All about Retrieval-Augmented Generation (RAG)


What is RAG?
RAG (Retrieval-Augmented Generation) is a technique that enhances Large Language Models (LLMs) by incorporating information from external sources. It enables the generation of more accurate and up-to-date responses by combining real-world knowledge with the power of LLMs.
How RAG works?
Let’s understand how RAG works with a practical example. Suppose we have a 100-page PDF explaining a company’s leave policy, and a user wants to quickly understand the rules around sick and casual leaves. Here’s how RAG helps:
Chunking of the Data: Here chunking refers the breaking of pdf in chunks based on the number of characters or we can chunk it based on number pages.
Generating Embeddings: Each chunks are now converted to vector embeddings. Think these to be data points in 3D space which are inter-related.
Store these embeddings in Vector Database: Store these embedded chunks in database with some metadata like page number.
User query embeddings: The query asked by the user can also be converted to vector embeddings.
Retrieving Relevant Chunks: Get the similar data from the vector Database based on the user query embeddings.
Leverage LLM’s: Feed the user query and relevant chunks to the LLM models so that it can generate context based responses.
Let’s Build RAG application
We’ll use lang chain library to achieve this. Lang chain is a framework which integrates LLM with external data source to create an application.
📄Load PDF
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
pdf_path = Path(__file__).parent / "node.pdf"
loader = PyPDFLoader(pdf_path)
docs = loader.load()
✂️Chunking using LangChain
from langchain_text_splitters import RecursiveCharacterTextSplitter
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(docs)
🧠Create vector embeddings for the Chunks
from langchain_openai import OpenAIEmbeddings
embeddings = OpenAIEmbeddings(
model="text-embedding-3-small",
api_key=os.getenv("OPENAI_API_KEY")
)
🗃️Store the embeddings in Vector DB (qdrant)
from langchain_qdrant import QdrantVectorStore
vector_store = QdrantVectorStore.from_documents(
documents=texts,
embedding=embeddings,
collection_name="my_documents",
url="http://localhost:6333",
)
vector_store.add_documents(texts)
⁉️Get User Query
if send_button and user_query and st.session_state.qa_chain:
with st.spinner("Thinking..."):
answer = st.session_state.qa_chain.run(user_query)
st.markdown(f"**Answer:** {answer}")
🔎Search Relevant chunks
retriever = vector_store.as_retriever()
🤖Integrate the Relevant chunks with LLM’s
qa_chain = RetrievalQA.from_chain_type(
llm=ChatOpenAI(
model="gpt-4o-mini",
api_key=OPENAI_API_KEY
),
retriever=retriever,
chain_type="stuff"
)
st.session_state.qa_chain = qa_chain
Let’s Connect
LinkedIn: https://www.linkedin.com/in/revathi-p-22b060208/
Twitter: https://x.com/RevathiP04
Subscribe to my newsletter
Read articles from Revathi P directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
