What is RAG?

RAG (Retrieval-Augmented Generation) is a technique that enhances Large Language Models (LLMs) by incorporating information from external sources. It enables the generation of more accurate and up-to-date responses by combining real-world knowledge with the power of LLMs.

How RAG works?

Let’s understand how RAG works with a practical example. Suppose we have a 100-page PDF explaining a company’s leave policy, and a user wants to quickly understand the rules around sick and casual leaves. Here’s how RAG helps:

Chunking of the Data: Here chunking refers the breaking of pdf in chunks based on the number of characters or we can chunk it based on number pages.
Generating Embeddings: Each chunks are now converted to vector embeddings. Think these to be data points in 3D space which are inter-related.
Store these embeddings in Vector Database: Store these embedded chunks in database with some metadata like page number.
User query embeddings: The query asked by the user can also be converted to vector embeddings.
Retrieving Relevant Chunks: Get the similar data from the vector Database based on the user query embeddings.
Leverage LLM’s: Feed the user query and relevant chunks to the LLM models so that it can generate context based responses.

Let’s Build RAG application

We’ll use lang chain library to achieve this. Lang chain is a framework which integrates LLM with external data source to create an application.

📄Load PDF

from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader

pdf_path = Path(__file__).parent / "node.pdf"
loader = PyPDFLoader(pdf_path)

docs = loader.load()

✂️Chunking using LangChain

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(docs)

🧠Create vector embeddings for the Chunks

from langchain_openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=os.getenv("OPENAI_API_KEY")
)

🗃️Store the embeddings in Vector DB (qdrant)

from langchain_qdrant import QdrantVectorStore

vector_store = QdrantVectorStore.from_documents(
    documents=texts,
    embedding=embeddings,
    collection_name="my_documents",
    url="http://localhost:6333",
)


vector_store.add_documents(texts)

⁉️Get User Query

if send_button and user_query and st.session_state.qa_chain:
    with st.spinner("Thinking..."):
        answer = st.session_state.qa_chain.run(user_query)
        st.markdown(f"**Answer:** {answer}")

🔎Search Relevant chunks

retriever = vector_store.as_retriever()

🤖Integrate the Relevant chunks with LLM’s

qa_chain = RetrievalQA.from_chain_type(
        llm=ChatOpenAI(
            model="gpt-4o-mini", 
            api_key=OPENAI_API_KEY
        ),
        retriever=retriever,
        chain_type="stuff"
 )
st.session_state.qa_chain = qa_chain

Let’s Connect

LinkedIn: https://www.linkedin.com/in/revathi-p-22b060208/

Twitter: https://x.com/RevathiP04

All about Retrieval-Augmented Generation (RAG)

Table of contents

What is RAG?

How RAG works?

Let’s Build RAG application

📄Load PDF

✂️Chunking using LangChain

🧠Create vector embeddings for the Chunks

🗃️Store the embeddings in Vector DB (qdrant)

⁉️Get User Query

🔎Search Relevant chunks

🤖Integrate the Relevant chunks with LLM’s

Let’s Connect

Subscribe to my newsletter

Revathi P

Revathi P