Understanding RAG: The Smart Foundation of Advanced AI

Kamraan MulaniKamraan Mulani
6 min read

📚 What is RAG (Retrieval-Augmented Generation) ?

“ Take the relevant data and feed it to the prompt “ — Piyush sir
RAG is an AI technique that combines the power of information retrieval with large language models (LLMs) to enhance the accuracy and context awareness of AI generated responses.
It works by file searching for relevant information from external data sources ( like databases , documents or the web) and then it uses that information to guide LLM in generating a more accurate and informative response.

🧠 What is Context Window?

At a given time how many tokens we can process it is called as context window.

The above is example figure if the user ask a question (request) the LLM will give response and when user ask next question then LLM will give answer based on the previous conversation .

⚙️ A Simple RAG application

A chatbot that scans a PDF and provides responses based on the content of that PDF.

This is the workflow :

For a PDF chatbot :

First, we have a Data Source, such as a PDF. We break down the data source page by page, a process called Chunking. After chunking, we create embeddings for each chunk and store them in a Vector Store. This entire process is called indexing, or the injection process.

If a user asks a question, we create embeddings for that question and search for them in the vector store. This helps us find relevant chunks. Based on these relevant chunks, we filter out the chunks from the data source and create multiple small chunks based on the vector embeddings.

Finally, we take the user's question and the chunks as input and call OpenAI. Based on this, we can generate an output.

🛠️ Steps for writing code

We have grasped the core concepts of RAG. Now, let’s translate that knowledge into code and see how it works under the hood.

We will use LangChain , If you don’t know what is Langchain , LangChain is an open source framework designed to build applications powered by large language models (LLMs), especially those that go beyond simple Q&A by integrating external data sources (like documents, databases, APIs, tools, etc.) and reasoning chains.

📥 Load Pdf

This will load the specified pdf .

You can refer this docs : https://python.langchain.com/docs/integrations/document_loaders/pypdfloader/

Run the below given command in your terminal

pip install langchain_community pypdf

Then add the code in your file :

from langchain_community.document_loaders import PyPDFLoader

pdf_path = Path(__file__).parent / "nodejs.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()

✂️ Splitter

This will split the loaded pdf .

You can refer this docs : https://python.langchain.com/docs/concepts/text_splitters/

Run the below given command in your terminal

pip install langchain_text_splitters

Then add the code in your file :

This will splits text into chunks of 1000 characters each, with a 200 character overlap between consecutive chunks to maintain context.

from langchain_text_splitters import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200,
)

split_docs = text_splitter.split_documents(documents=docs)

🧬 Embedder

Now I will create embeddings for each chunk and store them in a Vector Store ,I am using Open AI embeddings model for this , you can use gemini also.

You can refer this docs : https://python.langchain.com/docs/integrations/text_embedding/openai/

Run the below given command in your terminal

pip install langchain_openai

Then add the code in your file :

from langchain_openai import OpenAIEmbeddings

embedder = OpenAIEmbeddings(
    model="text-embedding-3-large",
    api_key=""
)

🧱 Qdrant DB

We will use Qdrant DB Docker , create a file docker-compose.db.yml in your current folder ,
write the below code in your docker-compose.db.yml file.

services:
  qdrant:
    image: qdrant/qdrant
    ports:
      - 6333:6333

Run the below given command in your terminal

Before running this command start your desktop and sign up with your account :

docker compose -f docker.compose.yml up

After running the above command in terminal , you will get output in terminal as shown below

You can view the dashboard of your qdrant db on the port 6333 , use the below url to redirect to the dashboard. There is collection page where you can see your existing collection , at the beginning there will be no collections.

http://localhost:6333/dashboard

🤝 Interact with Database

To interact with DB we will use langchain.

Run the below given command in your terminal

pip install langchain-qdrant

Then add the code in your file :

from langchain_qdrant import QdrantVectorStore

 vector_store = QdrantVectorStore.from_documents(
     documents=[],
     url="http://localhost:6333",
     collection_name="learning_langchain",
     embedding=embedder
 )

vector_store.add_documents(documents=split_docs)
print("Injection Done")

After the Injection is Done , the above code will make a collection for you with a name learning_langchain , you can view the collection , vectors , there will be page content our chunk and metadata . There will be vector embeddings for every chunks .

🔍 Retrieval

Now we are goin towards retrieval stage , for that we have to comment the code which is shown below or you can also make another file for this retrieval stage .

 vector_store = QdrantVectorStore.from_documents(
     documents=[],
     url="http://localhost:6333",
     collection_name="learning_langchain",
     embedding=embedder
 )

vector_store.add_documents(documents=split_docs)
print("Injection Done")

Add the below retriver code

retriver = QdrantVectorStore.from_existing_collection(
    url="http://localhost:6333",
    collection_name="learning_langchain",
    embedding=embedder
)

Now , If user ask some question for that we can make search_result

search_result = retriver.similarity_search(
    query="What is FS Module?"
)
print("Relevant Chunks", search_result)

The above code will give me relevant chunks for the query "What is FS Module?".

💬 System prompt and Add Chat Lopp

we can add our system prompt for PDF Chatbot and also add a chat loop , as given below :

import os
from langchain.schema import SystemMessage, HumanMessage

# Function to format system prompt with context and rules
def get_system_prompt(chunks):
    return f"""You are a helpful assistant. Use the given context to answer the question.
If the answer isn't in the context, say "I don't know".
Always cite the page number (e.g., "According to page 5...").

{chunks}
"""

# Clears terminal screen
def clear_screen():
    os.system("cls" if os.name == "nt" else "clear")

# Main chat loop
def main():
    clear_screen()
    print("🤖 PDF GPT Chat\nAsk about Node.js or type 'exit' to quit")

    while True:
        user_query = input("\n👤 You: ")
        if user_query.lower() in ["exit", "quit", "bye"]:
            print("\n👋 Thanks for chatting! Goodbye!")
            break

        # Retrieve top 3 relevant document chunks
        results = retriever.similarity_search(query=user_query, k=3)
        chunks = "\n\n".join([f"[Page {doc.metadata.get('page', '?')}]: {doc.page_content}" for doc in results])

        # Create message history
        messages = [
            SystemMessage(content=get_system_prompt(chunks)),
            HumanMessage(content=user_query),
        ]

        # Get AI response
        print("\n🤖 PDF GPT: ", end="")
        print(model.invoke(messages).content)

if __name__ == "__main__":
    main()

This completes your first simple RAG project , but this is a very simple RAG . There are many RAG Techniques , We will discuss the techniques of RAG in my next blog.

Output

After successfully implementing the code and running your file you will get output as shown below as you can see that it gives the content along with the page number .

📄Summary

In this blog , I explain Retrieval-Augmented Generation (RAG) and how it can be used to build a PDF chatbot. RAG improves AI responses by combining information retrieval with large language models, leading to more accurate and context aware answers. The bloggives a step by step guide on how to create a simple RAG based system using LangChain, including loading and breaking down a PDF, embedding and storing data in a Qdrant vector database, and retrieving information to answer user questions. It includes system prompts and a chat loop. This is a basic introduction to making a simple RAG project, I will discuss the techniques of RAG in my next blog. THank you so much for reading , if any doubts comment down below , I will resolve.

14
Subscribe to my newsletter

Read articles from Kamraan Mulani directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Kamraan Mulani
Kamraan Mulani