Exploring the Chain of Thought in RAG

Retrieval-Augmented Generation (RAG) systems are excellent for querying documents, but adding reasoning capabilities can take them to the next level. In this blog, we’ll build an interactive PDF query assistant using the Chain of Thought (CoT) prompting technique. We’ll explore CoT in detail, walk through the code step-by-step, and show how it enhances the assistant’s ability to provide thoughtful, accurate answers from a PDF.
What is Chain of Thought (CoT)
Chain of Thought (CoT) is a prompting technique that instructs a language model to break down its reasoning process into explicit, sequential steps before arriving at an answer. Introduced in research to improve large language models’ performance on complex tasks, CoT mimics how humans solve problems by thinking aloud, step-by-step.
How Does CoT Work?
Normally, a language model might provide a direct answer to a query like "What’s the output of this Python code?" With CoT, we prompt the model to:
Analyze the problem piece by piece.
Show intermediate reasoning steps.
Conclude with a final answer.
For example:
Query: "If x = 5 and y = 3, what’s x + y?"
Standard Response: "8"
CoT Response:
"1. We’re given x = 5 and y = 3.The operation is addition, so we compute x + y as 5 + 3.
Adding 5 and 3 gives 8.
So, the answer is 8."
Why Use CoT?
CoT shines in tasks requiring:
Logical Reasoning: Breaking down multi-step problems (e.g., math or code analysis).
Synthesis: Combining information from multiple sources (e.g., PDF excerpts).
Transparency: Showing how an answer was derived, increasing trust.
CoT in Practice
In our PDF assistant:
Input: A user query and relevant PDF excerpts.
Process: The model uses CoT to identify key information, analyze it, and formulate an answer.
Output: A response with reasoning steps and a concise conclusion.
Benefits of CoT
Improved Accuracy: Explicit steps reduce the chance of skipping critical logic.
Explainability: Users see the thought process, making answers more understandable.
Adaptability: Works for simple factual queries and complex analytical ones.
Limitations of CoT
Verbose Output: Responses are longer due to reasoning steps.
Prompt Sensitivity: The model’s performance depends on how well the CoT prompt is crafted.
Token Limits: Including steps and context might hit model input limits with large PDFs.
System Overview
Load and Split PDF: Extract text from a PDF and chunk it.
Embed and Store: Convert chunks into embeddings and store them in Qdrant.
Retrieve Context: Fetch relevant chunks based on the user’s query.
CoT Prompting: Build a prompt with CoT instructions and context.
Generate Response: Use a language model to produce a reasoned answer.
Interact: Loop for multiple user queries.
Step-by-Step Code Walkthrough
Step 1: Import Required Libraries
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_qdrant import QdrantVectorStore
import os
Step 2: Load and Process the PDF
pdf_path = Path(__file__).parent / "Python Programming.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()
Step 3: Split the Document into Chunks
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=2000,
chunk_overlap=200,
)
split_docs = text_splitter.split_documents(documents=docs)
Step 4: Generate Embeddings
embedder = GoogleGenerativeAIEmbeddings(
model="models/text-embedding-004",
google_api_key="YOUR_API_KEY" # Replace with your actual key
)
Step 5: Set Up the Vector Store
vector_store = QdrantVectorStore.from_documents(
documents=split_docs,
embedding=embedder,
url="http://localhost:6333", # Local Qdrant instance
collection_name="pdf_assistant"
)
Step 6: Initialize the Language Model
llm = ChatGoogleGenerativeAI(
model="gemini-2.0-flash",
google_api_key="YOUR_API_KEY" # Replace with your actual key
)
Step 7: Define the System Prompt with CoT Emphasis
The system prompt sets the assistant’s role and introduces CoT.
SYSTEM_PROMPT = """
You are a smart PDF assistant designed to help users understand a PDF document’s content. Your task is to provide accurate, clear, and concise responses based solely on the user’s query and the provided PDF excerpts. When answering, use a step-by-step reasoning process (Chain of Thought) to ensure clarity and accuracy. Follow these guidelines:
1. **Query Handling**:
- For summaries, give a high-level overview of key points.
- For specific info (e.g., "What is X?"), extract it directly.
- For explanations (e.g., "Explain Y"), provide a clear overview, adding details if needed.
- For vague queries, assume a general response is needed.
2. **Use Excerpts Only**:
- Rely entirely on the provided excerpts.
- If the info isn’t there, say: "The PDF does not contain this information."
3. **Response Style**:
- Use simple, clear language.
- Show your reasoning step-by-step before giving the final answer.
If the query is unclear, ask for clarification.
"""
- CoT Integration: Emphasizes step-by-step reasoning in the guidelines.
Step 8: Retrieve Relevant Documents
Fetch the top 5 relevant chunks.
def retrieve_documents(vector_store, query, k=5):
return vector_store.similarity_search(query, k=k)
- k=5: Limits retrieval to 5 chunks for manageable context.
Step 9: Construct the CoT Prompt
Build a prompt that guides the model through reasoning.
def construct_cot_prompt(query, context):
cot_prompt = (
SYSTEM_PROMPT + "\n\n"
"Based on the following PDF excerpts, answer the question using Chain of Thought reasoning.\n\n"
"Excerpts:\n"
f"{context}\n\n"
"Question: " + query + "\n\n"
"Let’s reason step-by-step:\n"
"1. Identify the key information in the excerpts related to the question.\n"
"2. Analyze how this information applies to the question.\n"
"3. Formulate a clear, concise answer based on the analysis.\n\n"
"So, the answer is:"
)
return cot_prompt
Structure: Combines system instructions, context, and CoT steps.
Steps: Ensures a logical progression from identification to conclusion.
Step 10: Generate the Response with CoT
def chat_with_cot(query, vector_store, llm):
retrieved_docs = retrieve_documents(vector_store, query)
context = "\n\n".join([doc.page_content for doc in retrieved_docs])
cot_prompt = construct_cot_prompt(query, context)
response = llm.invoke(cot_prompt)
return response.content
- Flow: Retrieve, build prompt, generate response.
Step 11: Build the Interactive Loop
Enable continuous querying.
print("Welcome to the PDF Query Assistant with Chain of Thought!")
while True:
query = input("Ask a question about the PDF (or type 'exit' to quit): ")
if query.lower() == 'exit':
print("Goodbye!")
break
if not query.strip():
print("Please enter a valid question.")
continue
try:
answer = chat_with_cot(query, vector_store, llm)
print("Assistant:", answer)
except Exception as e:
print(f"An error occurred: {e}")
Output:
Conclusion
This interactive PDF query assistant utilizes the Chain of Thought (CoT) methodology to provide well-reasoned and transparent answers derived from PDF content. By incorporating CoT, the assistant enhances the Retrieval-Augmented Generation (RAG) framework, adding a crucial layer of logical analysis and detailed breakdown. This makes it particularly suitable for navigating educational or technical documents, where understanding complex information is essential. To get started, you need to replace the API keys with your own, ensure Qdrant is running locally on your machine, and then you can begin exploring your PDF documents with this robust tool. The assistant is designed to facilitate a deeper understanding of the content by breaking down information into manageable parts, offering a comprehensive and insightful experience.
Subscribe to my newsletter
Read articles from shrihari katti directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
