Retrieval-Augmented Generation (RAG) systems are excellent for querying documents, but adding reasoning capabilities can take them to the next level. In this blog, we’ll build an interactive PDF query assistant using the Chain of Thought (CoT) prompting technique. We’ll explore CoT in detail, walk through the code step-by-step, and show how it enhances the assistant’s ability to provide thoughtful, accurate answers from a PDF.

What is Chain of Thought (CoT)

Chain of Thought (CoT) is a prompting technique that instructs a language model to break down its reasoning process into explicit, sequential steps before arriving at an answer. Introduced in research to improve large language models’ performance on complex tasks, CoT mimics how humans solve problems by thinking aloud, step-by-step.

How Does CoT Work?

Normally, a language model might provide a direct answer to a query like "What’s the output of this Python code?" With CoT, we prompt the model to:

Analyze the problem piece by piece.
Show intermediate reasoning steps.
Conclude with a final answer.

For example:

Query: "If x = 5 and y = 3, what’s x + y?"
Standard Response: "8"
CoT Response:
"1. We’re given x = 5 and y = 3.
1. The operation is addition, so we compute x + y as 5 + 3.
2. Adding 5 and 3 gives 8.
  So, the answer is 8."

Why Use CoT?

CoT shines in tasks requiring:

Logical Reasoning: Breaking down multi-step problems (e.g., math or code analysis).
Synthesis: Combining information from multiple sources (e.g., PDF excerpts).
Transparency: Showing how an answer was derived, increasing trust.

CoT in Practice

In our PDF assistant:

Input: A user query and relevant PDF excerpts.
Process: The model uses CoT to identify key information, analyze it, and formulate an answer.
Output: A response with reasoning steps and a concise conclusion.

Benefits of CoT

Improved Accuracy: Explicit steps reduce the chance of skipping critical logic.
Explainability: Users see the thought process, making answers more understandable.
Adaptability: Works for simple factual queries and complex analytical ones.

Limitations of CoT

Verbose Output: Responses are longer due to reasoning steps.
Prompt Sensitivity: The model’s performance depends on how well the CoT prompt is crafted.
Token Limits: Including steps and context might hit model input limits with large PDFs.

System Overview

Load and Split PDF: Extract text from a PDF and chunk it.
Embed and Store: Convert chunks into embeddings and store them in Qdrant.
Retrieve Context: Fetch relevant chunks based on the user’s query.
CoT Prompting: Build a prompt with CoT instructions and context.
Generate Response: Use a language model to produce a reasoned answer.
Interact: Loop for multiple user queries.

Step-by-Step Code Walkthrough

Step 1: Import Required Libraries

from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_google_genai import GoogleGenerativeAIEmbeddings, ChatGoogleGenerativeAI
from langchain_qdrant import QdrantVectorStore
import os

Step 2: Load and Process the PDF

pdf_path = Path(__file__).parent / "Python Programming.pdf"
loader = PyPDFLoader(file_path=pdf_path)
docs = loader.load()

Step 3: Split the Document into Chunks

text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=2000,
    chunk_overlap=200,
)
split_docs = text_splitter.split_documents(documents=docs)

Step 4: Generate Embeddings

embedder = GoogleGenerativeAIEmbeddings(
    model="models/text-embedding-004",
    google_api_key="YOUR_API_KEY"  # Replace with your actual key
)

Step 5: Set Up the Vector Store

vector_store = QdrantVectorStore.from_documents(
    documents=split_docs,
    embedding=embedder,
    url="http://localhost:6333",  # Local Qdrant instance
    collection_name="pdf_assistant"
)

Step 6: Initialize the Language Model

llm = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash",
    google_api_key="YOUR_API_KEY"  # Replace with your actual key
)

Step 7: Define the System Prompt with CoT Emphasis

The system prompt sets the assistant’s role and introduces CoT.

SYSTEM_PROMPT = """
You are a smart PDF assistant designed to help users understand a PDF document’s content. Your task is to provide accurate, clear, and concise responses based solely on the user’s query and the provided PDF excerpts. When answering, use a step-by-step reasoning process (Chain of Thought) to ensure clarity and accuracy. Follow these guidelines:

1. **Query Handling**:
   - For summaries, give a high-level overview of key points.
   - For specific info (e.g., "What is X?"), extract it directly.
   - For explanations (e.g., "Explain Y"), provide a clear overview, adding details if needed.
   - For vague queries, assume a general response is needed.

2. **Use Excerpts Only**:
   - Rely entirely on the provided excerpts.
   - If the info isn’t there, say: "The PDF does not contain this information."

3. **Response Style**:
   - Use simple, clear language.
   - Show your reasoning step-by-step before giving the final answer.

If the query is unclear, ask for clarification.
"""

CoT Integration: Emphasizes step-by-step reasoning in the guidelines.

Step 8: Retrieve Relevant Documents

Fetch the top 5 relevant chunks.

def retrieve_documents(vector_store, query, k=5):
    return vector_store.similarity_search(query, k=k)

k=5: Limits retrieval to 5 chunks for manageable context.

Step 9: Construct the CoT Prompt

Build a prompt that guides the model through reasoning.

def construct_cot_prompt(query, context):
    cot_prompt = (
        SYSTEM_PROMPT + "\n\n"
        "Based on the following PDF excerpts, answer the question using Chain of Thought reasoning.\n\n"
        "Excerpts:\n"
        f"{context}\n\n"
        "Question: " + query + "\n\n"
        "Let’s reason step-by-step:\n"
        "1. Identify the key information in the excerpts related to the question.\n"
        "2. Analyze how this information applies to the question.\n"
        "3. Formulate a clear, concise answer based on the analysis.\n\n"
        "So, the answer is:"
    )
    return cot_prompt

Structure: Combines system instructions, context, and CoT steps.
Steps: Ensures a logical progression from identification to conclusion.

Step 10: Generate the Response with CoT

def chat_with_cot(query, vector_store, llm):
    retrieved_docs = retrieve_documents(vector_store, query)
    context = "\n\n".join([doc.page_content for doc in retrieved_docs])
    cot_prompt = construct_cot_prompt(query, context)
    response = llm.invoke(cot_prompt)
    return response.content

Flow: Retrieve, build prompt, generate response.

Step 11: Build the Interactive Loop

Enable continuous querying.

print("Welcome to the PDF Query Assistant with Chain of Thought!")
while True:
    query = input("Ask a question about the PDF (or type 'exit' to quit): ")
    if query.lower() == 'exit':
        print("Goodbye!")
        break
    if not query.strip():
        print("Please enter a valid question.")
        continue
    try:
        answer = chat_with_cot(query, vector_store, llm)
        print("Assistant:", answer)
    except Exception as e:
        print(f"An error occurred: {e}")

Output:

Conclusion

This interactive PDF query assistant utilizes the Chain of Thought (CoT) methodology to provide well-reasoned and transparent answers derived from PDF content. By incorporating CoT, the assistant enhances the Retrieval-Augmented Generation (RAG) framework, adding a crucial layer of logical analysis and detailed breakdown. This makes it particularly suitable for navigating educational or technical documents, where understanding complex information is essential. To get started, you need to replace the API keys with your own, ensure Qdrant is running locally on your machine, and then you can begin exploring your PDF documents with this robust tool. The assistant is designed to facilitate a deeper understanding of the content by breaking down information into manageable parts, offering a comprehensive and insightful experience.

Exploring the Chain of Thought in RAG

Table of contents