Chain of Thought - CoT

🤓 Explanation

First, take the user query and create a system prompt to send it to an LLM like OpenAI or Gemini.
The LLM will convert the query into 3, 5, or 10 different queries, giving us an array of questions.
Let us say it creates 5 queries for us.
We can then use these three queries to perform a similarity search for each one and store the results in a vector database.This will give us vector embeddings, known as chunks.
After forming chunks for each query generated by the LLM,we give the original query to get

Response 1.
Then we give this Response 1 to Query 2 and carry same steps , basically here query 1 perform generation process and give a response 1 , but this response 1 is used as a input to next query 2 and so on it is carried out .
After completing 5 response , we collect all the 5 Responses and take original user query and then give this to a LLM that is here OpenAI so that it can generate most relevant response from broader aspects .

📖 Step by step Explanation of the diagram

Step 1 : 🤔 Ask the LLM

What we do : Take the user’s original question and send it to an LLM (like OpenAI or Gemini).
Why? To get 3, 5, or 10 related questions, here let us we say 5 related queries.

Step 2 : 🔍 Search Smart with Vectors

What we do : Take each generated question and do a similarity search using a vector database like Qdrant or Pinecone.
Why ? To find useful, related chunks of content for each query.

Step 3 : 📦 Chunk It Up

What we do : Store the search results (chunks) from each query in the vector DB.
Why ? These chunks are your knowledge nuggets for crafting answers later.

Step 4 : 🧠 Generate Response 1

What we do:Give Query 1st chunks to the LLM to create Response 1.
Why?We are starting the chain this is our base response.

Step 5 : 🔁 Chain Reactions Begin

What we do : Feed Response 1 as input to Query 2, and repeat:→ get chunks → send to LLM → generate Response 2. Continue this until you get all 5 responses.
Why ? Each response builds on the last, getting smarter and more contextual.

Step 6 : 📚 Bundle the Brains

What we do : Now we have 5 smart responses. Take all 5 and combine them with the original query.
Why ? This gives us a zoomed-out, holistic view of the answer space.

Step 7 : 🧠✨ Final LLM Fusion

What we do : Send the bundle to the LLM again.
Why ? To create a super relevant final response one that understands depth, breadth, and user intent.

⚙️ Implementation

💻 Code Example

I have previously posted a blog on “ Understanding RAG: The Smart Foundation of Advanced AI “ , In which I have explain step by step how to implement RAG with a simple RAG project “PDF Chatbot” , on the top of it , I have built this CoT - Chain of Thought .

Here is the code for CoT - Chain of Thought you can add this code on the top of “PDF Chatbot” code for implementing it .

If there is some issue you can comment or you can check my code at github: https://github.com/Kamraanmulani

def chain_of_thought_rag(original_query, num_queries=5):
    print("Generating queries...")
    queries = generate_multiple_queries(original_query, num_queries)

    print("\nGenerated queries:")
    for i, query in enumerate(queries, 1):
        print(f"{i}. {query}")

    responses = []
    previous_response = ""

    for i, query in enumerate(queries, 1):
        print(f"\nStep {i}/{len(queries)}...")

        current_query = query
        if i > 0 and previous_response:
            current_query = f"Based on this information: '{previous_response}', {query}"

        search_results = retriever.similarity_search(query=current_query, k=3)
        relevant_chunks_with_sources = []

        for doc in search_results:
            page_number = doc.metadata.get("page", "Unknown page")
            relevant_chunks_with_sources.append(
                f"[Page {page_number}]: {doc.page_content}"
            )

        relevant_chunks = "\n\n".join(relevant_chunks_with_sources)

        messages = [
            SystemMessage(content=get_system_prompt(relevant_chunks)),
            HumanMessage(content=current_query),
        ]

        step_response = model.invoke(messages)
        responses.append(step_response.content)
        previous_response = step_response.content

        print_box(step_response.content, title=f"Response {i}")

    print("\nGenerating final response...")

    final_system_prompt = """You are a helpful assistant. You've been given multiple perspectives 
    on a question through a chain of thought process. Synthesize these perspectives into a 
    comprehensive, accurate answer. Focus on providing the most relevant information, eliminating 
    redundancies, and ensuring clarity. Include specific page references from the sources."""

    final_messages = [
        SystemMessage(content=final_system_prompt),
        HumanMessage(content=f"Original question: {original_query}\n\nChain of thought responses:\n" + 
                            "\n\n".join([f"Response {i+1}: {resp}" for i, resp in enumerate(responses)])),
    ]

    final_response = model.invoke(final_messages)
    return final_response.content

✅ Code Execution Result

🤔 Why ? How ?

🧐 Why is Chain of Thought (CoT) Reasoning Used?

Chain of Thought (CoT) reasoning is used to improve the quality and depth of answers generated by language models. Rather than jumping to a conclusion, CoT mimics human-like thinking , breaking down a complex question into smaller parts and processing each step thoughtfully.

This technique:

Helps in tackling multi layered or ambiguous questions.
Encourages step-by-step reasoning, reducing the chances of incorrect answers.
Extracts richer insights by exploring the question from multiple perspectives.

In short, CoT ensures that the model does not just answer you see , It think before it speaks.

⚙️ How Does Chain of Thought Reasoning Work?

The CoT technique works by generating multiple related queries for a single user question. These queries are variations that explore different angles or subtopics of the main question. Here's how it unfolds:

🧠 Original Query Expansion: The user asks a question. The system generates 3–5 different versions of that question (eg : rephrased or focusing on different parts).
🔍 Retrieval for Each Query: For every version, the system searches the document (like a PDF or database) to retrieve the most relevant chunks.
💬 Independent Responses: The language model processes each retrieved context and query pair, giving an individual response.
🧩 Final Synthesis: All responses are combined into one comprehensive answer that merges the best insights, removes repetition, and adds clarity to the final response.

🌍 Real-life Applications:

Chain of Thought (CoT) in Retrieval Augmented Generation (RAG) is excellent for situations where decisions require careful consideration. By breaking down complex questions and examining each part logically, CoT in RAG enhances the accuracy, detail, and context awareness of answers. It is particularly useful in fields like education, finance, research, and tech support.

📚 1. AI Study Tutor (Education Tech)

Use Case :
A student asks : “Can you help me understand Newton’s laws and how they apply to car crashes?”

This question involves both theory and application. CoT creates steps like:

“What are Newton’s three laws of motion?”
“How does Newton’s second law apply to collisions?”
“Examples of Newton’s laws in real-life car accidents”

📌 Why it works :
Instead of just providing a basic definition, CoT explains the concept step-by-step and connects it to real situations, making it easier to understand.

💰 2. Financial Advisory Bot

Use Case :
A user types: “Should I invest in mutual funds or crypto in 2025?”

This is a complex decision. CoT breaks it down with steps like :

“What are the risks and benefits of mutual funds in 2025?”
“Crypto market predictions for 2025”
“How to choose investments based on risk tolerance?”

📌 Why it works:
Each step leads to more personalized advice. Instead of guessing or oversimplifying, the system builds a well informed response by linking economic forecasts and personal preferences.

📑Summary

Chain of Thought (CoT) reasoning enhances the quality of answers from language models by breaking down complex questions into simpler steps. By transforming one question into several related ones, CoT allows the model to explore different aspects of the main topic, resulting in a thorough and informed answer.

This approach is particularly useful in fields like education and finance, where detailed analysis is crucial. The article provides a code example of CoT in a PDF chatbot project, illustrating how it aids in making better decisions by eliminating unnecessary details and ensuring clear answers.

Linking Logic: The Power of Chain of Thought in AI

Table of contents

Chain of Thought - CoT

🤓 Explanation

📖 Step by step Explanation of the diagram

Step 1 : 🤔 Ask the LLM

Step 2 : 🔍 Search Smart with Vectors

Step 3 : 📦 Chunk It Up

Step 4 : 🧠 Generate Response 1

Step 5 : 🔁 Chain Reactions Begin

Step 6 : 📚 Bundle the Brains

Step 7 : 🧠✨ Final LLM Fusion