RAG multiagent

If you are new to GenerativeAI or curious of how LLMs can interact with real objects , make inferences and work for you - then this article is for you!

We will not go into the details of what are LLMs - if you want to know the magic behind the inference work then please refer to my previous articles.

Building using Traditional RAG

Smart Question-Answering Bot that can read PDFs

We will build a smart question answering bot that can read pdfs and answering questions powered by LangChain and LLAMA models.

Quick Refresh on RAG

Step 1: PDF loading

Use PyMuPDFLoader to load PDFContent into text.

#  Load the PDF file
pdf_path = "/content/bengalicookbook.pdf"  # Replace with your PDF
loader = PyMuPDFLoader(pdf_path)
documents = loader.load()

Step 2: Text Chunking

RecursiveCharacterTextSplitter splits long content into chunks
Chunk size : 500 chars, Overlap : 50 for context retention

# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

Step 3: Embedding & Storage

Embedding via HuggingFaceEmbeddings (all-MiniLM-L6-v2)
Store vectorized chunks in Chroma DB

# 📊 Embed and store in Chroma
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = Chroma.from_documents(chunks, embedding=embedding_model, persist_directory="./chroma_db")

Step 4: Retrieval

Setup a retriever to fetch relevant chunks for any question.

# Use LLaMA3-70B from Groq
llm = ChatGroq(model_name="llama3-70b-8192", temperature=0)

# Set up RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

Step 5 : Answering with Groq

Use ChatGroq with LLaMA via GroqAPI for fast response.

query = input("You: ")
result = qa_chain({"query": query})
answer = result['result'].strip()

If you want to directly run on google collab , here is the link : https://colab.research.google.com/drive/1sfo_gHIyTk91jd5vtGEuB0jmj6_7Rw3L?usp=sharing

That’t it - now you can upload a bunch of pdfs and have a bot directly from a folder to answer any questions about yourself.

Combining RAG with agentic workflows

In the previous example we saw how traditional RAG can be used towards a singular use case.

In its standalone format Retrieval Augmentic Generation (RAG) is reactive;

“Retrieves once , generates once.. ”.

Lets say we have to have “summarize viewpoints from multiple patient behavioral data and come up with a viewpoint , critic its own response - and generate a more balanced feedback” -

Traditional RAG :

X Lacks Planning and Reasoning

X Assumes whatever data is provided is “right”

X Cannot Parallelize

X Does not “Learn” - No correction loop

Enter the world of Agentic RAG - “From Retrieval to Autonomous Reasoning”

flowchart TD
    %%Traditional RAG
    A1[User Query] --> B1[Retriever]
    B1 --> C1[Context Documents]
    C1 --> D1[LLM Generation]
    D1 --> E1[Final Answer]

    %%Agentic RAG
    A2[User Query] --> B2[Query Refiner Agent]
    B2 --> C2a[Retriever: Web Search]
    B2 --> C2b[Retriever: Vector DB]
    C2a --> D2a[Reader: Web Docs]
    C2b --> D2b[Reader: Vector Docs]
    D2a --> E2[Synthesizer Agent]
    D2b --> E2
    E2 --> F2[Critic Agent]
    F2 --> G2[Final Answer]

Lets use a query usecase that you can on a daily basis and dont need a blog to read from :)

“Summarize today’s top AI research news and breakthroughs.”

If we were to breakdown the above into tasks -

Collect all the AI based articles , research paper or news
Embed or store them into a VectorDB
Retrieve the most relevant ones based on your query
Summarize them using LLM.

Simple , right? Yes we will write some code towards basically brining this together using the following steps

You can run the detailed code from https://colab.research.google.com/drive/1IwxsCJAg1w575TPbuRhRScl6Dqd3BeTa?usp=sharing

Build a Basic RAG System for “Today’s AI Research News”

# Fetch daily articles from ARXIV , few blogs you follow , github trending

def fetch_all_sources() -> list[Document]:
    docs = []

    print("Fetching ArXiv papers...")
    docs += fetch_arxiv_docs()

    print("Fetching blog articles...")
    docs += fetch_blog_articles(blog_urls)

    print("Fetching GitHub trending projects...")
    docs += fetch_github_trending()

    return docs

# Save the fetched data as documents and embeddings into VectorDB

# Fetch full documents from Arxiv, blogs, GitHub
docs = fetch_all_sources()

# Embed and store in Chroma(VectorDB)
embedding = OpenAIEmbeddings()
db = Chroma.from_documents(split_docs, embedding, persist_directory="./ai_daily_db")
db.persist()

# Now query / retrieve and use LLM to generate text
retriever = db.as_retriever(search_kwargs={"k": 5})
# please refer to my previous article on fine tuning to understand the parameters
llm = OpenAI(temperature=0) 
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)

query = "Summarize today’s top AI research news and breakthroughs."
print(rag_chain.run(query))

This is the standard RAG that generates

“ Today's top AI research news and breakthroughs include the growing use of AI in various industries such as healthcare and robotics, the development of edge computing technology, and the application of AI in complex areas such as conflict resolution. Additionally, there is ongoing research and discussion on the governance framework for physical artificial intelligence (PAI) and its potential impact on society.”

Modularize into Agent Roles

Now we break the query filtering , retrieval , reader and synthesizer and critic agent to be able to improve, reason , parallelize operations in the process.

Query Refiner Agent

This agent will be responsible for cleaning or expanding on user query.

As you can see

We will maintain a history of past queries
Use an LLM prompt to rewrite the current query based on that history and current date.

# query refiner agent
from datetime import date

def refine_query(user_query: str, llm, past_queries: list[str] = None, past_answers: list[str] = None) -> str:
    today = date.today().strftime("%B %d, %Y")

    history_prompt = ""
    if past_queries and past_answers:
        past_qas = "\n".join(
            [f"Q: {q}\nA: {a}" for q, a in zip(past_queries[-3:], past_answers[-3:])]
        )
        history_prompt = f"\n--- Past Conversations ---\n{past_qas}"

    prompt = f"""
You are a query rewriting assistant for an AI research analyst. Today's date is {today}.

A user has asked:
"{user_query}"

{history_prompt}

Your job is to rewrite this query to:
- Improve clarity
- Include specific temporal markers (e.g. replace "today" with the real date)
- Add missing AI-relevant keywords like "transformers", "LLMs", "retrieval", etc.
- Leverage previous questions and answers to suggest a more informative query

Return only the rewritten query.
"""
    return llm.predict(prompt).strip()

Data Retriever Agent

Retrieves the relevant doc from the VectorDB store

def retrieve_documents(query: str, vectorstore, k=5):
    retriever = vectorstore.as_retriever(search_kwargs={"k": k})
    return retriever.get_relevant_documents(query)

Data Reader agent

Summarizes each document

def summarize_document(doc, llm) -> str:
    prompt = f"Summarize the following:\n\n{doc.page_content}"
    return llm.predict(prompt)

Answer Synthesizer Agent

Compiles a comprehensive answer from the summaries. Please note how prompt is used to change the format of the answer

# synthesize answers agent

def synthesize_answer(summaries: list[str], query: str, llm) -> str:
    joined = "\n".join(summaries)
    prompt = f"""Using the following information, summarize today's top AI research for this query:
    "{query}"

    {joined}

    Output a 3-5 bullet summary."""
    return llm.predict(prompt)

Critic Agent

After the synthesizer agent generates the answer , Critic agent checks for quality , hallucination.

def critique_answer(answer: str, query: str, llm) -> dict:
    prompt = f"""
    You are an AI answer critic. Evaluate the following generated answer to the user's query.

    --- User Query ---
    {query}

    --- Answer ---
    {answer}

    Provide:
    1. A brief critique (1-3 sentences)
    2. A verdict: either "Pass" or "Needs Revision"

    Respond in this format:
    Critique: <your critique>
    Verdict: <Pass or Needs Revision>
    """
    output = llm.predict(prompt)

    # Parse result
    lines = output.strip().split("\n")
    critique_text = ""
    verdict = "Pass"
    for line in lines:
        if line.lower().startswith("critique:"):
            critique_text = line[len("Critique:"):].strip()
        if line.lower().startswith("verdict:"):
            verdict = line[len("Verdict:"):].strip()
    return {"critique": critique_text, "verdict": verdict}

Now bring them together and run all the agents sequentially , the generated answer from the Synthesizer agent is

AI is being utilized in various industries, including healthcare and robotics.
Edge computing is being developed to improve data processing at the edge of a network.
AI has the potential to address global challenges and transform government processes.
The concept of physical AI and its potential governance problems are being discussed and researched.
Further research and discussions are needed to fully address these problems.

The Critique: The AI-generated answer does not fully address the query as it only mentions a few general topics related to AI research, but does not provide any specific news or breakthroughs from today. It also includes some unfounded claims, such as the potential for AI to transform government processes without providing any evidence or examples. The answer is somewhat clear and well-structured, but it lacks completeness and specificity.

Critic Verdict: Needs Revision

Now you can use the verdict to run an agentic flow by running a “while” loop , updating the docs the data is generated from.

Orchestrate the agents as MCP controllers

Fun Fact : MCP stands for “Master Control Program” - referenced from the movie Tron.

In Agentic world , MCP is a metaphor for systems that allow setting up agents through libraries like Lang-Graph, AutoGen etc. Think of it as a Command Center that brings all the agents together.

In this example we will use LangGraph to show how to control the agents we defined above and we will implement one parallel path

flowchart TD
    A[User Query Input] --> B[QueryRefiner Agent]
    B --> C[Retriever Agent]
    C --> D[Reader Agent]
    D --> E[Synthesizer Agent]
    E --> F[Critic Agent]

    F -->|Pass| G[Final Answer Output]
    F -->|Needs Revision| C

    subgraph MCP Controller
        B
        C
        D
        E
        F
    end


    style G fill:#c6f6d5,stroke:#2f855a,stroke-width:2px
    style MCP Controller fill:#f0f0f0,stroke:#999,stroke-dasharray: 5 5

The Reader agent will need to query many documents at the same time, we will implement parallel path to help speeden up the process

flowchart TD
    A[Retriever Agent] --> B{{Reader Agent}}
    B -->|parallel| B1[Doc 1 Summary]
    B -->|parallel| B2[Doc 2 Summary]
    B -->|parallel| B3[Doc 3 Summary]
    B1 & B2 & B3 --> C[Synthesizer Agent]

We will create the nodes for the MCP based LangGraph

Query Node ,Retriever Node, Reader Node, Synthesizer Node, Critic Node.

Then bring it all together -

def build_graph():
    workflow = StateGraph(AgentState)
    workflow.set_entry_point("QueryRefiner")
    workflow.add_node("QueryRefiner", query_refiner_node)
    workflow.add_node("Retriever", retriever_node)
    workflow.add_node("Reader", reader_node)
    workflow.add_node("Synthesizer", synthesizer_node)
    workflow.add_node("Critic", critic_node)
    workflow.add_edge("QueryRefiner", "Retriever")
    workflow.add_edge("Retriever", "Reader")
    workflow.add_edge("Reader", "Synthesizer")
    workflow.add_edge("Synthesizer", "Critic")
    workflow.add_conditional_edges("Critic", verdict_router, {
        "Needs Revision": "Retriever",
        "Pass": END
    })
    return workflow.compile()

As you visualize from the code above - the nodes are connected by edges, and you can add conditional edges to fix the answers based on the Critic response.

Lets build Smart Agents (RAG on steroids) !

Table of contents

Building using Traditional RAG

Smart Question-Answering Bot that can read PDFs

Combining RAG with agentic workflows

Build a Basic RAG System for “Today’s AI Research News”

Modularize into Agent Roles

Orchestrate the agents as MCP controllers

Subscribe to my newsletter

Sarnab Podder

Sarnab Podder