Lets build Smart Agents (RAG on steroids) !

If you are new to GenerativeAI or curious of how LLMs can interact with real objects , make inferences and work for you - then this article is for you!
We will not go into the details of what are LLMs - if you want to know the magic behind the inference work then please refer to my previous articles.
Building using Traditional RAG
Smart Question-Answering Bot that can read PDFs
We will build a smart question answering bot that can read pdfs and answering questions powered by LangChain and LLAMA models.
Quick Refresh on RAG
Step 1: PDF loading
- Use PyMuPDFLoader to load PDFContent into text.
# Load the PDF file
pdf_path = "/content/bengalicookbook.pdf" # Replace with your PDF
loader = PyMuPDFLoader(pdf_path)
documents = loader.load()
Step 2: Text Chunking
RecursiveCharacterTextSplitter splits long content into chunks
Chunk size : 500 chars, Overlap : 50 for context retention
# Split into chunks
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)
Step 3: Embedding & Storage
Embedding via HuggingFaceEmbeddings (all-MiniLM-L6-v2)
Store vectorized chunks in Chroma DB
# 📊 Embed and store in Chroma
embedding_model = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vector_store = Chroma.from_documents(chunks, embedding=embedding_model, persist_directory="./chroma_db")
Step 4: Retrieval
- Setup a retriever to fetch relevant chunks for any question.
# Use LLaMA3-70B from Groq
llm = ChatGroq(model_name="llama3-70b-8192", temperature=0)
# Set up RetrievalQA
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
Step 5 : Answering with Groq
- Use ChatGroq with LLaMA via GroqAPI for fast response.
query = input("You: ")
result = qa_chain({"query": query})
answer = result['result'].strip()
If you want to directly run on google collab , here is the link : https://colab.research.google.com/drive/1sfo_gHIyTk91jd5vtGEuB0jmj6_7Rw3L?usp=sharing
That’t it - now you can upload a bunch of pdfs and have a bot directly from a folder to answer any questions about yourself.
Combining RAG with agentic workflows
In the previous example we saw how traditional RAG can be used towards a singular use case.
In its standalone format Retrieval Augmentic Generation (RAG) is reactive;
“Retrieves once , generates once.. ”.
Lets say we have to have “summarize viewpoints from multiple patient behavioral data and come up with a viewpoint , critic its own response - and generate a more balanced feedback” -
Traditional RAG :
X Lacks Planning and Reasoning
X Assumes whatever data is provided is “right”
X Cannot Parallelize
X Does not “Learn” - No correction loop
Enter the world of Agentic RAG - “From Retrieval to Autonomous Reasoning”
flowchart TD
%%Traditional RAG
A1[User Query] --> B1[Retriever]
B1 --> C1[Context Documents]
C1 --> D1[LLM Generation]
D1 --> E1[Final Answer]
%%Agentic RAG
A2[User Query] --> B2[Query Refiner Agent]
B2 --> C2a[Retriever: Web Search]
B2 --> C2b[Retriever: Vector DB]
C2a --> D2a[Reader: Web Docs]
C2b --> D2b[Reader: Vector Docs]
D2a --> E2[Synthesizer Agent]
D2b --> E2
E2 --> F2[Critic Agent]
F2 --> G2[Final Answer]
Lets use a query usecase that you can on a daily basis and dont need a blog to read from :)
“Summarize today’s top AI research news and breakthroughs.”
If we were to breakdown the above into tasks -
Collect all the AI based articles , research paper or news
Embed or store them into a VectorDB
Retrieve the most relevant ones based on your query
Summarize them using LLM.
Simple , right? Yes we will write some code towards basically brining this together using the following steps
You can run the detailed code from https://colab.research.google.com/drive/1IwxsCJAg1w575TPbuRhRScl6Dqd3BeTa?usp=sharing
Build a Basic RAG System for “Today’s AI Research News”
# Fetch daily articles from ARXIV , few blogs you follow , github trending
def fetch_all_sources() -> list[Document]:
docs = []
print("Fetching ArXiv papers...")
docs += fetch_arxiv_docs()
print("Fetching blog articles...")
docs += fetch_blog_articles(blog_urls)
print("Fetching GitHub trending projects...")
docs += fetch_github_trending()
return docs
# Save the fetched data as documents and embeddings into VectorDB
# Fetch full documents from Arxiv, blogs, GitHub
docs = fetch_all_sources()
# Embed and store in Chroma(VectorDB)
embedding = OpenAIEmbeddings()
db = Chroma.from_documents(split_docs, embedding, persist_directory="./ai_daily_db")
db.persist()
# Now query / retrieve and use LLM to generate text
retriever = db.as_retriever(search_kwargs={"k": 5})
# please refer to my previous article on fine tuning to understand the parameters
llm = OpenAI(temperature=0)
rag_chain = RetrievalQA.from_chain_type(llm=llm, retriever=retriever)
query = "Summarize today’s top AI research news and breakthroughs."
print(rag_chain.run(query))
This is the standard RAG that generates
“ Today's top AI research news and breakthroughs include the growing use of AI in various industries such as healthcare and robotics, the development of edge computing technology, and the application of AI in complex areas such as conflict resolution. Additionally, there is ongoing research and discussion on the governance framework for physical artificial intelligence (PAI) and its potential impact on society.”
Modularize into Agent Roles
Now we break the query filtering , retrieval , reader and synthesizer and critic agent to be able to improve, reason , parallelize operations in the process.
Query Refiner Agent
This agent will be responsible for cleaning or expanding on user query.
As you can see
We will maintain a history of past queries
Use an LLM prompt to rewrite the current query based on that history and current date.
# query refiner agent
from datetime import date
def refine_query(user_query: str, llm, past_queries: list[str] = None, past_answers: list[str] = None) -> str:
today = date.today().strftime("%B %d, %Y")
history_prompt = ""
if past_queries and past_answers:
past_qas = "\n".join(
[f"Q: {q}\nA: {a}" for q, a in zip(past_queries[-3:], past_answers[-3:])]
)
history_prompt = f"\n--- Past Conversations ---\n{past_qas}"
prompt = f"""
You are a query rewriting assistant for an AI research analyst. Today's date is {today}.
A user has asked:
"{user_query}"
{history_prompt}
Your job is to rewrite this query to:
- Improve clarity
- Include specific temporal markers (e.g. replace "today" with the real date)
- Add missing AI-relevant keywords like "transformers", "LLMs", "retrieval", etc.
- Leverage previous questions and answers to suggest a more informative query
Return only the rewritten query.
"""
return llm.predict(prompt).strip()
Data Retriever Agent
Retrieves the relevant doc from the VectorDB store
def retrieve_documents(query: str, vectorstore, k=5):
retriever = vectorstore.as_retriever(search_kwargs={"k": k})
return retriever.get_relevant_documents(query)
Data Reader agent
Summarizes each document
def summarize_document(doc, llm) -> str:
prompt = f"Summarize the following:\n\n{doc.page_content}"
return llm.predict(prompt)
Answer Synthesizer Agent
Compiles a comprehensive answer from the summaries. Please note how prompt is used to change the format of the answer
# synthesize answers agent
def synthesize_answer(summaries: list[str], query: str, llm) -> str:
joined = "\n".join(summaries)
prompt = f"""Using the following information, summarize today's top AI research for this query:
"{query}"
{joined}
Output a 3-5 bullet summary."""
return llm.predict(prompt)
Critic Agent
After the synthesizer agent generates the answer , Critic agent checks for quality , hallucination.
def critique_answer(answer: str, query: str, llm) -> dict:
prompt = f"""
You are an AI answer critic. Evaluate the following generated answer to the user's query.
--- User Query ---
{query}
--- Answer ---
{answer}
Provide:
1. A brief critique (1-3 sentences)
2. A verdict: either "Pass" or "Needs Revision"
Respond in this format:
Critique: <your critique>
Verdict: <Pass or Needs Revision>
"""
output = llm.predict(prompt)
# Parse result
lines = output.strip().split("\n")
critique_text = ""
verdict = "Pass"
for line in lines:
if line.lower().startswith("critique:"):
critique_text = line[len("Critique:"):].strip()
if line.lower().startswith("verdict:"):
verdict = line[len("Verdict:"):].strip()
return {"critique": critique_text, "verdict": verdict}
Now bring them together and run all the agents sequentially , the generated answer from the Synthesizer agent is
AI is being utilized in various industries, including healthcare and robotics.
Edge computing is being developed to improve data processing at the edge of a network.
AI has the potential to address global challenges and transform government processes.
The concept of physical AI and its potential governance problems are being discussed and researched.
Further research and discussions are needed to fully address these problems.
The Critique: The AI-generated answer does not fully address the query as it only mentions a few general topics related to AI research, but does not provide any specific news or breakthroughs from today. It also includes some unfounded claims, such as the potential for AI to transform government processes without providing any evidence or examples. The answer is somewhat clear and well-structured, but it lacks completeness and specificity.
Critic Verdict: Needs Revision
Now you can use the verdict to run an agentic flow by running a “while” loop , updating the docs the data is generated from.
Orchestrate the agents as MCP controllers
Fun Fact : MCP stands for “Master Control Program” - referenced from the movie Tron.
In Agentic world , MCP is a metaphor for systems that allow setting up agents through libraries like Lang-Graph, AutoGen etc. Think of it as a Command Center that brings all the agents together.
In this example we will use LangGraph to show how to control the agents we defined above and we will implement one parallel path
flowchart TD
A[User Query Input] --> B[QueryRefiner Agent]
B --> C[Retriever Agent]
C --> D[Reader Agent]
D --> E[Synthesizer Agent]
E --> F[Critic Agent]
F -->|Pass| G[Final Answer Output]
F -->|Needs Revision| C
subgraph MCP Controller
B
C
D
E
F
end
style G fill:#c6f6d5,stroke:#2f855a,stroke-width:2px
style MCP Controller fill:#f0f0f0,stroke:#999,stroke-dasharray: 5 5
The Reader agent will need to query many documents at the same time, we will implement parallel path to help speeden up the process
flowchart TD
A[Retriever Agent] --> B{{Reader Agent}}
B -->|parallel| B1[Doc 1 Summary]
B -->|parallel| B2[Doc 2 Summary]
B -->|parallel| B3[Doc 3 Summary]
B1 & B2 & B3 --> C[Synthesizer Agent]
We will create the nodes for the MCP based LangGraph
Query Node ,Retriever Node, Reader Node, Synthesizer Node, Critic Node.
Then bring it all together -
def build_graph():
workflow = StateGraph(AgentState)
workflow.set_entry_point("QueryRefiner")
workflow.add_node("QueryRefiner", query_refiner_node)
workflow.add_node("Retriever", retriever_node)
workflow.add_node("Reader", reader_node)
workflow.add_node("Synthesizer", synthesizer_node)
workflow.add_node("Critic", critic_node)
workflow.add_edge("QueryRefiner", "Retriever")
workflow.add_edge("Retriever", "Reader")
workflow.add_edge("Reader", "Synthesizer")
workflow.add_edge("Synthesizer", "Critic")
workflow.add_conditional_edges("Critic", verdict_router, {
"Needs Revision": "Retriever",
"Pass": END
})
return workflow.compile()
As you visualize from the code above - the nodes are connected by edges, and you can add conditional edges to fix the answers based on the Critic response.
Subscribe to my newsletter
Read articles from Sarnab Podder directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
