LangGraph RAG Agent Tutorial | Basics to Advanced Multi-Agent AI Chatbot


Retrieval-Augmented Generation (RAG) is becoming the go-to pattern for building AI systems that can fetch real-time or domain-specific knowledge on demand. But RAG alone doesn’t make your chatbot smart.
With LangGraph, you can build stateful, agent-like flows that combine tools, memory, structured decision logic, and retrieval—all driven by LLMs.
In this blog, we’ll build up to a full LangGraph-based RAG Agent from scratch. We'll follow a practical path:
Start with basic LLM usage
Bind tools to the LLM
Use LangGraph to build stateful agents
Add memory, routing logic, and tool execution
Finally, combine all of it with document retrieval to create a RAG-powered agent
Each section mirrors what you’d build in a notebook, but with clear explanations to help you understand why each piece matters.
Let’s start with the first building block: invoking an LLM.
🧠 Step 1: Invoking a Language Model (LLM)
To begin, we’ll use ChatOpenAI
from LangChain to invoke a language model. We’ll keep it simple:
from langchain_openai import ChatOpenAI
# Initialize the LLM
llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)
# Basic prompt
response = llm.invoke("What is artificial intelligence?")
print(response.content)
This returns a standard response from the LLM. But the real value comes when you treat the LLM like a conversation partner using message objects:
from langchain_core.messages import HumanMessage, SystemMessage
messages = [
SystemMessage(content="You are a helpful AI assistant that explains complex topics simply."),
HumanMessage(content="Explain machine learning in 2 sentences.")
]
response = llm.invoke(messages)
print(response.content)
Using SystemMessage
and HumanMessage
gives you more control over behavior and tone. It’s also how you’ll structure inputs later when building memory-enabled and multi-step agents.
Now that we can invoke an LLM in both simple and structured ways, we’re ready to start integrating tools.
🔧 Step 2: Extending LLMs with Tools
LLMs are powerful, but they can’t do math or fetch real-time information on their own. To make your LLM truly useful, you can bind it with external tools. Here’s how:
from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun
@tool
def calculator(expression: str) -> str:
"""Calculate mathematical expressions. Use this for any math calculations."""
try:
result = eval(expression)
return f"The result of {expression} is {result}"
except Exception as e:
return f"Error calculating {expression}: {str(e)}"
search_tool = DuckDuckGoSearchRun()
We now have two tools:
calculator
to perform basic arithmeticsearch_tool
to fetch info from the web
To bind these tools to the LLM:
# Bind tools to the LLM
tools = [calculator, search_tool]
llm_with_tools = llm.bind_tools(tools)
Let’s test the LLM with tools:
response = llm_with_tools.invoke("What's 25 * 4 + 17?")
print(response.content)
However, when an LLM is tool-enabled, its response might include tool_calls
instead of just plain text. To handle that:
def handle_tool_calls(response, tool_map):
if not getattr(response, 'tool_calls', None):
return
for tool_call in response.tool_calls:
tool_name = tool_call['name']
args = tool_call['args']
tool = tool_map.get(tool_name)
if tool:
result = tool.invoke(args)
print(f"Tool result: {result}")
Then:
tool_map = {
'calculator': calculator,
'duckduckgo_search': search_tool,
}
def test_llm_tool(query):
response = llm_with_tools.invoke(query)
print(response.content)
handle_tool_calls(response, tool_map)
# Run some queries
test_llm_tool("What's 25 * 4 + 17?")
test_llm_tool("Search for recent news about artificial intelligence")
With this setup, your LLM is now a tool-using agent.
Next, we’ll take this a step further by wiring everything into a LangGraph to make it stateful and multi-turn.
🧩 Step 3: Building a Basic LangGraph Chatbot
At its core, LangGraph lets you define a graph of nodes that process conversational state. Let’s begin with a minimal chatbot graph.
Define Chatbot State
from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langgraph.graph.message import add_messages
class State(TypedDict):
messages: Annotated[list[BaseMessage], add_messages]
Here, we define a State
object that will carry the conversation. The add_messages
function ensures new messages are appended correctly.
Create the Chatbot Node
def chatbot_node(state: State) -> State:
response = llm.invoke(state["messages"])
return {"messages": [response]}
This node accepts messages and returns the updated state with the AI's response.
Build and Compile the Graph
from langgraph.graph import StateGraph, START, END
graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot_node)
graph_builder.add_edge(START, "chatbot")
graph_builder.add_edge("chatbot", END)
graph = graph_builder.compile()
This sets up a simple one-node chatbot pipeline. You can now test it:
def test_chatbot(message: str):
initial_state = {"messages": [HumanMessage(content=message)]}
result = graph.invoke(initial_state)
print("🤖 Assistant:", result["messages"][-1].content)
test_chatbot("Hello! My name is Pradip")
test_chatbot("Do you remember my name?")
You’ll notice it doesn’t remember past messages yet. That’s what we’ll fix in the next step—by adding memory.
🧠 Step 4: Adding Memory to the Chatbot
To make the chatbot remember previous conversations, we need to add a memory backend.
LangGraph provides MemorySaver
for this purpose.
from langgraph.checkpoint.memory import MemorySaver
memory = MemorySaver()
# Compile the graph again with memory enabled
graph_with_memory = graph_builder.compile(checkpointer=memory)
We can now run the chatbot in a threaded manner, and it will retain context:
def chat_with_memory(message: str, thread_id: str):
config = {"configurable": {"thread_id": thread_id}}
initial_state = {"messages": [HumanMessage(content=message)]}
result = graph_with_memory.invoke(initial_state, config)
print("🤖 Assistant:", result["messages"][-1].content)
# Start a conversation
chat_with_memory("Hi, my name is Pradip", thread_id="thread-1")
chat_with_memory("What's my name?", thread_id="thread-1")
With memory in place, the assistant can now recall previous messages.
This forms the foundation for building multi-turn, context-aware agents.
Next, we’ll add more intelligence to the flow using routing and tools.
🛠️ Step 5 – LangGraph Agent with Tools
So far, our chatbot can talk (Step 3) and remember context (Step 4). Now we want it to recognise when a tool is needed and call it automatically.
At a high‑level we’ll add two new pieces:
chatbot
node – decides whether it can answer directly or should call a tool.tools
node – actually runs the requested tool‑call and passes the result back.
The conversation state stays the same – a list of LangChain Message
objects – so we just rename it to emphasise the agent role:
from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage
class AgentState(TypedDict):
"""State for our two‑node agent"""
messages: Annotated[list[BaseMessage], add_messages]
1. Bind the LLM to our existing tools
llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)
llm_with_tools = llm.bind_tools(tools) # `tools` already contains `calculator` and `search_tool`
Binding keeps the API exactly the same – we just swap llm
for llm_with_tools
when we need tool‑usage.
2. The chatbot node – decide answer vs. tool
from langchain_core.messages import HumanMessage, AIMessage
def chatbot_node(state: AgentState) -> AgentState:
"""Gatekeeper: answer directly or request a tool"""
system_message = (
"You are a helpful assistant.\n"
"Use the `web_search` tool for real‑time facts and `calculator` for maths.\n"
"Otherwise answer directly."
)
messages = [
{"role": "system", "content": system_message},
*state["messages"],
]
response = llm_with_tools.invoke(messages)
return {"messages": [response]} # LangGraph merges this into the running state
Key idea: we embed the routing logic inside the prompt – the LLM decides whether tool calls are needed and, if so, emits a tool_calls
entry in its JSON response.
3. The tools node – run any requested tool‑calls
Instead of re‑implementing the execution loop, we reuse the pre‑built ToolNode
:
from langgraph.prebuilt import ToolNode
tool_node = ToolNode(tools) # automatically dispatches and streams results back
4. Routing logic
We just need a small helper that checks whether the last message contains tool calls:
from typing import Literal
def should_continue(state: AgentState) -> Literal["tools", "end"]:
last = state["messages"][-1]
return "tools" if getattr(last, "tool_calls", None) else "end"
5. Wire it all together with StateGraph
from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver
workflow = StateGraph(AgentState)
workflow.add_node("chatbot", chatbot_node)
workflow.add_node("tools", tool_node)
workflow.add_edge(START, "chatbot")
workflow.add_conditional_edges("chatbot", should_continue, {"tools": "tools", "end": END})
workflow.add_edge("tools", "chatbot") # come back after tools run
app = workflow.compile(checkpointer=MemorySaver())
Why a loop back to
chatbot
? After a tool runs we want the LLM to integrate the tool output and craft the final answer – so the graph cycles once.
6. Quick manual test
def chat_with_agent(msg: str, thread_id="demo"):
cfg = {"configurable": {"thread_id": thread_id}}
state = {"messages": [HumanMessage(content=msg)]}
result = app.invoke(state, cfg)
print(result["messages"][-1].content)
chat_with_agent("What's 15% of 240?")
chat_with_agent("Search for recent news about artificial intelligence")
You should see the calculator and web_search tools being triggered automatically, followed by a neat, fully‑formed answer.
That’s a self‑routing, tool‑aware agent. In the next step we’ll plug a knowledge‑base retriever into the tool‑chain and teach the agent when to switch from web search to internal RAG – bringing us one step closer to a production‑ready assistant.
🔍 Step 6 – LangGraph RAG Agent
Goal: Give your agent up‑to‑date, domain‑specific knowledge so it can answer beyond the LLM’s training data.
We’ll layer retrieval, routing, and an optional web‑search fallback on top of the tool‑enabled agent from Step 5.
1️⃣ Index your documents once
# ── Build & persist a Chroma index ────────────────────────────────
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader, Docx2txtLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
SOURCE_DIR = Path("docs") # put your files here
INDEX_DIR = Path("chroma_db_1") # will be created if missing
EMBED_MODEL = "text-embedding-3-small"
# Load docs (keep only pdf/docx for brevity)
docs = []
for f in SOURCE_DIR.glob("*.*"):
if f.suffix == ".pdf":
docs += PyPDFLoader(str(f)).load()
elif f.suffix == ".docx":
docs += Docx2txtLoader(str(f)).load()
# Split & embed
chunks = RecursiveCharacterTextSplitter(chunk_size=1_000, chunk_overlap=200).split_documents(docs)
embeddings = OpenAIEmbeddings(model=EMBED_MODEL)
vectordb = Chroma.from_documents(
documents = chunks,
embedding = embeddings,
persist_directory = str(INDEX_DIR),
collection_name = "kb_collection",
)
vectordb.persist()
print("✅ Index built →", INDEX_DIR.resolve())
Run this once; the agent will query the saved index at runtime.
2️⃣ Expose a Retriever as a LangChain Tool
retriever = vectordb.as_retriever(search_kwargs={"k": 2})
@tool
def rag_search_tool(query: str) -> str:
"""Search the knowledge‑base for relevant chunks"""
results = retriever.invoke(query)
return "
".join(d.page_content for d in results)
3️⃣ Optional fallback → real‑time web search
from langchain_tavily import TavilySearch
tavily = TavilySearch(max_results=3, topic="general")
@tool
def web_search_tool(query: str) -> str:
"""Up‑to‑date web info via Tavily"""
return "
".join(r["content"] for r in tavily.invoke({"query": query})["results"]) # simplified
4️⃣ Extend the Agent State
class AgentState(State): # add to previous `State`
route: str # "rag", "answer", "web", "end"
rag: str | None # KB result
web: str | None # web‑search snippets
5️⃣ Decision / Execution Nodes
Node | What it does |
router_node | Uses an LLM with structured output to decide the route – rag, answer, or end. |
rag_node | Runs rag_search_tool , then asks a judge LLM if the chunks are sufficient. Sets route to answer or web. |
web_node | Calls web_search_tool and passes snippets along. |
answer_node | Crafts the final reply, combining any rag and/or web context. |
Key implementation points (condensed):
# ── Structured helpers ─────────────────
class RouteDecision(BaseModel):
route: Literal["rag", "answer", "end"]
reply: str | None = None
class RagJudge(BaseModel):
sufficient: bool
router_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RouteDecision)
judge_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RagJudge)
answer_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)
# ── Router ─────────────────────────────
def router_node(state: AgentState) -> AgentState:
q = state["messages"][-1].content
decision = router_llm.invoke([
("system", "Decide route: rag / answer / end"),
("user", q)
])
new_state = {**state, "route": decision.route}
if decision.route == "end":
new_state["messages"] += [AIMessage(content=decision.reply or "Hello!")]
return new_state
# ── RAG lookup ─────────────────────────
def rag_node(state: AgentState) -> AgentState:
q = state["messages"][-1].content
chunks = rag_search_tool.invoke(q)
verdict = judge_llm.invoke([("user", f"Question: {q}
Docs: {chunks[:300]}…")])
return {**state, "rag": chunks, "route": "answer" if verdict.sufficient else "web"}
# ── Web search & Answer nodes omitted for brevity (same as notebook) ──
6️⃣ Wire up the Graph
agent_graph = StateGraph(AgentState)
agent_graph.add_node("router", router_node)
agent_graph.add_node("rag_lookup", rag_node)
agent_graph.add_node("web_search", web_node)
agent_graph.add_node("answer", answer_node)
agent_graph.set_entry_point("router")
agent_graph.add_conditional_edges("router", from_router,
{"rag": "rag_lookup", "answer": "answer", "end": END})
agent_graph.add_conditional_edges("rag_lookup", after_rag,
{"answer": "answer", "web": "web_search"})
agent_graph.add_edge("web_search", "answer")
agent_graph.add_edge("answer", END)
agent = agent_graph.compile(checkpointer=MemorySaver())
7️⃣ Quick CLI test
if __name__ == "__main__":
config = {"configurable": {"thread_id": "thread‑12"}}
while True:
q = input("You: ").strip()
if q in {"quit", "exit"}: break
result = agent.invoke({"messages": [HumanMessage(content=q)]}, config)
print(result["messages"][-1].content)
Now your LangGraph agent:
Routes intelligently
Retrieves domain knowledge with RAG
Falls back to web search when KB is insufficient
Streams multi‑turn answers with memory
In short, this is a production‑ready skeleton you can plug into any project.
🚀 Conclusion & Resources
In this tutorial we climbed the ladder from basic LLM calls ➜ tool‑aware agents ➜ memory ➜ RAG ➜ full multi‑step routing with LangGraph. You now have a production‑ready skeleton that can:
Chat naturally across turns (memory)
Decide when to use internal knowledge vs. external tools (router)
Pull trusted data from your own docs (RAG)
Fall back to real‑time web search when the KB is lacking
📂 Grab the code
- Full Notebook on GitHub: LangGraph RAG Agent Notebook
🕹 Try the live RAG Agent: https://agent.futuresmart.ai/
🎥 Watch the build walkthrough
What’s next?
Swap in your own docs. Point the loader at your knowledge base and rebuild the index.
Add streaming. LangGraph supports async generators so you can pipe partial answers to the UI.
Deploy. Package the graph inside a FastAPI endpoint or a serverless function and wire up a front‑end.
Got questions or improvement ideas? drop a comment under the YouTube video – I’d love to hear how you extend this skeleton!
Happy building 🛠️🤖
Subscribe to my newsletter
Read articles from Pradip Nichite directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Pradip Nichite
Pradip Nichite
🚀 I'm a Top Rated Plus NLP freelancer on Upwork with over $300K in earnings and a 100% Job Success rate. This journey began in 2022 after years of enriching experience in the field of Data Science. 📚 Starting my career in 2013 as a Software Developer focusing on backend and API development, I soon pursued my interest in Data Science by earning my M.Tech in IT from IIIT Bangalore, specializing in Data Science (2016 - 2018). 💼 Upon graduation, I carved out a path in the industry as a Data Scientist at MiQ (2018 - 2020) and later ascended to the role of Lead Data Scientist at Oracle (2020 - 2022). 🌐 Inspired by my freelancing success, I founded FutureSmart AI in September 2022. We provide custom AI solutions for clients using the latest models and techniques in NLP. 🎥 In addition, I run AI Demos, a platform aimed at educating people about the latest AI tools through engaging video demonstrations. 🧰 My technical toolbox encompasses: 🔧 Languages: Python, JavaScript, SQL. 🧪 ML Libraries: PyTorch, Transformers, LangChain. 🔍 Specialties: Semantic Search, Sentence Transformers, Vector Databases. 🖥️ Web Frameworks: FastAPI, Streamlit, Anvil. ☁️ Other: AWS, AWS RDS, MySQL. 🚀 In the fast-evolving landscape of AI, FutureSmart AI and I stand at the forefront, delivering cutting-edge, custom NLP solutions to clients across various industries.