LangGraph RAG Agent Tutorial | Basics to Advanced Multi-Agent AI Chatbot

Pradip NichitePradip Nichite
10 min read

Retrieval-Augmented Generation (RAG) is becoming the go-to pattern for building AI systems that can fetch real-time or domain-specific knowledge on demand. But RAG alone doesn’t make your chatbot smart.

With LangGraph, you can build stateful, agent-like flows that combine tools, memory, structured decision logic, and retrieval—all driven by LLMs.

In this blog, we’ll build up to a full LangGraph-based RAG Agent from scratch. We'll follow a practical path:

  1. Start with basic LLM usage

  2. Bind tools to the LLM

  3. Use LangGraph to build stateful agents

  4. Add memory, routing logic, and tool execution

  5. Finally, combine all of it with document retrieval to create a RAG-powered agent

Each section mirrors what you’d build in a notebook, but with clear explanations to help you understand why each piece matters.

Let’s start with the first building block: invoking an LLM.


🧠 Step 1: Invoking a Language Model (LLM)

To begin, we’ll use ChatOpenAI from LangChain to invoke a language model. We’ll keep it simple:

from langchain_openai import ChatOpenAI

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)

# Basic prompt
response = llm.invoke("What is artificial intelligence?")
print(response.content)

This returns a standard response from the LLM. But the real value comes when you treat the LLM like a conversation partner using message objects:

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful AI assistant that explains complex topics simply."),
    HumanMessage(content="Explain machine learning in 2 sentences.")
]

response = llm.invoke(messages)
print(response.content)

Using SystemMessage and HumanMessage gives you more control over behavior and tone. It’s also how you’ll structure inputs later when building memory-enabled and multi-step agents.

Now that we can invoke an LLM in both simple and structured ways, we’re ready to start integrating tools.


🔧 Step 2: Extending LLMs with Tools

LLMs are powerful, but they can’t do math or fetch real-time information on their own. To make your LLM truly useful, you can bind it with external tools. Here’s how:

from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun

@tool
def calculator(expression: str) -> str:
    """Calculate mathematical expressions. Use this for any math calculations."""
    try:
        result = eval(expression)
        return f"The result of {expression} is {result}"
    except Exception as e:
        return f"Error calculating {expression}: {str(e)}"

search_tool = DuckDuckGoSearchRun()

We now have two tools:

  • calculator to perform basic arithmetic

  • search_tool to fetch info from the web

To bind these tools to the LLM:

# Bind tools to the LLM
tools = [calculator, search_tool]
llm_with_tools = llm.bind_tools(tools)

Let’s test the LLM with tools:

response = llm_with_tools.invoke("What's 25 * 4 + 17?")
print(response.content)

However, when an LLM is tool-enabled, its response might include tool_calls instead of just plain text. To handle that:

def handle_tool_calls(response, tool_map):
    if not getattr(response, 'tool_calls', None):
        return

    for tool_call in response.tool_calls:
        tool_name = tool_call['name']
        args = tool_call['args']
        tool = tool_map.get(tool_name)
        if tool:
            result = tool.invoke(args)
            print(f"Tool result: {result}")

Then:

tool_map = {
    'calculator': calculator,
    'duckduckgo_search': search_tool,
}

def test_llm_tool(query):
    response = llm_with_tools.invoke(query)
    print(response.content)
    handle_tool_calls(response, tool_map)

# Run some queries
test_llm_tool("What's 25 * 4 + 17?")
test_llm_tool("Search for recent news about artificial intelligence")

With this setup, your LLM is now a tool-using agent.

Next, we’ll take this a step further by wiring everything into a LangGraph to make it stateful and multi-turn.


🧩 Step 3: Building a Basic LangGraph Chatbot

At its core, LangGraph lets you define a graph of nodes that process conversational state. Let’s begin with a minimal chatbot graph.

Define Chatbot State

from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]

Here, we define a State object that will carry the conversation. The add_messages function ensures new messages are appended correctly.

Create the Chatbot Node

def chatbot_node(state: State) -> State:
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

This node accepts messages and returns the updated state with the AI's response.

Build and Compile the Graph

from langgraph.graph import StateGraph, START, END

graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot_node)
graph_builder.add_edge(START, "chatbot")
graph_builder.add_edge("chatbot", END)
graph = graph_builder.compile()

This sets up a simple one-node chatbot pipeline. You can now test it:

def test_chatbot(message: str):
    initial_state = {"messages": [HumanMessage(content=message)]}
    result = graph.invoke(initial_state)
    print("🤖 Assistant:", result["messages"][-1].content)

test_chatbot("Hello! My name is Pradip")
test_chatbot("Do you remember my name?")

You’ll notice it doesn’t remember past messages yet. That’s what we’ll fix in the next step—by adding memory.


🧠 Step 4: Adding Memory to the Chatbot

To make the chatbot remember previous conversations, we need to add a memory backend.
LangGraph provides MemorySaver for this purpose.

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()

# Compile the graph again with memory enabled
graph_with_memory = graph_builder.compile(checkpointer=memory)

We can now run the chatbot in a threaded manner, and it will retain context:

def chat_with_memory(message: str, thread_id: str):
    config = {"configurable": {"thread_id": thread_id}}
    initial_state = {"messages": [HumanMessage(content=message)]}
    result = graph_with_memory.invoke(initial_state, config)
    print("🤖 Assistant:", result["messages"][-1].content)

# Start a conversation
chat_with_memory("Hi, my name is Pradip", thread_id="thread-1")
chat_with_memory("What's my name?", thread_id="thread-1")

With memory in place, the assistant can now recall previous messages.
This forms the foundation for building multi-turn, context-aware agents.

Next, we’ll add more intelligence to the flow using routing and tools.

🛠️ Step 5 – LangGraph Agent with Tools

So far, our chatbot can talk (Step 3) and remember context (Step 4). Now we want it to recognise when a tool is needed and call it automatically.

At a high‑level we’ll add two new pieces:

  1. chatbot node – decides whether it can answer directly or should call a tool.

  2. tools node – actually runs the requested tool‑call and passes the result back.

The conversation state stays the same – a list of LangChain Message objects – so we just rename it to emphasise the agent role:

from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    """State for our two‑node agent"""
    messages: Annotated[list[BaseMessage], add_messages]

1. Bind the LLM to our existing tools

llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)
llm_with_tools = llm.bind_tools(tools)  # `tools` already contains `calculator` and `search_tool`

Binding keeps the API exactly the same – we just swap llm for llm_with_tools when we need tool‑usage.


2. The chatbot node – decide answer vs. tool

from langchain_core.messages import HumanMessage, AIMessage

def chatbot_node(state: AgentState) -> AgentState:
    """Gatekeeper: answer directly or request a tool"""
    system_message = (
        "You are a helpful assistant.\n"
        "Use the `web_search` tool for real‑time facts and `calculator` for maths.\n"
        "Otherwise answer directly."
    )

    messages = [
        {"role": "system", "content": system_message},
        *state["messages"],
    ]

    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}  # LangGraph merges this into the running state

Key idea: we embed the routing logic inside the prompt – the LLM decides whether tool calls are needed and, if so, emits a tool_calls entry in its JSON response.


3. The tools node – run any requested tool‑calls

Instead of re‑implementing the execution loop, we reuse the pre‑built ToolNode:

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools)  # automatically dispatches and streams results back

4. Routing logic

We just need a small helper that checks whether the last message contains tool calls:

from typing import Literal

def should_continue(state: AgentState) -> Literal["tools", "end"]:
    last = state["messages"][-1]
    return "tools" if getattr(last, "tool_calls", None) else "end"

5. Wire it all together with StateGraph

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

workflow = StateGraph(AgentState)
workflow.add_node("chatbot", chatbot_node)
workflow.add_node("tools",   tool_node)

workflow.add_edge(START, "chatbot")
workflow.add_conditional_edges("chatbot", should_continue, {"tools": "tools", "end": END})
workflow.add_edge("tools", "chatbot")  # come back after tools run

app = workflow.compile(checkpointer=MemorySaver())

Why a loop back to chatbot? After a tool runs we want the LLM to integrate the tool output and craft the final answer – so the graph cycles once.


6. Quick manual test

def chat_with_agent(msg: str, thread_id="demo"):
    cfg = {"configurable": {"thread_id": thread_id}}
    state = {"messages": [HumanMessage(content=msg)]}
    result = app.invoke(state, cfg)
    print(result["messages"][-1].content)

chat_with_agent("What's 15% of 240?")
chat_with_agent("Search for recent news about artificial intelligence")

You should see the calculator and web_search tools being triggered automatically, followed by a neat, fully‑formed answer.


That’s a self‑routing, tool‑aware agent. In the next step we’ll plug a knowledge‑base retriever into the tool‑chain and teach the agent when to switch from web search to internal RAG – bringing us one step closer to a production‑ready assistant.

🔍 Step 6 – LangGraph RAG Agent

Goal: Give your agent up‑to‑date, domain‑specific knowledge so it can answer beyond the LLM’s training data.

We’ll layer retrieval, routing, and an optional web‑search fallback on top of the tool‑enabled agent from Step 5.

1️⃣ Index your documents once

# ── Build & persist a Chroma index ────────────────────────────────
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader, Docx2txtLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

SOURCE_DIR   = Path("docs")          # put your files here
INDEX_DIR    = Path("chroma_db_1")   # will be created if missing
EMBED_MODEL  = "text-embedding-3-small"

# Load docs (keep only pdf/docx for brevity)
docs = []
for f in SOURCE_DIR.glob("*.*"):
    if f.suffix == ".pdf":
        docs += PyPDFLoader(str(f)).load()
    elif f.suffix == ".docx":
        docs += Docx2txtLoader(str(f)).load()

# Split & embed
chunks     = RecursiveCharacterTextSplitter(chunk_size=1_000, chunk_overlap=200).split_documents(docs)
embeddings = OpenAIEmbeddings(model=EMBED_MODEL)

vectordb = Chroma.from_documents(
    documents         = chunks,
    embedding         = embeddings,
    persist_directory = str(INDEX_DIR),
    collection_name   = "kb_collection",
)
vectordb.persist()
print("✅ Index built →", INDEX_DIR.resolve())

Run this once; the agent will query the saved index at runtime.

2️⃣ Expose a Retriever as a LangChain Tool

retriever = vectordb.as_retriever(search_kwargs={"k": 2})

@tool
def rag_search_tool(query: str) -> str:
    """Search the knowledge‑base for relevant chunks"""
    results = retriever.invoke(query)
    return "

".join(d.page_content for d in results)
from langchain_tavily import TavilySearch

tavily = TavilySearch(max_results=3, topic="general")

@tool
def web_search_tool(query: str) -> str:
    """Up‑to‑date web info via Tavily"""
    return "

".join(r["content"] for r in tavily.invoke({"query": query})["results"])  # simplified

4️⃣ Extend the Agent State

class AgentState(State):          # add to previous `State`
    route:    str          # "rag", "answer", "web", "end"
    rag:      str | None   # KB result
    web:      str | None   # web‑search snippets

5️⃣ Decision / Execution Nodes

NodeWhat it does
router_nodeUses an LLM with structured output to decide the routerag, answer, or end.
rag_nodeRuns rag_search_tool, then asks a judge LLM if the chunks are sufficient. Sets route to answer or web.
web_nodeCalls web_search_tool and passes snippets along.
answer_nodeCrafts the final reply, combining any rag and/or web context.

Key implementation points (condensed):

# ── Structured helpers ─────────────────
class RouteDecision(BaseModel):
    route: Literal["rag", "answer", "end"]
    reply: str | None = None

class RagJudge(BaseModel):
    sufficient: bool

router_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RouteDecision)
judge_llm  = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RagJudge)
answer_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)

# ── Router ─────────────────────────────
def router_node(state: AgentState) -> AgentState:
    q = state["messages"][-1].content
    decision = router_llm.invoke([
        ("system", "Decide route: rag / answer / end"),
        ("user", q)
    ])
    new_state = {**state, "route": decision.route}
    if decision.route == "end":
        new_state["messages"] += [AIMessage(content=decision.reply or "Hello!")]
    return new_state

# ── RAG lookup ─────────────────────────
def rag_node(state: AgentState) -> AgentState:
    q = state["messages"][-1].content
    chunks = rag_search_tool.invoke(q)
    verdict = judge_llm.invoke([("user", f"Question: {q}
Docs: {chunks[:300]}…")])
    return {**state, "rag": chunks, "route": "answer" if verdict.sufficient else "web"}

# ── Web search & Answer nodes omitted for brevity (same as notebook) ──

6️⃣ Wire up the Graph

agent_graph = StateGraph(AgentState)
agent_graph.add_node("router",      router_node)
agent_graph.add_node("rag_lookup",  rag_node)
agent_graph.add_node("web_search",  web_node)
agent_graph.add_node("answer",      answer_node)

agent_graph.set_entry_point("router")
agent_graph.add_conditional_edges("router", from_router,
        {"rag": "rag_lookup", "answer": "answer", "end": END})
agent_graph.add_conditional_edges("rag_lookup", after_rag,
        {"answer": "answer", "web": "web_search"})
agent_graph.add_edge("web_search", "answer")
agent_graph.add_edge("answer", END)

agent = agent_graph.compile(checkpointer=MemorySaver())

7️⃣ Quick CLI test

if __name__ == "__main__":
    config = {"configurable": {"thread_id": "thread‑12"}}
    while True:
        q = input("You: ").strip()
        if q in {"quit", "exit"}: break
        result = agent.invoke({"messages": [HumanMessage(content=q)]}, config)
        print(result["messages"][-1].content)

Now your LangGraph agent:

  • Routes intelligently

  • Retrieves domain knowledge with RAG

  • Falls back to web search when KB is insufficient

  • Streams multi‑turn answers with memory

In short, this is a production‑ready skeleton you can plug into any project.


🚀 Conclusion & Resources

In this tutorial we climbed the ladder from basic LLM callstool‑aware agentsmemoryRAGfull multi‑step routing with LangGraph. You now have a production‑ready skeleton that can:

  • Chat naturally across turns (memory)

  • Decide when to use internal knowledge vs. external tools (router)

  • Pull trusted data from your own docs (RAG)

  • Fall back to real‑time web search when the KB is lacking


📂 Grab the code

🕹 Try the live RAG Agent: https://agent.futuresmart.ai/

🎥 Watch the build walkthrough


What’s next?

  1. Swap in your own docs. Point the loader at your knowledge base and rebuild the index.

  2. Add streaming. LangGraph supports async generators so you can pipe partial answers to the UI.

  3. Deploy. Package the graph inside a FastAPI endpoint or a serverless function and wire up a front‑end.

Got questions or improvement ideas? drop a comment under the YouTube video – I’d love to hear how you extend this skeleton!

Happy building 🛠️🤖

0
Subscribe to my newsletter

Read articles from Pradip Nichite directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Pradip Nichite
Pradip Nichite

🚀 I'm a Top Rated Plus NLP freelancer on Upwork with over $300K in earnings and a 100% Job Success rate. This journey began in 2022 after years of enriching experience in the field of Data Science. 📚 Starting my career in 2013 as a Software Developer focusing on backend and API development, I soon pursued my interest in Data Science by earning my M.Tech in IT from IIIT Bangalore, specializing in Data Science (2016 - 2018). 💼 Upon graduation, I carved out a path in the industry as a Data Scientist at MiQ (2018 - 2020) and later ascended to the role of Lead Data Scientist at Oracle (2020 - 2022). 🌐 Inspired by my freelancing success, I founded FutureSmart AI in September 2022. We provide custom AI solutions for clients using the latest models and techniques in NLP. 🎥 In addition, I run AI Demos, a platform aimed at educating people about the latest AI tools through engaging video demonstrations. 🧰 My technical toolbox encompasses: 🔧 Languages: Python, JavaScript, SQL. 🧪 ML Libraries: PyTorch, Transformers, LangChain. 🔍 Specialties: Semantic Search, Sentence Transformers, Vector Databases. 🖥️ Web Frameworks: FastAPI, Streamlit, Anvil. ☁️ Other: AWS, AWS RDS, MySQL. 🚀 In the fast-evolving landscape of AI, FutureSmart AI and I stand at the forefront, delivering cutting-edge, custom NLP solutions to clients across various industries.