LangGraph RAG Agent: From Basics to Multi-Agent AI

Retrieval-Augmented Generation (RAG) is becoming the go-to pattern for building AI systems that can fetch real-time or domain-specific knowledge on demand. But RAG alone doesn’t make your chatbot smart.

With LangGraph, you can build stateful, agent-like flows that combine tools, memory, structured decision logic, and retrieval—all driven by LLMs.

In this blog, we’ll build up to a full LangGraph-based RAG Agent from scratch. We'll follow a practical path:

Start with basic LLM usage
Bind tools to the LLM
Use LangGraph to build stateful agents
Add memory, routing logic, and tool execution
Finally, combine all of it with document retrieval to create a RAG-powered agent

Each section mirrors what you’d build in a notebook, but with clear explanations to help you understand why each piece matters.

Let’s start with the first building block: invoking an LLM.

🧠 Step 1: Invoking a Language Model (LLM)

To begin, we’ll use ChatOpenAI from LangChain to invoke a language model. We’ll keep it simple:

from langchain_openai import ChatOpenAI

# Initialize the LLM
llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)

# Basic prompt
response = llm.invoke("What is artificial intelligence?")
print(response.content)

This returns a standard response from the LLM. But the real value comes when you treat the LLM like a conversation partner using message objects:

from langchain_core.messages import HumanMessage, SystemMessage

messages = [
    SystemMessage(content="You are a helpful AI assistant that explains complex topics simply."),
    HumanMessage(content="Explain machine learning in 2 sentences.")
]

response = llm.invoke(messages)
print(response.content)

Using SystemMessage and HumanMessage gives you more control over behavior and tone. It’s also how you’ll structure inputs later when building memory-enabled and multi-step agents.

Now that we can invoke an LLM in both simple and structured ways, we’re ready to start integrating tools.

🔧 Step 2: Extending LLMs with Tools

LLMs are powerful, but they can’t do math or fetch real-time information on their own. To make your LLM truly useful, you can bind it with external tools. Here’s how:

from langchain_core.tools import tool
from langchain_community.tools import DuckDuckGoSearchRun

@tool
def calculator(expression: str) -> str:
    """Calculate mathematical expressions. Use this for any math calculations."""
    try:
        result = eval(expression)
        return f"The result of {expression} is {result}"
    except Exception as e:
        return f"Error calculating {expression}: {str(e)}"

search_tool = DuckDuckGoSearchRun()

We now have two tools:

calculator to perform basic arithmetic
search_tool to fetch info from the web

To bind these tools to the LLM:

# Bind tools to the LLM
tools = [calculator, search_tool]
llm_with_tools = llm.bind_tools(tools)

Let’s test the LLM with tools:

response = llm_with_tools.invoke("What's 25 * 4 + 17?")
print(response.content)

However, when an LLM is tool-enabled, its response might include tool_calls instead of just plain text. To handle that:

def handle_tool_calls(response, tool_map):
    if not getattr(response, 'tool_calls', None):
        return

    for tool_call in response.tool_calls:
        tool_name = tool_call['name']
        args = tool_call['args']
        tool = tool_map.get(tool_name)
        if tool:
            result = tool.invoke(args)
            print(f"Tool result: {result}")

Then:

tool_map = {
    'calculator': calculator,
    'duckduckgo_search': search_tool,
}

def test_llm_tool(query):
    response = llm_with_tools.invoke(query)
    print(response.content)
    handle_tool_calls(response, tool_map)

# Run some queries
test_llm_tool("What's 25 * 4 + 17?")
test_llm_tool("Search for recent news about artificial intelligence")

With this setup, your LLM is now a tool-using agent.

Next, we’ll take this a step further by wiring everything into a LangGraph to make it stateful and multi-turn.

🧩 Step 3: Building a Basic LangGraph Chatbot

At its core, LangGraph lets you define a graph of nodes that process conversational state. Let’s begin with a minimal chatbot graph.

Define Chatbot State

from typing import Annotated, TypedDict
from langchain_core.messages import BaseMessage, HumanMessage, AIMessage
from langgraph.graph.message import add_messages

class State(TypedDict):
    messages: Annotated[list[BaseMessage], add_messages]

Here, we define a State object that will carry the conversation. The add_messages function ensures new messages are appended correctly.

Create the Chatbot Node

def chatbot_node(state: State) -> State:
    response = llm.invoke(state["messages"])
    return {"messages": [response]}

This node accepts messages and returns the updated state with the AI's response.

Build and Compile the Graph

from langgraph.graph import StateGraph, START, END

graph_builder = StateGraph(State)
graph_builder.add_node("chatbot", chatbot_node)
graph_builder.add_edge(START, "chatbot")
graph_builder.add_edge("chatbot", END)
graph = graph_builder.compile()

This sets up a simple one-node chatbot pipeline. You can now test it:

def test_chatbot(message: str):
    initial_state = {"messages": [HumanMessage(content=message)]}
    result = graph.invoke(initial_state)
    print("🤖 Assistant:", result["messages"][-1].content)

test_chatbot("Hello! My name is Pradip")
test_chatbot("Do you remember my name?")

You’ll notice it doesn’t remember past messages yet. That’s what we’ll fix in the next step—by adding memory.

🧠 Step 4: Adding Memory to the Chatbot

To make the chatbot remember previous conversations, we need to add a memory backend.
LangGraph provides MemorySaver for this purpose.

from langgraph.checkpoint.memory import MemorySaver

memory = MemorySaver()

# Compile the graph again with memory enabled
graph_with_memory = graph_builder.compile(checkpointer=memory)

We can now run the chatbot in a threaded manner, and it will retain context:

def chat_with_memory(message: str, thread_id: str):
    config = {"configurable": {"thread_id": thread_id}}
    initial_state = {"messages": [HumanMessage(content=message)]}
    result = graph_with_memory.invoke(initial_state, config)
    print("🤖 Assistant:", result["messages"][-1].content)

# Start a conversation
chat_with_memory("Hi, my name is Pradip", thread_id="thread-1")
chat_with_memory("What's my name?", thread_id="thread-1")

With memory in place, the assistant can now recall previous messages.
This forms the foundation for building multi-turn, context-aware agents.

Next, we’ll add more intelligence to the flow using routing and tools.

🛠️ Step 5 – LangGraph Agent with Tools

So far, our chatbot can talk (Step 3) and remember context (Step 4). Now we want it to recognise when a tool is needed and call it automatically.

At a high‑level we’ll add two new pieces:

chatbot node – decides whether it can answer directly or should call a tool.
tools node – actually runs the requested tool‑call and passes the result back.

The conversation state stays the same – a list of LangChain Message objects – so we just rename it to emphasise the agent role:

from typing import Annotated, TypedDict
from langgraph.graph.message import add_messages
from langchain_core.messages import BaseMessage

class AgentState(TypedDict):
    """State for our two‑node agent"""
    messages: Annotated[list[BaseMessage], add_messages]

1. Bind the LLM to our existing tools

llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)
llm_with_tools = llm.bind_tools(tools)  # `tools` already contains `calculator` and `search_tool`

Binding keeps the API exactly the same – we just swap llm for llm_with_tools when we need tool‑usage.

2. The chatbot node – decide answer vs. tool

from langchain_core.messages import HumanMessage, AIMessage

def chatbot_node(state: AgentState) -> AgentState:
    """Gatekeeper: answer directly or request a tool"""
    system_message = (
        "You are a helpful assistant.\n"
        "Use the `web_search` tool for real‑time facts and `calculator` for maths.\n"
        "Otherwise answer directly."
    )

    messages = [
        {"role": "system", "content": system_message},
        *state["messages"],
    ]

    response = llm_with_tools.invoke(messages)
    return {"messages": [response]}  # LangGraph merges this into the running state

Key idea: we embed the routing logic inside the prompt – the LLM decides whether tool calls are needed and, if so, emits a tool_calls entry in its JSON response.

3. The tools node – run any requested tool‑calls

Instead of re‑implementing the execution loop, we reuse the pre‑built ToolNode:

from langgraph.prebuilt import ToolNode

tool_node = ToolNode(tools)  # automatically dispatches and streams results back

4. Routing logic

We just need a small helper that checks whether the last message contains tool calls:

from typing import Literal

def should_continue(state: AgentState) -> Literal["tools", "end"]:
    last = state["messages"][-1]
    return "tools" if getattr(last, "tool_calls", None) else "end"

5. Wire it all together with `StateGraph`

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import MemorySaver

workflow = StateGraph(AgentState)
workflow.add_node("chatbot", chatbot_node)
workflow.add_node("tools",   tool_node)

workflow.add_edge(START, "chatbot")
workflow.add_conditional_edges("chatbot", should_continue, {"tools": "tools", "end": END})
workflow.add_edge("tools", "chatbot")  # come back after tools run

app = workflow.compile(checkpointer=MemorySaver())

Why a loop back to chatbot? After a tool runs we want the LLM to integrate the tool output and craft the final answer – so the graph cycles once.

6. Quick manual test

def chat_with_agent(msg: str, thread_id="demo"):
    cfg = {"configurable": {"thread_id": thread_id}}
    state = {"messages": [HumanMessage(content=msg)]}
    result = app.invoke(state, cfg)
    print(result["messages"][-1].content)

chat_with_agent("What's 15% of 240?")
chat_with_agent("Search for recent news about artificial intelligence")

You should see the calculator and web_search tools being triggered automatically, followed by a neat, fully‑formed answer.

That’s a self‑routing, tool‑aware agent. In the next step we’ll plug a knowledge‑base retriever into the tool‑chain and teach the agent when to switch from web search to internal RAG – bringing us one step closer to a production‑ready assistant.

🔍 Step 6 – LangGraph RAG Agent

Goal: Give your agent up‑to‑date, domain‑specific knowledge so it can answer beyond the LLM’s training data.

We’ll layer retrieval, routing, and an optional web‑search fallback on top of the tool‑enabled agent from Step 5.

1️⃣ Index your documents once

# ── Build & persist a Chroma index ────────────────────────────────
from pathlib import Path
from langchain_community.document_loaders import PyPDFLoader, Docx2txtLoader
from langchain_text_splitters import RecursiveCharacterTextSplitter
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

SOURCE_DIR   = Path("docs")          # put your files here
INDEX_DIR    = Path("chroma_db_1")   # will be created if missing
EMBED_MODEL  = "text-embedding-3-small"

# Load docs (keep only pdf/docx for brevity)
docs = []
for f in SOURCE_DIR.glob("*.*"):
    if f.suffix == ".pdf":
        docs += PyPDFLoader(str(f)).load()
    elif f.suffix == ".docx":
        docs += Docx2txtLoader(str(f)).load()

# Split & embed
chunks     = RecursiveCharacterTextSplitter(chunk_size=1_000, chunk_overlap=200).split_documents(docs)
embeddings = OpenAIEmbeddings(model=EMBED_MODEL)

vectordb = Chroma.from_documents(
    documents         = chunks,
    embedding         = embeddings,
    persist_directory = str(INDEX_DIR),
    collection_name   = "kb_collection",
)
vectordb.persist()
print("✅ Index built →", INDEX_DIR.resolve())

Run this once; the agent will query the saved index at runtime.

2️⃣ Expose a Retriever as a LangChain Tool

retriever = vectordb.as_retriever(search_kwargs={"k": 2})

@tool
def rag_search_tool(query: str) -> str:
    """Search the knowledge‑base for relevant chunks"""
    results = retriever.invoke(query)
    return "

".join(d.page_content for d in results)

3️⃣ Optional fallback → real‑time web search

from langchain_tavily import TavilySearch

tavily = TavilySearch(max_results=3, topic="general")

@tool
def web_search_tool(query: str) -> str:
    """Up‑to‑date web info via Tavily"""
    return "

".join(r["content"] for r in tavily.invoke({"query": query})["results"])  # simplified

4️⃣ Extend the Agent State

class AgentState(State):          # add to previous `State`
    route:    str          # "rag", "answer", "web", "end"
    rag:      str | None   # KB result
    web:      str | None   # web‑search snippets

5️⃣ Decision / Execution Nodes

Node	What it does
router_node	Uses an LLM with structured output to decide the `route` – rag, answer, or end.
rag_node	Runs `rag_search_tool`, then asks a judge LLM if the chunks are sufficient. Sets `route` to answer or web.
web_node	Calls `web_search_tool` and passes snippets along.
answer_node	Crafts the final reply, combining any `rag` and/or `web` context.

Key implementation points (condensed):

# ── Structured helpers ─────────────────
class RouteDecision(BaseModel):
    route: Literal["rag", "answer", "end"]
    reply: str | None = None

class RagJudge(BaseModel):
    sufficient: bool

router_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RouteDecision)
judge_llm  = ChatOpenAI(model="gpt-4.1-mini", temperature=0).with_structured_output(RagJudge)
answer_llm = ChatOpenAI(model="gpt-4.1-mini", temperature=0.7)

# ── Router ─────────────────────────────
def router_node(state: AgentState) -> AgentState:
    q = state["messages"][-1].content
    decision = router_llm.invoke([
        ("system", "Decide route: rag / answer / end"),
        ("user", q)
    ])
    new_state = {**state, "route": decision.route}
    if decision.route == "end":
        new_state["messages"] += [AIMessage(content=decision.reply or "Hello!")]
    return new_state

# ── RAG lookup ─────────────────────────
def rag_node(state: AgentState) -> AgentState:
    q = state["messages"][-1].content
    chunks = rag_search_tool.invoke(q)
    verdict = judge_llm.invoke([("user", f"Question: {q}
Docs: {chunks[:300]}…")])
    return {**state, "rag": chunks, "route": "answer" if verdict.sufficient else "web"}

# ── Web search & Answer nodes omitted for brevity (same as notebook) ──

6️⃣ Wire up the Graph

agent_graph = StateGraph(AgentState)
agent_graph.add_node("router",      router_node)
agent_graph.add_node("rag_lookup",  rag_node)
agent_graph.add_node("web_search",  web_node)
agent_graph.add_node("answer",      answer_node)

agent_graph.set_entry_point("router")
agent_graph.add_conditional_edges("router", from_router,
        {"rag": "rag_lookup", "answer": "answer", "end": END})
agent_graph.add_conditional_edges("rag_lookup", after_rag,
        {"answer": "answer", "web": "web_search"})
agent_graph.add_edge("web_search", "answer")
agent_graph.add_edge("answer", END)

agent = agent_graph.compile(checkpointer=MemorySaver())

7️⃣ Quick CLI test

if __name__ == "__main__":
    config = {"configurable": {"thread_id": "thread‑12"}}
    while True:
        q = input("You: ").strip()
        if q in {"quit", "exit"}: break
        result = agent.invoke({"messages": [HumanMessage(content=q)]}, config)
        print(result["messages"][-1].content)

Now your LangGraph agent:

Routes intelligently
Retrieves domain knowledge with RAG
Falls back to web search when KB is insufficient
Streams multi‑turn answers with memory

In short, this is a production‑ready skeleton you can plug into any project.

🚀 Conclusion & Resources

In this tutorial we climbed the ladder from basic LLM calls ➜ tool‑aware agents ➜ memory ➜ RAG ➜ full multi‑step routing with LangGraph. You now have a production‑ready skeleton that can:

Chat naturally across turns (memory)
Decide when to use internal knowledge vs. external tools (router)
Pull trusted data from your own docs (RAG)
Fall back to real‑time web search when the KB is lacking

📂 Grab the code

Full Notebook on GitHub: LangGraph RAG Agent Notebook

🕹 Try the live RAG Agent: https://agent.futuresmart.ai/

🎥 Watch the build walkthrough

https://youtu.be/60XDTWhklLA

What’s next?

Swap in your own docs. Point the loader at your knowledge base and rebuild the index.
Add streaming. LangGraph supports async generators so you can pipe partial answers to the UI.
Deploy. Package the graph inside a FastAPI endpoint or a serverless function and wire up a front‑end.

Got questions or improvement ideas? drop a comment under the YouTube video – I’d love to hear how you extend this skeleton!

Happy building 🛠️🤖

LangGraph RAG Agent Tutorial | Basics to Advanced Multi-Agent AI Chatbot

Table of contents

🧠 Step 1: Invoking a Language Model (LLM)

🔧 Step 2: Extending LLMs with Tools

🧩 Step 3: Building a Basic LangGraph Chatbot

Define Chatbot State

Create the Chatbot Node

Build and Compile the Graph

🧠 Step 4: Adding Memory to the Chatbot

🛠️ Step 5 – LangGraph Agent with Tools

1. Bind the LLM to our existing tools

2. The chatbot node – decide answer vs. tool

3. The tools node – run any requested tool‑calls

4. Routing logic

5. Wire it all together with `StateGraph`

6. Quick manual test

🔍 Step 6 – LangGraph RAG Agent

1️⃣ Index your documents once

2️⃣ Expose a Retriever as a LangChain Tool

3️⃣ Optional fallback → real‑time web search

4️⃣ Extend the Agent State

5️⃣ Decision / Execution Nodes

6️⃣ Wire up the Graph

7️⃣ Quick CLI test

🚀 Conclusion & Resources

📂 Grab the code

🎥 Watch the build walkthrough

What’s next?

Subscribe to my newsletter

Pradip Nichite

Pradip Nichite

LangGraph RAG Agent Tutorial | Basics to Advanced Multi-Agent AI Chatbot

Table of contents

🧠 Step 1: Invoking a Language Model (LLM)

🔧 Step 2: Extending LLMs with Tools

🧩 Step 3: Building a Basic LangGraph Chatbot

Define Chatbot State

Create the Chatbot Node

Build and Compile the Graph

🧠 Step 4: Adding Memory to the Chatbot

🛠️ Step 5 – LangGraph Agent with Tools

1. Bind the LLM to our existing tools

2. The chatbot node – decide answer vs. tool

3. The tools node – run any requested tool‑calls

4. Routing logic

5. Wire it all together with StateGraph

6. Quick manual test

🔍 Step 6 – LangGraph RAG Agent

1️⃣ Index your documents once

2️⃣ Expose a Retriever as a LangChain Tool

3️⃣ Optional fallback → real‑time web search

4️⃣ Extend the Agent State

5️⃣ Decision / Execution Nodes

6️⃣ Wire up the Graph

7️⃣ Quick CLI test

🚀 Conclusion & Resources

📂 Grab the code

🎥 Watch the build walkthrough

What’s next?

Subscribe to my newsletter

Pradip Nichite

Pradip Nichite

5. Wire it all together with `StateGraph`