KGrag MCP: Enhancing Context for LLMs

KGrag MCP Server implements the Model Context Protocol (MCP) to connect language models to data, tools, and prompts in a standard and secure way, enabling the construction of truly useful workflows and agents on Knowledge Graphs and documents.

What the Model Context Protocol (MCP) Really Is

MCP is an open protocol that standardizes how applications provide context to LLMs. Essentially, it's the “USB-C of AI apps”: a unique and consistent way to expose data, functions, and reusable prompts to a model. (modelcontextprotocol.io)

In the MCP specification, servers expose three fundamental primitives:

Tools (executable actions),
Resources (data/context addressable via URI),
Prompts (reusable templates and flows).
These primitives are discovered and used by MCP clients through JSON-RPC 2.0 messages with a defined lifecycle (handshake, capability negotiation, etc.)

For communication, MCP defines standard transports stdio and HTTP “streamable” (with streaming via SSE).

Why This Matters for AI System Builders

With MCP, you can:

connect multiple sources/tools to an LLM with a single interface,
compose workflows between different servers,
maintain a clear perimeter of permissions and responsibilities between host, client, and server.

KGrag MCP Server in Brief

KGrag MCP Server is designed to ingest, semantically enrich, and query data (structured and unstructured) using:

Neo4j for knowledge graphs,
AWS S3 for storage,
Qdrant for vector search,
LLM for analysis and query response,
all orchestrated via Docker Compose for scalability and ease of deployment.

Note: as requested, no use of Redis.

Tools

1) Ingestion → Knowledge Graph

Tool ingestion(path: str): loads a file from the filesystem and populates nodes/relationships.
Resulting Resource: the “normalized” document and the IDs of the created nodes (addressable as URI).
Supporting Prompt: parser_text_prompt to extract entities/relationships from text before insertion.
This combination reflects the MCP model: tools that act, resources that describe the context, prompts to make the LLM consistent and repeatable.

2) Semantic Graph Query

Tool query(query: str): queries the graph (Neo4j + embedding in Qdrant) and returns citable evidence.
Prompt agent_query_prompt: structures the response using nodes_str, edges_str, user_query.
The MCP client discovers these tools/prompts and invokes them via standard JSON-RPC messages.

3) Standalone Extraction/Parsing

Tool extract_graph_data(raw_data: str) and tool parser(text: str): transform text/records into entities/relationships without persisting.
The result can be exposed as a disposable resource for other workflow steps.

Starting the Server

Create a .env file to use directly with --env-file .env in the docker run command:

# ====================================================
# Configurazione MCP Server - Ambiente
# ====================================================

LLM_MODEL_TYPE="openai"   # opzioni: openai | ollama | vllm
APP_ENV="production"
USER_AGENT="kgrag_agent"

# ------------------------- OpenAI -------------------------
OPENAI_API_KEY=   # Inserisci qui la tua chiave OpenAI

# ------------------------- AWS -------------------------
AWS_ACCESS_KEY_ID=        # Inserisci qui la tua AWS Access Key
AWS_SECRET_ACCESS_KEY=    # Inserisci qui la tua AWS Secret Key
AWS_REGION=               # es: eu-central-1
AWS_BUCKET_NAME=          # nome del bucket S3

# ------------------------- Generale -------------------------
COLLECTION_NAME="kgrag_data"

# ====================================================
# Embedding per Qdrant
# ====================================================
# Imposta VECTORDB_SENTENCE_TYPE su 'local' per un modello locale,
# oppure su 'hf' per scaricare automaticamente da Hugging Face.
# Alcuni modelli disponibili:
# - BAAI/bge-base-en
# - BAAI/bge-small-en-v1.5
# - snowflake/snowflake-arctic-embed-s
# - jinaai/jina-embeddings-v2-base-en
# - nomic-ai/nomic-embed-text-v1.5
# - sentence-transformers/all-MiniLM-L6-v2
# - intfloat/multilingual-e5-large
# - jinaai/jina-embeddings-v3
# (scegline uno e inseriscilo sotto)

VECTORDB_SENTENCE_TYPE="hf" 
VECTORDB_SENTENCE_MODEL="BAAI/bge-small-en-v1.5"

# ------------------------- LLM & Embedding -------------------------
LLM_MODEL_NAME="gpt-4.1-mini"
MODEL_EMBEDDING="text-embedding-3-small"

# ------------------------- Neo4j -------------------------
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD=    # Inserisci la password Neo4j
NEO4J_AUTH=        # Formato es. neo4j/<password>

# ------------------------- Redis -------------------------
REDIS_URL="redis://localhost:6379"
REDIS_HOST="localhost"
REDIS_PORT=6379
REDIS_DB=10

Consider the mapping table of all possible environment variables:

Variable	Service	Description
`APP_ENV`	Generic App	Execution environment (`production`, `development`, etc.)
`USER_AGENT`	Generic App	Client/agent identifier name (e.g., `kgrag_agent`)
`LLM_MODEL_TYPE`	LLM	Type of model used: `openai`, `ollama`, `vllm`
`LLM_MODEL_NAME`	LLM	Name of the LLM model to use (e.g., `gpt-4.1-mini`)
`OPENAI_API_KEY`	OpenAI	OpenAI API key (leave empty if not using OpenAI)
`MODEL_EMBEDDING`	LLM / Embedding	Textual embedding model (e.g., `text-embedding-3-small`)
`VECTORDB_SENTENCE_TYPE`	Qdrant / VectorDB	Embedding type: `hf` (HuggingFace) or `local` (local model)
`VECTORDB_SENTENCE_MODEL`	Qdrant / VectorDB	HuggingFace embedding model name (e.g., `BAAI/bge-small-en-v1.5`)
`AWS_ACCESS_KEY_ID`	AWS S3	AWS access key
`AWS_SECRET_ACCESS_KEY`	AWS S3	AWS secret key
`AWS_REGION`	AWS S3	AWS region (e.g., `eu-central-1`)
`AWS_BUCKET_NAME`	AWS S3	S3 bucket name
`COLLECTION_NAME`	Qdrant	Name of the collection to save embeddings (e.g., `kgrag_data`)
`NEO4J_USERNAME`	Neo4j	Neo4j access username (default: `neo4j`)
`NEO4J_PASSWORD`	Neo4j	Neo4j access password
`NEO4J_AUTH`	Neo4j	Authentication string (e.g., `neo4j/<password>`)
`REDIS_URL`	Redis	Redis connection URL (e.g., `redis://localhost:6379`)
`REDIS_HOST`	Redis	Redis host (default: `localhost`)
`REDIS_PORT`	Redis	Redis port (default: `6379`)
`REDIS_DB`	Redis	Redis DB number to use (default: `10`)
`LOKI_URL`	Loki (logging)	Loki endpoint to send logs (e.g., `http://kgrag-loki:3100/loki/api/v1/push`)

Start the container:

docker run -d \
  --name kgrag_mcp_server \
  --restart always \
  --env-file .env \
  -p 8000:8000 -p 6379:6379 -p 6333:6333 -p 6334:6334 -p 7474:7474 -p 7687:7687 \
  -v qdrant_data:/qdrant/storage:z \
  -v redis_data:/data \
  -v neo4j_data:/var/lib/neo4j/data \
  --network kgrag-network \
  ghcr.io/gzileni/kgrag_mcp_server:main

Connecting an MCP Server to VSCode with GitHub Copilot

You can use GitHub Copilot in VSCode to interact with an MCP Server and manage document ingestion in agent mode.

Quick Steps:

Open VSCode and ensure Copilot is active.
Create an mcp.json file in the project folder with this configuration:
At this point, Copilot can automatically suggest ingestion code, as well as improvements like error handling or batch processing, based on the configuration.

{
  "servers": {
    "kgrag-server": {
      "url": "http://localhost:8000/sse",
      "type": "sse"
    }
  },
  "inputs": []
}

Connecting an MCP Server to an Agent with LangGraph

Connecting an agent developed with LangGraph to an MCP Server means transforming a simple language model into a true operational assistant.
Thanks to the Model Context Protocol, the agent not only generates text but can use external tools, query knowledge graphs, load documents, and maintain persistent memory.
LangGraph provides step-by-step reasoning logic and memory management, while MCP exposes data and tools in a standard and modular way.
The result? A more powerful, transparent, and scalable agent, capable of linking artificial intelligence and business data in real workflows.

from langgraph.prebuilt import create_react_agent
from langgraph.store.memory import InMemoryStore
from langmem import create_manage_memory_tool, create_search_memory_tool
from langmem.short_term import SummarizationNode
from langchain.core.messages.utils import count_tokens_approximately
from langchain_mcp_adapters.client import MultiServerMCPClient

model: str = "gpt-4.1-mini"
model_emebdding: str = "openai:text-embedding-3-small"

# Set up store and memory saver
store = InMemoryStore(
    index={
        "dims": 1536,
        "embed": model_emebdding,
    }
)

summarize_node = SummarizationNode(
    token_counter=count_tokens_approximately,
    model=model,
    max_tokens=384,
    max_summary_tokens=128,
    output_messages_key="llm_input_messages",
)

tools = await mcp_client.get_tools()
tools. Extend([
    create_manage_memory_tool(namespace=("memories", "{user_id}")),
    create_search_memory_tool(namespace=("memories", "{user_id}")),
])

mcp_client = MultiServerMCPClient(
    {
        "kgrag": {
            # Ensure you start your kgrag server on port 8001
            "url": "http://localhost:8000/sse",
            "transport": "sse",
        }
    }
)

# Create agent with memory tools
agent = create_react_agent(
    model,
    tools=tools,
    store=store,
    prompt="You are a research assistant",
    pre_model_hook=summarize_node
)

async def run_agent(agent, prompt: str):
    async for chunk in agent.astream(
        {"messages": [("user", prompt)]}
    ):
        yield chunk

In summary, KGrag MCP Server implements the Model Context Protocol to connect language models to data, tools, and prompts in a secure and standardized way.
With technologies like Neo4j, AWS S3, and Qdrant, it simplifies data management and the creation of Knowledge Graph-based workflows.
Thanks to Docker Compose, GitHub Copilot, and LangGraph, it becomes a scalable and transparent solution to bring AI into real workflows.

KGrag MCP Server + Model Context Protocol: From Data to Context for LLMs

Table of contents