KGrag MCP Server + Model Context Protocol: From Data to Context for LLMs


KGrag MCP Server implements the Model Context Protocol (MCP) to connect language models to data, tools, and prompts in a standard and secure way, enabling the construction of truly useful workflows and agents on Knowledge Graphs and documents.
What the Model Context Protocol (MCP) Really Is
MCP is an open protocol that standardizes how applications provide context to LLMs. Essentially, it's the “USB-C of AI apps”: a unique and consistent way to expose data, functions, and reusable prompts to a model. (modelcontextprotocol.io)
In the MCP specification, servers expose three fundamental primitives:
Tools (executable actions),
Resources (data/context addressable via URI),
Prompts (reusable templates and flows).
These primitives are discovered and used by MCP clients through JSON-RPC 2.0 messages with a defined lifecycle (handshake, capability negotiation, etc.)
For communication, MCP defines standard transports stdio and HTTP “streamable” (with streaming via SSE).
Why This Matters for AI System Builders
With MCP, you can:
connect multiple sources/tools to an LLM with a single interface,
compose workflows between different servers,
maintain a clear perimeter of permissions and responsibilities between host, client, and server.
KGrag MCP Server in Brief
KGrag MCP Server is designed to ingest, semantically enrich, and query data (structured and unstructured) using:
Neo4j for knowledge graphs,
AWS S3 for storage,
Qdrant for vector search,
LLM for analysis and query response,
all orchestrated via Docker Compose for scalability and ease of deployment.
Note: as requested, no use of Redis.
Tools
1) Ingestion → Knowledge Graph
Tool
ingestion(path: str)
: loads a file from the filesystem and populates nodes/relationships.Resulting Resource: the “normalized” document and the IDs of the created nodes (addressable as URI).
Supporting Prompt:
parser_text_prompt
to extract entities/relationships from text before insertion.
This combination reflects the MCP model: tools that act, resources that describe the context, prompts to make the LLM consistent and repeatable.
2) Semantic Graph Query
Tool
query(query: str)
: queries the graph (Neo4j + embedding in Qdrant) and returns citable evidence.Prompt
agent_query_prompt
: structures the response usingnodes_str
,edges_str
,user_query
.
The MCP client discovers these tools/prompts and invokes them via standard JSON-RPC messages.
3) Standalone Extraction/Parsing
Tool
extract_graph_data(raw_data: str)
and toolparser(text: str)
: transform text/records into entities/relationships without persisting.The result can be exposed as a disposable resource for other workflow steps.
Starting the Server
- Create a .env file to use directly with
--env-file .env
in thedocker run
command:
# ====================================================
# Configurazione MCP Server - Ambiente
# ====================================================
LLM_MODEL_TYPE="openai" # opzioni: openai | ollama | vllm
APP_ENV="production"
USER_AGENT="kgrag_agent"
# ------------------------- OpenAI -------------------------
OPENAI_API_KEY= # Inserisci qui la tua chiave OpenAI
# ------------------------- AWS -------------------------
AWS_ACCESS_KEY_ID= # Inserisci qui la tua AWS Access Key
AWS_SECRET_ACCESS_KEY= # Inserisci qui la tua AWS Secret Key
AWS_REGION= # es: eu-central-1
AWS_BUCKET_NAME= # nome del bucket S3
# ------------------------- Generale -------------------------
COLLECTION_NAME="kgrag_data"
# ====================================================
# Embedding per Qdrant
# ====================================================
# Imposta VECTORDB_SENTENCE_TYPE su 'local' per un modello locale,
# oppure su 'hf' per scaricare automaticamente da Hugging Face.
# Alcuni modelli disponibili:
# - BAAI/bge-base-en
# - BAAI/bge-small-en-v1.5
# - snowflake/snowflake-arctic-embed-s
# - jinaai/jina-embeddings-v2-base-en
# - nomic-ai/nomic-embed-text-v1.5
# - sentence-transformers/all-MiniLM-L6-v2
# - intfloat/multilingual-e5-large
# - jinaai/jina-embeddings-v3
# (scegline uno e inseriscilo sotto)
VECTORDB_SENTENCE_TYPE="hf"
VECTORDB_SENTENCE_MODEL="BAAI/bge-small-en-v1.5"
# ------------------------- LLM & Embedding -------------------------
LLM_MODEL_NAME="gpt-4.1-mini"
MODEL_EMBEDDING="text-embedding-3-small"
# ------------------------- Neo4j -------------------------
NEO4J_USERNAME=neo4j
NEO4J_PASSWORD= # Inserisci la password Neo4j
NEO4J_AUTH= # Formato es. neo4j/<password>
# ------------------------- Redis -------------------------
REDIS_URL="redis://localhost:6379"
REDIS_HOST="localhost"
REDIS_PORT=6379
REDIS_DB=10
Consider the mapping table of all possible environment variables:
Variable | Service | Description |
APP_ENV | Generic App | Execution environment (production , development , etc.) |
USER_AGENT | Generic App | Client/agent identifier name (e.g., kgrag_agent ) |
LLM_MODEL_TYPE | LLM | Type of model used: openai , ollama , vllm |
LLM_MODEL_NAME | LLM | Name of the LLM model to use (e.g., gpt-4.1-mini ) |
OPENAI_API_KEY | OpenAI | OpenAI API key (leave empty if not using OpenAI) |
MODEL_EMBEDDING | LLM / Embedding | Textual embedding model (e.g., text-embedding-3-small ) |
VECTORDB_SENTENCE_TYPE | Qdrant / VectorDB | Embedding type: hf (HuggingFace) or local (local model) |
VECTORDB_SENTENCE_MODEL | Qdrant / VectorDB | HuggingFace embedding model name (e.g., BAAI/bge-small-en-v1.5 ) |
AWS_ACCESS_KEY_ID | AWS S3 | AWS access key |
AWS_SECRET_ACCESS_KEY | AWS S3 | AWS secret key |
AWS_REGION | AWS S3 | AWS region (e.g., eu-central-1 ) |
AWS_BUCKET_NAME | AWS S3 | S3 bucket name |
COLLECTION_NAME | Qdrant | Name of the collection to save embeddings (e.g., kgrag_data ) |
NEO4J_USERNAME | Neo4j | Neo4j access username (default: neo4j ) |
NEO4J_PASSWORD | Neo4j | Neo4j access password |
NEO4J_AUTH | Neo4j | Authentication string (e.g., neo4j/<password> ) |
REDIS_URL | Redis | Redis connection URL (e.g., redis:// localhost:6379 ) |
REDIS_HOST | Redis | Redis host (default: localhost ) |
REDIS_PORT | Redis | Redis port (default: 6379 ) |
REDIS_DB | Redis | Redis DB number to use (default: 10 ) |
LOKI_URL | Loki (logging) | Loki endpoint to send logs (e.g., http://kgrag-loki:3100/loki/api/v1/push ) |
- Start the container:
docker run -d \
--name kgrag_mcp_server \
--restart always \
--env-file .env \
-p 8000:8000 -p 6379:6379 -p 6333:6333 -p 6334:6334 -p 7474:7474 -p 7687:7687 \
-v qdrant_data:/qdrant/storage:z \
-v redis_data:/data \
-v neo4j_data:/var/lib/neo4j/data \
--network kgrag-network \
ghcr.io/gzileni/kgrag_mcp_server:main
Connecting an MCP Server to VSCode with GitHub Copilot
You can use GitHub Copilot in VSCode to interact with an MCP Server and manage document ingestion in agent mode.
Quick Steps:
Open VSCode and ensure Copilot is active.
Create an
mcp.json
file in the project folder with this configuration:At this point, Copilot can automatically suggest ingestion code, as well as improvements like error handling or batch processing, based on the configuration.
{
"servers": {
"kgrag-server": {
"url": "http://localhost:8000/sse",
"type": "sse"
}
},
"inputs": []
}
Connecting an MCP Server to an Agent with LangGraph
Connecting an agent developed with LangGraph to an MCP Server means transforming a simple language model into a true operational assistant.
Thanks to the Model Context Protocol, the agent not only generates text but can use external tools, query knowledge graphs, load documents, and maintain persistent memory.
LangGraph provides step-by-step reasoning logic and memory management, while MCP exposes data and tools in a standard and modular way.
The result? A more powerful, transparent, and scalable agent, capable of linking artificial intelligence and business data in real workflows.
from langgraph.prebuilt import create_react_agent
from langgraph.store.memory import InMemoryStore
from langmem import create_manage_memory_tool, create_search_memory_tool
from langmem.short_term import SummarizationNode
from langchain.core.messages.utils import count_tokens_approximately
from langchain_mcp_adapters.client import MultiServerMCPClient
model: str = "gpt-4.1-mini"
model_emebdding: str = "openai:text-embedding-3-small"
# Set up store and memory saver
store = InMemoryStore(
index={
"dims": 1536,
"embed": model_emebdding,
}
)
summarize_node = SummarizationNode(
token_counter=count_tokens_approximately,
model=model,
max_tokens=384,
max_summary_tokens=128,
output_messages_key="llm_input_messages",
)
tools = await mcp_client.get_tools()
tools. Extend([
create_manage_memory_tool(namespace=("memories", "{user_id}")),
create_search_memory_tool(namespace=("memories", "{user_id}")),
])
mcp_client = MultiServerMCPClient(
{
"kgrag": {
# Ensure you start your kgrag server on port 8001
"url": "http://localhost:8000/sse",
"transport": "sse",
}
}
)
# Create agent with memory tools
agent = create_react_agent(
model,
tools=tools,
store=store,
prompt="You are a research assistant",
pre_model_hook=summarize_node
)
async def run_agent(agent, prompt: str):
async for chunk in agent.astream(
{"messages": [("user", prompt)]}
):
yield chunk
In summary, KGrag MCP Server implements the Model Context Protocol to connect language models to data, tools, and prompts in a secure and standardized way.
With technologies like Neo4j, AWS S3, and Qdrant, it simplifies data management and the creation of Knowledge Graph-based workflows.
Thanks to Docker Compose, GitHub Copilot, and LangGraph, it becomes a scalable and transparent solution to bring AI into real workflows.
Links
Subscribe to my newsletter
Read articles from Giuseppe Zileni directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
