vault-mcp: A Scrappy, Self-Updating RAG Server for Your Markdown Hoard


TL;DR: I built a tiny server that watches my Obsidian vault, re-indexes only what changed, answers questions—and exposes itself as both a REST API and an MCP server so Claude can call it directly.
Grab the repo: https://github.com/robbiemu/vault-mcp
note: The code blocks in this post, unlike my usually, are often times pseudocode to give a representational presentation of what is in the repo.
The Itch
I have a few hundred Markdown files. They started as “side-project notes” but slowly became the single source of truth that I hope to use in many different project. Not just repos but research, workflows, meeting summaries, game design docs, even hobbies.
If you want to give context to your local coding agent, searching that documentation with grep
is like archaeology: slow, dusty, and never sure you’ve found all the relevant context. Copy-pasting snippets into is a total pain. There are MCP solutions for people like me, that keep project notes in Obsidian, but it’s all-or-nothing, and I follow the established practice of keeping everything unsorted, in the same vault.
So I built vault-mcp
: a self-updating RAG server that watches what I choose from my vault, re-indexes only what changed, answers natural-language questions, and exposes itself as both a REST API and an MCP server that Claude (or any agent) can call directly.
It even uses a basic Merkle tree approach to avoid reprocessing 1,000 files when I fix a typo in one.
Quick Spin-Up
git clone https://github.com/robbiemu/vault-mcp
cd vault-mcp
uv sync
# edit config/app.toml → vault_dir = "/path/to/your/notes"
vault-mcp # starts both servers
# or
vault-mcp --serve-api # launch the REST api only
Once it’s running:
REST API:
http://localhost:8000/docs
(Swagger UI for humans)MCP Server:
http://localhost:8081/mcp/info
(machine-readable for agents)
Two Modes, One Toggle
Pick your poison in config/app.toml
:
[retrieval]
mode = "static" # or "agentic"
mode | what it does | speed | LLM calls |
static | expand chunk to full section, no LLM | ⚡ < 150 ms | 0 |
agentic | rewrite each chunk with context from siblings | 🐢 1–3 s | per chunk |
Let’s dig into what these actually mean.
Static Mode: Fast, Deterministic Context
This is the “I just want to see the whole section” mode. When you query your vault, vault-mcp
retrieves the most relevant text chunk — say, a sentence buried in a long note. In static mode, it doesn’t touch an LLM. Instead, it expands that chunk to its parent section, using Markdown headers (#
, ##
) to find the boundaries.
So if your chunk is in a section titled “Multiplayer Rollback Strategy,” you get the entire section, not just the sentence. No magic, no latency, no cost. Just fast, reliable context.
It’s perfect for quick lookups, debugging, or when you want to preserve the original wording of your notes.
note: Indexing takes advantage of the Markdown structure so you shouldn’t get chunks with part nested in one section and continuing on to contain the beginning of the next.
Agentic Mode: AI-Powered Chunk Rewriting
This is where things get interesting.
In agentic mode, vault-mcp
doesn’t just return raw chunks. It uses an LLM to rewrite each one, giving it context from the other retrieved chunks.
Think of it as an analyst who gets several mini-tasks: each one has a document snippet and a briefing on what the others found. They rewrite their snippet to be more self-contained and informative.
Here’s how it works:
You ask: “How did we decide to do multiplayer rollback?”
The system retrieves the top n most relevant chunks.
For each chunk, it runs a prompt like:
User Query: How did we decide to do multiplayer rollback?
Seed Chunk (from 'Game Design Notes.md'):
We decided to use rollback because it's simpler than state synchronization.
Context from other chunks:
- “State sync would require a full game state snapshot every 50ms, which is bandwidth-heavy.”
- “Rollback was chosen after the 2023-04-12 engine team meeting, where latency was prioritized over complexity.”
Your Task: Rewrite the seed chunk to be more comprehensive, using the context above.
- The LLM returns a richer version:
We chose rollback over state synchronization because it’s simpler and more bandwidth-efficient. This decision was confirmed in the April 2023 engine team meeting, where low-latency gameplay was prioritized despite the added complexity of input prediction.
This is the first step toward true RAG: retrieval + refinement + generation.
Live Sync: Only the Diff
Re-indexing 1,000 files because I renamed a heading is dumb. So instead of scanning everything, vault-mcp
uses a file-level Merkle tree to detect changes with cryptographic precision.
Here’s how it works:
On startup, hash every
.md
file and build a Merkle tree from the hashes.Store the root hash and a manifest (
{file_path: content_hash}
).When a file changes (via
watchdog
), rebuild the tree.Compare the new root hash with the old one.
If they differ, diff the manifests to find exactly which files were
added
,updated
, orremoved
.Re-index only those files.
# shared/state_tracker.py
class StateTracker:
def compare_states(self, old_manifest: dict, new_manifest: dict):
old, new = set(old_manifest.keys()), set(new_manifest.keys())
return {
"added": list(new - old),
"removed": list(old - new),
"updated": [f for f in old & new if old_manifest[f] != new_manifest[f]]
}
This makes live sync fast and reliable—even for large vaults.
A Sketch for the Future: Chunk-Level Diffing
The Merkle tree tracks file changes. The next step? Map file diffs to changed chunks and re-index only those.
Imagine:
You edit a paragraph.
difflib
computes the character-level diff.The system maps that diff to the affected chunk(s) using
start_char_idx
andend_char_idx
.Only those chunks are re-embedded.
This would be a massive efficiency win. The foundation is already there.
Dual-Server Architecture: You vs. Your AI
One of the most important design choices in vault-mcp
is the separation between you and your AI.
So it runs two servers:
port | audience | purpose |
8000 | humans / scripts | REST API with Swagger UI for testing and integration |
8081 | AI agents (e.g. Claude) | MCP-compliant interface for automated tool use |
You can run either, both, or neither.
Live updates
Want to bulk-edit your vault without triggering a re-index storm?
# config/app.toml
[watcher]
enabled = false
Then re-index manually:
curl -X POST http://localhost:8000/reindex
Or, if you’re only using it with something like Gemini cli or Crush and want the live updates, skip the config change and just run the MCP server:
vault-mcp --serve-mcp
This flexibility means you can tailor the server to your workflow — not the other way around.
The Pipeline: From Markdown to Meaning
1. Load Only What You Need
Before loading any files, vault-mcp
applies your allowed_prefixes
filter (in the app.toml
configuration file). No need to load Personal Diary.md
when you only care about Work - ...
.
# components/document_processing/document_loader.py
def load_documents(config, files_to_process=None):
if files_to_process:
return SimpleDirectoryReader(input_files=files_to_process).load_data()
# Apply prefix filter *before* I/O
vault = Path(config.paths.vault_dir)
md_files = [
str(p) for p in vault.rglob("*.md")
if config.should_include_file(p.name)
]
# In reality, this uses a custom ObsidianReader or SimpleDirectoryReader
return SimpleDirectoryReader(input_files=md_files).load_data()
This filter-then-load approach saves memory and startup time.
2. Two-Stage Chunking
First, split by Markdown sections (headings). Then, split those into smaller, embeddable chunks.
# components/vault_service/main.py
initial_nodes = MarkdownNodeParser().get_nodes_from_documents(docs)
splitter = SentenceSplitter(
chunk_size=config.indexing.chunk_size,
chunk_overlap=config.indexing.chunk_overlap
)
final_nodes = splitter.get_nodes_from_documents([
Document(text=n.get_content(), metadata=n.metadata) for n in initial_nodes
])
This preserves semantic boundaries while creating fine-grained search units.
3. Quality Gate
Not all chunks are worth embedding. Short, low-info snippets get filtered out.
# components/document_processing/quality_scorer.py
def score(self, text: str) -> float:
# Simplified for clarity; actual implementation uses more heuristics
if len(text) < 50: return 0.0
return (
0.4 * length_score(text) +
0.3 * vocab_richness(text) +
0.3 * info_density(text)
)
Configurable via quality_threshold
—turn it off if you want everything indexed.
4. Store & Search
Chunks go into ChromaDB with metadata (file path, score). Semantic search is powered by sentence-transformers.
# components/vector_store/vector_store.py
embeddings = self.embedding_model.encode([chunk["text"] for chunk in chunks])
self.collection.add(
embeddings=embeddings,
documents=[c["text"] for c in chunks],
metadatas=[{"file_path": c["file_path"], "score": c["score"]} for c in chunks],
ids=[c["chunk_id"] for c in chunks]
)
Plugging into Claude Desktop
Add this to your claude_desktop_config.json
:
{
"mcpServers": {
"vault": {
"command": "vault-mcp",
"args": ["--serve-mcp"]
}
}
}
*note: I actually don’t use that one, but the configuration is similar for agent CLIs and IDEs.
*Now you can ask:
Claude, how did we decide to do multiplayer rollback?
And it’ll pull from your vault, rewrite the chunks for context, and give your agent coherent answers.
“Real” Numbers
vault size | cold start | live re-index (1 file) | search (static) |
200 files / ~50k tokens | ~3 s | < 1 s | ~150 ms |
Agentic mode is slower due to multiple LLM calls, but the quality jump is often worth it.
What’s Next
Final answer synthesis: The rewritten chunks are ready—now generate a single, human-readable response.
Web UI: For non-coders who just want to chat with their notes.
Chunk-level diffing: The Merkle tree tracks file changes. But like git, we could do better if we map file diffs to changed chunks and re-index only those chunks. (The foundation is already there.)
There are other, more speculative and research-oriented ideas in the issues section of the GitHub repo (#9 and #11).
Final Thought
vault-mcp
isn’t just a tool. It’s a prototype for a new kind of personal AI: one that’s private, up-to-date, and context-aware. It’s scrappy, yes—but it works.
And it’s built on open standards (MCP), so it’s not locked into one vendor or app.
So go ahead: clone it, break it, make it yours.
Grab the repo:
https://github.com/robbiemu/vault-mcp
PRs and issues welcome. Let’s build this together.
Subscribe to my newsletter
Read articles from Robert Collins directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
