vault-mcp: A Scrappy, Self-Updating RAG Server for Your Markdown Hoard

Robert CollinsRobert Collins
8 min read

TL;DR: I built a tiny server that watches my Obsidian vault, re-indexes only what changed, answers questions—and exposes itself as both a REST API and an MCP server so Claude can call it directly.

Grab the repo: https://github.com/robbiemu/vault-mcp


note: The code blocks in this post, unlike my usually, are often times pseudocode to give a representational presentation of what is in the repo.

The Itch

I have a few hundred Markdown files. They started as “side-project notes” but slowly became the single source of truth that I hope to use in many different project. Not just repos but research, workflows, meeting summaries, game design docs, even hobbies.

If you want to give context to your local coding agent, searching that documentation with grep is like archaeology: slow, dusty, and never sure you’ve found all the relevant context. Copy-pasting snippets into is a total pain. There are MCP solutions for people like me, that keep project notes in Obsidian, but it’s all-or-nothing, and I follow the established practice of keeping everything unsorted, in the same vault.

So I built vault-mcp: a self-updating RAG server that watches what I choose from my vault, re-indexes only what changed, answers natural-language questions, and exposes itself as both a REST API and an MCP server that Claude (or any agent) can call directly.

It even uses a basic Merkle tree approach to avoid reprocessing 1,000 files when I fix a typo in one.


Quick Spin-Up

git clone https://github.com/robbiemu/vault-mcp
cd vault-mcp
uv sync
# edit config/app.toml → vault_dir = "/path/to/your/notes"
vault-mcp            # starts both servers
# or
vault-mcp --serve-api   # launch the REST api only

Once it’s running:

  • REST API: http://localhost:8000/docs (Swagger UI for humans)

  • MCP Server: http://localhost:8081/mcp/info (machine-readable for agents)


Two Modes, One Toggle

Pick your poison in config/app.toml:

[retrieval]
mode = "static"   # or "agentic"
modewhat it doesspeedLLM calls
staticexpand chunk to full section, no LLM⚡ < 150 ms0
agenticrewrite each chunk with context from siblings🐢 1–3 sper chunk

Let’s dig into what these actually mean.


Static Mode: Fast, Deterministic Context

This is the “I just want to see the whole section” mode. When you query your vault, vault-mcp retrieves the most relevant text chunk — say, a sentence buried in a long note. In static mode, it doesn’t touch an LLM. Instead, it expands that chunk to its parent section, using Markdown headers (#, ##) to find the boundaries.

So if your chunk is in a section titled “Multiplayer Rollback Strategy,” you get the entire section, not just the sentence. No magic, no latency, no cost. Just fast, reliable context.

It’s perfect for quick lookups, debugging, or when you want to preserve the original wording of your notes.

note: Indexing takes advantage of the Markdown structure so you shouldn’t get chunks with part nested in one section and continuing on to contain the beginning of the next.


Agentic Mode: AI-Powered Chunk Rewriting

This is where things get interesting.

In agentic mode, vault-mcp doesn’t just return raw chunks. It uses an LLM to rewrite each one, giving it context from the other retrieved chunks.

Think of it as an analyst who gets several mini-tasks: each one has a document snippet and a briefing on what the others found. They rewrite their snippet to be more self-contained and informative.

Here’s how it works:

  1. You ask: “How did we decide to do multiplayer rollback?”

  2. The system retrieves the top n most relevant chunks.

  3. For each chunk, it runs a prompt like:

User Query: How did we decide to do multiplayer rollback?
Seed Chunk (from 'Game Design Notes.md'):
We decided to use rollback because it's simpler than state synchronization.
Context from other chunks:
- “State sync would require a full game state snapshot every 50ms, which is bandwidth-heavy.”
- “Rollback was chosen after the 2023-04-12 engine team meeting, where latency was prioritized over complexity.”

Your Task: Rewrite the seed chunk to be more comprehensive, using the context above.
  1. The LLM returns a richer version:

We chose rollback over state synchronization because it’s simpler and more bandwidth-efficient. This decision was confirmed in the April 2023 engine team meeting, where low-latency gameplay was prioritized despite the added complexity of input prediction.

This is the first step toward true RAG: retrieval + refinement + generation.


Live Sync: Only the Diff

Re-indexing 1,000 files because I renamed a heading is dumb. So instead of scanning everything, vault-mcp uses a file-level Merkle tree to detect changes with cryptographic precision.

Here’s how it works:

  1. On startup, hash every .md file and build a Merkle tree from the hashes.

  2. Store the root hash and a manifest ({file_path: content_hash}).

  3. When a file changes (via watchdog), rebuild the tree.

  4. Compare the new root hash with the old one.

  5. If they differ, diff the manifests to find exactly which files were added, updated, or removed.

  6. Re-index only those files.

# shared/state_tracker.py
class StateTracker:
    def compare_states(self, old_manifest: dict, new_manifest: dict):
        old, new = set(old_manifest.keys()), set(new_manifest.keys())
        return {
            "added": list(new - old),
            "removed": list(old - new),
            "updated": [f for f in old & new if old_manifest[f] != new_manifest[f]]
        }

This makes live sync fast and reliable—even for large vaults.

A Sketch for the Future: Chunk-Level Diffing

The Merkle tree tracks file changes. The next step? Map file diffs to changed chunks and re-index only those.

Imagine:

  1. You edit a paragraph.

  2. difflib computes the character-level diff.

  3. The system maps that diff to the affected chunk(s) using start_char_idx and end_char_idx.

  4. Only those chunks are re-embedded.

This would be a massive efficiency win. The foundation is already there.


Dual-Server Architecture: You vs. Your AI

One of the most important design choices in vault-mcp is the separation between you and your AI.

So it runs two servers:

portaudiencepurpose
8000humans / scriptsREST API with Swagger UI for testing and integration
8081AI agents (e.g. Claude)MCP-compliant interface for automated tool use

You can run either, both, or neither.

Live updates

Want to bulk-edit your vault without triggering a re-index storm?

# config/app.toml
[watcher]
enabled = false

Then re-index manually:

curl -X POST http://localhost:8000/reindex

Or, if you’re only using it with something like Gemini cli or Crush and want the live updates, skip the config change and just run the MCP server:

vault-mcp --serve-mcp

This flexibility means you can tailor the server to your workflow — not the other way around.


The Pipeline: From Markdown to Meaning

1. Load Only What You Need

Before loading any files, vault-mcp applies your allowed_prefixes filter (in the app.toml configuration file). No need to load Personal Diary.md when you only care about Work - ....

# components/document_processing/document_loader.py
def load_documents(config, files_to_process=None):
    if files_to_process:
        return SimpleDirectoryReader(input_files=files_to_process).load_data()

    # Apply prefix filter *before* I/O
    vault = Path(config.paths.vault_dir)
    md_files = [
        str(p) for p in vault.rglob("*.md")
        if config.should_include_file(p.name)
    ]
    # In reality, this uses a custom ObsidianReader or SimpleDirectoryReader
    return SimpleDirectoryReader(input_files=md_files).load_data()

This filter-then-load approach saves memory and startup time.


2. Two-Stage Chunking

First, split by Markdown sections (headings). Then, split those into smaller, embeddable chunks.

# components/vault_service/main.py
initial_nodes = MarkdownNodeParser().get_nodes_from_documents(docs)
splitter = SentenceSplitter(
    chunk_size=config.indexing.chunk_size,
    chunk_overlap=config.indexing.chunk_overlap
)
final_nodes = splitter.get_nodes_from_documents([
    Document(text=n.get_content(), metadata=n.metadata) for n in initial_nodes
])

This preserves semantic boundaries while creating fine-grained search units.


3. Quality Gate

Not all chunks are worth embedding. Short, low-info snippets get filtered out.

# components/document_processing/quality_scorer.py
def score(self, text: str) -> float:
    # Simplified for clarity; actual implementation uses more heuristics
    if len(text) < 50: return 0.0
    return (
        0.4 * length_score(text) +
        0.3 * vocab_richness(text) +
        0.3 * info_density(text)
    )

Configurable via quality_threshold—turn it off if you want everything indexed.


Chunks go into ChromaDB with metadata (file path, score). Semantic search is powered by sentence-transformers.

# components/vector_store/vector_store.py
embeddings = self.embedding_model.encode([chunk["text"] for chunk in chunks])
self.collection.add(
    embeddings=embeddings,
    documents=[c["text"] for c in chunks],
    metadatas=[{"file_path": c["file_path"], "score": c["score"]} for c in chunks],
    ids=[c["chunk_id"] for c in chunks]
)

Plugging into Claude Desktop

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "vault": {
      "command": "vault-mcp",
      "args": ["--serve-mcp"]
    }
  }
}

*note: I actually don’t use that one, but the configuration is similar for agent CLIs and IDEs.
*Now you can ask:

Claude, how did we decide to do multiplayer rollback?

And it’ll pull from your vault, rewrite the chunks for context, and give your agent coherent answers.


“Real” Numbers

vault sizecold startlive re-index (1 file)search (static)
200 files / ~50k tokens~3 s< 1 s~150 ms

Agentic mode is slower due to multiple LLM calls, but the quality jump is often worth it.


What’s Next

  • Final answer synthesis: The rewritten chunks are ready—now generate a single, human-readable response.

  • Web UI: For non-coders who just want to chat with their notes.

  • Chunk-level diffing: The Merkle tree tracks file changes. But like git, we could do better if we map file diffs to changed chunks and re-index only those chunks. (The foundation is already there.)

    There are other, more speculative and research-oriented ideas in the issues section of the GitHub repo (#9 and #11).


Final Thought

vault-mcp isn’t just a tool. It’s a prototype for a new kind of personal AI: one that’s private, up-to-date, and context-aware. It’s scrappy, yes—but it works.

And it’s built on open standards (MCP), so it’s not locked into one vendor or app.

So go ahead: clone it, break it, make it yours.

Grab the repo:
https://github.com/robbiemu/vault-mcp

PRs and issues welcome. Let’s build this together.

0
Subscribe to my newsletter

Read articles from Robert Collins directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Robert Collins
Robert Collins