Enhance Repository Analysis with MindsDB MCP and Docs-KB MCP


Ever felt tired of staring at a massive codebase trying to figure out how things work or hunting down a bug? With MindsDB, your AI assistant can understand any codebase for you without the code ever being on your machine. Lets see how
Introduction
MindsDB is an AI federated query engine that enables humans, AI agents, and applications to get highly accurate answers across sprawled and large-scale data sources. It follows the "Connect, Unify, Respond" philosophy - connecting to hundreds of data sources, unifying them through SQL interfaces, and responding to queries using AI models as virtual tables.
Docs-KB is a CLI tool that let you create MindsDB Knowledge Base on top of open source documentation, specifically the markdown files used for writing documentation (e.g., Mintlify, Docusaurus).
Overview of MindsDB Knowledge Base
Knowledge Bases are one of MindsDB's core unification interfaces that index and organize unstructured data for efficient retrieval. They transform documents, web content, and text data into searchable repositories that can be queried using natural language. Knowledge bases work alongside other MindsDB interfaces (Views, ML Models, Jobs) to create a unified data ecosystem where you can ask questions across both structured and unstructured data sources using simple SQL syntax.
Bottom Line: MindsDB turns scattered data into an intelligent, queryable system where knowledge bases handle unstructured content while the platform federates everything through SQL. And you can do much more with it than you can imagine.
Overview of MindsDB MCP
MindsDB is also an MCP server that enables intelligent applications to query and reason over federated data from databases, data warehouses, and applications.
It supports 200+ AI and DATA integration and via MCP your agent can easily query any data handler. You can see all supported integrations here.
For this demonstration, we are only going to use GitHub Handler which is a community contributed integration.
Overview of Doc-KB and its MCP tools
Docs-KB is a CLI tool that uses MindsDB under the hood to transforms GitHub repositories into intelligent, searchable knowledge bases. It saves you from writing queries to create knowledge base and integrating embedding models, AI models and GitHub handler.
Key features:
Simple Ingestion:
docs-kb ingest mindsdb/mindsdb
- instantly create searchable docsNatural Language Queries: Ask questions like "How do I configure authentication?"
Privacy-First: Uses local Ollama models (nomic-embed-text + gemma2) - no data leaves your machine
MCP Server: Integrates with Claude Desktop, Cursor or VS code via Model Context Protocol
Tools accessible via MCP
how_to_use_docs_kb_mcp
Get usage guide for the MCP serverlist_available_repositories
List all ingested repositoriesquery_repository_docs
Search documentation with natural language (uses MindsDB KB)get_single_file
Retrieve content of a specific file from GitHubload_multiple_files
Load multiple files from GitHub concurrently
Also with doc-kb you can pass option to ingestion command to create github handler for repository in MindsDB.
Analyzing MindsDB Repository Issues with MCP Tools
In the below video I have used both MCP tools to:
Get top 5 issues from MindsDB github repository
Analyzing the issue by navigating repository docs and files via MCP tools
Please watch video to the end, you will love it
Generated Reports:
You can access the generated report here:
From Video
With just using few tools (although it calls them many times), GitHub Copilot Agent was able to identify the root cause of the bug, propose a solution, and explain why the solution might work. It also provided recommendations for testing and validation.
Important points to consider
There are few things to consider before jumping to conclusion
The GitHub Copilot Agent is using Claude Sonnet 4 which is one of the best LLM model but costly. You might need to use comparable model to get same result.
The agent always fetches the latest code from the repository, which can lead to inaccurate analysis.
For example, suppose MindsDB is using LiteLLM v1.63 (which has a bug), but the agent pulls the code of the latest version where the bug is fixed, it may not figure out the source of issue since it doesn't have access to the older version(1.63) of code.
Doing this all manually
The above approach of analysis with AI agent like GitHub Copilot can be achieved manually by following steps:
Get issues by navigating repositories in GitHub.
Clone the repository.
Open repository in VS Code.
Give issue to the agent and let it search solution for it by navigating the repository.
There is nothing wrong with above approach but for each repository we have to perform the above steps. And if issues are across multiple repository or packages then it will be difficult for agent to perform analysis.
Is it possible to build this complete solution natively in MindsDB?
Yes it is possible to build complete solution just using MindsDB data handler, AI handler and agent. Alternatively we can also achieve the same result with external agent using MindsDB MCP. It just need few changes
Challenges Faced While Building the Solution with MindsDB
Loading data with in built github handler was the only issue:
Its slow.
Give unsupported encoding error when try to fetch large number of files.
Limits to 10 files when limit is not provided.
Only return path, name and content. SHA of file can be used to check if files are changed and automate the ingestion on only changed file keeping knowledge base up to date with latest data.
Is there a solution?
Below approach can improve performance of GitHub file handler:
Using GitHub Tree API endpoint to get repository structure in single API call, then applying file or path filters on the results and making batch requests to get content for those filtered paths.
We can create another table that stores the Tree API results, containing SHA, path, and other important metadata, allowing users to experiment with different path filters to identify the files they want, then use those same filters to fetch the actual file content. In this way we do not need to change the old file table.
With SHA we can detect if file content changed or not and this can help in automating the ingestion of only changed content.
Also we can add these tables and knowledge base to agent, creating the same solution natively like we did using MindsDB MCP, Doc-KB MCP and GitHub Copilot Agent.
What else to try with this combination?
While this solution opens the door to several powerful solution using MindsDB, which enables tasks such as:
Reviewing pull requests with better contextual understanding.
Accelerating root cause analysis of bugs within and across repositories.
Answering user queries directly from documentation using natural language understanding.
Reducing the time users spend searching for information within and across multiple documentation by providing precise, AI-driven answers.
Wrapping Up
Credits
This project was developed as a submission to Quira.sh Quest 19.
This project wouldn’t be possible without the support of MindsDB and Quira community.
Resources
Generated Report on Issue analysis:
YouTube Videos
Subscribe to my newsletter
Read articles from Md Abid Hussain directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
