Boost Repository Analysis with MindsDB and Docs-KB

Ever felt tired of staring at a massive codebase trying to figure out how things work or hunting down a bug? With MindsDB, your AI assistant can understand any codebase for you without the code ever being on your machine. Lets see how

Introduction

MindsDB is an AI federated query engine that enables humans, AI agents, and applications to get highly accurate answers across sprawled and large-scale data sources. It follows the "Connect, Unify, Respond" philosophy - connecting to hundreds of data sources, unifying them through SQL interfaces, and responding to queries using AI models as virtual tables.

Docs-KB is a CLI tool that let you create MindsDB Knowledge Base on top of open source documentation, specifically the markdown files used for writing documentation (e.g., Mintlify, Docusaurus).

Overview of MindsDB Knowledge Base

Knowledge Bases are one of MindsDB's core unification interfaces that index and organize unstructured data for efficient retrieval. They transform documents, web content, and text data into searchable repositories that can be queried using natural language. Knowledge bases work alongside other MindsDB interfaces (Views, ML Models, Jobs) to create a unified data ecosystem where you can ask questions across both structured and unstructured data sources using simple SQL syntax.

Bottom Line: MindsDB turns scattered data into an intelligent, queryable system where knowledge bases handle unstructured content while the platform federates everything through SQL. And you can do much more with it than you can imagine.

Overview of MindsDB MCP

MindsDB is also an MCP server that enables intelligent applications to query and reason over federated data from databases, data warehouses, and applications.

It supports 200+ AI and DATA integration and via MCP your agent can easily query any data handler. You can see all supported integrations here.

For this demonstration, we are only going to use GitHub Handler which is a community contributed integration.

Overview of Doc-KB and its MCP tools

Docs-KB is a CLI tool that uses MindsDB under the hood to transforms GitHub repositories into intelligent, searchable knowledge bases. It saves you from writing queries to create knowledge base and integrating embedding models, AI models and GitHub handler.

Key features:

Simple Ingestion: docs-kb ingest mindsdb/mindsdb - instantly create searchable docs
Natural Language Queries: Ask questions like "How do I configure authentication?"
Privacy-First: Uses local Ollama models (nomic-embed-text + gemma2) - no data leaves your machine
MCP Server: Integrates with Claude Desktop, Cursor or VS code via Model Context Protocol

Tools accessible via MCP

how_to_use_docs_kb_mcp Get usage guide for the MCP server
list_available_repositories List all ingested repositories
query_repository_docs Search documentation with natural language (uses MindsDB KB)
get_single_file Retrieve content of a specific file from GitHub
load_multiple_files Load multiple files from GitHub concurrently

Also with doc-kb you can pass option to ingestion command to create github handler for repository in MindsDB.

Analyzing MindsDB Repository Issues with MCP Tools

In the below video I have used both MCP tools to:

Get top 5 issues from MindsDB github repository
Analyzing the issue by navigating repository docs and files via MCP tools

Please watch video to the end, you will love it

https://youtu.be/p0-1fCEDhIU

Generated Reports:

You can access the generated report here:

From Video

With just using few tools (although it calls them many times), GitHub Copilot Agent was able to identify the root cause of the bug, propose a solution, and explain why the solution might work. It also provided recommendations for testing and validation.

Important points to consider

There are few things to consider before jumping to conclusion

The GitHub Copilot Agent is using Claude Sonnet 4 which is one of the best LLM model but costly. You might need to use comparable model to get same result.
The agent always fetches the latest code from the repository, which can lead to inaccurate analysis.
For example, suppose MindsDB is using LiteLLM v1.63 (which has a bug), but the agent pulls the code of the latest version where the bug is fixed, it may not figure out the source of issue since it doesn't have access to the older version(1.63) of code.

Doing this all manually

The above approach of analysis with AI agent like GitHub Copilot can be achieved manually by following steps:

Get issues by navigating repositories in GitHub.
Clone the repository.
Open repository in VS Code.
Give issue to the agent and let it search solution for it by navigating the repository.

There is nothing wrong with above approach but for each repository we have to perform the above steps. And if issues are across multiple repository or packages then it will be difficult for agent to perform analysis.

Is it possible to build this complete solution natively in MindsDB?

Yes it is possible to build complete solution just using MindsDB data handler, AI handler and agent. Alternatively we can also achieve the same result with external agent using MindsDB MCP. It just need few changes

Challenges Faced While Building the Solution with MindsDB

Loading data with in built github handler was the only issue:

Its slow.
Give unsupported encoding error when try to fetch large number of files.
Limits to 10 files when limit is not provided.
Only return path, name and content. SHA of file can be used to check if files are changed and automate the ingestion on only changed file keeping knowledge base up to date with latest data.

Is there a solution?

Below approach can improve performance of GitHub file handler:

Using GitHub Tree API endpoint to get repository structure in single API call, then applying file or path filters on the results and making batch requests to get content for those filtered paths.
We can create another table that stores the Tree API results, containing SHA, path, and other important metadata, allowing users to experiment with different path filters to identify the files they want, then use those same filters to fetch the actual file content. In this way we do not need to change the old file table.
With SHA we can detect if file content changed or not and this can help in automating the ingestion of only changed content.
Also we can add these tables and knowledge base to agent, creating the same solution natively like we did using MindsDB MCP, Doc-KB MCP and GitHub Copilot Agent.

What else to try with this combination?

While this solution opens the door to several powerful solution using MindsDB, which enables tasks such as:

Reviewing pull requests with better contextual understanding.
Accelerating root cause analysis of bugs within and across repositories.
Answering user queries directly from documentation using natural language understanding.
Reducing the time users spend searching for information within and across multiple documentation by providing precise, AI-driven answers.

Wrapping Up

Credits

This project was developed as a submission to Quira.sh Quest 19.

This project wouldn’t be possible without the support of MindsDB and Quira community.

Resources

MindsDB
Quira.sh
Docs-KB CLI Blog
Doc-KB CLI repository
Generated Report on Issue analysis:
- GitHub Gist
- Google Drive
YouTube Videos
- Solving for MindsDB with MindsDB MCP and Doc-KB MCP
- Docs-KB CLI tool in action

Enhance Repository Analysis with MindsDB MCP and Docs-KB MCP

Table of contents