The Context-First Revolution: How RAG and MCP Build Smarter Financial AI 🧠

In the world of finance, every decision hinges on data,mountain of 10-Ks, 10-Qs, and earnings reports that can take teams days or weeks to process. The promise of AI is to make this work instantaneous, but a simple truth stands in the way: a Large Language Model is only as good as the context you give it.

This is where two powerful concepts, Retrieval-Augmented Generation (RAG) and the Model Context Protocol (MCP), are creating a revolution.

Our project, Lyst.ai, is a practical example of how these ideas come together to build an "AI associate" that can process years of financial data in seconds.

The Power of RAG: From Guesswork to Precision


Before RAG, asking an LLM about a specific company's financials was a coin toss. The model might hallucinate, provide outdated information, or simply miss the mark because its training data was not current or precise enough.

RAG solves this by providing a reliable source of truth. The workflow is simple yet powerful:

  1. Ingestion: You feed the system your financial documents (e.g., PDFs, text files).

  2. Vectorization: The system breaks down these documents into small chunks and converts them into numerical representations called vectors, which are then stored in a vector database.

  3. Retrieval: When you ask a question, the system searches the database to find the most relevant chunks of text.

  4. Augmentation: It then injects these retrieved chunks directly into the LLM's prompt, effectively giving the model a "cheat sheet" to generate an accurate answer.

The Problem with RAG: Fragmentation and the N x M Problem


While RAG is a massive leap forward, it still has a core weakness: scalability.

Most RAG systems are built as monolithic applications with hardcoded connections to their data sources and tools. If you want your system to also access a separate market research database, a live stock price API, or a different financial ratio calculator, you have to build a new, custom integration for each one. This creates a mess of tangled connections—the "N times M problem"—where N clients must connect to M servers, leading to a complex web of integrations that is difficult to maintain and scale.

This is the exact problem the Model Context Protocol (MCP) was created to solve.

MCP: A Standard for Intelligent AI Systems


Developed by Anthropic, the Model Context Protocol (MCP) is an open standard that provides a unified language for AI systems to communicate with data sources and tools. It creates fungibility, meaning that any MCP-compatible AI client can talk to any MCP server without needing a custom integration.

The architecture is simple and powerful:

  • Host: Your AI application (the core platform).

  • Clients: Connectors within the host that establish a session.

  • Servers: Independent services that provide context and capabilities.

In this model, servers can expose three types of building blocks:

  • Resources: Structured data like files or database schemas.

  • Tools: Executable functions the model can call to perform actions (e.g., an API call).

  • Prompts: Pre-defined instructions that the user can trigger.

MCP enables a shift from rigid, monolithic systems to flexible, composable, and intelligent agents.

Lyst.ai: A Real-World Example of MCP Principles


Our project, Lyst.ai, was built as a learning exercise to understand and apply the principles of a modular, agentic AI architecture, much like the Model Context Protocol (MCP). While we didn't implement the official MCP standard, our system's design mirrors its core philosophy, allowing us to see how a "host" orchestrates multiple "servers" to deliver a powerful application.


Our Project as an MCP Sandbox 🤖

The core of our project is a distributed orchestration engine where each component acts as a separate service. This design allowed us to treat each part as a self-contained "server" that our main application "host" could call.

  • The Lyst.ai API Server (Our "Host"): This is the central brain of our project. It's the api/server.py file that handles user requests, orchestrates the entire workflow, and manages the lifecycle of the query. It's our Host because it holds the conversation and decides which "servers" to call and when.

  • The Weaviate DB (Our "Resource Server"): We treated the Weaviate vector database as a specialized MCP Server. Its job is to provide the core Resources—the vectorized chunks of financial documents. Our main server simply sends a query to this service, and in return, gets the relevant document context. This separation means our main server doesn't need to know anything about the database's internal workings.

  • The Local LLM (Our "Tool Server"): We used Ollama to run a local large language model, treating it as another MCP Server that exposes a key Tool—the ability to reason and generate insights. When our main server needs to process a user question, it sends the retrieved context and the prompt to this service, which handles the complex AI reasoning.

  • The Excel Artifact Generation (Another "Tool Server"): The excel_artifact.py script is a perfect example of a smaller, dedicated MCP Server. Its sole purpose is to take the final results and generate a professionally formatted Excel report. Our main server simply calls this service to perform that specific task.

This modular design was the most valuable part of our project. It taught us how to build a powerful system by separating concerns into independent, swappable components. We could have easily swapped Weaviate for another vector database or Ollama for a cloud-based LLM without having to rewrite our core application logic.

The Future of Financial Services AI


The combination of RAG and an MCP-like architecture represents the future of enterprise AI. It moves us away from black-box models and toward transparent, intelligent systems that provide not only an answer but also the source of the evidence.

For financial professionals, this means:

  • Unprecedented Efficiency: Automating days of manual work into seconds.

  • Traceability & Trust: Every insight is linked directly back to its source, eliminating guesswork.

  • Scalability: The ability to seamlessly integrate new tools and data sources as needed, without reinventing the wheel.

Our project, Lyst.ai, is a testament to the fact that by focusing on a context-first approach and a modular design, even a student project can achieve the reliability and transparency required for high-stakes applications.
To see the full implementation, check out the project’s code and explore how a distributed orchestration engine brings these principles to life.

0
Subscribe to my newsletter

Read articles from Prianshu Mukherjee directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Prianshu Mukherjee
Prianshu Mukherjee