Open-source financial research with LLMs, MCP, and the US SEC EDGAR


Overview
In 1934, US Congress created the Securities and Exchange Commission (SEC) to oversee financial markets and protect investors.
The agency was built on a simple principle: investors deserve accurate, truthful, and complete information about the companies they want to invest in.
[…] “those who seek to draw upon other people's money must be wholly candid regarding the facts on which the investor's judgment is asked.” […]
- Franklin Delano Roosevelt, 32nd President of the United States
For nearly a century, the SEC has served as the financial world's transparency watchdog, requiring public companies to disclose everything from quarterly earnings to executive compensation, from business risks to major corporate events.
Through its Electronic Data Gathering, Analysis, and Retrieval (EDGAR
) system (launched in the 1990s) the SEC has made these filings freely available to anyone with an internet connection.
In theory, this created a level playing field where individual investors could access the same data as professional investors. In practice, however, there is a gap.
Institutional investors have analysts, sophisticated software, and millions in technology infrastructure to parse, analyze, and extract insights from SEC filings. Retail investors don’t. They need to manually navigate through dense, complex documents that often span hundreds of pages.
For example, a single Apple annual report (10-K
filing) contains over 100 pages of financial data, business descriptions, and risk factors.
Page 1 of 121 of the 2024 Apple’s annual report (10-K
)
The problem goes beyond just having access to data. Professional investors systematically extract specific metrics, compare trends across quarters and years, analyze segment performance, and identify patterns that would be nearly impossible to spot through manual review.
But today things have changed, individual investors can now keep up.
Large language models, the Model Context Protocol, and programmatic access to SEC EDGAR data are finally making sophisticated financial research accessible to individual investors. You can now use AI to extract key financial metrics, perform complex analyses across multiple companies and time periods, and uncover insights that previously required specialized expertise and expensive tools.
How exactly does this work in practice? Let's dive into the technical foundation that makes this possible: the SEC EDGAR MCP Server (released with version 1-alpha
on the 21th July 2025), an open-source software built in public and maintained by the community that transforms the way how investors interact with financial research.
How AI changes everything
Before we dive into the technical details, let's understand what makes this approach fundamentally different from traditional financial research tools and workflows.
The Model Context Protocol (MCP) is an open standard that allows AI assistants to securely connect to external data sources and tools. Think of it as a universal connector that lets your AI assistant access and speak directly to databases, APIs, and services, including the SEC's EDGAR system.
Traditional financial research often involves jumping between multiple platforms: searching for companies on one site, downloading filings from another, then manually copying data into spreadsheets for analysis.
With an MCP server, your AI assistant can do all of this seamlessly in a single conversation.
From raw data to intelligence
Let's look at a practical example. Suppose you want to analyze Microsoft's financial data. Traditionally, this would involve:
Finding MSFT's recent 10-Q (quarterly report) or 10-K (annual report) filings on EDGAR
Downloading and opening multiple PDF documents
Manually searching for segment revenue data
Copying numbers into a spreadsheet
Calculating growth rates and trends
With the SEC EDGAR MCP, this entire process becomes a simple conversation: "Show me Microsoft balance sheet, income statement, and cashflow data." The AI assistant handles all the technical complexity behind the scenes and presents you with clean, formatted results:
Another example would be to analyze Apple’s latest revenue:
Or create a dashboard with charts, on the fly, based on the latest NVIDIA financial data:
We could also search for the latest insider transactions in Amazon:
Or investigate company-specific data entries from the Apple filings, and plot them:
To use this workflow, you'll need a running LLM that supports the MCP protocol, such as Claude Desktop (used in the demos). The server runs locally via Docker.
You can follow the instructions on how to install and use it here.
But wait, the SEC does provide APIs for accessing EDGAR data. So why not just use those directly?
Why not EGAR API?
The answer lies in complexity and usability. The SEC's REST APIs are powerful but require technical expertise to use effectively. You need to understand company identifiers (CIKs), filing taxonomies, XBRL structures, and how to navigate complex JSON responses. Also, you’d need to know how to code.
For a simple question like "What was Apple's revenue last quarter?" you'd need to write a software to find Apple's CIK
(Central Identifier Key), locate the right filing, parse XBRL
data, and extract the specific financial concept. All of this before you even get to analysis.
This complexity naturally leads to another question: why not just ask ChatGPT or other AI assistants directly about financial data, without an MCP server?
Why not general LLMs?
The challenge here is accuracy and currency. General-purpose AI models are trained on data with cutoff dates, meaning they lack recent financial information. While they can navigate the web and try to find the financial data, they might miss important details. When you're making investment decisions, you need current, verified, and complete data from the source.
That's exactly the problem the SEC EDGAR MCP server was designed to solve.
As you can see from this conversation, the LLM connected to the MCP server is able to consume the information based on the original filing, the best source of information available:
All data is sourced directly from NVIDIA's SEC EDGAR filing (Form 10-Q, filed May 28, 2025, Accession Number: 0001045810-25-000116) with exact precision preserved from the original XBRL data.
Inner workings of the MCP server
This open-source package is freely available for anyone to use, modify, and improve. The package provides over 20 specialized tools to LLMs, that handle everything from finding company filings to extracting complex financial metrics:
You can read more in details about each tool in the documentation.
Here's what makes it powerful:
Smart data extraction and parsing: Instead of manually parsing through hundreds of pages of financial documents, the package can automatically extract specific metrics like revenue by geographic segment, quarterly comparisons, or executive compensation data.
Multiple data sources: The package taps into several SEC data streams, from the main EDGAR database to real-time RSS feeds of new filings, so that you have access to both historical data and the latest company updates.
XBRL analysis: Modern SEC filings use XBRL (eXtensible Business Reporting Language), a structured format that makes financial data machine-readable. The package understands XBRL natively, allowing it to extract precise financial concepts rather than consuming the whole document.
Company-specific insights: Different companies report data differently. Apple might break down revenue by "Americas, Europe, and Greater China" while Microsoft uses different regional categories. The package dynamically discovers and adapts to each company's specific reporting structure.
Open-source, why it matters?
The decision to make this package open source isn't just about free access, it's about transparency and community-driven innovation.
Financial tools shouldn't be black boxes. When you're making investment decisions, you need to trust not just the data, but the methods used to extract and analyze it.
Open source means you can inspect exactly how the package works, contribute improvements, and adapt it for your specific needs. It also means the tool can evolve with the community, incorporating new features.
Looking forward
The SEC has been collecting corporate disclosures for nearly 90 years. The data is all there, freely available to anyone. But until now, extracting meaningful insights from that data required time, deep technical expertise, and expensive analytical tools.
With MCP and LLMs individual investors can ask questions in plain English and get precise answers backed by official SEC filings.
It’s not a revolutionary technology, it's simply good engineering applied to a real problem. The SEC EDGAR already provides APIs, companies already file in structured formats, AI assistants already exist.
The MCP just connects these pieces together in a way that's actually useful for investors.
Roosevelt wanted markets where individual investors could make informed decisions. The SEC provided the transparency. Now open-source tools are providing the accessibility. What took teams of analysts before can now be done in a conversation by anyone.
Maybe the information advantage that Wall Street has held for decades is disappearing.
Acknowledgements
This work wouldn't be possible without the foundation laid by many others.
The US SEC deserves recognition for the incredible work of maintaining one of the world's most comprehensive and accessible corporate disclosure systems. The EDGAR database and REST APIs provide the reliable data foundation that makes tools like this possible.
Anthropic created the Model Context Protocol standard and continues to advance the field of AI safety and capability. Their commitment to open standards enables the kind of interoperability that benefits everyone.
Links
SEC EDGAR MCP package
GitHub Repository: https://github.com/stefanoamorelli/sec-edgar-mcp
Documentation: https://sec-edgar-mcp.amorelli.tech
SEC resources
EDGAR Database: https://www.sec.gov/edgar
SEC REST APIs: https://www.sec.gov/edgar/sec-api-documentation
EDGAR Company Search: https://www.sec.gov/edgar/searchedgar/companysearch
SEC Investor.gov: https://www.investor.gov/
Model context protocol (MCP)
MCP Specification: https://modelcontextprotocol.io/
Anthropic MCP Documentation: https://docs.anthropic.com/en/docs/build-with-claude/computer-use
MCP Servers Repository: https://github.com/modelcontextprotocol/servers
Open-source packages
edgartools
(by Dwight Gunning): https://github.com/dgunning/edgartoolsdatamule
(by John Friedman): https://github.com/john-friedman/datamule-python
Financial data and analysis
XBRL International: https://www.xbrl.org/
SEC XBRL Information: https://www.sec.gov/structureddata/osd-inline-xbrl.html
OpenFIGI (Financial Instrument Global Identifier): https://www.openfigi.com/
Financial Data Transparency Act: https://www.congress.gov/bill/117th-congress/house-bill/2989
Historical context
Securities Act of 1933: https://www.investor.gov/introduction-investing/investing-basics/role-sec/laws-govern-securities-industry#secact1933
Securities Exchange Act of 1934: https://www.investor.gov/introduction-investing/investing-basics/role-sec/laws-govern-securities-industry#secexact1934
Franklin D. Roosevelt Presidential Library: https://www.fdrlibrary.org/
Subscribe to my newsletter
Read articles from Stefano Amorelli directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
