Building an AI-Powered Financial Research Platform with AI, NLP, and Vector Databases

Financial analysts and investment strategists spend countless hours manually gathering reports from different institutions, extracting key information, and compiling insights for clients. These documents often come in the form of lengthy PDFs, scattered across dozens of websites, and filled with dense regulatory language.
For research teams, this process is:
Time-consuming - repetitive tasks like downloading, converting, and categorizing reports eat into time that could be spent on analysis.
Error-prone - with multiple sources and formats, it’s easy to miss critical details.
Difficult to scale - as the number of clients and institutions grows, manual workflows simply can’t keep up.
The question became clear: What if we could automate this entire workflow and let AI handle the heavy lifting?
The Project
As part of my work with Encore Financial, I set out to build a platform that automated financial research workflows end-to-end. The goal was to:
Automatically collect reports from 25+ top financial institutions.
Extract and process content into structured, searchable datasets.
Leverage AI and vector databases to generate insights beyond simple keyword search.
Deliver results through a dashboard that analysts could use directly in their day-to-day work.
This wasn’t just an experiment in coding. It was about creating a system that could reduce manual work significantly and allow the research team at Encore Financial to deliver timely, data-driven advice to their 100+ existing clientele.
Technologies Used
The platform combined web automation, NLP, vector databases, and AI into a cohesive pipeline:
Python + Flask → backbone for automation and web dashboard.
Playwright, Requests, BeautifulSoup → scraping reports from institutional sites.
PyPDF2, pdfplumber → parsing PDFs into structured text.
Sentence-Transformers + ChromaDB (vector database) → embeddings and semantic search.
Claude API (Anthropic) → generating AI-powered insights and structured reports.
Google Drive API → cloud storage and categorization by timeframe (weekly, monthly, quarterly).
HTML, CSS, JavaScript → interactive dashboard for analysts.
System Architecture
Our platform is built on a clean, logical architecture that transforms scattered web data into actionable intelligence. The diagram below shows how information flows through the system, from user action to AI-powered insight.
Let's trace the two main workflows based on the diagram.
Workflow 1: The Automated Data Pipeline
This is how the system gathers its knowledge:
Initiation: The process starts when the user on the Web Dashboard sends a request to the Backend Server. The server's API Endpoints route this request to the Automation & Query Controller, which initiates the collection.
Collection & Storage: The Data Collector fetches reports, then immediately uploads the raw PDFs to Google Drive for archival.
Processing & Embedding: The collected PDFs are passed to the PDF Processor, which extracts the raw text. This text is then sent to the Text Embedder, an AI model that converts the text into meaningful numerical vectors (embeddings).
Indexing: Finally, these powerful embeddings are stored in our ChromaDB vector database, creating a searchable knowledge base.
Workflow 2: The AI-Powered Analysis
This is the interactive part where users get answers in seconds:
Trigger Insights: The user clicks the "Generate Insights" button on the Web Dashboard. The request is routed through the API Endpoints to the Automation & Query Controller.
Semantic Search: The Controller performs a "Semantic Search" on ChromaDB, which returns the most contextually relevant document snippets.
AI Synthesis: The Controller bundles this context with the user's query and sends it to the Claude API. Here, the AI synthesizes the information and formulates a comprehensive answer.
Display: The final answer is sent back through the Controller and API Endpoints to be displayed on the Web Dashboard.
This two-workflow system creates a powerful loop: it continuously ingests and processes market intelligence, then stands ready to deliver synthesized, on-demand insights.
Outcomes & Impact
By automating this workflow, the prototype demonstrated:
80% reduction in manual research time, freeing analysts to focus on strategy and client service.
Scalable coverage across 25+ leading institutions, improving reliability and depth of insights.
AI-powered reporting that generated validated investment themes and structured outputs.
Direct business value — supporting the research team in providing faster, more consistent advice to 100+ clients.
Closing Thoughts
This project taught me that the real power of AI in finance isn’t about replacing analysts; it’s about augmenting their expertise. By combining automation, vector databases, and AI models, we can shift the burden from repetitive tasks to higher-value decision-making.
The future of financial research will be defined by workflows like these: where AI handles the grunt work, and humans focus on insight, strategy, and judgment.
Subscribe to my newsletter
Read articles from Adam Yassine directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
