AInvoice


Project Type: Data Engineering + Dashboarding + NLP/OCR
Tech Stack: Python, OpenAI API, Tesseract OCR, SQLite, Power BI
Project Focus: Multicurrency invoice parsing, FX normalization, interactive dashboard
๐ Project Overview
AInvoice is a fully automated pipeline to extract structured data from scanned invoices or images (in multiple currencies), normalize the amounts via live FX conversion, and store the results in a SQLite database. The pipeline outputs a .csv
file ready for visualization in Power BI, featuring a dashboard that tracks monthly expenses, high-value invoices, and spending per supplier.
This project combines OCR, NLP, FX APIs, and data modeling โ all orchestrated in Python, and visualized in a polished dashboard.
๐งพ Folder Structure & Components
AInvoice/
โ
โโโ main.py # Main execution script
โโโ functions.py # OCR, API, and processing functions
โโโ prompt.py # Prompt template for OpenAI extraction
โโโ queries.sql # SQL file with useful queries
โโโ environment.yml # Conda environment config
โโโ .env # Template for API keys (OpenAI + FX)
โ
โโโ invoices/ # Monthly folders with invoice files (add your invoices manually here)
โ โโโ 01_January/
โ โโโ ...
โ โโโ 12_December/
โ
โโโ PowerBI/ # Power BI visual templates
โ โโโ Dashboard_AInvoice.pbix # Power BI clean template
โ โโโ Dashboard_AInvoice (reference).pbix # Layout reference
โ โโโ IMPORTANT - READ.txt # Dashboard instructions
โ
โโโ invoices_export.csv # Optional CSV export from SQLite queries for Power BI
๐ Important: Make sure the
.env
file is configured with your OpenAI and FX API keys.
โ๏ธ Environment Setup
1. Install prerequisites:
sqlite3
(already included in most Python installs)
2. Create the Conda environment:
conda env create -f environment.yml
3. Activate the environment and install Jupyter kernel (for interactive development):
conda activate invoice-extractor
pip install ipykernel
python -m ipykernel install --user --name=invoice-extractor
๐ง How the Pipeline Works
OCR Extraction
Each file in/invoices/
is parsed using Tesseract and OpenAI to identify fields likesupplier
,description
,date
,import
,currency
.Currency Conversion
A live FX rate API (like Fixer.io) is used to convert all amounts into euros.Data Storage
Results are saved into a local SQLite database (invoices.db
) with automatic schema creation.CSV Export
A finalinvoices_export.csv
is created for use in Power BI or Excel.
๐ Execution of main.py
in VS Code (Interactive Window):
๐งช SQLite Queries
Use the queries.sql
file to interact with your database. To launch SQLite in Anaconda Powershell or CMD (from the project root where invoices.db is located):
sqlite3 invoices.db
.read queries.sql
Example queries include:
Selecting total spending
Filtering invoices by date or supplier
Exporting to
.csv
(optional)
๐ Power BI Integration
1. Load invoices_export.csv
Go to Home > Get Data > Text/CSV and select the file. Configure encoding as 65001: Unicode (UTF-8)
.
2. Handle Import Formatting
To avoid issues with decimal/comma separators and scientific notation:
After selecting
invoices_export.csv
, click Transform Data instead of Load.In Power Query Editor, change the
import
column type to Text first.- When prompted, click Replace current.
Then open the type dropdown again and choose Using Locale....
Set the type to Fixed decimal number
Set Locale to English (United States)
This double conversion (Text โ Fixed decimal with Locale) ensures proper parsing of both scientific notation and decimal separators across regional settings.
3. Close & Apply
Click Close & Apply to load the cleaned data into Power BI.
4. Format Column
In the Column Tools tab:
Data type:
Fixed decimal number
Format:
Currency
Decimal places: 2
๐ Formatted table in Power BI:
๐ Dashboard Output
The dashboard shows:
๐ข Number of invoices
๐ฐ Total spending
๐ Monthly trend of expenses
๐ข Breakdown by supplier
๐งพ Highest value invoices
All visuals are updated dynamically based on the contents of invoices_export.csv
.
๐ Dashboard Preview:
โ Final Notes
Ensure monthly folders are not empty, or they'll be skipped automatically.
Unsupported or missing currency labels will be logged and excluded.
Add or remove currencies from
currency_map
inmain.py
.The project supports adding more FX providers if needed.
You can modify the OpenAI model in
functions.py
(e.g.gpt-4
,gpt-3.5-turbo
).Add or change fields by editing the prompt template in
prompt.py
.
๐ Security Tip
Never upload your real .env
file to GitHub. Always include only the .env.example
template in version control.
๐ Explore the Project
๐ GitHub repo
๐ Power BI dashboard
๐ฌ Feedback & Contact
If you found this useful or have questions, feel free to reach out or fork the repo on GitHub!
Built with ๐ป Python, ๐ Power BI, and ๐ค OpenAI โ by Tayfur Akkaya
Subscribe to my newsletter
Read articles from Tayfur Akkaya Clavijo directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Tayfur Akkaya Clavijo
Tayfur Akkaya Clavijo
I'm an AI, data and finance enthusiast. Passionate about lifelong learning and cross-sector innovation, Iโm always looking to collaborate on impactful projects at the intersection of data, strategy, and technology.