Utility for Data Reconciliation: Comparing Table Extracts and Files


In data reconciliation projects, it's crucial to ensure the integrity and consistency of data between various sources. To address this need, I have developed a Python utility that compares extracts from tables and files. This utility simplifies the process of identifying discrepancies and ensures data accuracy.
Key Features:
Compatibility with CSV and Excel: The utility can handle both CSV and Excel file formats, making it versatile for different data sources.
Visual Differentiation: It highlights matching cells in green and differing cells in red, providing clear visual feedback.
Error Handling: The script includes error handling to manage scenarios where input files may be missing.
Python Utility Code:
import pandas as pd
from openpyxl import Workbook, load_workbook
from openpyxl.styles import PatternFill
def read_file(file):
if file.endswith('.csv'):
return pd.read_csv(file, header=None)
elif file.endswith('.xlsx'):
return pd.read_excel(file, header=None)
else:
raise ValueError("Unsupported file format. Please provide a .csv or .xlsx file.")
def compare_files(file1, file2, file3):
try:
# Load the data from the files
df1 = read_file(file1)
df2 = read_file(file2)
except FileNotFoundError as e:
print(f"Error: {e}")
print(f"Please check if the files {file1} and {file2} exist.")
return
# Create a new workbook for the result
wb = Workbook()
ws = wb.active
green_fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
red_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
# Compare each cell
for i in range(df1.shape[0]):
for j in range(df1.shape[1]):
cell_value1 = df1.iloc[i, j] if i < len(df1) and j < len(df1.columns) else None
cell_value2 = df2.iloc[i, j] if i < len(df2) and j < len(df2.columns) else None
if cell_value1 == cell_value2:
ws.cell(row=i+1, column=j+1).value = f"{file1}: {cell_value1}\n{file2}: {cell_value2}"
ws.cell(row=i+1, column=j+1).fill = green_fill
else:
ws.cell(row=i+1, column=j+1).value = f"{file1}: {cell_value1}\n{file2}: {cell_value2}"
ws.cell(row=i+1, column=j+1).fill = red_fill
# Save the result to File3
wb.save(file3)
# Example usage
file1 = 'File1.csv' # or 'File1.xlsx'
file2 = 'File2.xlsx' # or 'File2.xlsx'
file3 = 'comparison_results.xlsx' # result file (should be Excel to handle cell colors)
compare_files(file1, file2, file3)
Result File(comparison_results.xlsx):
Github:
https://github.com/vipinputhanveetil/data-reconcil-util
How to Use:
Install Required Libraries:
bash
pip install pandas openpyxl
Prepare Your Files:
- Ensure your input files (
File1
andFile2
) are in either.csv
or.xlsx
format and place them in the same directory as your script.
- Ensure your input files (
Run the Script:
Update the
file1
,file2
, andfile3
variables with your actual file names.Execute the script to generate the comparison result in
File3
.
This utility is designed to make data reconciliation more efficient and error-free, saving you valuable time and effort. If you have any questions or need further customization, feel free to reach out!
Subscribe to my newsletter
Read articles from Vipin directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vipin
Vipin
Highly skilled Data Test Automation professional with over 10 years of experience in data quality assurance and software testing. Proven ability to design, execute, and automate testing across the entire SDLC (Software Development Life Cycle) utilizing Agile and Waterfall methodologies. Expertise in End-to-End DWBI project testing and experience working in GCP, AWS, and Azure cloud environments. Proficient in SQL and Python scripting for data test automation.