Utility for Data Reconciliation: Comparing Table Extracts and Files


In data reconciliation projects, it's crucial to ensure the integrity and consistency of data between various sources. To address this need, I have developed a Python utility that compares extracts from tables and files. This utility simplifies the process of identifying discrepancies and ensures data accuracy.
Key Features:
Compatibility with CSV and Excel: The utility can handle both CSV and Excel file formats, making it versatile for different data sources.
Visual Differentiation: It highlights matching cells in green and differing cells in red, providing clear visual feedback.
Error Handling: The script includes error handling to manage scenarios where input files may be missing.
Python Utility Code:
import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import PatternFill
import csv
import os
import time
def load_file(file_path):
try:
if file_path.endswith('.csv'):
with open(file_path, newline='') as csvfile:
reader = list(csv.reader(csvfile))
return reader
elif file_path.endswith('.xlsx'):
wb = load_workbook(file_path)
sheet = wb.active
return [[cell.value for cell in row] for row in sheet.iter_rows()]
else:
raise ValueError("Unsupported file format")
except FileNotFoundError as fnf_error:
print(f"Error: {fnf_error}")
except Exception as e:
print(f"An unexpected error occurred: {e}")
def compare_and_highlight(input_src_file, input_tgt_file, xlsx_file3_path,rextract_ref):
src_file = input_src_file.split('\\')
tgt_file = input_tgt_file.split('\\')
print(f"{rextract_ref} comparison started")
log_file.write(f"\nComparing {rextract_ref}: Source Extract:{src_file[2]}, Target Extract:{tgt_file[2]}\n")
#time.sleep(1)
# Load the files
flag=0
data1 = load_file(input_src_file)
data2 = load_file(input_tgt_file)
# Create a new Excel workbook
wb3 = openpyxl.Workbook()
sheet3 = wb3.active
# Define the fill colors
green_fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
red_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
# Get the max row and column count
max_rows = max(len(data1), len(data2))
max_cols = max(len(data1[0]), len(data2[0]))
# Compare the cells and write to the new Excel file
for row in range(max_rows):
for col in range(max_cols):
cell1 = data1[row][col] if row < len(data1) and col < len(data1[row]) else None
cell2 = data2[row][col] if row < len(data2) and col < len(data2[row]) else None
if cell1 == cell2:
sheet3.cell(row=row+1, column=col+1).value = f"{input_src_file}: {cell1}\n {input_tgt_file}: {cell2}"
sheet3.cell(row=row+1, column=col+1).fill = green_fill
else:
sheet3.cell(row=row+1, column=col+1).value = f"{input_src_file}: {cell1}\n {input_tgt_file}: {cell2}"
sheet3.cell(row=row+1, column=col+1).fill = red_fill
flag=1
# Save the results to the Excel file
wb3.save(xlsx_file3_path)
if flag==0:
#print("No difference observed.")
log_file.write(f"Source and Target extract matching. SUCCESS.\n")
log_file.write(f"--------")
else:
log_file.write(f"####################################Mismatch between source and target extract. FAIL####################################\n")
log_file.write(f"--------")
print(f"{rextract_ref} comparison completed")
with open("utility_execution_logs.txt", "w") as log_file:
with open('config.txt', 'r') as file:
# Read each line in the file
for line in file:
# Remove any leading/trailing whitespace, including newline characters
line = line.strip()
words = line.split(',')
rextract_ref=words[0]
input_src_file =words[1]
input_tgt_file=words[2]
output_result_file=words[3]
compare_and_highlight(input_src_file, input_tgt_file, output_result_file,rextract_ref)
Result File(comparison_results.xlsx):
Github:
https://github.com/vipinputhanveetil/data-reconcil-util
How to Use:
Install Required Libraries:
bash
pip install pandas openpyxl
Prepare Your Files:
- Ensure your input files (
File1
andFile2
) are in either.csv
or.xlsx
format and place them in the same directory as your script.
- Ensure your input files (
Run the Script:
Update the
file1
,file2
, andfile3
variables with your actual file names.Execute the script to generate the comparison result in
File3
.
This utility is designed to make data reconciliation more efficient and error-free, saving you valuable time and effort. If you have any questions or need further customization, feel free to reach out!
Subscribe to my newsletter
Read articles from Vipin directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Vipin
Vipin
Highly skilled Data Test Automation professional with over 10 years of experience in data quality assurance and software testing. Proven ability to design, execute, and automate testing across the entire SDLC (Software Development Life Cycle) utilizing Agile and Waterfall methodologies. Expertise in End-to-End DWBI project testing and experience working in GCP, AWS, and Azure cloud environments. Proficient in SQL and Python scripting for data test automation.