In data reconciliation projects, it's crucial to ensure the integrity and consistency of data between various sources. To address this need, I have developed a Python utility that compares extracts from tables and files. This utility simplifies the process of identifying discrepancies and ensures data accuracy.

Key Features:

Compatibility with CSV and Excel: The utility can handle both CSV and Excel file formats, making it versatile for different data sources.
Visual Differentiation: It highlights matching cells in green and differing cells in red, providing clear visual feedback.
Error Handling: The script includes error handling to manage scenarios where input files may be missing.

Python Utility Code:

import openpyxl
from openpyxl import load_workbook
from openpyxl.styles import PatternFill
import csv
import os
import time
def load_file(file_path):
    try:
        if file_path.endswith('.csv'):
            with open(file_path, newline='') as csvfile:
                reader = list(csv.reader(csvfile))
            return reader
        elif file_path.endswith('.xlsx'):
            wb = load_workbook(file_path)
            sheet = wb.active
            return [[cell.value for cell in row] for row in sheet.iter_rows()]
        else:
            raise ValueError("Unsupported file format")
    except FileNotFoundError as fnf_error: 
        print(f"Error: {fnf_error}") 
    except Exception as e: 
        print(f"An unexpected error occurred: {e}")
def compare_and_highlight(input_src_file, input_tgt_file, xlsx_file3_path,rextract_ref):
    src_file = input_src_file.split('\\')
    tgt_file = input_tgt_file.split('\\')
    print(f"{rextract_ref} comparison started")
    log_file.write(f"\nComparing {rextract_ref}: Source Extract:{src_file[2]}, Target Extract:{tgt_file[2]}\n")
    #time.sleep(1)
    # Load the files
    flag=0
    data1 = load_file(input_src_file)
    data2 = load_file(input_tgt_file)
    # Create a new Excel workbook
    wb3 = openpyxl.Workbook()
    sheet3 = wb3.active
    # Define the fill colors
    green_fill = PatternFill(start_color="C6EFCE", end_color="C6EFCE", fill_type="solid")
    red_fill = PatternFill(start_color="FFC7CE", end_color="FFC7CE", fill_type="solid")
    # Get the max row and column count
    max_rows = max(len(data1), len(data2))
    max_cols = max(len(data1[0]), len(data2[0]))
    # Compare the cells and write to the new Excel file
    for row in range(max_rows):
        for col in range(max_cols):
            cell1 = data1[row][col] if row < len(data1) and col < len(data1[row]) else None
            cell2 = data2[row][col] if row < len(data2) and col < len(data2[row]) else None

            if cell1 == cell2:
                sheet3.cell(row=row+1, column=col+1).value = f"{input_src_file}: {cell1}\n {input_tgt_file}: {cell2}"
                sheet3.cell(row=row+1, column=col+1).fill = green_fill
            else:
                sheet3.cell(row=row+1, column=col+1).value = f"{input_src_file}: {cell1}\n {input_tgt_file}: {cell2}"
                sheet3.cell(row=row+1, column=col+1).fill = red_fill
                flag=1

    # Save the results to the Excel file
    wb3.save(xlsx_file3_path)
    if flag==0:
        #print("No difference observed.")
        log_file.write(f"Source and Target extract matching. SUCCESS.\n")
        log_file.write(f"--------")
    else:
        log_file.write(f"####################################Mismatch between source and target extract. FAIL####################################\n")
        log_file.write(f"--------")
    print(f"{rextract_ref} comparison completed")
with open("utility_execution_logs.txt", "w") as log_file:
    with open('config.txt', 'r') as file:
        # Read each line in the file
        for line in file:
            # Remove any leading/trailing whitespace, including newline characters
            line = line.strip()
            words = line.split(',')
            rextract_ref=words[0]
            input_src_file =words[1]
            input_tgt_file=words[2]
            output_result_file=words[3]
            compare_and_highlight(input_src_file, input_tgt_file, output_result_file,rextract_ref)

Result File(comparison_results.xlsx):

Github:

https://github.com/vipinputhanveetil/data-reconcil-util

How to Use:

Install Required Libraries:

bash
```
 pip install pandas openpyxl
```
Prepare Your Files:
- Ensure your input files (File1 and File2) are in either .csv or .xlsx format and place them in the same directory as your script.
Run the Script:
- Update the file1, file2, and file3 variables with your actual file names.
- Execute the script to generate the comparison result in File3.

This utility is designed to make data reconciliation more efficient and error-free, saving you valuable time and effort. If you have any questions or need further customization, feel free to reach out!

Utility for Data Reconciliation: Comparing Table Extracts and Files