Outlier Detection in Osun State Presidential Election 2023 Data Using Geospatial Analysis

Mohammed TaliatMohammed Taliat
3 min read

Introduction

This documentation focuses on the methodology employed to conduct a geospatial analysis aimed at identifying potential voting irregularities in the Osun State Presidential Election 2023. The study utilizes election data to pinpoint outlier polling units where voting outcomes diverge notably from neighboring units, thereby addressing concerns about election integrity and transparency.

Data Preparation

The dataset used for the analysis combined two primary datasets: data_1.csv containing election metrics such as voter counts and party-specific results, and PolUni.csv, which provided geographical coordinates (latitude and longitude) of each polling unit. These datasets were loaded into pandas DataFrames for initial inspection and subsequent analysis. The datasets were merged based on common identifiers (Ward, LGA, and State), ensuring alignment of electoral and geographical data.

import pandas as pd

import numpy as np

from sklearn.cluster import KMeans

import matplotlib.pyplot as plt

from geopy.distance import distance

from scipy.spatial import distance_matrix

# Load the CSV and Excel files into DataFrames

df1 =pd.read_csv(r'C:\Users\Muhammad\Desktop\HNG\Stage One\data_1.csv', encoding='latin1')

df2 =pd.read_csv(r'C:\Users\Muhammad\Desktop\HNG\Stage One\PolUni.csv')

# Print the column names to debug

print("Columns in df1:", df1.columns)

print("Columns in df2:", df2.columns)

# Print the first few rows of df2 to inspect the actual data

print("First few rows of df2:")

print(df2.head())

# Merge based on adjusted column names

merged_df = pd.merge(df1, df2, left_on=['Ward', 'LGA', 'State'], right_on=['ward_name', 'local_government_name', 'state_name'], how='inner')

print("Merged DataFrame:", merged_df.head())

Analysis Methodology

1. Geospatial Analysis

Geospatial analysis was employed to analyze voting behaviors across polling units by computing a distance matrix based on their geographical coordinates. This approach facilitated the measurement of proximity between polling units, using a defined radius of 1.0 kilometer to identify neighboring units for each location.

# Extracting Latitude and Longitude columns

lat_lon = merged_df[['location.latitude', 'location.longitude']].values

# Calculate the distance matrix between all polling units

dist_matrix = distance_matrix(lat_lon, lat_lon)

2. Outlier Detection

Outliers were identified based on voting patterns for major political parties: All Progressives Congress (APC), Labor Party (LP), Peoples Democratic Party (PDP), and New Nigeria Peoples Party (NNPP). For each polling unit, outlier scores were calculated by comparing the party's vote count to the mean vote count of its neighboring polling units within the specified radius.

# Define the radius for neighbours (in kilometres)

radius_km = 1.0

# Create a list to store outlier score

results = []

# Iterating each of the polling unit to calculate the outlier scores

for index, row in merged_df.iterrows():

3. Top Outlier Analysis

The analysis focused on identifying the top three outliers for each party based on their respective outlier scores. Outlier scores were sorted in descending order, and the polling units with the highest outlier scores for each party were selected. These outliers provide insights into voting irregularities or exceptional support for specific parties within Osun State.

# Sorting the dataset by the outlier scores for each party

sorted_apc = outlier_scores.sort_values(by='APC_outlier', ascending=False).head(3)

sorted_lp = outlier_scores.sort_values(by='LP_outlier', ascending=False).head(3)

sorted_pdp = outlier_scores.sort_values(by='PDP_outlier', ascending=False).head(3)

sorted_nnpp = outlier_scores.sort_values(by='NNPP_outlier', ascending=False).head(3)

Key Findings

The analysis revealed notable outliers across different parties:

  • APC and NNPP: Identified outliers indicate polling units with significantly higher votes for both APC and NNPP compared to their neighbors, suggesting areas of strong support or unusual voting patterns.

  • LP and PDP: Similar outliers were identified for these parties but way lesser than what was obtainable in the case of APC and NNPP, highlighting localized voting dynamics that deviate from the norm.

Conclusion and Recommendations

Understanding outlier voting patterns is crucial for election monitoring and policy-making. The identified outliers can inform electoral strategies, resource allocation, and potential investigations into irregularities. The findings underscore the importance of geospatial analysis in elucidating nuanced electoral behaviors and ensuring transparency in democratic processes.

0
Subscribe to my newsletter

Read articles from Mohammed Taliat directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Mohammed Taliat
Mohammed Taliat

I transitioned from a research analyst to a data analyst, leveraging my analytical skills to delve into data-driven insights. Currently, I am enhancing my expertise through a prestigious scholarship with HNG. This journey reflects my commitment to continuous learning and professional growth in data analysis.