Outlier Detection in Osun State Presidential Election 2023 Data Using Geospatial Analysis
Introduction
This documentation focuses on the methodology employed to conduct a geospatial analysis aimed at identifying potential voting irregularities in the Osun State Presidential Election 2023. The study utilizes election data to pinpoint outlier polling units where voting outcomes diverge notably from neighboring units, thereby addressing concerns about election integrity and transparency.
Data Preparation
The dataset used for the analysis combined two primary datasets: data_1.csv containing election metrics such as voter counts and party-specific results, and PolUni.csv, which provided geographical coordinates (latitude and longitude) of each polling unit. These datasets were loaded into pandas DataFrames for initial inspection and subsequent analysis. The datasets were merged based on common identifiers (Ward, LGA, and State), ensuring alignment of electoral and geographical data.
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt
from geopy.distance import distance
from scipy.spatial import distance_matrix
# Load the CSV and Excel files into DataFrames
df1 =
pd.read
_csv(r'C:\Users\Muhammad\Desktop\HNG\Stage One\data_1.csv', encoding='latin1')
df2 =
pd.read
_csv(r'C:\Users\Muhammad\Desktop\HNG\Stage One\PolUni.csv')
# Print the column names to debug
print("Columns in df1:", df1.columns)
print("Columns in df2:", df2.columns)
# Print the first few rows of df2 to inspect the actual data
print("First few rows of df2:")
print(df2.head())
# Merge based on adjusted column names
merged_df = pd.merge(df1, df2, left_on=['Ward', 'LGA', 'State'], right_on=['ward_name', 'local_government_name', 'state_name'], how='inner')
print("Merged DataFrame:", merged_df.head())
Analysis Methodology
1. Geospatial Analysis
Geospatial analysis was employed to analyze voting behaviors across polling units by computing a distance matrix based on their geographical coordinates. This approach facilitated the measurement of proximity between polling units, using a defined radius of 1.0 kilometer to identify neighboring units for each location.
# Extracting Latitude and Longitude columns
lat_lon = merged_df[['location.latitude', 'location.longitude']].values
# Calculate the distance matrix between all polling units
dist_matrix = distance_matrix(lat_lon, lat_lon)
2. Outlier Detection
Outliers were identified based on voting patterns for major political parties: All Progressives Congress (APC), Labor Party (LP), Peoples Democratic Party (PDP), and New Nigeria Peoples Party (NNPP). For each polling unit, outlier scores were calculated by comparing the party's vote count to the mean vote count of its neighboring polling units within the specified radius.
# Define the radius for neighbours (in kilometres)
radius_km = 1.0
# Create a list to store outlier score
results = []
# Iterating each of the polling unit to calculate the outlier scores
for index, row in merged_df.iterrows():
3. Top Outlier Analysis
The analysis focused on identifying the top three outliers for each party based on their respective outlier scores. Outlier scores were sorted in descending order, and the polling units with the highest outlier scores for each party were selected. These outliers provide insights into voting irregularities or exceptional support for specific parties within Osun State.
# Sorting the dataset by the outlier scores for each party
sorted_apc = outlier_scores.sort_values(by='APC_outlier', ascending=False).head(3)
sorted_lp = outlier_scores.sort_values(by='LP_outlier', ascending=False).head(3)
sorted_pdp = outlier_scores.sort_values(by='PDP_outlier', ascending=False).head(3)
sorted_nnpp = outlier_scores.sort_values(by='NNPP_outlier', ascending=False).head(3)
Key Findings
The analysis revealed notable outliers across different parties:
APC and NNPP: Identified outliers indicate polling units with significantly higher votes for both APC and NNPP compared to their neighbors, suggesting areas of strong support or unusual voting patterns.
LP and PDP: Similar outliers were identified for these parties but way lesser than what was obtainable in the case of APC and NNPP, highlighting localized voting dynamics that deviate from the norm.
Conclusion and Recommendations
Understanding outlier voting patterns is crucial for election monitoring and policy-making. The identified outliers can inform electoral strategies, resource allocation, and potential investigations into irregularities. The findings underscore the importance of geospatial analysis in elucidating nuanced electoral behaviors and ensuring transparency in democratic processes.
Subscribe to my newsletter
Read articles from Mohammed Taliat directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Mohammed Taliat
Mohammed Taliat
I transitioned from a research analyst to a data analyst, leveraging my analytical skills to delve into data-driven insights. Currently, I am enhancing my expertise through a prestigious scholarship with HNG. This journey reflects my commitment to continuous learning and professional growth in data analysis.