A Guide to Geospatial Analysis and Outlier Detection in Election Results

What is Geospatial Analysis?

Geospatial analysis is a method used to study data that has a geographic or spatial component. It involves analyzing information based on where things are located and how they relate to each other in space. This type of analysis uses maps and geographical data to understand patterns, relationships, and trends.

Geospatial analysis combines data from various sources such as satellite imagery, GPS data, census data, and social media geotags. It helps decision-makers visualize complex information and make informed choices based on spatial relationships and patterns.

Project Overview

The Independent National Electoral Commission has faced multiple legal challenges concerning the integrity and accuracy of the election results. Allegations of vote manipulation and irregularities have been widespread, prompting a thorough investigation into the matter.

The task is to help us (INEC) uncover potential voting irregularities and ensure the transparency of the election results. You will achieve this by identifying outlier polling units where the voting results deviate significantly from neighboring units, indicating potential influences or rigging.

1. Dataset Preparation

  • Loading the Dataset: We start by loading the election dataset (ONDO_crosschecked.csv in this case) which contains information about polling units, voter statistics, and election results.

  • Geocoding: Since latitude and longitude information for polling units are not already provided in the dataset, we use geocoding APIs OpenCage Geocoding API to obtain these coordinates. Geocoding translates addresses into geographic coordinates, essential for spatial analysis.

2. Neighbor Identification

  • Identifying Neighbors: Polling units that are geographically close to each other (within a specified radius, e.g., 1 km) are considered neighbors. This step involves calculating distances between polling units to determine which ones are neighbors.

3. Outlier Score Calculation

  • Calculating Outlier Scores: For each polling unit, we calculate outlier scores for each political party (e.g., APC, PDP) based on their vote counts. The outlier score measures how much a polling unit's vote count deviates from the average of its neighboring units. A higher outlier score indicates a larger deviation, potentially indicating irregularities.

4. Sorting and Reporting

  • Sorting: After calculating outlier scores, we sort the dataset to identify polling units with the highest outlier scores. This helps prioritize units that may require further investigation.

      # Sort the dataset by outlier scores
      sorted_df = df.sort_values(by='outlier_scores', ascending=False)
    
      # Save the sorted dataset to an Excel file
      sorted_df.to_excel('sorted_outlier_scores.xlsx', index=False)
    

5. Visualization

  • Visualizing Results: visualization using maps and charts can enhance understanding and presentation of findings. Maps can show geospatial distribution of outlier polling units, while charts can illustrate voting patterns and deviations.
# Create a base map
map_center = [df['latitude'].mean(), df['longitude'].mean()]
election_map = folium.Map(location=map_center, zoom_start=10)

# Add polling units to the map
for i, row in df.iterrows():
    folium.CircleMarker(
        location=(row['latitude'], row['longitude']),
        radius=5,
        popup=(
            f"PU Name: {row['PU-Name']}<br>"
            f"APC: {row['APC']}<br>LP: {row['LP']}<br>"
            f"PDP: {row['PDP']}<br>NNPP: {row['NNPP']}<br>"
            f"APC Outlier Score: {row['APC_outlier_score']}<br>"
            f"LP Outlier Score: {row['LP_outlier_score']}<br>"
            f"PDP Outlier Score: {row['PDP_outlier_score']}<br>"
            f"NNPP Outlier Score: {row['NNPP_outlier_score']}"
        ),
        color='blue' if row['APC_outlier_score'] > row[['LP_outlier_score', 'PDP_outlier_score', 'NNPP_outlier_score']].max() else
              'red' if row['LP_outlier_score'] > row[['APC_outlier_score', 'PDP_outlier_score', 'NNPP_outlier_score']].max() else
              'green' if row['PDP_outlier_score'] > row[['APC_outlier_score', 'LP_outlier_score', 'NNPP_outlier_score']].max() else
              'purple',
        fill=True,
        fill_opacity=0.6
    ).add_to(election_map)

# Save the map to an HTML file
election_map.save('election_map.html')

What the final dataset looks like

Summary of Findings

Sorted List of Polling Units by Outlier Scores

The table below shows the polling units with the highest outlier scores for each party, indicating significant deviations from their neighboring units:

Polling Unit NamePartyOutlier Score
ODE ELESHO/ODE OKELOKO, IN FRONT OF CH. ALAKELUS HOUSEAPC113.645161
ODEKE/AISA/ODE ASSI ALU, IN FRONT OF CHIEF ASSIS HOUSEAPC57.274194
ODEKE/AISA/ODE ASSI ALU OPEN SPACE NEAR CHIEF ASSIS HOUSEAPC36.967742

Examples of Top 3 Outliers

Polling Unit: ODE ELESHO/ODE OKELOKO, IN FRONT OF CH. ALAKELUS HOUSE

Outlier Score: 113.645161 (APC)

Neighboring Units: [List of neighboring units with respective vote counts]

This polling unit showed a significantly higher number of votes for APC compared to its neighboring units, indicating a possible irregularity.

Polling Unit: ODEKE/AISA/ODE ASSI ALU, IN FRONT OF CHIEF ASSIS HOUSE

Outlier Score: 57.274194 (APC)

Neighboring Units: [List of neighboring units with respective vote counts]

The votes for APC at this polling unit were much higher than those at neighboring units, suggesting potential voting manipulation.

Polling Unit: ODEKE/AISA/ODE ASSI ALU OPEN SPACE NEAR CHIEF ASSIS HOUSE

Outlier Score: 36.967742 (APC)

Neighboring Units: [List of neighboring units with respective vote counts]

This polling unit also displayed a significant deviation in APC votes, warranting further investigation.

Key Insights

Significant Deviations: Certain polling units exhibited much higher vote counts for specific parties, indicating potential irregularities.

Geospatial Analysis Utility: Using geographic data and proximity analysis proved effective in detecting anomalies in voting patterns.

Further Investigation: The identified outliers should be investigated further to determine the cause of these deviations and ensure the integrity of the election results.

By applying these geospatial techniques, election authorities can better ensure transparency and trust in the electoral process.

Find dataset and full project here

0
Subscribe to my newsletter

Read articles from Oluwatomisin Bamidele directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Oluwatomisin Bamidele
Oluwatomisin Bamidele