Optimizing Data Integrity and Availability using SQL

Table of contents

Introduction
On a weekly basis, my team was required to determine the cell availability for all sites – we were monitoring sites located in 4 regions of the country, and we had 10000 sites that were on air. Data was stored on an hourly basis and so for each site for every hour, we had to collate the data, analyze it, and append our findings for each region to the final report. This report was generated every week.
The Challenge
The node collating the data we needed had an issue. If an outage lasted for an hour, the site's data for that hour won’t be recorded and if it went on for more than 24 hours, the site won't be listed for that day. This occurred at random, there was no apparent order to the madness! To resolve this issue, we had to manually check and update every missing site information and hourly record. It was a very tedious task that significantly delayed the delivery of the overall report.
The Solution
This was broken down into two phases:
Data integrity issue.
Data Aggregation.
For the integrity issue, I had to do 3 things:
Confirm that the total number of sites recorded matched the total sites for that region.
Check that each site had records to account for 24 hrs of activity or inactivity.
In the above situations, update where necessary.
The first thing I had to do was load the data to an SQL database.
// Connection to database
$host = 'localhost';
$user = 'root';
$password = '';
$db = 'availability';
$cons = mysqli_connect($host, $user,$password,$db) or die(mysql_error());
// Table
$table = 'cell';
// Data
$file = $_POST['csv'];
// This query reads rows from the CSV file into the database.
mysqli_query($cons, '
LOAD DATA LOCAL INFILE "'.$file.'"
INTO TABLE '.$table.'
FIELDS TERMINATED by \',\'
LINES TERMINATED BY \'\n\'
')or die(mysql_error());
The next step was to determine if the total number of unique sites from the uploaded data matched the sites on the site information database for that region. Since the data uploaded contained multiple event dates, the missing sites would be updated for all the specified event dates.
// Table containing site records
$region = $_POST['region'];
// The count of updates
$numberOfSitesUpdated = 0;
// Loop through the site records to check for every site.
$qrySiteRecords = mysqli_query($cons, "SELECT * FROM $region");
While ($recordsResult = mysqli_fetch_assoc($qrySiteRecords)){
$siteId = $recordsResult ['id'];
// Query the uploaded data for the current site id.
$qryData = mysqli_query($cons, "SELECT * FROM cell WHERE ne = '$siteId'");
if (mysqli_num_rows($qryData) > 1){
// If the site id is found in uploaded data, do nothing
}else {
// Query the data for unique dates.
$qry_add = mysqli_query($cons, "SELECT DISTINCT(eventDate) FROM cell");
$result_add = mysqli_fetch_assoc($qry_add);
$event_add = $result_add['event'];
// Update data's site records for each unique date - a single row for each date.
mysqli_query($cons, "INSERT INTO
cell (event,time,ne,enodeb_dur,duration,percentage)
values ('$event_add', '0:00:00', '$siteId', '3600', '0', '0')");
// Update number of updates.
$numberOfSitesUpdated += 1;
}
}
// Get the new total number of rows
$newCount = mysqli_query($cons, "select count(*) AS total from cell");
$newCountResult = mysqli_fetch_assoc($newCount);
$newCountTotal = $newCountResult['total'];
$alert = "<span class='text-success al'>
<b>$newTotal</b> rows have been added to the table.
<b>$add_count</b> missing site information updated.</span> ";
After updating the missing sites on the uploaded data for the different event dates, we still had the event hours - remember the data was collated hourly.
// First, create an array of timestamps
$time_stamp = Array("0:00:00", "1:00:00", "2:00:00", "3:00:00", "4:00:00", "5:00:00", "6:00:00", "7:00:00", "8:00:00", "9:00:00", "10:00:00", "11:00:00", "12:00:00", "13:00:00", "14:00:00", "15:00:00", "16:00:00", "17:00:00", "18:00:00", "19:00:00", "20:00:00", "21:00:00", "22:00:00", "23:00:00");
// Query the data table for both the event date and site id
$eventDate = mysqli_query($conn, "SELECT DISTINCT(event) from cell");
$neName = mysqli_query($conn, "SELECT DISTINCT(ne) FROM cell");
$countRecordUpdates = 0;
// For every site on the different event dates, update the hourly records
while($recordDate = mysqli_fetch_assoc($eventDate)){
while($siteName = mysqli_fetch_assoc($neName)){
foreach ($time_stamp as $val){
$d = $recordDate['event'];
$s = $siteName['ne'];
// Check if the record for the particular timestamp is available
$checkTime = mysqli_query($conn, "SELECT * FROM cell WHERE time = '$val' AND ne = '$s' AND event ='$d'");
if (mysqli_num_rows($checkTime) > 0){
// If it exists, do nothing
}else {
mysqli_query($conn, "INSERT INTO
cell (event,time,ne,enodeb_dur,duration,percentage)
values('$d', '$val', '$s', '3600', '0', '0')");
}
$countRecordUpdates+= 1;
}
}
}
echo "<br>A total of <b>$countRecordUpdates</b> records have been updated.<br><br>";
Now we aggregate the data. Determine the average availability for the individual days.
$eventDateTwo = mysqli_query($conn, "SELECT DISTINCT(event) from cell");
while($result = mysqli_fetch_assoc($eventDateTwo)){
$qry_date = $result['event'];
$query = mysqli_query($conn, "SELECT AVG(percentage) FROM cell WHERE event = '$qry_date'");
$qry_result = mysqli_fetch_assoc($query);
echo $qry_date. " average: ".round($qry_result['AVG(percentage)'],6). "<br />";
}
$echo = "<script type='text/javascript'>$('.process').attr('disabled',false);</script>";
$link = " <a href='index.php'>CAR Homepage</a>";
$download = "<a href='index.php' id='dwn_link'>Download</a>";
Conclusion
A fairly simple project that yielded massive results.
Subscribe to my newsletter
Read articles from Osahon Ohenhen directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Osahon Ohenhen
Osahon Ohenhen
I build things.