Automated CSV Data Processing and Uploading with CodeIgniter

Arif YuliantoArif Yulianto
3 min read

Introduction

In this article, we will discuss how a CodeIgniter-based application can be used to read, process, and upload data from CSV files to a database automatically. This process includes directory exploration, CSV data processing, and updating the database with new entries.

Code Structure

This code resides in a controller named Nameclass, which has the primary function of scanning a directory for files, processing them, and storing the results in a database.

1. Initialization and Configuration

public function __construct(){
    parent::__construct();
    $this->load->helper('directory');
    $this->load->library('ftp');
    $this->load->library('get_ftp');
    $this->db2 = $this->load->database('<database>', TRUE);
}

The above code:

  • Loads the directory helper for file and folder manipulation.

  • Loads the ftp and get_ftp libraries for FTP connection (if needed).

  • Establishes a connection to the second database ($this->db2), likely used for main data storage.

2. Scanning the Directory for CSV Files

$root = '/directory/folder';
$files  = array('files'=>array(), 'dirs'=>array());
$directories[]  = $root;

while (sizeof($directories)) {
    $dir  = array_pop($directories);
    if ($handle = opendir($dir)) {
        while (false !== ($file = readdir($handle))) {
            if ($file == '.' || $file == '..') {
                continue;
            }
            $file  = $dir.$file;
            if (is_dir($file)) {
                array_push($directories, $file.DIRECTORY_SEPARATOR);
                $files['dirs'][]  = $file;
            } elseif (is_file($file)) {
                $files['files'][]  = $file;
            }
        }
        closedir($handle);
    }
}

This section of the script scans the /directory/folder directory, including its subfolders, to find files. The identified files are stored in the $files['files'] array for further processing.

3. Processing and Uploading CSV Data

foreach ($files['files'] as $key => $value) {
    $check = $this->db->from($table_uploaded)->where('file_name',$value)->where('done', '1')->get()->num_rows();
    if ($check == 0) {
        $handle = fopen($value, "r");
        $n_data=0;
        $array_insert_data = array();
        while (($row = fgetcsv($handle,0,'|'))) {
            $n_data++;
            if ($n_data == 1) continue;
            $insert_data = array(
                'array0' => addslashes($row[0]),
                'array1' => addslashes($row[1]),
                'array2' => addslashes($row[2]),
                'array3' => addslashes($row[3]),
                'array4' => addslashes($row[4])
            );
            $check_dwh_key = $this->db2->from($table)->where('array0', addslashes($row[0]))->get()->num_rows();
            if ($check_dwh_key == 0) {
                array_push($array_insert_data,$insert_data);
                if(count($array_insert_data) == 1000){
                    $this->db2->insert_batch($table, $array_insert_data);
                    $array_insert_data = array();
                }
            } else {
                $this->db2->where('array0', addslashes($row[0]));
                $this->db2->update($table, $insert_data);
            }
        }
        if(!empty($array_insert_data)){
            $this->db2->insert_batch($table, $array_insert_data);
        }

Explanation

  • The CSV file is opened and read line by line using fgetcsv() with a | delimiter.

  • Data from the CSV is stored in the $insert_data array and added to $array_insert_data if it is not already in the database.

  • When the number of records reaches 1000, the data is inserted into the database using insert_batch() for efficiency.

  • If the data already exists in the database, it is updated instead.

4. Logging Processed Files

$uploaded = [
    'file_name' => $value,
    'num_rows' => count(file($value, FILE_SKIP_EMPTY_LINES)),
    'done' => '1',
    'created_at' => date("Y-m-d H:i:s")
];
$this->db->insert($table_uploaded, $uploaded);

unlink($value);

After all data has been processed:

  • File information is logged in the $table_uploaded table.

  • The CSV file is deleted (unlink($value)) to prevent reprocessing.

Conclusion

This script automates the process of directory scanning, CSV file reading, and database updating. With batch inserts and data validation before updates, it efficiently handles large datasets. This implementation is ideal for automated data processing systems like ETL (Extract, Transform, Load) in an enterprise environment.

Feel free to leave a comment and join the discussion on this blog!

0
Subscribe to my newsletter

Read articles from Arif Yulianto directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Arif Yulianto
Arif Yulianto