Enhancing a Scalable Export in Ruby on Rails: Lessons Learned

Alvin CrespoAlvin Crespo
10 min read

In my last blog post, I walked you through the implementation of a scalable export system in a Ruby on Rails application. Since then, I’ve encountered a few challenges, particularly around concurrency, file locking, and ensuring that data exports are reliably processed when multiple jobs need to write to the same ZIP file. In this post, I'll dive deeper into those challenges and share the solutions I found to make the export system even more robust.

The Challenge: Concurrent Jobs and ZIP Files

When you're building a feature to export user data, especially in large applications, handling concurrency is key. Each user might have various types of data to export, such as notes, files, or other content, and all of this needs to be bundled into a single ZIP file.

In my case, multiple background jobs (ExportUserUploadedContentJob and ExportUserNotesJob) were responsible for fetching the data and adding it to the same ZIP file. Initially, everything seemed straightforward—each job would write its content to the ZIP file, and a cleanup job would handle the rest. However, this caused problems when jobs tried to access the ZIP file simultaneously, leading to incomplete data in the final file. After debugging, I realized the following issues:

  • Race conditions: Multiple jobs trying to write to the same ZIP file concurrently.

  • Incomplete writes: Some jobs didn’t finish writing before another job tried to access the file, resulting in missing content.

  • File not properly closed: The ZIP file wasn’t always flushed and saved to disk after every write.

To address these issues, I implemented a series of improvements, starting with how we manage exports in the database.


Managing Exports: Migration and Model

Making an Export model would help me to monitor the status of several export jobs, control task completion, and track statistics like file size and time elapsed, allowing me to track the process of every export.

Migration for the exports Table

class CreateExports < ActiveRecord::Migration[7.0]
  def change
    create_table :exports do |t|
      t.references :user, null: false, foreign_key: true
      t.string :prefix, null: false
      t.integer :jobs_completed, default: 0
      t.integer :total_jobs, default: 2  # Adjust for the number of jobs you plan to run
      t.datetime :started_at
      t.datetime :completed_at
      t.integer :file_size, default: 0
      t.timestamps
    end
  end
end

Export Model

The Export model tracks the start and completion of jobs, how many jobs have completed, and other metrics like the size of the final ZIP file. This allows us to track the state of an export, making it easy to determine when all jobs are finished and trigger the final cleanup and upload process.

Here’s the key structure for managing export state:

class Export < ApplicationRecord
  belongs_to :user

  # Mark the start of the export
  def mark_started
    update!(started_at: Time.current)
  end

  # Mark the completion and calculate elapsed time
  def mark_completed(zip_file_path)
    update!(
      completed_at: Time.current,
      file_size: File.size(zip_file_path)  # Get file size in bytes
    )
  end

  # Increment job completion and check if all jobs are finished
  def mark_job_complete
    increment!(:jobs_completed)
    jobs_completed == total_jobs
  end

  # Calculate the total time taken for the export
  def time_elapsed
    return nil unless completed_at && started_at
    completed_at - started_at
  end
end

Changes to the ExportsJob: Job Coordination

The ExportsJob is responsible for coordinating the different export jobs that will append data to the same ZIP file. Before launching the jobs, we make sure the ZIP file is created and ready for use.

class ExportsJob < ApplicationJob
  queue_as :default

  def perform(user_id)
    prefix = SecureRandom.hex(8)
    zip_file_path = Rails.root.join("tmp", "#{prefix}_export.zip").to_s

    # Create an empty ZIP file before jobs start
    Zip::File.open(zip_file_path, Zip::File::CREATE) {}

    # Create a new export tracking record and mark the start
    export = Export.create!(user_id: user_id, prefix: prefix, total_jobs: 2)
    export.mark_started

    # Launch export jobs concurrently
    ExportUserUploadedContentJob.perform_later(user_id, zip_file_path, export.id)
    ExportUserNotesJob.perform_later(user_id, zip_file_path, export.id)
  end
end
  • Create the ZIP file: Ensures that an empty ZIP file is created before any background job tries to write to it. This prevents errors where the file doesn’t exist when jobs attempt to append data.

  • Job coordination: Each job is launched concurrently, but they will use file locks to ensure they don’t interfere with each other.


File Locking and Proper Commit Handling

File Locking for Safe Concurrent Access

To solve these issues, I implemented a file locking mechanism using Ruby’s flock method. This ensures that only one job can write to the ZIP file at a time. If another job tries to access the file while it's locked, it waits or retries until the lock is released.

Here’s how the file locking looks in practice:

File.open(zip_file_path, 'r+') do |file|
  if file.flock(File::LOCK_EX | File::LOCK_NB)
    # Perform ZIP operations...
    file.flock(File::LOCK_UN)  # Release the lock
  else
    # Retry if lock is unavailable
    retry_job(wait: 5.seconds)
  end
end

This guarantees exclusive access to the file during write operations and prevents race conditions.

Ensuring Proper Commits with Zip::File#commit

Even with the locking mechanism in place, I noticed that some data was still missing from the final ZIP file. The reason? The ZIP file wasn’t always being flushed to disk after each write. To resolve this, I added the zipfile.commit method at the end of each job to ensure all changes were saved:

Zip::File.open(zip_file_path, Zip::File::CREATE) do |zipfile|
  # Write content to the ZIP file...
  zipfile.commit  # Ensure the file is flushed and saved
end

This simple change ensures that each job commits its work before releasing the lock, preventing data loss.

Here is a full example of file locking and commiting would look like in a job:

require "zip"
require "csv"

class ExportUserNotesJob < ApplicationJob
  queue_as :default

  def perform(user_id, zip_file_path, export_id)
    user = User.find(user_id)

    File.open(zip_file_path, "r+") do |file|
      Rails.logger.info "#{self.class.name} writing to #{zip_file_path}"
      if file.flock(File::LOCK_EX | File::LOCK_NB)
        Rails.logger.info "Lock acquired by #{self.class.name} for #{zip_file_path}"

        csv_content = CSV.generate(headers: true) do |csv|
          csv << [ "Title", "Body" ]
          user.notes.find_each do |note|
            csv << [ note.title, note.body.to_plain_text ]
          end
        end

        file.get_output_stream("user_notes.csv") do |f|
          f.write(csv_content)
        end

        file.commit

        file.flock(File::LOCK_UN)  # Unlock the file
        Rails.logger.info "Lock released by #{self.class.name} for #{zip_file_path}"
      else
        Rails.logger.info "Failed to acquire lock for #{self.class.name}, retrying..."
        retry_job(wait: 5.seconds)
        return
      end
    end

    # Update export progress
    export = Export.find(export_id)
    if export.mark_job_complete
      ExportCleanupJob.perform_later(user_id, zip_file_path, export.prefix)
    end
  end
end

In the above job, we are getting all notes for a user and creating a CSV for their requested export. We start by opening the zip file, locking it with the a bitwise OR of File::LOCK_EX | File::LOCK_NB. Once the zip is locked we proceed with generating the CSV content, outputting it to the zip file and ultimately ensuring the content of the zip is committed. Once thats done, we unlock the zip and move on with the job. If the lock fails to be acquired we log out the failure and retry the job after 5 seconds.


Key Takeaways

  • File locking with flock is essential when multiple background jobs need to write to the same file concurrently.

  • Always commit changes when modifying a ZIP file in a concurrent environment to ensure data is saved properly.

  • Adding retry logic helps manage cases where a job cannot immediately acquire the lock, allowing the system to recover gracefully.

These small but powerful adjustments have made my export process more reliable and scalable.


Alternative Approaches

When expanding on the coordination of multiple export jobs using a ZIP file, it's important to weigh the tradeoffs and consider alternative approaches depending on your application’s needs, scalability, and complexity. Here are some tradeoffs and alternatives worth exploring.

File Locking

My current approach locks the ZIP file before writing and releases the lock after it’s done. While this prevents race conditions, file locking comes with its own tradeoffs:

  • Pro: Simple and easy to implement. It provides an effective way to prevent concurrency issues when multiple jobs write to the same file.

  • Con: Lock contention can arise when many jobs are trying to access the same file at once. If several jobs are queued to write to the ZIP file, they must wait for the file lock to be released, which can slow down the process.

Alternatives

  • Database-Backed Job Queue Coordination: Instead of relying on file locks, you can store the results of each job in the database, then have a single process handle creating the ZIP file. This avoids locking issues but adds the overhead of managing results in the database.

  • Distributed Locking Services: For large-scale systems, using a distributed locking service such as Redis-based locking (e.g., with Redlock) allows for more robust concurrency control. This helps ensure locks are respected across distributed systems, which is useful when dealing with multiple servers or processes.

Sequential Job Execution

Although jobs are triggered concurrently, the need for file locking causes them to effectively run sequentially when writing to the ZIP file, as each job must wait for the previous one to finish.

  • Pro: Simple and predictable. Jobs run in sequence and only one writes to the ZIP at a time.

  • Con: This limits the speed at which jobs can complete since they cannot write in parallel.

Alternatives

  • Write Each Job’s Output to Separate Files: Instead of having each job write directly to the ZIP, have each job write its output to a temporary file. Once all jobs are complete, you can use a final job to combine all the temporary files into a single ZIP file. This allows each job to run concurrently and avoids file contention.

  • Stream the Output: For very large exports, instead of writing the entire ZIP to disk in one go, you can stream the output directly to the user or to cloud storage (such as S3). This avoids large intermediate files and can reduce the overall time to delivery.

Resource Usage

  • Pro: Using a single ZIP file reduces memory overhead compared to storing each piece of data in memory before finalizing the export.

  • Con: Writing large files to disk can consume significant disk space and I/O resources, especially when many exports are happening simultaneously.

Alternatives

  • In-Memory Zip Creation: If the exports are relatively small, you can create the ZIP file in memory using StringIO or a similar mechanism. This reduces disk I/O but increases memory usage.

  • Direct Streaming to Cloud Storage: Instead of writing the ZIP file to local disk, you could stream each part of the ZIP directly to cloud storage (e.g., AWS S3) as the jobs are processed. This reduces the need for local disk space but may require more complex coordination.

Failure Handling

With multiple concurrent jobs writing to a single file, ensuring failure handling and retries are robust is important. If one job fails to complete, the entire export could be incomplete or corrupted.

  • Pro: Jobs can retry upon failure, ensuring eventual completion.

  • Con: If retries are not handled carefully (e.g., ensuring the state of the ZIP file isn’t corrupted), multiple failures can still result in partial exports.

Alternatives

  • Atomic File Writes: Using temporary files for each job and only moving them into the final ZIP once completed ensures that partial failures do not corrupt the final file. This way, even if a job fails, you can retry it without affecting the rest of the export.

  • Job State Tracking: Implement tracking of each job’s state in the database. If a job fails, it can be retried without corrupting the overall export, and the ZIP creation can be resumed once the failed job completes successfully.

Depending on the scenario, such as large data sets or high concurrency requirements, a mixture of these approaches will be needed. Keeping it simple is the first step. Once you learn more about the behavior of your system, thats when taking a closer look at these alternatives might be the right choice for you.


Final Thoughts

Enabling users to have full control of data starts with creating an exporting system that is easy, fast and scalable. Kicking off concurrent jobs that lookup, generate and manage files can be complex. In my personal project I circumvented race conditions and ensured that each job accurately writes to a ZIP file with file locking and proper file committing. Tracking exports significantly made it easier to debug and manage the system. Implementing these practices in your Ruby on Rails applications can save significant complications in the future.

Let me know if you have similar problems and what techniques you have applied to go beyond them! Drop a comment or reach out to me on my website, I’m always up for a friendly chat.


References

File#flock

Zip::File#commit

0
Subscribe to my newsletter

Read articles from Alvin Crespo directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Alvin Crespo
Alvin Crespo

Hi, I’m Alvin! I write about tech, and love sharing what I learn. If you have any questions or suggestions, please reach out - I'm always up for a quick chat.