Efficient Pagination for Large Datasets in Rails: Avoiding the COUNT Trap

When building applications with infinite scroll or large data tables, pagination is essential. However, the default pagination methods in Rails (like will_paginate or kaminari) often rely on running a COUNT(*) query to determine the total number of entries. For large datasets, this can become a significant performance bottleneck.

The Problem with COUNT in Pagination

By default, paginating a query in Rails will execute a SELECT COUNT(*) FROM ... to determine the total number of records. This is fine for small tables, but as your data grows, this count query can become slow and resource-intensive, especially if your query involves joins or complex filters.

A Smarter Approach: Loading Just Enough (Similar approach used in Cursor-Based Pagination, a.k.a "keyset pagination")

Instead of running a full count, you can optimize pagination by only loading the records you need for the current page, plus one extra record to check if there’s a next page. Here’s how it works:

  1. Calculate the number of records that should be loaded for the current page:

     current_records_loaded_count = page * per_page
    
  2. Check if there is a next page by fetching one extra record:

     next_page_first_element = records.offset(current_records_loaded_count).limit(1).first
     has_next_page = next_page_first_element.present?
    
  3. Set the total_entries for pagination:

    • If there is a next page, set total_entries to the current loaded count plus one.

    • If not, set it to the actual number of records loaded.

    total_entries = has_next_page ? current_records_loaded_count + 1 : records.size
  1. Paginate using the calculated total_entries:

     records = records.paginate(page: page, per_page: per_page, total_entries: total_entries)
    

Why This Works

  • Performance: You avoid the expensive COUNT(*) query, which can be slow on large tables.

  • User Experience: For infinite scroll or "load more" interfaces, you don’t need the exact total count just whether there’s another page.

  • Simplicity: The logic is easy to maintain.

Example Implementation

Here’s a simplified version of the approach:

def records_search_improved
  records = Records.valid_and_active
  # ... apply filters and sorting ...

  current_records_loaded_count = page * per_page
  next_page_first_element = records.offset(current_records_loaded_count).limit(1).first
  has_next_page = next_page_first_element.present?
  total_entries = has_next_page ? current_records_loaded_count + 1 : records.size

  records = records.paginate(page: page, per_page: per_page, total_entries: total_entries)
  # ... further processing ...
end

When to Use This Pattern

  • When you have very large datasets.

  • When the exact total count is not critical for the UI.

  • When you want to improve response times for paginated endpoints.

Alternative Solutions

While the above method is a great fit for many use cases, there are other strategies you can consider for efficient pagination with large datasets:

1. Cursor-Based Pagination

Instead of using page numbers and offsets, cursor-based pagination uses a unique, sequential value (like an ID or timestamp) to fetch the next set of records. This method is highly performant for large or frequently changing datasets, as it avoids the performance issues of large offsets.

Example:

# Fetch records after a given ID
Record.where('id > ?', last_seen_id).limit(per_page)

Cursor-based pagination is ideal for APIs and feeds where you only need to move forward (or backward) through a dataset.

2. Estimate Count Instead of Exact Count

If you still need to display an approximate total, you can use PostgreSQL’s system tables to get a fast estimate:

result = ActiveRecord::Base.connection.execute("SELECT reltuples::bigint FROM pg_class WHERE relname = 'models';")
approximate_count = result[0]['reltuples']

This query returns an estimated row count for the table, which is much faster than a full COUNT(*). While not exact, it’s often “good enough” for user interfaces that just need to show a rough idea of the total.

Conclusion

By rethinking how you calculate the total number of entries for pagination, you can dramatically improve the performance of your Rails applications, especially as your data grows. This approach is particularly well-suited for infinite scroll and similar interfaces where knowing the exact total isn’t necessary.

0
Subscribe to my newsletter

Read articles from Alejandro Monasterios Rios directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Alejandro Monasterios Rios
Alejandro Monasterios Rios

Passionate Software Engineer with over 12 years of experience in software development across diverse projects and industries, including SaaS, B2B, WMF, Social Platforms, Startups, and the Gaming Industry. In recent years, my focus has revolved around full-stack engineering. My responsibilities encompass crucial aspects of software development, including architectural decision-making, starting projects from scratch with scalability in mind, code quality maintenance, analyzing and setting up UI testing strategies, and implementing code standard enforcement tools. On the front end, I worked with JavaScript/TypeScript and frameworks/libraries such as React.js and Next.js. Simultaneously, I have demonstrated proficiency on the server, working with SQL and NoSQL databases, Ruby on Rails, NodeJS, Python fastAPI, and Java Spring, often in conjunction with cloud platforms such as AWS and Azure.