Efficient Performant Pagination for Large Datasets in Rails

When building applications with infinite scroll or large data tables, pagination is essential. However, the default pagination methods in Rails (like will_paginate or kaminari) often rely on running a COUNT(*) query to determine the total number of entries. For large datasets, this can become a significant performance bottleneck.

The Problem with COUNT in Pagination

By default, paginating a query in Rails will execute a SELECT COUNT(*) FROM ... to determine the total number of records. This is fine for small tables, but as your data grows, this count query can become slow and resource-intensive, especially if your query involves joins or complex filters.

A Smarter Approach: Loading Just Enough (Similar approach used in Cursor-Based Pagination, a.k.a "keyset pagination")

Instead of running a full count, you can optimize pagination by only loading the records you need for the current page, plus one extra record to check if there’s a next page. Here’s how it works:

Calculate the number of records that should be loaded for the current page:
```
 current_records_loaded_count = page * per_page
```

Check if there is a next page by fetching one extra record:

 next_page_first_element = records.offset(current_records_loaded_count).limit(1).first
 has_next_page = next_page_first_element.present?

Set the total_entries for pagination:
- If there is a next page, set total_entries to the current loaded count plus one.
- If not, set it to the actual number of records loaded.

    total_entries = has_next_page ? current_records_loaded_count + 1 : records.size

Paginate using the calculated total_entries:

 records = records.paginate(page: page, per_page: per_page, total_entries: total_entries)

Why This Works

Performance: You avoid the expensive COUNT(*) query, which can be slow on large tables.
User Experience: For infinite scroll or "load more" interfaces, you don’t need the exact total count just whether there’s another page.
Simplicity: The logic is easy to maintain.

Example Implementation

Here’s a simplified version of the approach:

def records_search_improved
  records = Records.valid_and_active
  # ... apply filters and sorting ...

  current_records_loaded_count = page * per_page
  next_page_first_element = records.offset(current_records_loaded_count).limit(1).first
  has_next_page = next_page_first_element.present?
  total_entries = has_next_page ? current_records_loaded_count + 1 : records.size

  records = records.paginate(page: page, per_page: per_page, total_entries: total_entries)
  # ... further processing ...
end

When to Use This Pattern

When you have very large datasets.
When the exact total count is not critical for the UI.
When you want to improve response times for paginated endpoints.

Alternative Solutions

While the above method is a great fit for many use cases, there are other strategies you can consider for efficient pagination with large datasets:

1. Cursor-Based Pagination

Instead of using page numbers and offsets, cursor-based pagination uses a unique, sequential value (like an ID or timestamp) to fetch the next set of records. This method is highly performant for large or frequently changing datasets, as it avoids the performance issues of large offsets.

Example:

# Fetch records after a given ID
Record.where('id > ?', last_seen_id).limit(per_page)

Cursor-based pagination is ideal for APIs and feeds where you only need to move forward (or backward) through a dataset.

2. Estimate Count Instead of Exact Count

If you still need to display an approximate total, you can use PostgreSQL’s system tables to get a fast estimate:

result = ActiveRecord::Base.connection.execute("SELECT reltuples::bigint FROM pg_class WHERE relname = 'models';")
approximate_count = result[0]['reltuples']

This query returns an estimated row count for the table, which is much faster than a full COUNT(*). While not exact, it’s often “good enough” for user interfaces that just need to show a rough idea of the total.

Conclusion

By rethinking how you calculate the total number of entries for pagination, you can dramatically improve the performance of your Rails applications, especially as your data grows. This approach is particularly well-suited for infinite scroll and similar interfaces where knowing the exact total isn’t necessary.

Efficient Pagination for Large Datasets in Rails: Avoiding the COUNT Trap