Optimizing DRF LimitOffsetPagination performance to avoid slow count queries.

Table of contents

TLDR

LimitOffsetPagination in Django can cause performance issues due to the expensive count query, especially with large datasets or search_fields. To improve performance, you can override the default pagination to skip the count query.

The issue

LimitOffsetPagination implementation is straight forward.

class LimitOffsetPagination(BasePagination):
    # ...
    def paginate_queryset(self, queryset, request, view=None):
        self.request = request
        # The 'limit' parameter is retrieved from the request
        self.limit = self.get_limit(request)
        if self.limit is None:
            return None

        # Count query
        self.count = self.get_count(queryset)
        # The 'offset' parameter is retrieved from the request
        self.offset = self.get_offset(request)
        # We need the count to not 'overshoot' when using the
        # retrived 'offset' parameter
        if self.count > self.limit and self.template is not None:
            self.display_page_controls = True
        if self.count == 0 or self.offset > self.count:
            return []
        # The limit and offset parameters applied to the initial query
        return list(queryset[self.offset:self.offset + self.limit])

    # ...

We only need 2 SQL queries: count and the initial select query with limit + offset applied. And here is the potential performance issue.

Count can be a quite expensive operation. If you have more than 1 million rows in the table then the count query can take more then 2 secons.

💡
For instance postgres docs mention the issue with slow counting.

Even if you don’t have that much data you might experience similar performance drops when using search_fields for your viewset.

class ProductViewSet(
    mixins.ListModelMixin,
    mixins.RetrieveModelMixin,
    viewsets.GenericViewSet,
):
    pagination_class = LimitOffsetPagination
    search_fields = ["name", "full_name", "description", "type", "website"]

Here’s the resulting count query for searching a table product.

SELECT COUNT(*) AS "__count"
FROM "product"
WHERE (
    UPPER("product"."name"::text) LIKE UPPER('%table%')
    OR UPPER("product"."full_name"::text) LIKE UPPER('%table%')
    OR UPPER("product"."description"::text) LIKE UPPER('%table%')
    OR UPPER("product"."type"::text) LIKE UPPER('%table%')
    OR UPPER("product"."website"::text) LIKE UPPER('%table%')
)

If you don’t have any indexes then the execution time for the table with 800k rows can be more than 5 seconds!

💡
Fun fact: Django admin docs mention show_full_result_count parameter which disables the count query for the model. Also the note for theautocomplete_fields parameter states that the default paginator may cause performance issues because of the count query.

Fixing issue

Thankfully the Django rest framework is flexible enough to let us override built in pagination styles. DRF maintainers even encourage us to do so.

There are a few methods to deal with count issue including:

I’ve settled with this stackoverflow answer which basically removes count query:

class NoCountPagination(LimitOffsetPagination):
    def get_paginated_response(self, data):
        return Response(
            {
                "next": self.get_next_link(),
                "previous": self.get_previous_link(),
                "results": data,
            }
        )

    def paginate_queryset(self, queryset, request, view=None):
        self.offset = self.get_offset(request)
        self.limit = self.get_limit(request)

        # Get one extra element to check if there is a "next" page
        q = list(queryset[self.offset : self.offset + self.limit + 1])
        self.count = self.offset + len(q) if len(q) else self.offset - 1
        if len(q) > self.limit:
            q.pop()

        self.request = request
        if self.count > self.offset + self.limit and self.template is not None:
            self.display_page_controls = True

        return q

    def get_paginated_response_schema(self, schema):
        ret = super().get_paginated_response_schema(schema)
        del ret["properties"]["count"]
        return ret
0
Subscribe to my newsletter

Read articles from Michael Interface directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Michael Interface
Michael Interface