N+1s and Performance
In 1974 Donald Knuth wrote:
"Programmers waste enormous amounts of time thinking about, or worrying about, the speed of noncritical parts of their programs, and these attempts at efficiency actually have a strong negative impact when debugging and maintenance are considered. We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%."
This is an excellent statement, one that suggests that I should not be doing any optimisation until I know there is a problem that needs resolving. As an experienced Rails developer though I know specific cases that will lead to performance problems and they can be identified and fixed early on without adding much complexity or getting in the way of debugging and maintenance.
I'm not going to go into all possible performance problems in this blog, and besides I don't know if they are all relevant yet. But there are two that can be addressed right now; N+1 queries and indexes for filter+sort queries. Both of these start to come to the fore as more data is added to the application. I now have a table with around 1.4m records in it which while not massive, is definitely enough to start exhibiting some issues, especially on filtering and sorting.
Identifying and Fixing N+1s
There are many excellent articles that go into what an N+1 query is so I won't reproduce any of that here, but suffice to say it can lead to poor performance by having extra unnecessary queries being executed. I've linked an excellent article below for more in-depth analysis.
As this is a well known problem there are a few existing tools for helping to identify when these types of queries are occurring.
Bullet vs Prosopite
Bullet is a gem that has been around for quite a while, it watches the queries your application generates. It looks for patterns that suggest there should eager loading added or counter caches added. Its output will suggest what should be one done to resolve the issues it finds.
Prosopite is a newer gem that again examines your queries looking for N+1 queries. Its claim is that it does not produce any false negatives or false positives unlike Bullet. However it will not make any suggestions as to how to resolve the issues it finds.
For my application I'm going to choose Prosopite, as I don't feel I need to be prompted with what the correct resolution for a problem is and would prefer to only be informed of problems with better accuracy.
Installation is pretty straightforward. Add the gem to the Gemfile, as well as the pg_query
gem which is required if using any database other than MySQL/MariaDB (I am using Postgres). Be aware that if you're on MacOS then XCode and it's command line tools need to be installed properly so that pg_query
can be built.
Gemfile
group :development, :test do
# N+1 query locator
gem 'prosopite'
gem 'pg_query'
end
Now it needs to be invoked in the controller. I only want this to be executed if this is not the production environment.
app/controllers/application_controller.rb
class ApplicationController < ActionController::Base
include Localisation
add_flash_types :success, :information, :error, :warning
unless Rails.env.production?
around_action :n_plus_one_detection
def n_plus_one_detection
Prosopite.scan
yield
ensure
Prosopite.finish
end
end
end
Lastly it needs to report any issues it finds somewhere. I'll specify that it should appears in the Rails log output and also in a file (log/prosopite.log
).
config/environments/development.rb
Rails.application.configure do
# Settings specified here will take precedence over those in config/application.rb.
...
config.after_initialize do
Prosopite.rails_logger = true
Prosopite.prosopite_logger = true
end
end
Now on to fixing any N+1 issues that are found. There are a few different options here as well. The first is to take it out of my hands altogether and use a gem such as goldiloader
which will automatically apply eager loading if it determines the ActiveRecord needs it. I think instead I'd prefer to have a little more control and know exactly when eager loading will be applied.
I can do some collection preloading semi-automatically in the controller. I'll specify that my filtered and sorted results should be using specified set of preloads. If the specific controller does not define this then nothing is preloaded.
app/controllers/concerns/filtered_sorted.rb
def set_collection
@collection = @q.result.preload(collection_preloads)
end
def collection_preloads
[]
end
An example is the product translations controller which returns a collection of translations. It will also be retrieving information from the product that owns the translation so it makes sense here to preload that product.
app/controllers/product_translations_controller.rb
private
def collection_preloads
[:product]
end
Another way to prevent lazy loading occurring without realising it is to prevent it from happening at all. Rails 6.1 added a method called strict_loading
that can achieve this.
It can be used on a specific query or it can be applied at the relationship level in the model. I'll go with the latter for consistency.
app/models/item_sell_pack.rb
class ItemSellPack < ApplicationRecord
include Broadcast
has_many :item_sell_pack_aliases, dependent: :destroy, strict_loading: true
...
Here is what happens if I try to use that association without eager loading now:
> p = ItemSellPack.find(22)
ItemSellPack Load (3.5ms) SELECT "item_sell_packs".* FROM "item_sell_packs" WHERE "item_sell_packs"."id" = $1 LIMIT $2 [["id", 22], ["LIMIT", 1]]
=> #<ItemSellPack:0x00007f7a83390648 id: 22, name: "carton", canonical: true, created_at: Fri, 07 Oct 2022 11:07:51.106520000 UTC +00:00, updated_at: Fri, 07 Oct 2022 11:07:51.106520000 UTC +00:00>
> p.item_sell_pack_aliases.first
/activerecord-7.0.3.1/lib/active_record/core.rb:242:in `strict_loading_violation!': `ItemSellPack` is marked for strict_loading. The ItemSellPackAlias association named `:item_sell_pack_aliases` cannot be lazily loaded. (ActiveRecord::StrictLoadingViolationError)
Now trying again but this time preloading the association. It allows me to return the first item from the association without error.
> p = ItemSellPack.preload(:item_sell_pack_aliases).find(22)
ItemSellPack Load (0.3ms) SELECT "item_sell_packs".* FROM "item_sell_packs" WHERE "item_sell_packs"."id" = $1 LIMIT $2 [["id", 22], ["LIMIT", 1]]
ItemSellPackAlias Load (0.2ms) SELECT "item_sell_pack_aliases".* FROM "item_sell_pack_aliases" WHERE "item_sell_pack_aliases"."item_sell_pack_id" = $1 [["item_sell_pack_id", 22]]
=> #<ItemSellPack:0x00007f7a8e2acc08 id: 22, name: "carton", canonical: true, created_at: Fri, 07 Oct 2022 11:07:51.106520000 UTC +00:00, updated_at: Fri, 07 Oct 2022 11:07:51.106520000 UTC +00:00>
> p.item_sell_pack_aliases.first
=> #<ItemSellPackAlias:0x00007f7a8e2a4f58 id: 1, item_sell_pack_id: 22, alias: "cart", confirmed: true, created_at: Fri, 07 Oct 2022 16:27:20.669752000 UTC +00:00, updated_at: Fri, 07 Oct 2022 16:27:20.669752000 UTC +00:00>
Filtering/Sorting and indexes
Executing a query that contains a where
or order by
clause can potentially cause performance problems depending on the size of the table being queried. The filtering and sorting I am doing on my index pages obviously do just that, so should there be indexes placed on them?
First I'll get some baselines for the two types of filtering queries I'll generally be doing:
-- A "contains" filter, this is converted by ransack to an ILIKE
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_description ILIKE '%bread%'
Gather (cost=1000.00..40005.01 rows=124 width=394) (actual time=4.898..241.176 rows=13539 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on products (cost=0.00..38992.61 rows=52 width=394) (actual time=2.107..228.841 rows=4513 loops=3)
Filter: (item_description ~~* '%bread%'::text)
Rows Removed by Filter: 426118
Planning Time: 0.194 ms
Execution Time: 242.147 ms
-- An "eq" filter, this is converted by ransack to a straight where query
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_sell_pack_name = 'carton'
Seq Scan on products (cost=0.00..48412.66 rows=373659 width=394) (actual time=0.007..486.769 rows=365761 loops=1)
Filter: ((item_sell_pack_name)::text = 'carton'::text)
Rows Removed by Filter: 926132
Planning Time: 0.117 ms
Execution Time: 506.680 ms
Now what happens if I introduce indexes to these attributes? Here are the indexes to add (see here for more on ILIKE indexes):
-- Index for ILIKE on item_description. Note that this is best done with a trigram index.
CREATE EXTENSION pg_trgm;
CREATE INDEX idx_products_item_description_gin ON products USING gin (item_description gin_trgm_ops);
-- Index for item_sell_pack_name. A standard index can be used here
CREATE INDEX idx_products_item_sell_pack_name ON products (item_sell_pack_name NULLS LAST);
And now to the results:
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_description ILIKE '%bread%'
-- POST INDEX
Bitmap Heap Scan on products (cost=40.96..515.45 rows=124 width=394) (actual time=4.915..18.227 rows=13539 loops=1)
Recheck Cond: (item_description ~~* '%bread%'::text)
Rows Removed by Index Recheck: 88
Heap Blocks: exact=8307
-> Bitmap Index Scan on idx_products_item_description_gin (cost=0.00..40.93 rows=124 width=0) (actual time=3.729..3.729 rows=13627 loops=1)
Index Cond: (item_description ~~* '%bread%'::text)
Planning Time: 0.127 ms
Execution Time: 18.976 ms
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_sell_pack_name = 'carton'
Bitmap Heap Scan on products (cost=4168.28..41103.02 rows=373659 width=394) (actual time=17.142..227.549 rows=365761 loops=1)
Recheck Cond: ((item_sell_pack_name)::text = 'carton'::text)
Heap Blocks: exact=29074
-> Bitmap Index Scan on idx_products_item_sell_pack_name (cost=0.00..4074.87 rows=373659 width=0) (actual time=11.628..11.629 rows=365761 loops=1)
Index Cond: ((item_sell_pack_name)::text = 'carton'::text)
Planning Time: 0.060 ms
Execution Time: 247.326 ms
An obvious substantial bump in performance by doing both of those. The "contains" query went from 242.147 ms
to an amazing 18.976ms
and the "eq" query went from 506.680 ms
to a still impressive 247.326 ms
. So it seems like a no-brainer to add indexes to the attributes that can be filtered.
But what happens if sorting is thrown into the mix too? I'll leave the existing indexes in for the attribute in the where clause. Here's the baseline pre-index results for some scenarios:
-- A "contains" style filter sorting on the same column
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_description ILIKE '%bread%'
ORDER BY item_description DESC
Sort (cost=519.76..520.07 rows=124 width=394) (actual time=33.207..34.666 rows=13539 loops=1)
Sort Key: item_description DESC
Sort Method: quicksort Memory: 3940kB
-> Bitmap Heap Scan on products (cost=40.96..515.45 rows=124 width=394) (actual time=4.931..18.895 rows=13539 loops=1)
Recheck Cond: (item_description ~~* '%bread%'::text)
Rows Removed by Index Recheck: 88
Heap Blocks: exact=8307
-> Bitmap Index Scan on idx_products_item_description_gin (cost=0.00..40.93 rows=124 width=0) (actual time=3.793..3.793 rows=13627 loops=1)
Index Cond: (item_description ~~* '%bread%'::text)
Planning Time: 0.172 ms
Execution Time: 35.404 ms
-- A "contains" style filter sorting on a different column
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_description ILIKE '%bread%'
ORDER BY brand DESC
Sort (cost=519.76..520.07 rows=124 width=394) (actual time=31.854..33.330 rows=13539 loops=1)
Sort Key: brand DESC
Sort Method: quicksort Memory: 3940kB
-> Bitmap Heap Scan on products (cost=40.96..515.45 rows=124 width=394) (actual time=5.004..19.316 rows=13539 loops=1)
Recheck Cond: (item_description ~~* '%bread%'::text)
Rows Removed by Index Recheck: 88
Heap Blocks: exact=8307
-> Bitmap Index Scan on idx_products_item_description_gin (cost=0.00..40.93 rows=124 width=0) (actual time=3.903..3.904 rows=13627 loops=1)
Index Cond: (item_description ~~* '%bread%'::text)
Planning Time: 0.143 ms
Execution Time: 34.104 ms
-- An "eq" style filter sorting on a different column
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_sell_pack_name = 'carton'
ORDER BY brand ASC
Gather Merge (cost=109218.49..145548.93 rows=311382 width=394) (actual time=310.593..459.772 rows=365761 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=108218.47..108607.70 rows=155691 width=394) (actual time=304.356..340.605 rows=121920 loops=3)
Sort Key: brand
Sort Method: external merge Disk: 23984kB
Worker 0: Sort Method: external merge Disk: 20168kB
Worker 1: Sort Method: external merge Disk: 20064kB
-> Parallel Bitmap Heap Scan on products (cost=4168.28..38378.43 rows=155691 width=394) (actual time=14.927..119.857 rows=121920 loops=3)
Recheck Cond: ((item_sell_pack_name)::text = 'carton'::text)
Heap Blocks: exact=10737
-> Bitmap Index Scan on idx_products_item_sell_pack_name (cost=0.00..4074.87 rows=373659 width=0) (actual time=14.275..14.276 rows=365761 loops=1)
Index Cond: ((item_sell_pack_name)::text = 'carton'::text)
Planning Time: 0.094 ms
Execution Time: 483.388 ms
Definitely slower by adding sorting but does it help adding indexes to the columns being sorted as well as the column being filtered on?
CREATE INDEX idx_products_brand ON products (brand NULLS LAST);
-- A "contains" style filter sorting on the same column
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_description ILIKE '%bread%'
ORDER BY item_description DESC
Sort (cost=519.76..520.07 rows=124 width=394) (actual time=33.733..34.674 rows=13539 loops=1)
Sort Key: item_description DESC
Sort Method: quicksort Memory: 3940kB
-> Bitmap Heap Scan on products (cost=40.96..515.45 rows=124 width=394) (actual time=4.869..19.193 rows=13539 loops=1)
Recheck Cond: (item_description ~~* '%bread%'::text)
Rows Removed by Index Recheck: 88
Heap Blocks: exact=8307
-> Bitmap Index Scan on idx_products_item_description_gin (cost=0.00..40.93 rows=124 width=0) (actual time=3.792..3.792 rows=13627 loops=1)
Index Cond: (item_description ~~* '%bread%'::text)
Planning Time: 0.143 ms
Execution Time: 35.429 ms
-- A "contains" style filter sorting on a different column
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_description ILIKE '%bread%'
ORDER BY brand DESC
Sort (cost=519.76..520.07 rows=124 width=394) (actual time=30.733..32.557 rows=13539 loops=1)
Sort Key: brand DESC
Sort Method: quicksort Memory: 3940kB
-> Bitmap Heap Scan on products (cost=40.96..515.45 rows=124 width=394) (actual time=4.829..18.267 rows=13539 loops=1)
Recheck Cond: (item_description ~~* '%bread%'::text)
Rows Removed by Index Recheck: 88
Heap Blocks: exact=8307
-> Bitmap Index Scan on idx_products_item_description_gin (cost=0.00..40.93 rows=124 width=0) (actual time=3.748..3.748 rows=13627 loops=1)
Index Cond: (item_description ~~* '%bread%'::text)
Planning Time: 0.154 ms
Execution Time: 33.339 ms
-- An "eq" style filter sorting on a different column
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_sell_pack_name = 'carton'
ORDER BY brand ASC
Gather Merge (cost=109218.49..145548.93 rows=311382 width=394) (actual time=306.586..453.547 rows=365761 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=108218.47..108607.70 rows=155691 width=394) (actual time=300.240..337.387 rows=121920 loops=3)
Sort Key: brand
Sort Method: external merge Disk: 23944kB
Worker 0: Sort Method: external merge Disk: 20224kB
Worker 1: Sort Method: external merge Disk: 20056kB
-> Parallel Bitmap Heap Scan on products (cost=4168.28..38378.43 rows=155691 width=394) (actual time=13.227..119.005 rows=121920 loops=3)
Recheck Cond: ((item_sell_pack_name)::text = 'carton'::text)
Heap Blocks: exact=10754
-> Bitmap Index Scan on idx_products_item_sell_pack_name (cost=0.00..4074.87 rows=373659 width=0) (actual time=13.178..13.178 rows=365761 loops=1)
Index Cond: ((item_sell_pack_name)::text = 'carton'::text)
Planning Time: 0.104 ms
Execution Time: 475.886 ms
This has made a negligible difference to the results! Why? Notice how the newly added index for brand is not even used in the query explanation. Postgres has decided that it can use just the index for the field being queried and then apply a Quicksort to it. This article explains a bit more on why: https://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by
The gist of the article is that unless the index contains both fields, that is the one for the where clause and the one for the order by, then the index will not get used. But even if a composite index is added, while it does lead to fewer operations it is not any faster. Let's see that in action here:
CREATE INDEX idx_products_item_sell_pack_name_brand ON products (item_sell_pack_name NULLS LAST, brand NULLS LAST);
EXPLAIN ANALYZE SELECT *
FROM products
WHERE item_sell_pack_name = 'carton'
ORDER BY brand ASC
Index Scan using idx_products_item_sell_pack_name_brand on products (cost=0.43..111989.26 rows=373659 width=394) (actual time=0.033..509.517 rows=365761 loops=1)
Index Cond: ((item_sell_pack_name)::text = 'carton'::text)
Planning Time: 0.130 ms
Execution Time: 532.557 ms
So now Postgres is using the composite index but it still isn't any faster than letting it do a Quicksort itself.
Ok, how about if I'm doing an order by without a where clause? Will an index help there?
-- Order the products table without a where clause
EXPLAIN ANALYZE SELECT *
FROM products
ORDER BY brand ASC
Gather Merge (cost=284913.70..410523.24 rows=1076578 width=394) (actual time=878.058..1354.875 rows=1291893 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=283913.68..285259.40 rows=538289 width=394) (actual time=873.688..996.489 rows=430631 loops=3)
Sort Key: brand
Sort Method: external merge Disk: 76880kB
Worker 0: Sort Method: external merge Disk: 74576kB
Worker 1: Sort Method: external merge Disk: 73144kB
-> Parallel Seq Scan on products (cost=0.00..37646.89 rows=538289 width=394) (actual time=0.037..200.792 rows=430631 loops=3)
Planning Time: 0.079 ms
Execution Time: 1454.186 ms
-- Then add an index on that field being ordered
CREATE INDEX index_products_on_brand ON products (brand NULLS LAST);
-- And run again
EXPLAIN ANALYZE SELECT *
FROM products
ORDER BY brand ASC
Index Scan using index_products_on_brand on products (cost=0.43..154214.17 rows=1291893 width=394) (actual time=0.061..1548.000 rows=1291893 loops=1)
Planning Time: 0.087 ms
Execution Time: 1622.027 ms
Nope! It's using the new index but performance is no better (or perhaps slightly worse).
Sorry for the long winded analysis but the takeaway from all of this is definitely add indexes for the fields being used in a where clause but it is not worth doing the same for fields that appear in the order by unless an analysis categorically shows it is better.
What about any other indexes that might be missing? There is a hilariously named gem called lol_dba
that purports to help find missing indexes. I'll try it out to see what it reports.
catalogue_cleanser % lol_dba db:find_indexes
catalogue_cleanser %
Nothing missing, and that makes sense based on what that tool can do. It only examines associations on models and if I've done my schema migrations correctly then things like foreign keys and so on are automatically indexed now due to Rails magic. Ideally it could also examine the use of where
clauses and suggest indexes for that too. I note that this has been added as an enhancement request: https://github.com/plentz/lol_dba/issues/101
So that wraps up for now the steps I'm taking to ensure better application performance. In future I'll be looking at the various Application Performance Monitoring (APM) tools that exist (such as Sentry, NewRelic, DataDog etc). This will mean I can gather some data on where performance needs to be looked at before attempting to preemptively fixing anything.
Useful links
https://blog.appsignal.com/2020/06/09/n-plus-one-queries-explained.html
https://github.com/charkost/prosopite
https://github.com/flyerhzm/bullet
https://dev.to/delbetu/join-vs-includes-vs-eager-load-vs-preload-36l3
https://engineering.gusto.com/a-visual-guide-to-using-includes-in-rails/
https://use-the-index-luke.com/sql/sorting-grouping/indexed-order-by
https://github.com/plentz/lol_dba
Subscribe to my newsletter
Read articles from Andrew Foster directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Andrew Foster
Andrew Foster
I am a web developer who has been in the industry since 1995. My current tech stack preference is Ruby on Rails with JavaScript. Originally from Australia but now living in Scotland.