How to not get into trouble with Active Record
Active Record is a layer, in the Rails application, that makes database manipulation easier but can also harm your application’s performance and reliability if not used properly. This article is a list of good practices that you should follow, and harmful practices that you should avoid.
Improve the performance, produce fewer bugs, and make the code responsible for database communication readable and testable by selecting the right Active Record features for the given scenario.
All my notes are based on years of hands-on experience at iRonin.IT - a top software development company, where we provide custom software development and IT staff augmentation services for a wide array of technologies.
Messing with one million
All examples included in this article are tested against one million of records in the database. I did this so you can see what is the difference in memory usage and execution time for the given scenario.
Generating test data
You can fill your database with the test data by generating the User
model with the following definition:
rails g model User first_name:string last_name:string age:integer email:string
And creating the records with FFaker gem:
1.upto(1_000_000) do
User.create!(
first_name: FFaker::Name.first_name,
last_name: FFaker::Name.last_name,
age: (1..99).to_a.sample,
email: FFaker::Internet.email
)
end
Collecting the information about the performance
I collected execution time with the Benchmark library:
Benchmark.measure { … }
I measured allocated memory with the memory_profiler gem:
# bundle add memory_profiler
report = MemoryProfiler.report do
# execute code
end
report.pretty_print(scale_bytes: true)
The test data that you will create will be different from the one I created so the results can be different on your machine.
Performance
Fetching large collections of records
Is your database filled with thousands or millions of records? Loading them at once into the memory is one of the easiest ways to crash your application in the production environment.
Avoid: using each
and all
with Active Record queries that are designed to return many records:
User.all.each do |user|
puts user.id
end
Use: find_each
to pull records in batches. It won’t load all records into the database and you can use it like an ordinary each method to iterate through all items in the collection:
User.find_each do |user|
puts user.id
end
Selecting values for one column
It often happens that you have a collection of records but you want to receive the array of values for the given column. Doing it wrong you can load too many records into memory instead of doing the heavy-lifting on the database level.
Avoid: using map
or collect
directly on the collection produced by the Active Record:
User.where(age: 30).map(&:email)
Use: pluck
to narrow the query to the database and reduce the time needed to pull the records:
User.where(age: 30).pluck(:email)
Messing with one million
My test database contains exactly 10 186 records where a user has an age equal to 30.
Method | Time of execution | Memory allocated |
#map | 0.17s | 11.67 MB |
#pluck | 0.04s | 1.75 MB |
Counting records
It seems that there is no simpler task than counting the records for a given query. Nothing is more wrong, especially when it comes to databases with thousands or millions of records.
In Rails, we have three methods for counting items in the Active Record collection: size
, count
, and length
. If you will use the wrong one, you can harm the performance of your application and make way more queries to the database than necessary.
It’s enough for you to memorize the following:
When you use
length
, it always loads all records into memory and then counts themUse
count
if you want to perform SQL count query as this method always does it even when you pulled the collection earliersize
is the most flexible one as it performs SQL count query if records are not loaded in the memory and it counts the elements from memory when they are already loaded
An additional advantage of using size
is that you can benefit from using cache counters as Active Record handles them automatically.
Messing with one million
My test database contains exactly 9 960 records where a user has an age equal to 27. If I want to count all users with the age of 27 the results are the following:
Method | Time of execution | Memory allocated |
#count | 0.036s (same if records are loaded) | 8.34kb |
#size | 0.035s (0.000007 if records are loaded) | 8.34kb |
#length | 0.19s (0.000007 if records are loaded) | 9.45mb |
Bulk insert and update
You can harm your application’s performance not only by pulling the information from the database but also by inserting it (especially when it comes to inserting a lot of data).
When you need to insert or update multiple records it is always good to consider doing it in bulk mode.
Avoid: creating records in loop:
users_data.each do |attrs|
User.create!(attrs)
end
Use: create records in bulk:
User.insert_all(users_data)
Of course, inserting records in bulk is not always as easy as it seems to be. You have to deal with duplicate records or other cases but the insert_all
method can handle this also with some additional parameters.
Messing with one million
If we would like to create 20 records the difference between creating in loop and using the bulk mode is the following:
Method | Time of execution | Memory allocated |
.create | 0.035s | 590.54kb |
.insert_all | 0.0076s | 65.06kb |
Let the database do the job
It sounds obvious but the practice sometimes looks different especially when we are in hurry or when we work with the old legacy codebases. By saying about letting the database do the job I mean stuff like sorting, filtering, etc.
Avoid: doing with Ruby operations that can be performed on SQL level
User.where(age: 51).select { |user| user.first_name.starts_with?('A') }.take(10)
Use: SQL to perform actions for better performance
User.where(age: 51).where("first_name LIKE 'A%'").limit(10)
Messing with one million
Method | Time of execution | Memory allocated |
Ruby | 0.19s | 12.61mb |
SQL | 0.000158s | 3.15kb |
Checking the record presence
Another simple yet tricky thing. Rails developers got used to use .present?
or .blank?
for checking the presence of the data given variable or collection. However, calling these methods on Active Record’s collection can introduce performance problems.
If you don’t want to mess up the performance, remember:
Calling
.exists?
will trigger an SQL query even if records are already loaded into the memoryCalling
.present?
or.blank?
will load records into memory so use them only if you are doing something else with these records later onIf you want to load the collection and check for its existence, use
@collection.load.any?
,@collection.load.none?
or@collection.load.none?
Data integrity
Finding records by…
I bet that in the Ruby world, the “undefined method for NilClass” is one of the most common errors that developers experience, especially when working with big and old legacy codebases.
Each time you use the find_by
method, always take into account that it can return a nil
value. You have to handle the nil value or use find_by! method instead to raise an error when the record is not found.
Avoid: using find_by
method without explicitly handling possible nil
value:
user = User.find_by(email: 'john@doe.com')
user.first_name
Use: find_by!
method if you would like to raise an exception if the record is not present:
user = User.find_by!(...)
user.first_name
Use: find_by
and handle the nil
value explicitly if you don’t want to raise an exception:
user = User.find_by(...)
user&.first_name
# or
if (user = User.find_by(...))
user.first_name
end
The bang methods
Continuing the previous example, the bang method (find_by!
in the previous example) is a special method that will raise an exception in case something went wrong. When saving, creating, or updating the record we can also take advantage of the special version of methods.
Transactions
Using bang methods is extremely important when dealing with database transactions.
Avoid: using non-bang methods when dealing with transactions as transaction won’t be rollbacked when the record won’t be saved:
User.transaction do
user = User.create(...)
SomeService.call(user)
end
Use: bang methods to rollback the whole transaction in case of a problem. The whole idea behind the transactions is to achieve “all or nothing” state where all steps of the process are completed or none of them is performed:
User.transaction do
user = User.create!(...)
SomeService.call(user)
end
Difference between delete and destroy
The most important difference is that when you call destroy, all related callbacks from the model are called, and when you call delete you just simply delete the record from the database. Because of that, the delete option is faster.
If you just want to remove the record without triggering any workflows, validations, and other callbacks, simply go for delete. Otherwise use destroy which is a standard way of removing records from the database in a typical Rails application.
Readability and testability
You can bring trouble to your application not only by calling the database in an ineffective manner but also by writing code that is not readable and hard to maintain.
Use scopes for common use cases
If you use the same queries multiple times in your application, consider moving them to the scope for better maintainability but also for better readability.
Avoid: using the same query multiple times in different places
User.where('age > ?', 17).where(...)
Use: scope to have more readable code and be able to introduce changes faster
scope :adults, -> { where('age > ?', 17) }
User.adults.where(...)
Query objects to isolate the query layer
Scopes are great for smaller and repeatable queries but when it comes to complex queries that are used once or more, the query object pattern might be helpful as well.
Avoid: mixing the logic for more complex queries with other layers of logic
results = User.where('first_name = ? or ?', 'John', 'Angie').where('age > ?', 17).order('age DESC')
results.each do |user|
# do something
end
Use: query object pattern to isolate the logic and test the code easier
AdultJohnOrAngieQuery.call.each do |user|
# do something
end
Didn't get enough of Ruby?
Check out our free books about Ruby to level up your skills and become a better software developer.
Subscribe to my newsletter
Read articles from Paweł Dąbrowski directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Paweł Dąbrowski
Paweł Dąbrowski
I am a software engineer with over 14 years of experience in building commercial web applications of various sizes and technology stacks. I am passionate about writing for both human beings and computers. I work as a CTO @ iRonin