Rails Creator - Catching stragglers
Setting
This came up with a need to implement a creator to centralize creation of a model, instead of using Active Record Callbacks (due to associated complexity and prior issues with them). But here's the conundrum:
How do you know every instance of a creation is actually using the Creator Pattern?
This type of problem can be particularly hard to solve, since there are a few sneaky ways in which a model object can be created. Here's a sample, off the top of my head, from most obvious to hardest to find:
Model.create
/Model.create!
Model.new
followed byModel.save
orModel.save!
- The elusive nested attributes. Particularly sneaky if you're not familiar with that bit of Rails magic.
- Storing the Model in a variable, and then running the creation methods against the variable. Albeit powerful, this can be particularly hard to discover, especially when coupled with Dependency Injection. E.g.:
def self.creation(model_variable, attributes) model_variable.create!(attributes) end
The Problem
We want to answer the following questions:
- Are there instances where our model creates an object without using the Creator?
- For any such instance, where is it located in the codebase?
Towards a solution
Glimpse
In what I could only describe as a fever dream, I suddenly had a glimpse of hope. What if the Active Record Callbacks, which initially caused us grief, could actually help us? This gives us an entry into every instance of a creation. But then...what?
Initial Approach
Once we are actually using the Active Record Callback, the new question to ask is "How do we know where we came from?". This is to answer our two questions:
- We want to identify whether the object is created by the Creator, which should be our "centralized pattern".
- If we're not coming from the Creator, then we want to know how we got there in order to report it.
Luckily, programming languages tend to keep track of the Stack Trace (particularly helpful when raising an error). And ruby exposes it.
Proof of Concept
Equipped with the trusty console, binding.pry and this wild idea, my journey began.
I opted to create a very simple model (Primary
) and started to test things out. My initial model looked as follows, with one after_create
hook, simply connected to a binding.pry
.
class Primary < ApplicationRecord
after_create :test_location
private
def test_location
binding.pry
end
end
I then went into the console, created a new Primary
and looked at the StackTrace. My first surprise was with how much of the trace was populated with rails activity, mostly from ActiveRecord
and ActiveSupport
. This meant that gems themselves would make the trace harder to parse, and it wouldn't be as simple as seeing what line we had right before the callback.
After some research, I discovered that ActiveSupport
actually has a backtrace cleaner that can simplify this, although it will come at a performance cost. I thus updated my test_location
method:
def test_location
backtrace = Thread.current.backtrace
bc = ActiveSupport::BacktraceCleaner.new
bc.add_filter { |line| line.gsub(Rails.root.to_s, '') }
bc.add_silencer { |line| /puma|rubygems/.match?(line) }
backtrace = bc.clean(backtrace)
binding.pry
end
Now, the backtrace was actually more succinct and cleaner.
Next, I investigated what the trace looked like from a call within the code. I created a service with a simple creator method and looked at the trace:
class PrimaryCreator
def self.create_primary
Primary.create!
end
end
Here is the backtrace I got:
["/app/models/primary.rb:7:in `backtrace'",
"/app/models/primary.rb:7:in `test_location'",
"/app/services/primary_creator.rb:3:in `create_primary'",
"(irb):1:in `<top (required)>'",
"bin/rails:4:in `require'",
"bin/rails:4:in `<main>'"]
Just as desired, we see right away where it was created. Now, as a simple improvement, let's update the test_location
to test whether we're coming from the creator:
def test_location
backtrace = Thread.current.backtrace
bc = ActiveSupport::BacktraceCleaner.new
bc.add_filter { |line| line.gsub(Rails.root.to_s, '') }
bc.add_silencer { |line| !/primary_creator/.match?(line) }
creator_used = bc.clean(backtrace).size.positive?
binding.pry
end
Let's now test this with a few more scenarios to test "creation leaks":
class PrimaryLeakAnalysis
def self.leak
primary_variable = Primary
primary_variable.create!
end
def self.use_creator
PrimaryCreator.create_primary
end
end
Here, if we run PrimaryLeakAnalysis.leak
, we'll get a false value for creator_used
, whereas if we run PrimaryLeakAnalysis.use_creator
, creator_used
will be true.
Lastly, let's test the situation with nested attributes. For this, we create a Secondary
model:
class Secondary < ApplicationRecord
belongs_to :primary
accepts_nested_attributes_for :primary
end
Let's also create a new method for this situation:
def self.nested_leak
params = { primary_attributes: {} }
Secondary.create(params)
end
Here, once we call this method, we once again see that our trace doesn't come from PrimaryCreator
, and can trace it back to coming from PrimaryLeakAnalysis.nested_leak
. However, it is important to note that the backtrace has no explicit indication that we're creating a Primary
object through creation of a Secondary
object.
Reporting
We now have a way to detect and identify these instances, what next? Well, that's up to you. We want to identify them and take an action on it. but there are different actions we could take, each with tradeoffs. Here are a few:
Raise Error
A simple solution could be to raise an error and then look at the report to analyze the error and backtrace, but this could result in problematic behavior. The object is already created, there might have been other side-effects, and we're suddenly stopping execution. Even if we were to use before_create
instead, we might already have created other objects, and we could reach an unexpected state.
Log these instances
You could simply log these instances, in the same way you usually handle other Rails logs. One potential concern with this approach, however, is that you might not get alerted, and so need to be intentional about following up.
Alert without raising an error
You could use a custom approach to notify you and your team whenever such an occurrence arises. Do keep in mind that this could become very noisy if you missed an important spot.
Performance concerns and improvements
Performance cost
While we now have a solution (at runtime) to our initial problem, finding instances where we create an object from outside our Creator, let's take a look at the actual solution. One big concern when looking at this solution is that it might be costly. At each object creation, we suddenly have a few more costly steps we need to take, since we need to parse through the backtrace. There are ways to make this parsing leaner, but this approach still requires parsing the backtrace (or using an alternate backtrace, which in itself would make every operations more costly).
Potential improvements
There are a few things we could do to avoid this cost:
Set a global variable
One naive solution would be to simply set the value of a global variable from within the creator and to check whether it is set. However, there are a lot of things we need to take into consideration, such as:
- Concurrency
- Resetting the variable (What if it somehow fails to reset?)
- Uniqueness of the variable
Create a new dedicated attribute on the model
Another solution would be to create a new attribute on our model dedicated to this purpose. While this would add irrelevant data to the model's table and "pollute" it, it's also not unheard of. It would be a more "efficient" way of handling this, but would remain inelegant. Moreover, some other engineer could simply not understand this pattern and blindly update that column outside of the Creator.
Testing Concerns
Often, people use factories purely for their tests/specs. If the factory itself calls upon the model directly, instead of the creator, this could be problematic: those calls would be detected as not coming from the creator and pass through your reporting process. A few thoughts on this matter:
- Since the whole code base should use the creator, you might consider it meaningless to create the object without the creator, making this a non-issue (although there could be some performance concerns).
- You could simply check the environment and skip the process if you're in the testing environment (although, beware of this in tests where you change the environment).
- Your reporting process might already be disabled in the testing environment (e.g.: a notification where the webhook is inactive) or insignificant to you and your team (e.g.: logs that don't raise errors).
Regardless, this remains a concern to consider.
Edge Case
Of course, things can always get complicated. One potential edge-case would be if the creator for "model A" makes a call that eventually, but unexpectedly, creates an object for "model A" without using the creator. Indeed, this would be detected as coming from our creator, despite the fact it's unintentional. However, if that's the case, there might be different concerns about how the creator pattern is used and the rise of circular dependencies.
Potential extension: Update
The next logical step would be to consider how we would handle this when centralizing the process of updating a model object, as opposed to creating a model object. We should be able to use the same initial solution we've highlighted with the backtrace, replacing after_create
with after_update
(or even after_save
to combine both create and update). And considering how much harder it can be to find instances of a model's object being updated than being created (since we usually need to store the object in a variable), this seems all the more critical. However, we need to be careful with some of our options.
First, since it's harder to identify all locations where we update an object for a specific model, missed instances are likely to be more prevalent, meaning a lot of potential noise through notifications.
Next, not all solutions highlighted for create
work as easily for update
. For instance, one would need to be more careful with the "dedicated attribute" approach. Unlike creation (a one-time action), updates can happen many times throughout the life-cycle of an object, so we need to consider not just the current update, but also the next one. If we still wanted to take this approach, we do have a few options:
- We could update the attribute back during the callback. However, this is not straightforward. This second update would be detected as not respecting the pattern, so you would need a special clause to test against this specific change. Moreover, this would mean updating the object twice for one set of changes.
- This new column could be
datetime
, and we could relate it back to anupdated_at
column. However, one needs to be particularly careful about proper handling of differentdatetime
attributes when they need to be correlated.
Final Thoughts
Here, I've outlined an outlandish way to test whether the Creator pattern is respected. Note that there might be other ways to test this out, and also better improvements than those suggested here. Moreover, it's quite likely that there already is extended literature on this, but I wanted to explore this concept and share my thoughts before researching it.
Feel free to share your thoughts and suggestions, and any references to existing solutions to this problem!
Ideal Solution
I would also like to highlight that, ideally, the kind of solution we'd want would not happen at runtime, but would instead analyze the code itself. In theory, this should be possible, but would likely involve significant effort and require intricate knowledge of Rails' internals. Moreover, it would likely be susceptible to changes in Rails.
Subscribe to my newsletter
Read articles from Shawn directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by