Solving Data Consistency in Elixir with Ecto's prepare_changes
When developing an application, it's common for the state of one piece of data to depend on the state of another. This interdependence introduces complexity, as changes in one dataset must remain consistent with changes in others to avoid conflicts or inaccuracies. This challenge is often known as the data consistency problem.
Intro to Data Consistency Problem
Consider the following simple example app where you have a bank and an account that belongs to a bank.
defmodule ElixirApp.Bank do
use Ecto.Schema
schema "banks" do
field(:name, :string)
field(:is_active, :boolean)
timestamps()
end
end
defmodule ElixirApp.Account do
use Ecto.Schema
schema "accounts" do
field(:name, :string)
field(:currency, :string)
field(:is_active, :boolean)
belongs_to(:bank, Bank)
timestamps()
end
end
In this app, a bank has an "active" status, and if the bank becomes inactive, all accounts associated with it must also be set to inactive. This data consistency rule can be implemented within the bank's domain or context module. For instance, you might implement an update_bank/2
function as shown below:
defmodule ElixirApp.Banks do
alias ElixirApp.{Account, Bank}
alias ElixirApp.Repo
import Ecto.Query, warn: false
def update_bank(%Bank{} = bank, attrs) do
with {:ok, updated_bank} <- do_update_bank(bank, attrs),
{:ok, _num_of_deactivated_accounts} <- maybe_deactivate_bank_accounts(updated_bank) do
{:ok, updated_bank}
end
end
defp do_update_bank(%Bank{} = bank, attrs) do
bank
|> Bank.changeset(attrs)
|> Repo.update()
end
defp maybe_deactivate_bank_accounts(%Bank{active: true}) do
{:ok, 0}
end
defp maybe_deactivate_bank_accounts(%Bank{id: bank_id, active: false}) do
{updated_accounts_count, _} =
from(a in Account,
where: a.bank_id == ^bank_id,
update: [set: [is_active: ^false]]
)
|> Repo.update_all([])
{:ok, updated_accounts_count}
end
end
Within the update_bank/2
function, each time a bank's status is updated, it’s passed to the maybe_deactivate_bank_accounts/1
function. This function ensures that if the bank is now inactive, all associated accounts are also deactivated, maintaining consistency across related data.
The previous implementation works but it has one major drawback that needs to be handled by developers consistently.
All functions that perform an update to a bank must be ensured to implement the same data consistency logic.
This means that:
The data consistency logic code can potentially be duplicated across different modules as the application evolves. I.e. the update operation to a bank takes place outside the
Banks
context module itself.The data consistency logic becomes “unknown unknown” in the long run. This manual step introduces a risk of inconsistency if it’s overlooked, potentially leading to data integrity issues.
The key takeaway is that this implementation could lead to an increase in "unknown unknowns" over time. This risk arises because developers may unintentionally overlook the need for consistent data handling, leading to hidden bugs and data inconsistencies. To mitigate this, it’s essential to encapsulate or abstract this data consistency logic in a more centralized, reliable place. By doing so, we reduce code duplication and ensure that new developers don't need to be aware of this requirement upfront when adding features. This is where Ecto's prepare_changes/2
function comes in, offering an ideal solution for enforcing consistency at the database layer automatically.
How to Use Ecto prepare_changes/2
As explained in the official documentation, the prepare_changes/2
function allows you to specify a function that the repository will execute upon insert, update, or delete operations. This means that any function passed to prepare_changes/2
will only run when the changeset is submitted to the Repo
module. This approach is especially beneficial for abstracting or encapsulating data consistency logic directly within the schema's changeset function, allowing us to enforce consistency while keeping the changeset function itself pure and focused on validation and data transformation.
Let’s use the example app before to see how this function can be used inside the Bank
changeset function to maintain consistency between banks and accounts.
defmodule ElixirApp.Bank do
use Ecto.Schema
alias Ecto.Changeset
import Ecto.Changeset
import Ecto.Query, warn: false
schema "banks" do
field(:name, :string)
field(:is_active, :boolean)
timestamps()
end
@doc false
def changeset(bank, attrs) do
bank
|> cast(attrs, [:name, :is_active])
|> validate_required([:name, :is_active])
|> prepare_changes(&ensure_accounts_status/1)
end
defp ensure_accounts_status(%Changeset{action: :update} = changeset) do
bank_id = get_field(changeset, :id)
bank_active_status = get_field(changeset, :is_active)
if bank_active_status == false do
from(a in "accounts",
where: a.bank_id == ^bank_id,
update: [set: [is_active: ^false]]
)
|> changeset.repo.update_all([])
end
changeset
end
defp ensure_accounts_status(%Changeset{action: _} = changeset) do
changeset
end
end
In the implementation above, the code inside the ensure_accounts_status/1
function is only executed when the bank's status is updated to "inactive." This ensures that account status changes occur only when necessary, maintaining efficiency. Additionally, it's important to note that the changeset provided by prepare_changes/2
includes access to the repository (repo
), allowing the developer to perform CRUD operations as part of the data consistency logic. This makes it possible to handle complex business rules directly within the changeset, without needing to manually invoke operations outside of it.
The use of prepare_changes/1
has the following pros and cons
Pros:
Encapsulating the data consistency logic down to the changeset function means that any updates to banks that use the bank’s changeset function will be guaranteed to produce a consistent result with other data which in this case accounts.
Less chance of duplicating the same data consistency logic across different domain functions as the application evolves.
Using the changeset function to perform a data validation will not make any database call (at least yet until it is passed to the
Repo
module) therefore keeping the changeset function pure.
Cons:
Performing the data consistency logic inside the changeset could increase the sense of “magic” when performing CRUD operations to the bank. It can become slightly unobvious of the source of the operation that makes accounts inactive. However, this could justified as long as it simplifies your application development in a way that reduces bugs or errors.
Logic duplication could still happen if the
Bank
module has multiple changeset functions and each function must perform the same data consistency operation. This however can be slightly simplified by abstracting the data consistency logic inside a private function and then re-using it across different changeset functions.
Conclusion
As effective as the prepare_changes/2
function can be for addressing data consistency problems, it’s important to know that there should be limits to its use. For example, it’s not advisable to overload prepare_changes/2
with complex logic or side effects, such as sending notifications, creating new records of bank, or other operations that could obscure the flow of the application. Keeping the function focused on its core responsibility—ensuring data consistency—will help maintain clarity and make your application’s behavior more predictable to other developers.
Now that you have an alternative approach for tackling data consistency issues with Ecto, you can try it out in your project and assess whether it helps reduce the likelihood of data inconsistency in your application. By keeping your code clean and leveraging Ecto's powerful tools like prepare_changes/2
, you’ll be able to build more reliable and maintainable applications. Happy coding!
Subscribe to my newsletter
Read articles from Nyoman Abiwinanda directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Nyoman Abiwinanda
Nyoman Abiwinanda
Passionate Elixir developer with a multidisciplinary background in backend development and software infrastructure. I enjoy sharing programming insights online to empower developers in building maintainable and scalable software.