When developing an application, it's common for the state of one piece of data to depend on the state of another. This interdependence introduces complexity, as changes in one dataset must remain consistent with changes in others to avoid conflicts or inaccuracies. This challenge is often known as the data consistency problem.

Intro to Data Consistency Problem

Consider the following simple example app where you have a bank and an account that belongs to a bank.

defmodule ElixirApp.Bank do
   use Ecto.Schema

   schema "banks" do
      field(:name, :string)
      field(:is_active, :boolean)

      timestamps()
  end
end

defmodule ElixirApp.Account do
   use Ecto.Schema

   schema "accounts" do
      field(:name, :string)
      field(:currency, :string)
      field(:is_active, :boolean)

      belongs_to(:bank, Bank)

      timestamps()
   end
end

In this app, a bank has an "active" status, and if the bank becomes inactive, all accounts associated with it must also be set to inactive. This data consistency rule can be implemented within the bank's domain or context module. For instance, you might implement an update_bank/2 function as shown below:

defmodule ElixirApp.Banks do
   alias ElixirApp.{Account, Bank}
   alias ElixirApp.Repo

   import Ecto.Query, warn: false

   def update_bank(%Bank{} = bank, attrs) do
      with {:ok, updated_bank} <- do_update_bank(bank, attrs),
           {:ok, _num_of_deactivated_accounts} <- maybe_deactivate_bank_accounts(updated_bank) do
         {:ok, updated_bank}
      end
   end

   defp do_update_bank(%Bank{} = bank, attrs) do
      bank
      |> Bank.changeset(attrs)
      |> Repo.update()
   end

   defp maybe_deactivate_bank_accounts(%Bank{active: true}) do
      {:ok, 0}
   end

   defp maybe_deactivate_bank_accounts(%Bank{id: bank_id, active: false}) do
      {updated_accounts_count, _} =
         from(a in Account,
            where: a.bank_id == ^bank_id,
            update: [set: [is_active: ^false]]
         )
         |> Repo.update_all([])

      {:ok, updated_accounts_count}
   end 
end

Within the update_bank/2 function, each time a bank's status is updated, it’s passed to the maybe_deactivate_bank_accounts/1 function. This function ensures that if the bank is now inactive, all associated accounts are also deactivated, maintaining consistency across related data.

The previous implementation works but it has one major drawback that needs to be handled by developers consistently.

All functions that perform an update to a bank must be ensured to implement the same data consistency logic.

This means that:

The data consistency logic code can potentially be duplicated across different modules as the application evolves. I.e. the update operation to a bank takes place outside the Banks context module itself.
The data consistency logic becomes “unknown unknown” in the long run. This manual step introduces a risk of inconsistency if it’s overlooked, potentially leading to data integrity issues.

The key takeaway is that this implementation could lead to an increase in "unknown unknowns" over time. This risk arises because developers may unintentionally overlook the need for consistent data handling, leading to hidden bugs and data inconsistencies. To mitigate this, it’s essential to encapsulate or abstract this data consistency logic in a more centralized, reliable place. By doing so, we reduce code duplication and ensure that new developers don't need to be aware of this requirement upfront when adding features. This is where Ecto's prepare_changes/2 function comes in, offering an ideal solution for enforcing consistency at the database layer automatically.

How to Use Ecto `prepare_changes/2`

As explained in the official documentation, the prepare_changes/2 function allows you to specify a function that the repository will execute upon insert, update, or delete operations. This means that any function passed to prepare_changes/2 will only run when the changeset is submitted to the Repo module. This approach is especially beneficial for abstracting or encapsulating data consistency logic directly within the schema's changeset function, allowing us to enforce consistency while keeping the changeset function itself pure and focused on validation and data transformation.

Let’s use the example app before to see how this function can be used inside the Bank changeset function to maintain consistency between banks and accounts.

defmodule ElixirApp.Bank do
   use Ecto.Schema
   alias Ecto.Changeset
   import Ecto.Changeset
   import Ecto.Query, warn: false

   schema "banks" do
      field(:name, :string)
      field(:is_active, :boolean)

      timestamps()
   end

   @doc false
   def changeset(bank, attrs) do
     bank
     |> cast(attrs, [:name, :is_active])
     |> validate_required([:name, :is_active])
     |> prepare_changes(&ensure_accounts_status/1)
   end

   defp ensure_accounts_status(%Changeset{action: :update} = changeset) do
      bank_id = get_field(changeset, :id)
      bank_active_status = get_field(changeset, :is_active)

      if bank_active_status == false do
         from(a in "accounts",
            where: a.bank_id == ^bank_id,
            update: [set: [is_active: ^false]]
         )
         |> changeset.repo.update_all([])
      end

      changeset
   end

   defp ensure_accounts_status(%Changeset{action: _} = changeset) do
      changeset
   end
end

In the implementation above, the code inside the ensure_accounts_status/1 function is only executed when the bank's status is updated to "inactive." This ensures that account status changes occur only when necessary, maintaining efficiency. Additionally, it's important to note that the changeset provided by prepare_changes/2 includes access to the repository (repo), allowing the developer to perform CRUD operations as part of the data consistency logic. This makes it possible to handle complex business rules directly within the changeset, without needing to manually invoke operations outside of it.

The use of prepare_changes/1 has the following pros and cons

Pros:

Encapsulating the data consistency logic down to the changeset function means that any updates to banks that use the bank’s changeset function will be guaranteed to produce a consistent result with other data which in this case accounts.
Less chance of duplicating the same data consistency logic across different domain functions as the application evolves.
Using the changeset function to perform a data validation will not make any database call (at least yet until it is passed to the Repo module) therefore keeping the changeset function pure.

Cons:

Performing the data consistency logic inside the changeset could increase the sense of “magic” when performing CRUD operations to the bank. It can become slightly unobvious of the source of the operation that makes accounts inactive. However, this could justified as long as it simplifies your application development in a way that reduces bugs or errors.
Logic duplication could still happen if the Bank module has multiple changeset functions and each function must perform the same data consistency operation. This however can be slightly simplified by abstracting the data consistency logic inside a private function and then re-using it across different changeset functions.

Conclusion

As effective as the prepare_changes/2 function can be for addressing data consistency problems, it’s important to know that there should be limits to its use. For example, it’s not advisable to overload prepare_changes/2 with complex logic or side effects, such as sending notifications, creating new records of bank, or other operations that could obscure the flow of the application. Keeping the function focused on its core responsibility—ensuring data consistency—will help maintain clarity and make your application’s behavior more predictable to other developers.

Now that you have an alternative approach for tackling data consistency issues with Ecto, you can try it out in your project and assess whether it helps reduce the likelihood of data inconsistency in your application. By keeping your code clean and leveraging Ecto's powerful tools like prepare_changes/2, you’ll be able to build more reliable and maintainable applications. Happy coding!

Solving Data Consistency in Elixir with Ecto's prepare_changes

Table of contents

Intro to Data Consistency Problem

How to Use Ecto `prepare_changes/2`

Conclusion

Subscribe to my newsletter

Nyoman Abiwinanda

Nyoman Abiwinanda

Solving Data Consistency in Elixir with Ecto's prepare_changes

Table of contents

Intro to Data Consistency Problem

How to Use Ecto prepare_changes/2

Conclusion

Subscribe to my newsletter

Nyoman Abiwinanda

Nyoman Abiwinanda

How to Use Ecto `prepare_changes/2`