Concurrency, isolation and fault-tolerance


Before getting into Elixir/BEAM, I did a lot of JavaScript, Java, and Python in my day job. They all have similar patterns for dealing with errors, typically using try/catch
.
There are some nuances across different platforms and languages, but in essence, it's about writing code that can fail inside a try
block and handling it in a catch
block.
However, Elixir provides a very different way of dealing with errors, which I want to explore in this blog post.
Terminology
Before we dive in, here are a few notes on terminology:
BEAM / Elixir – Interchangeable for the context of this blog post.
Errors – Issues we know can go wrong, e.g., "404 Not Found."
Faults – Issues we don't know can go wrong. All errors are faults, but not all faults are errors.
So, let's get into it.
A New Way of Thinking
I mentioned this in my introductory post on The Beauty of Elixir. Elixir is built on the concept of low-memory footprint, and isolated processes. It's so efficient that it's common to have a million processes running simultaneously.
This is similar to how humans function. We live our lives with some memory, and when we need something from others, we communicate, continue our business, and await a response.
This is how BEAM works. If you need something from another process, you can only interact with it by sending a message.
Since processes are ubiquitous in Elixir, we do this a lot.
What Does This Have to Do with Fault Tolerance?
Well, if all processes are isolated, so are the faults!
This was a big aha! moment for me.
I always thought concurrency was just a means to improve performance, but with the right constraints—like those in BEAM—it also increases availability.
Error Handling in Traditional Languages
In most other languages, if something fails unexpectedly, it throws an exception. If it isn’t handled properly, it propagates until memory is exhausted, eventually crashing the server.
It’s common practice to wrap code in an overarching try/catch
block, allowing exceptions to bubble up to a higher level where they can be handled.
But handling errors is not always enough. An error might leave the system in a bad state or disrupt an ongoing operation. One error can cascade into multiple errors, leading to a system-wide failure.
We've all been there—everything suddenly goes boom.
Depending on the platform, this can become catastrophic. In JavaScript, for example, exceptions can pile up, blocking the event loop and causing massive service degradation or an outright outage.
Typically, a lot of infrastructure-level instrumentation is required to recover from or even observe these kinds of problems.
Elixir’s Approach
Elixir does have try/catch
, but it is rarely used. Instead, it provides a much better way of handling faults—particularly those that are unpredictable.
Concurrency and Isolation
In the previous section, I mentioned how exceptions can put the system in a bad state and trigger a chain reaction that leads to a full-scale outage.
This is where process isolation in BEAM helps us—if a process fails, it doesn’t impact other processes. The bad state is confined to that one small process.
This is how BEAM improves system availability.
However, this requires a shift in our mental model for designing systems.
Example: A User’s Favorite Pokémon List
Let’s consider an application where each user maintains a list of their favorite Pokémon.
In the BEAM ecosystem, we can create a separate process for each user to manage their list. Users can add, delete, or update their lists, and all these actions remain isolated to their respective processes.
If an error occurs while updating a user’s list, only that user is affected—everyone else remains unaffected. Since each user has an isolated process, their experience is independent of others.
You might wonder if this approach is feasible. This is where BEAM’s power comes into play.
We can scale up to a million processes effortlessly. Additionally, when a user leaves, their assigned process ends, and when they return, a new process starts for them.
This lightweight, process-based architecture helps you scale naturally.
Enter Supervisors
Supervision is a unique concept native to BEAM.
We can have dedicated processes whose sole purpose is to monitor other processes. These are called supervisors.
Processes in BEAM are so lightweight that they can be started in microseconds, making it practical to simply restart a failing process!
This is just one way of handling faults. However, since processes are lightweight and easily monitored, BEAM provides many ways to deal with errors effectively.
For example, consider a pool of database connections. If a connection breaks, instead of panicking, we can just restart the process and try connecting again.
All of this is built into the language and platform as first-class features!
Subscribe to my newsletter
Read articles from Varenya Thyagaraj directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
