Circuit Breaker Pattern: Building Resilient Microservices


Imagine you are working in a distributed system with multiple microservices — for our example we will talk about two services A & B, where A is making an API call to B for some data.
What all could go wrong in a system like this?
There can be multiple things that could go wrong with the connection between A & B.
The service might take longer to respond for a while, but once the network gets better, things start working normally again without needing anyone to fix it.
If too many users hit the service at once, it may slow down or drop requests, but as the traffic goes down or auto-scaling kicks in, it recovers by itself.
Sometimes the service stores too much temporary data, which makes it sluggish, but it clears out old data on its own and starts running faster again.
Things like these usually resolve on their own in sometime and the services get healthy again — These are called transient faults they usually recover themselves very quickly.
To deal with transient faults you could use The Retry Pattern — any request which got an error will continue to try again and again for a certain number of tries with a buffer time between each try and then it is rejected.
But there can be cases where the service B is not able to recover on it’s own or might take a lot of time to get healthy again, this could happen when.
The service tries to fetch or save data, but it’s broken or unreadable, and it won’t fix itself until someone steps in to repair it.
For example, if someone accidentally sets the wrong database password or disables a needed feature, the service will keep failing until a human corrects the mistake.
If a service relies on another service that is completely offline (e.g. due to server crash or removal), it won’t magically start working again without fixing or replacing that dependency.
In cases like these requests from Service A should not continuously try to reach service B. Here we will use the Circuit Breaker Pattern — But Why? because if we decide to use the Retry Pattern it can block concurrent requests to the same operation until the time-out period expires.
These blocked requests might hold critical system resources, such as memory, threads, and database connections. This problem can exhaust resources, which might fail other unrelated parts of the system that need to use the same resources.
So when service B goes down and not able to recover automatically or quickly this should be identified by service A and it should completely stop sending requests to a dead service.
Retry Pattern | Circuit Breaker Pattern |
Retries failed operations automatically | Prevents requests when failure is likely |
Good for transient faults | Good for long-lasting or persistent failures |
May block threads/resources if overused | Frees up resources quickly |
What is The Circuit Breaker Pattern?
The Circuit Breaker pattern serves a different purpose than the Retry pattern. The Retry pattern enables an application to retry an operation with the expectation that it eventually succeeds. The Circuit Breaker pattern prevents an application from performing an operation that's likely to fail — you can imagine the breaker as a proxy that sits between the two services and controls the amount of requests which A can make to B.
Circuit Breaker as a Finite State Machine
You can implement the Circuit Breaker with the following states -
Closed: This is the default state, here all requests from service A will be allowed to reach service B. The Circuit Breaker will keep count of two things ( 1 ) The count of failures ( 2 ) Duration in which the failures occured — if the count of ( 1 ) crosses the manual threshold of ( 2 ), then the machine will move to the second state.
Opened: Just like an MCB switch (electrical circuit breaker) trips off and stops the current flow when the load increases to save other appliances and break the connection — just like this each microservice can cut themselves off to save other microservice from wasting their resouces and potentially failing - this type of failure is also known as Cascading Failure where one service failing results in other services failing and breaking down till the entire system is down
The connection between A & B will be terminated and all requests from A will not be waiting for a response from B. The machine stays in this state for a specifed amount of time to allow service B to recover until it moves to the third state. While in the Open state, instead of just failing requests, Service A can return a default response, cached data, or error message to the client. This improves user experience and maintains system stability.
Half-Opened: After a certain duration has passed in the open state, the machine will slowly allow few of the requests to pass and others will still be blocked, now this half opened state will keep a count of ( 1 ) The number of succesfull request ( 2 ) Duration where no errors takes place. If ( 2 ) is below the threshold then machine will again go to state 2 or otherwise it will confirm that the service B is up running again and machine will move to the default state again and everything goes back to normal.
The Circuit Breaker pattern provides stability while the system recovers from a failure and minimizes the impact on performance. It can help maintain the response time of the system. This pattern quickly rejects a request for an operation that's likely to fail, rather than waiting for the operation to time out or never return. If the circuit breaker raises an event each time it changes state, this information can help monitor the health of the protected system component or alert an administrator when a circuit breaker switches to the Open state.
How can we implement this pattern?
hystrix was one of the most famous library built by Netflix to provide this Circuit breaker pattern, but it is not maintained anymore — resilience4j is a new library which acts as a successor of hystrix and is being maintained actively.
The example below is from opossum ( javascript ) -
const fetch = require('node-fetch');
const CircuitBreaker = require('opossum');
// Risky call: API call to service B
async function riskyCall() {
const response = await fetch('https://service-b/api/data');
if (!response.ok) {
throw new Error(`Service B failed with status ${response.status}`);
}
return response.json();
}
// Circuit breaker options
const options = {
timeout: 3000, // If call takes > 3s, it fails
errorThresholdPercentage: 50, // If 50% of requests fail, breaker opens
resetTimeout: 5000 // After 5s, try a request again
};
// Create circuit breaker
const breaker = new CircuitBreaker(riskyCall, options);
// Fallback when service B is unhealthy
breaker.fallback(() => {
console.warn('Falling back as Service B is unavailable.');
return { data: 'fallback response' };
});
// Log breaker state changes
breaker.on('open', () => console.log('Circuit is OPEN: calls blocked.'));
breaker.on('halfOpen', () => console.log('Circuit is HALF-OPEN: testing...'));
breaker.on('close', () => console.log('Circuit is CLOSED: calls allowed.'));
breaker.on('fallback', () => console.log('Fallback executed'));
breaker.on('reject', () => console.log('Call rejected: circuit is OPEN'));
breaker.on('timeout', () => console.log('Timeout: call took too long'));
breaker.on('success', () => console.log('Call succeeded'));
breaker.on('failure', () => console.log('Call failed'));
// Fire the breaker periodically (for example, every 2 seconds)
setInterval(() => {
breaker.fire()
.then(response => console.log('📦 Response:', response))
.catch(error => console.error('🔥 Error:', error.message));
}, 2000);
Subscribe to my newsletter
Read articles from Ujjwal Pathak directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
