CircuitBreaker in AEM with Resilience4j Guide

Introduction

Recently, we experienced an issue where the AEM servers became unresponsive due to all threads went into the hung state. This situation arose when the backend servers stopped responding. This necessitated the implementation of a Circuit Breaker in AEM. This measure ensures that the website remains operational regardless of backend system failures.

We needed to implement a solution to minimize the impact on users. We were required to consider the following aspects:

Stop or pause only the specific APIs that are unresponsive. This prevents disruptions and ensures other website functions continue to operate smoothly.
Once the API is stopped, the solution should keep checking the B/E systems periodically, and the services should resume if the system is responding.
The system should retry/re-attempt to make the API call if the response is not received within a specific time.
The solution should ensure the threads are freed in case of no response to avoid thread exhaustion.

While researching for the solution or rather alternatives to implement Circuit Breaker in AEM, multiple solutions were considered and eventually it was agreed to use Resilience4j.

Why Resilience4j

Active Maintenance: Resilience4j is being actively developed and updated.
Lightweight: Resilience4j only uses Vavr and no other external library dependencies.
Feature Rich: Resilience4j supports Circuit Breaking, rate limiting, retry, bulkhead, TimeLimiter and Caching.
Flexibility: You can choose only the modules you need.

Circuit Breaker in Resilience4j

Circuit Breaker in Resilience4j supports three states: OPEN, CLOSED and HALF_OPEN

Initially the Circuit Breaker will be in CLOSED state i.e. all calls are allowed to pass.
The State of the Circuit Breaker changes from CLOSED to OPEN when the failure rate or the percentage of slow calls exceeds the configured threshold
The Circuit Breaker rejects the calls when it is in OPEN state
After the specified wait time, the Circuit Breaker state changes from OPEN to HALF_OPEN.
In the HALF_OPEN state a configured number of calls are permitted to pass through.
Additional calls will be rejected in the HALF_OPEN state until the permitted calls are completed
The state changes from HALF_OPEN to CLOSED if the failure rate and slow call rate falls below the threshold
The state changes back from HALF_OPEN to OPEN if the failure rate or slow call rate still exceeds the threshold

Solution Overview

We used Resilience4j to wrap external API calls in AEM with:

CircuitBreaker: Breaks calls after failures.
Retry: Retries failed API calls automatically.
TimeLimiter: Cancels the API calls safely.

We’ll modularize everything as AEM OSGi services for clean usage across Sling Models and Servlets.

NOTE: Resilience4j offers additional features such as caching; however, we opted for the features that were required for our requirements.

Architecture Diagram

Architecture Diagram to show Control Flow

Steps:

Maven Dependencies

One of the benefit of Resilience4j is that you can choose what you want to use and don’t need to include all modules.

<dependency>
    <groupId>io.github.resilience4j</groupId>
    <artifactId>resilience4j-core</artifactId>
    <version>${resilience4jVersion}</version>
</dependency>
<dependency>
  <groupId>io.github.resilience4j</groupId>
  <artifactId>resilience4j-circuitbreaker</artifactId>
  <version>${resilience4jVersion}</version>
</dependency>
<dependency>
  <groupId>io.github.resilience4j</groupId>
  <artifactId>resilience4j-retry</artifactId>
  <version>${resilience4jVersion}</version>
</dependency>
<dependency>
  <groupId>io.github.resilience4j</groupId>
  <artifactId>resilience4j-timelimiter</artifactId>
  <version>${resilience4jVersion}</version>
</dependency>

NOTE: Resilience4j 2 requires Java 17. Use Resilience4j version 1.x if your application is on Java version < 17.

Create Resilience4j Configuration

Set the CircuitBreaker, Retry and timeLimiter configurations by fetching the values from OSGI config.

@Component(service = MyResilience4jConfiguration.class, immediate = true)
public class MyResilience4jConfiguration {
    private final ConcurrentHashMap < String, CircuitBreaker > circuitBreakerMap = new ConcurrentHashMap < > ();
    private final ConcurrentHashMap < String, Retry > retryMap = new ConcurrentHashMap();
    private final ConcurrentHashMap < String, TimeLimiter > timeLimiterMap = new ConcurrentHashMap();

    private final ExecutorService executorService = Executors.newCachedThreadPool();

    @Activate
    protected void activate(MyResilience4jOSGi Config) {
        // Logic to fetch the configuration from OSGi config and set the Resilience4j configuration
    }


    /**
     * Retrieves or creates a Circuit Breaker for a given API.
     */
    public CircuitBreaker getAPICircuitBreaker(String apiName) {
        // Below configuration values should be fetched from OSGi Config
        return circuitBreakerMap.computeIfAbsent(apiName, key - > CircuitBreaker.of(key, CircuitBreakerConfig.custom()
            .failureRateThreshold(50)
            .slidingWindowSize(5)
            .waitDurationInOpenState(Duration.ofSeconds(3))
            .permittedNumberOfCallsInHalfOpenState(2)
            .minimumNumberOfCalls(5)
            .maxWaitDurationInHalfOpenState(Duration.ofMillis(3000))
            .build()));
    }

    /**
     * Retrieves or creates a Retry configuration for a given API.
     */
    public Retry getAPIRetry(String apiName) {
        // Below configuration values should be fetched from OSGi Config
        return retryMap.computeIfAbsent(apiName, key - > Retry.of(key, RetryConfig.custom()
            .maxAttempts(3)
            .waitDuration(Duration.ofMillis(500))
            .retryExceptions(IOException.class)
            .build()));
    }

    /**
     * Retrieves or creates a TimeLimiter for timeout management for a given API.
     */
    public TimeLimiter getAPITimeLimiter(String apiName) {
        // Below configuration values should be fetched from OSGi Config
        return timeLimiterMap.computeIfAbsent(apiName, key - > TimeLimiter.of(TimeLimiterConfig.custom()
            .timeoutDuration(Duration.ofSeconds(5))
            .cancelRunningFuture(true) // to ensure that the cancel is called on Future
            .build()));
    }

    public ExecutorService getExecutorServiceObj() {
        return executorService;
    }

    /** 
     * Ensures the maps are cleared and executorService is closed upon service deactication
     */
    @Deactivate
    protected void deactivate() {
        circuitBreakerMap.clear();
        retryMap.clear();
        timeLimiterMap.clear();
        if (executorService != null && !executorService.isShutDown()) {
            executorService.shutDown();
        }
    }

}

Create Resilience4jServiceExecutor Service

Create a Service class to wrap the Backend API or third party API execution call with Resilience4j

@Component(service = MyResilience4jServiceExecutor.class, immediate = true)
public class MyResilience4jServiceExecutor {

    @Reference
    private MyResilience4jConfiguration myConfig;

    /**
     * Makes an API call using Circuit Breaker, TimeLimiter, and Retry.
     */
    public String execute(String apiName) throws MyException {
        CircuitBreaker circuitBreaker = myConfig.getAPICircuitBreaker(apiName);
        Retry retry = myConfig.getAPIRetry(apiName);
        TimeLimiter timeLimiter = myConfig.getAPITimeLimiter(apiName);

        // Build the Callable for API invocation
        Callable < String > callable = () - > {
            try {
                //invoke API here
                callExternalApi(apiName);
            } catch (Excepion exception) {
                throw new CompletionException(exception)
            }
        };
        // Decorate with Timelimiter and Circuit Breaker
        Callable < String > decoratedAPICall = TimeLimiter.decorateFutureSupplier(timelimiter,
            () - > CompletableFuture.supplyAsync(() - > {
                try {
                    retrun CircuitBreaker.decorateCallable(circuitBreaker, callable).call();
                } catch (Exception exception) {
                    throw new CompletionException(exception)
                }
            }, myConfig.getExecutorServiceObj()));

        // Decorate with Retry
        Callable < String > retryAPICall = Retry.decorateCallable(retry, decoratedAPICall);

        // Excute the API service
        try {
            retrun retryAPICall.call();
        } catch (Exception exception) {
            logCircuitBreakerState(apiName, circuitBreaker);
            throw new MyException(exception.getMesssage());
        }
    }

    /**
     * Checks and Logs the Circuit Breaker state transitions.
     * This is used to check the state of the API when calls are not sent to backend systems
     */
    private void logCircuitBreakerState(String apiName, CircuitBreaker circuitBreaker) {
        CircuitBreaker.State state = circuitBreaker.getState();
        if (state == CircuitBreaker.State.HALF_OPEN) {
            log.warn("[{}] Circuit is HALF-OPEN. Limited retries allowed.", apiName);
        } else if (state == CircuitBreaker.State.OPEN) {
            log.warn("[{}] Circuit is OPEN. Waiting before retrying.", apiName);
        }
    }
}

Explanation:

Each API gets its own CircuitBreaker, Retry and TimeLimiter.
TimeLimiter is applied first → then Circuit Breaker → then Retry.
This is to ensure for every API timeout, a count is added for Circuit Breaker configuration and then a Retry happens
Above, sample configurations are used for demonstration purposes. The configurations for CircuitBreaker, Retry, and TimeLimiter can be further customized based on requirements.

Invocation of Resilience4jServiceExecutor in Servlet/Model class

Inject and call Resilience4jServiceExecutor from any Servlet/Model.

@Component(service = Servlet.class,
    property= {
        "sling.servlet.resourceTypes=" + "cq/Page",
        "sling.servlet.selectors=" + "myServlet",
        "sling.servlet.extensions=" + "json",
        "sling.servlet.methods=" + "GET"
})
public class MyServlet extens SlingSafeMethodServlet {    

    @Reference
    private MyResilience4jServiceExecutor resilience4jServiceExecutor;

    private final String API_URL = "https://api.myservice.com/my/api";

    protected void goGet(SlingHttpServletRequst request, SlingHttpServletResponse reponse) {
        try{
        response.getWriter().write(resilience4jServiceExecutor.execute(API_URL);
        } catch (MyException exception) {
        reposne.getWriter.write(excepion.getMessage());
        }
    }
}

Common Pitfalls:

Blocking main threads: Always use CompletableFuture properly to avoid blocking Sling threads.
Wrong circuit breaker sharing: Use separate CircuitBreaker per API — not one for all.
Timeouts too low: Configure sensible timeouts (TimeLimiter), not too aggressive.

Best Practices:

Write JUnit tests: to verify the functionality.
Use API-specific CircuitBreakers: to avoid cascading failures.
Implement proper logging: to facilitate debugging and tracking in the event of any issues.
Handle fallback responses gracefully: don't just throw generic errors.
Keep Resilience4j version compatible with AEM libraries to avoid OSGi dependency issues.
Keep the environment specific configurations rather than hard coded values.

Conclusion:

By integrating Resilience4j into AEM in a modular way, the project can be safeguarded from unstable or slow external APIs. Using CircuitBreaker, Retry, and TimeLimiter together ensures both reliability and a smooth user experience.

Step-by-Step Guide to Implement CircuitBreaker in AEM using Resilience4j

Table of contents

Introduction

Why Resilience4j

Circuit Breaker in Resilience4j

Solution Overview

Architecture Diagram

Steps:

Maven Dependencies

Create Resilience4j Configuration

Create Resilience4jServiceExecutor Service

Invocation of Resilience4jServiceExecutor in Servlet/Model class

Common Pitfalls:

Best Practices:

Conclusion:

Subscribe to my newsletter

Nitish Jain

Nitish Jain