Optimizing Automation: Overcoming Key Challenges

Introduction:

Test Automation is critical to modern software development, driving efficiency and faster feedback for API and UI testing. For our team, it started as a well-planned effort, but over time, our automation suite grew out of control—especially the UI tests built on Selenium WebDriver. With increasing delivery pressures, missing governance, and process gaps, we found ourselves dealing with an unmanageable regression suite, plagued by long execution times and flakiness.

This blog will share the lessons we learned, the challenges we faced, and the comprehensive steps we started taking to fix our automation framework. While we are still in the middle of this transformation, we have already seen improvements in the overall status of our regression suite.

The Problem: Growing Automation Pains

In the early stages, we practised in-sprint automation, and the approach worked seamlessly for some time. However, as our application and feature set expanded, so did our automation suite. We started encountering several significant issues:

Test Suite Bloat:
- As the number of tests increased, so did the complexity and size of our suite, leading to longer execution times.
- The reliance on Selenium-based UI tests made the suite especially prone to flakiness.
Flaky UI Tests:
- Timing issues, dynamic elements, and inconsistent environments resulted in frequent false positives, eroding trust in automation.
Lack of Governance:
- Without a clear test review process, redundant and low-value tests were added without fully understanding their impact on the suite.
Maintenance Overload:
- The growing complexity meant we were spending more time on test maintenance and troubleshooting, often at the cost of writing new tests.
Environmental Challenges:
- Ensuring a stable, consistent environment for test execution became difficult, leading to sporadic failures.

Recognizing that our test suite had become more of a burden than a help, we initiated a major overhaul to address these issues.

The Turning Point: Addressing Root Causes

We took a step back to analyze the situation and identify the root causes of our problems. Here’s what we found:

Over-reliance on UI Tests: Many validations were happening at the UI level when they could have been handled more effectively at the API or unit test level.
Redundant Tests Across Layers: Several tests duplicated functionality already covered by API or unit tests, making them unnecessary.
Inconsistent Test Design: The lack of modularity in test design meant that even minor changes to the application resulted in large-scale test failures, increasing maintenance costs.
Flaky Tests with No Immediate Fixes: Flaky tests were often ignored or rerun without addressing the underlying causes, which only compounded the issue over time.

The Fix: Transforming Our Automation Strategy

We knew that to regain control, we had to make some fundamental changes. While we are still in the process of implementing these improvements, here are the key steps we have undertaken so far:

1. Migrating UI Tests from Selenium to Selenide

One of our first major initiatives was moving our UI tests from Selenium WebDriver to Selenide. This shift provided several immediate benefits:

Increased Stability: Selenide’s built-in synchronization and smart waiting mechanisms drastically reduced flakiness, making tests more reliable.
Cleaner and Concise Code: The Selenide framework allowed us to write more concise and readable tests, cutting down on boilerplate code.
Additional Features: Selenide offers features like automatic screenshots on failure, simpler handling of dynamic elements, and improved support for file downloads and uploads, which enhanced the overall robustness of our test scripts.

Although we’re still migrating tests, we have already seen improvements in the stability and reliability of the suite.

Selenide vs Selenium

2. Creating a Modular Page Object Library with Maven

To address the issue of maintainability, we created a separate Maven module dedicated to our Page Object Model (POM). This allowed us to:

Reuse Across Environments: By creating a modular design, we can use the same Page Object library for both our stage and production environments, ensuring consistency.
Simplified Updates: Any changes to the UI can be updated in the Page Object module without needing to touch the main test suite, making maintenance easier.

3. Test Data Service Using Spring Boot

One of the challenges we faced was the complexity of managing test data. To simplify this, we built a Test Data Service using Spring Boot, which generates the necessary test data through an abstracted business layer:

Simplified API Interaction: Instead of making multiple backend API calls directly from the test scripts, our test data service acts as a central hub, orchestrating API calls and returning the required data in a clean, standardized format.
Consistency Across Environments: This service helps ensure that test data remains consistent across environments, reducing flakiness due to data discrepancies.

Although we’re still fine-tuning this service, the improvement in test data consistency has been noticeable.

4. Test Execution Using Selenoid & Ggr

To improve our execution pipeline further, we adopted Selenoid and Ggr from Aerokube. These tools allowed us to run tests in browser Docker containers:

Parallel Execution: By running tests in parallel across multiple containers, we’ve already begun reducing our total execution time.
Efficient Resource Utilization: Selenoid provides lightweight, on-demand browser instances, allowing us to run tests on different browsers without maintaining a complex infrastructure.
Scalability with Ggr: Ggr enables us to scale our test execution across multiple machines, improving overall throughput and test execution efficiency.

5. Optimizing the Test Pyramid

We realigned our test strategy to follow a test pyramid approach:

API and Unit Tests as Priority: We moved as many tests as possible from the UI layer to the API and unit test levels. API tests are faster and more reliable, providing quicker feedback.
Reducing UI Tests: UI tests were reserved only for critical end-to-end workflows, minimizing their impact on test execution time and stability.

This shift is ongoing, but the reduced emphasis on UI testing has already helped reduce execution time.

6. Integrating with ReportPortal for Smarter Failure Analysis

To streamline our test result analysis, we integrated ReportPortal:

Enhanced Reporting and Triaging: ReportPortal’s comprehensive reporting allows us to track test results over time, making it easier to identify and prioritize issues.
AI-Driven Failure Analysis: With ReportPortal’s AI-powered analysis, we can automatically detect patterns in test failures and categorize them for easier triaging. This significantly speeds up our debugging process and improves the quality of our automation.

Key Lessons Learned

Modularization Is Essential: Separating concerns into distinct modules, like the Page Object library and test data service, makes the test framework more maintainable and easier to extend.
Don’t Overload the UI Layer: Focusing too much on UI tests leads to longer execution times and increased flakiness. A balanced test pyramid with more API and unit tests improves both speed and stability.
Tools Matter: Migrating to Selenide and leveraging tools like Selenoid and Ggr improved the reliability and efficiency of our test suite by providing better browser management and parallel execution capabilities.
Data Management Is Critical: The Test Data Service helped streamline data management, making tests less dependent on direct API interactions and ensuring data consistency.
Smarter Failure Analysis: ReportPortal’s AI-powered failure analysis has started saving us significant time in debugging and improving our ability to fix flaky tests and recurring issues.

Conclusion:

Our automation journey is still ongoing, and while we haven’t yet completed all the changes, the improvements we’ve seen so far have been encouraging. Moving to Selenide, creating a Test Data Service, and using Selenoid for browser execution have already contributed to more stable and faster test runs. The integration with ReportPortal is helping us catch and fix issues more efficiently, even as we continue to optimize our test suite.

The key takeaway is that while the road to improving automation may be challenging, incremental improvements can lead to significant results over time. We are excited to continue on this journey and look forward to achieving a fully optimized and maintainable automation framework soon!

Happy Testing!

Transforming Our Automation Suite: From Challenges to Solutions

Table of contents