When we are creating high-quality applications that users and developers can rely on, we, as developers must control the quality thoroughly by checking the program operability in various ways. To ensure this developers use different techniques, and benchmarking is one of them. Here we are going to understand what metrics are used in benchmarking and also get familiar with micro and macro benchmarking and what difficulties may arise when using this kind of process.

What is benchmarking?

Performance evaluation of software applications can be approached through two distinct methods: benchmarking and profiling. Benchmarks are tools designed to simulate various load scenarios, including concurrent usage, to assess an application's performance. They deliberately create diverse conditions to observe how the application behaves under different circumstances. To ensure reliable data, benchmarks typically run multiple iterations, allowing for the collection of meaningful statistics and reproducible results.

Profiling, while serving a similar ultimate goal of performance assessment, operates differently. Profilers take a more passive role, observing the application's execution without actively interfering. They gather statistics about the application's behavior during normal operation, rather than creating artificial load scenarios. While both techniques aim to improve application performance, they differ significantly in their approach. Benchmarks actively test the application under various conditions, while profilers quietly monitor its regular operation.

Benchmarking types

Benchmarking can be categorized into two main types, each serving different purposes in performance evaluation:

Macro benchmarks: These focus on testing the entire application under realistic usage scenarios. They simulate actions a typical user would perform, providing a holistic view of the application's performance. While effective for overall assessment, macro benchmarks may not pinpoint minor performance issues due to their broad scope.
Micro benchmarks: This approach targets specific, small portions of code. It allows for a more granular examination of the application, helping identify subtle performance bottlenecks. For example, micro benchmarks can compare different implementations of algorithms (like recursive vs. iterative Fibonacci calculations), assess various sorting methods, or evaluate the efficiency of different data structures for specific operations.

Both macro and micro benchmarking techniques offer valuable but distinct insights into application performance. They answer different questions about the software's behavior and efficiency. For comprehensive performance evaluation, it's often beneficial to employ both approaches, as they complement each other and provide a more complete picture of the application's performance characteristics.

Macro Benchmarking example

import org.openjdk.jmh.annotations.*; 
import org.openjdk.jmh.runner.Runner; 
import org.openjdk.jmh.runner.RunnerException; 
import org.openjdk.jmh.runner.options.Options; 
import org.openjdk.jmh.runner.options.OptionsBuilder; 

import java.util.ArrayList; 
import java.util.List; 
import java.util.Random; 
import java.util.concurrent.TimeUnit; 

@State(Scope.Benchmark) 
@BenchmarkMode(Mode.AverageTime) 
@OutputTimeUnit(TimeUnit.MILLISECONDS) 
@Warmup(iterations = 2, time = 1) 
@Measurement(iterations = 5, time = 1) 
@Fork(1) 
public class EcommerceBenchmark { 

    private List<Product> inventory; 
    private ShoppingCart cart; 
    private Random random; 

    @Setup 
    public void setup() { 
        inventory = new ArrayList<>(); 
        for (int i = 0; i < 1000; i++) { 
            inventory.add(new Product("Product " + i, i + 0.99)); 
        } 
        cart = new ShoppingCart(); 
        random = new Random(); 
    } 

    @Benchmark 
    public void simulateShoppingSession() { 
        // Add 5-10 random items to cart 
        int itemsToAdd = 5 + random.nextInt(6); 
        for (int i = 0; i < itemsToAdd; i++) { 
            Product randomProduct = inventory.get(random.nextInt(inventory.size())); 
            cart.addItem(randomProduct); 
        } 

        // Remove 0-2 random items 
        int itemsToRemove = random.nextInt(3); 
        for (int i = 0; i < itemsToRemove && !cart.getItems().isEmpty(); i++) { 
            int indexToRemove = random.nextInt(cart.getItems().size()); 
            cart.removeItem(indexToRemove); 
        } 

        // Checkout 
        double total = cart.checkout(); 

        // Clear cart for next iteration 
        cart.clear(); 
    } 

    public static void main(String[] args) throws RunnerException { 
        Options opt = new OptionsBuilder() 
                .include(EcommerceBenchmark.class.getSimpleName()) 
                .build(); 
        new Runner(opt).run(); 
    } 
} 

class Product { 
    private String name; 
    private double price; 

    public Product(String name, double price) { 
        this.name = name; 
        this.price = price; 
    } 

    public double getPrice() { 
        return price; 
    } 
} 

class ShoppingCart { 
    private List<Product> items = new ArrayList<>(); 

    public void addItem(Product product) { 
        items.add(product); 
    } 

    public void removeItem(int index) { 
        items.remove(index); 
    } 

    public List<Product> getItems() { 
        return items; 
    } 

    public double checkout() { 
        double total = 0; 
        for (Product item : items) { 
            total += item.getPrice(); 
        } 
        // Simulate some processing time 
        try { 
            Thread.sleep(50);  // 50ms processing time 
        } catch (InterruptedException e) { 
            e.printStackTrace(); 
        } 
        return total; 
    } 

    public void clear() { 
        items.clear(); 
    } 
}

This macro benchmark simulates a typical e-commerce user session. Notice that here we are testing (or benchmarking) the entire worflow of a user session, from browsing to checkout. Remember that in a real-world scenario, you'd likely have more complex logic in your Product and ShoppingCart classes, and you might include database operations or API calls, which would make the benchmark even more representative of actual application performance.

Micro benchmarking example

import org.openjdk.jmh.annotations.*; 
import org.openjdk.jmh.runner.Runner; 
import org.openjdk.jmh.runner.RunnerException; 
import org.openjdk.jmh.runner.options.Options; 
import org.openjdk.jmh.runner.options.OptionsBuilder; 

import java.sql.*; 
import java.util.concurrent.TimeUnit; 

@BenchmarkMode(Mode.AverageTime) 
@OutputTimeUnit(TimeUnit.MILLISECONDS) 
@State(Scope.Benchmark) 
@Fork(value = 2, jvmArgs = {"-Xms2G", "-Xmx2G"}) 
@Warmup(iterations = 3) 
@Measurement(iterations = 5) 
public class DatabaseOperationsBenchmark { 

    @Param({"100", "1000", "10000"}) 
    private int recordCount; 

    private Connection conn; 

    @Setup(Level.Trial) 
    public void setUp() throws SQLException { 
        // Set up H2 in-memory database 
        conn = DriverManager.getConnection("jdbc:h2:mem:test"); 
        try (Statement stmt = conn.createStatement()) { 
            stmt.execute("CREATE TABLE users (id INT AUTO_INCREMENT PRIMARY KEY, name VARCHAR(255))"); 
        } 
    } 

    @TearDown(Level.Trial) 
    public void tearDown() throws SQLException { 
        if (conn != null && !conn.isClosed()) { 
            conn.close(); 
        } 
    } 

    @Benchmark 
    public void individualInserts() throws SQLException { 
        try (Statement stmt = conn.createStatement()) { 
            for (int i = 0; i < recordCount; i++) { 
                stmt.executeUpdate("INSERT INTO users (name) VALUES ('User" + i + "')"); 
            } 
        } 
    } 

    @Benchmark 
    public void batchInserts() throws SQLException { 
        try (Statement stmt = conn.createStatement()) { 
            for (int i = 0; i < recordCount; i++) { 
                stmt.addBatch("INSERT INTO users (name) VALUES ('User" + i + "')"); 
            } 
            stmt.executeBatch(); 
        } 
    } 

    @Benchmark 
    public void preparedStatementInserts() throws SQLException { 
        try (PreparedStatement pstmt = conn.prepareStatement("INSERT INTO users (name) VALUES (?)")) { 
            for (int i = 0; i < recordCount; i++) { 
                pstmt.setString(1, "User" + i); 
                pstmt.executeUpdate(); 
            } 
        } 
    } 

    @Benchmark 
    public void preparedStatementBatchInserts() throws SQLException { 
        try (PreparedStatement pstmt = conn.prepareStatement("INSERT INTO users (name) VALUES (?)")) { 
            for (int i = 0; i < recordCount; i++) { 
                pstmt.setString(1, "User" + i); 
                pstmt.addBatch(); 
            } 
            pstmt.executeBatch(); 
        } 
    } 

    public static void main(String[] args) throws RunnerException { 
        Options opt = new OptionsBuilder() 
                .include(DatabaseOperationsBenchmark.class.getSimpleName()) 
                .build(); 
        new Runner(opt).run(); 
    } 
}

This example is particularly useful because database operations are often a bottleneck in real-world applications, and choosing the right method for bulk inserts can significantly impact performance. Remember that while this benchmark uses an in-memory database for simplicity, in a real-world scenario you'd want to benchmark against your actual production database to get the most accurate results.

How to measure benchmarks

Benchmarking in software development typically involves creating methods to test specific operations within an application. These tests use specialized metrics to quantify performance, often through repeated executions of the same operation. Two key metrics commonly used are:

Throughput: Measures the number of times a method can be executed within a set time frame. It's essentially a measure of the operation's speed and efficiency.
Latency: This metric focuses on the time taken for a single method execution. Latency can be represented in various ways:
- Average time across all benchmark runs
- Minimum and maximum execution times
- Percentiles, which provide a more nuanced view of performance distribution Percentiles are particularly useful as they show the percentage of executions that fall below a certain time threshold. For instance, a 90th percentile latency of 12ms means that 90% of the executions took 12ms or less. This can reveal more about performance characteristics than simple averages or extremes. For example, you might have a maximum latency of 20ms, but a 90th percentile of 12ms. This indicates that while some outlier executions took up to 20ms, the vast majority (90%) completed in 12ms or less. This information helps in understanding both typical performance and potential outliers.

The specific format and detail of benchmark results can vary depending on the benchmarking tool used and how it's configured. Different tools and settings can provide varying levels of granularity and types of information, allowing you to tailor the benchmarking process to your specific needs and requirements.

Problems with benchmarks

Benchmarking in Java can be complicated by the JVM's optimization techniques, which, while generally beneficial, can skew benchmark results. Here we will see some common issues to be aware of:

Constant folding: When operations use hard-coded values, the JVM may pre-compute results rather than executing the operation each time. This optimization can make operations appear faster than they would be with variable inputs.
Dead-code elimination: The JIT compiler may remove code it deems unnecessary within the context of the benchmark, even if that code is used elsewhere in the full application. For example, a method that performs a calculation but doesn't use the result might be entirely removed by the compiler, leading to inaccurate timing results.
JVM warm-up effects: The JVM's performance improves over time as it optimizes frequently executed code. Initial iterations of a benchmark may be slower due to class loading and other startup processes. Proper benchmarking typically involves a warm-up period to allow the JVM to reach a steady state before measuring performance.

Conclusion

Benchmarking is a crucial technique (and often underestimated) for enhancing application performance, and this introduction has provided a foundation for understanding its importance and basic principles. We've explored the primary types of benchmarks, including macro and micro benchmarks, and discussed key metrics used to measure performance. Additionally, we've highlighted some common pitfalls and challenges in benchmarking, such as dealing with JVM optimizations that can skew results.

Remember that benchmarking is not just about measuring speed, but about gaining insights that lead to informed decisions in software design and optimization. As you progress, you'll learn to balance the trade-offs between different performance aspects and how to apply benchmarking results to make meaningful improvements to your applications. Happy development!

Advanced Software Engineering Series - Benchmarking

Table of contents