Understanding Stop-the-World Events in Java Garbage Collection


If you’re having performance issues with your Java application, there’s a good chance that stop-the-world events are contributing to your problem. What are they? Why do they happen? How do you minimize them to get better performance?
This article looks at these questions.
Tasks Performed During Java Garbage Collection
First let’s look at the tasks carried out by the garbage collector (GC).
Mark: GC first marks all objects that are currently in use. To do this, it starts with ‘roots’ – pointers to objects in the heap. To find them, it scans the stack, native threads and static variables. Following these roots, it creates a mark map, where it records what areas of memory are in use.
Sweep: During this phase, it adds all memory areas that are not in use to the list of free memory for reuse.
Compact: To avoid fragmentation, GC must from time to time compact memory so that all free space is together. This is a large task, as all references must be adjusted to point to the new location of each object.
What are Stop-the-World Events, and Why Are They a Problem?
A stop-the-world event happens when the GC must pause all application threads to carry out critical tasks. No actual work can take place until the GC finishes its job. By doing this, the GC can guarantee that no new objects are created during this period, and no existing objects will become unreachable.
These pauses are referred to as latency. Batch jobs can tolerate a certain amount of latency. User-facing software, as well as process control and robotics applications, have less tolerance.
How do Newer GC Algorithms Reduce Latency?
In early GC algorithms, all GC tasks were stop-the-world. Much work has been done to reduce latency, including:
Performing some tasks concurrently with application threads. These may include root scanning, some of the marking, and identifying regions needing compaction;
Splitting memory into smaller regions, each of which is cleaned separately. This means that each GC cycle is completed much faster, reducing pause times;
Activating several parallel GC threads to complete tasks quickly.
How do you Monitor Latency?
You can activate GC logging using command line arguments when you invoke the JVM. These logs provide valuable information about how GC is performing.
A useful monitoring tool is GCeasy, which analyzes the logs and provides key performance indicators and other diagnostic information. The image below is a section of a GCeasy report.
Fig: Key Performance Indicators from GCeasy Report
This shows the average and maximum pause times, as well as a graph analyzing the range. This is extremely useful for tuning and monitoring. You can adjust JVM parameters and quickly see the effect.
Can You Reduce the Duration of Stop-the-world Events?
Stop-the-world events are an essential part of GC, but by tuning and monitoring, it should be possible to reduce their frequency and duration considerably.
Here are some pointers as to what factors may help to minimize latency.
Make sure each of the heap spaces is sized appropriately for your application;
YG should usually be fairly small in comparison to OG. Since all new objects are created in YG, it’s essential for GC to stop other threads for a large part of the cleaning task. YG should be small enough to be cleaned very quickly.
Tweaking the tenuring and fill thresholds may help, as may increasing GC threads.
Select the correct algorithm to suit your application. G1 has much lower latency than earlier algorithms. Shenandoah and Z work better with larger heaps if you have an up-to-date version of the JVM.
Don’t use
System.gc()
within your applications. This triggers a major GC event, and is likely to increase latency. You can, in fact, disable this feature using JVM command line arguments.Ensure you have sufficient RAM and CPU cores for the task.
For more detailed information, it’s worth reading this article that deals with Java Garbage Collection principles and tuning.
Conclusion
Stop-the-world events are an essential part of Java garbage collection. They don’t, however, need to have a negative impact on our applications.
With proper monitoring and tuning, we can reduce latency to acceptable levels.
Subscribe to my newsletter
Read articles from Jill Thornhill directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
