Preventing OutOfMemoryError: Container Process Kills

Jill ThornhillJill Thornhill
4 min read

Intermittent errors are the most frustrating problems to deal with in production. This is especially so when an application apparently disappears without trace. Perhaps a service becomes unavailable, or a batch process terminates without completing its job.

Checking the application logs reveals nothing: there are no errors or termination messages to help us. Delving deeper, we may find a sinister message in the kernel logs, something like this:

Out of memory: Kill process 1024(java) score 350 or sacrifice child

There is no helpful stack trace following a java.lang.outofmemoryerror: the container process kills the application, giving it no opportunity to do an orderly shutdown.

This article looks at how to trace the cause of this problem, and solve it.

OutOfMemoryError: Container Process Kills

Container process kills occur under Linux when available memory becomes dangerously low. Unlike Windows, which simply freezes, Linux has a mechanism to prevent the entire system from crashing.

A daemon, known as the OOM killer, constantly monitors available memory. When it detects that memory is critical, it uses an algorithm to score each running process, and determine which is most likely to be the culprit. Having selected one it deems to be rogue, it terminates it, freeing up memory to prevent a system crash.

It’s possible to tune the OOM killer to suit different requirements.

This error only ever occurs in Linux, and it’s more likely to happen in containers, since they may have limited RAM capacity.

This OutOfMemory (OOM) error in Java is unlike other types of memory errors, in that the operating system of the device or container kills the application, rather than the JVM throwing a java.lang.outofmemoryerror.

Where Do You Find the Error Message, and What Does it Look Like?

To find the error messages, we need to use the dmesg utility, which displays the kernel logs.

The error messages may look like this:

[ 1584.087068] Out of memory: Kill process 3070 (java)  score 547 or sacrifice child 
[ 1584.094170] Killed process 3070 (java) total-vm:56994588kB, anon-rss:35690996kB, file-rss:0kB, shmem-rss:0kB

By using grep ‘Kill process’ in conjunction with dmesg, we can quickly find it.

What Causes Container Process Kills?

Let’s briefly look at some of the issues that cause this problem. You may also like to read this article: How to Solve OutOfMemoryError: Kill Process or Sacrifice Child.

Causes include:

· The device or container has insufficient RAM;

· Other running processes are using too much memory;

· Too many processes are running on the device;

· The Java application has a memory leak or other memory management problems;

· The application has poor thread management;

· If maximum memory in the JVM (-Xmx) is configured larger than initial memory (-Xms), it can contribute to the problem. When an application is constantly requesting more memory, the OOM killer is more likely to flag it as a rogue process.

Analyzing the Problem

To get to the root cause of the problem, we need to look at the container as a whole to see what’s causing it to run out of memory. How many processes are running? How much memory are they using?

We also need to look in depth at the Java application itself. Other OOM errors give us valuable clues in the application log, such as a reason for the crash and a stack trace. With this error, we have to look at all aspects, such as heap management, thread management, garbage collection and use of I/O buffers. Here are some tools that help us do this quickly.

· HeapHero: Analyzes heap usage;

· GCeasy: Evaluates garbage collection efficiency;

· FastThread: Analyzes thread-related issues;

· yCrash: Provides 360° process analytics, with AI recommendations.

Solutions

Once you’ve determined the cause of the problem, you may decide to:

· Fix application errors;

· Make -Xms and -Xmx the same;

· Increase the container’s memory;

· Move some processes to a different device;

· Tune the OOM killer.

Conclusion

Processes that disappear intermittently are frustrating to troubleshoot. Container process kills are likely to be the problem, and we look for them in the kernel log.

To diagnose the problem, look in depth at the affected application, as well as all other processes running on the device.

This article has looked at possible causes, as well as analytical tools that can help troubleshoot the problem. I hope you’ve found it helpful.

0
Subscribe to my newsletter

Read articles from Jill Thornhill directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jill Thornhill
Jill Thornhill