Why Are My Oracle Sessions Getting SIGSTOP’d?

Finally, I encountered a case that couldn't be solved using traditional Oracle Database tools. I've been experimenting with eBPF for some time, and this is a real-world scenario from an Exadata environment where I needed eBPF to help me truly understand the problem.
The Problem
Users complained that database sessions were occasionally "frozen." Interestingly, these "frozen" sessions could not be killed. When DBAs used alter system kill session
, those sessions stayed in the KILLED
state.
Also, the load average of the machine where this was happening was suspiciously high all the time, even though no one complained about things running "too slow."
The Investigation
I'll keep this story brief: the sessions couldn't be killed because their dedicated server process was STOPPED. By STOPPED, I mean the Linux process was in a T
state. You can replicate this by sending a SIGSTOP signal to any Linux process, like this:
kill -SIGSTOP 12345
So, why is Oracle stopping those sessions? Initially, I suspected it might be related to Oracle Resource Manager and cpu_count
settings because the load was high. However, as far as I know, Oracle doesn't use STOP/CONTINUE signals for resmgr. I even ran a quick test in my lab using resmgr under heavy load and did not see any such signals being sent.
Observing the Signals
This is where the bpftrace
comes into play. We can use it to detect when the signal is sent (or received) and who sent it. And it isn't really a voodoo, all you need is a simple bpftrace
script:
#!/usr/bin/env -S bpftrace
tracepoint:signal:signal_generate /args->sig == 19/ {
time("%Y-%m-%d %H:%M:%S");
printf(" SIGSTOP sent from %d (%s) to %d (%s)\n", pid, comm, args->pid, args->comm);
}
I was hoping to let this script run for a week or so and then see which processes were sending SIGSTOP signals. But to my surprise, there were tons of such signals, many of them within the same second.
When I tried to see details of the process that sent the signal (the pid is printed using the script above), I noticed that such a process no longer existed. Even when I added printing info about the pid into the bpftrace
script, like this:
system("cat /proc/%d/stack", pid);
I noticed that most of the time the process did not exist anymore when my script reached this call. So, they were very short-lived processes. Using a similar approach (reading PPid
from /proc/pid/status
), I found the parent process that was spawning them. And lo and behold, the parent process was the Oracle database session process itself!
Turns out, the Oracle database process was spawning a child process every second or so to send a STOP signal, do something, then send a CONTINUE signal again.
Why would Oracle do that?!
The Why
To understand why, I tried profiling a database session, which was waiting on SQL*Net message from client
for a while now but was still receiving STOP/CONTINUE signals.
I’ve blurred out functions that were named like Something::doSomething
. Notice how it looks like C++ code? If I remember correctly, most of Oracle's core is written in C. Could this really be Oracle code?!
Turns out that it is not. Inspecting shared libraries used by this process, like this:
cat /proc/12345/maps
proved, that Oracle process was using .so
libraries from /opt/some-other-product
.
So, it's not even an Oracle code; it's code from a product, that hooked Oracle database sessions in order to extract session-related info in a consistent manner.
Subscribe to my newsletter
Read articles from Urh Srecnik directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Urh Srecnik
Urh Srecnik
I'm an Oracle DBA and a developer at Abakus Plus d.o.o.. My team is responsible for pro-active maintenance of many Oracle Databases and their infrastructure. I am co-author of Abakus's solutions for monitoring Oracle Database performance, APPM, also available as a Free Edition. for backup and recovery: Backup Server for quick provisioning of test & development databases: Deja Vu Also author of open-source DDLFS filesystem which is available on GitHub. I am: OCP Database Administrator OCP Java SE Programmer OCIS Exadata Database Machine and a few more, check my LinkedIn profile for the complete list.