The subtle art of waiting


Recently, while working on a workshop titled Testing Your Pull Request on Kubernetes with GKE, and GitHub Actions, I faced twice the same issue: service A needs service B, but service A starts faster than service B, and the system fails. In this post, I want to describe the context of these issues and how I solved them both with the same tool.
Waiting in Kubernetes
It might sound strange to wait in Kubernetes. The self-healing nature of the Kubernetes platform is one of its biggest benefits. Let's consider two pods: a Python application and a PostgreSQL database.
The application starts very fast and eagerly tries to establish a connection to the database. Meanwhile, the database is initializing itself with the provided data; the connection fails. The pod ends up in the Failed
state.
After a while, Kubernetes requests the application pod's state. Because it's failed, it terminates it and starts a new pod. At this point, two things can happen: the database pod isn't ready yet, and it's back to square one or it's ready, and the application finally connects.
To speed up the process, Kubernetes offers startup probes:
startupProbe:
httpGet:
path: /health
port: 8080
failureThreshold: 30
periodSeconds: 10
With the above probe, Kubernetes waits for an initial ten seconds before requesting the pod's status. If it fails, it waits for another ten seconds. Rinse and repeat 30 times before it fails definitely.
You may have noticed the HTTP /health
endpoint above. Kubernetes offers two exclusive Probe configuration settings: httpGet
or exec
. The former is suitable for web applications, while the latter is for other applications. It implies we need to know which kind of container the pod contains and how to check its status, provided it can. I'm no PostgreSQL expert, so I searched for a status check command. The Bitnami Helm Chart looks like the following when applied:
startupProbe:
exec:
command:
- /bin/sh
- -c
- -e
- exec pg_isready -U $PG_USER -h $PG_HOST -p $PG_PORT
Note that the above is a simplification, as it gladly ignores the database name and an SSL certificate.
The startup probe speeds things up compared to the default situation if you configure it properly. You can set a long initial delay, and then shorter increments. Yet, the more diverse the containers, the harder it gets to configure, as you need to be an expert in each of the underlying containers.
It would be beneficial to look for alternatives.
Wait4x
Alternatives are tools whose focus is on waiting. A long time ago, I found the wait-for script for this. The idea is straightforward:
./wait-for
is a script designed to synchronize services like docker containers. It issh
andalpine
compatible.
Here's how to wait for an HTTP API:
sh -c './wait-for http://my.api/health -- echo "The api is up! Let's use it"'
It got the job done, but at the time, you had to copy the script and manually check for updates. I've checked, and the project now provides a regular container.
wait4x plays the same role, but is available as a versioned container and provides more services to wait for: HTTP, DNS, databases, and message queues. That's my current choice.
Whatever tool you use, you can use it inside an init container:
A Pod can have multiple containers running apps within it, but it can also have one or more init containers, which are run before the app containers are started.
Init containers are regular containers, except:
Init containers always run to completion.
Each init container must complete successfully before the next one starts.
Imagine the following Pod
that depends on a PostgreSQL Deployment
:
apiVersion: v1
kind: Pod
metadata:
labels:
type: app
app: recommandations
spec:
containers:
- name: recommandations
image: recommandations:latest
envFrom:
- configMapRef:
name: postgres-config
The application is Python and starts quite fast. It attempts to connect to the PostgreSQL database. Unfortunately, the database hasn't finished initializing, so the connection fails, and Kubernetes restarts the pod.
We can fix it with an initContainer
and a waiting container:
apiVersion: v1
kind: Pod
metadata:
labels:
type: app
app: recommandations
spec:
initContainers:
- name: wait-for-postgres
image: atkrad/wait4x:3.1
command:
- wait4x
- postgresql
- postgres://$(DATABASE_URL)?sslmode=disable
envFrom:
- configMapRef:
name: postgres-config
containers:
- name: recommandations
image: recommandations:latest
envFrom:
- configMapRef:
name: postgres-config
In the above setup, the initContainer
doesn't stop until the database accepts connections. When it does, it terminates, and the recommandations
container can start. Kubernetes doesn't need to terminate the Pod
as in the previous setup! It entails fewer logs and potentially fewer alerts.
When waiting becomes mandatory
The above is a slight improvement, but you can do without it. In other cases, waiting becomes mandatory. I experienced it recently when preparing for the workshop mentioned above. The scenario is the following:
The pipeline applies a manifest on the Kubernetes side
In the next step, it runs the test
As the test starts before the application is read, it fails.
We must wait until the backend is ready before we test. Let's use wait4x
to wait for the Pod
to accept requests before we launch the tests:
- name: Wait until the application has started
uses: addnab/docker-run-action@v3 #1
with:
image: atkrad/wait4x:latest
run: wait4x http ${{ env.BASE_URL }}/health --expect-status-code 200 #2
The GitHub Action allows running a container. I could have downloaded the Go binary instead.
Wait until the
/health
endpoint returns a200
response code.
Conclusion
Kubernetes startup probes are a great way to avoid unnecessary restarts when you start services that depend on each other. The alternative is an external waiting tool configured in an initContainer
. wait4x
is a tool that can be used in other contexts. It's now part of my toolbelt.
To go further:
Originally published at A Java Geek on April 20th, 2025
Subscribe to my newsletter
Read articles from Nicolas Fränkel directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by

Nicolas Fränkel
Nicolas Fränkel
Technologist focusing on cloud-native technologies, DevOps, CI/CD pipelines, and system observability. His focus revolves around creating technical content, delivering talks, and engaging with developer communities to promote the adoption of modern software practices. With a strong background in software, he has worked extensively with the JVM, applying his expertise across various industries. In addition to his technical work, he is the author of several books and regularly shares insights through his blog and open-source contributions.