Halyk Tech Sprints 2025: "DevOps is not only about Kubernetes"

https://www.youtube.com/watch?v=0kZIjfEhuqo&list=PLQteg1GS8z2TICflx1p8jeLNieiM7QU-P

Introduction

The talk begins with a question that almost every DevOps engineer has heard at some point:
“What do DevOps engineers actually do?”

Developers ask it. Family and friends ask it. Sometimes even colleagues inside IT ask it. And the uncomfortable truth is that DevOps engineers often struggle to give a clear answer.

Why? Because the essence of DevOps work is making things run smoothly in the background. If they do their job well, you don’t see it. Systems don’t break. Deployments go out without incident. Applications scale seamlessly. And because nobody notices, the work goes underappreciated.

Image source: AWS

This talk explores that tension - explaining what DevOps really is, why it’s so hard to explain, and why DevOps is one of the most impactful yet invisible roles in modern software engineering.

DevOps: From Culture to Job Title

The speaker reminds us of the origins of the term DevOps:

Coined around 2009 (popularized by Google and others).
Meant to describe a culture and set of practices to bridge the gap between development and operations.
The goal was collaboration, removing silos, and avoiding conflicts between developers who wanted to release fast and operators who wanted stability.

But somewhere along the way, DevOps transformed from a philosophy into a profession. Instead of just “a set of practices,” companies began hiring “DevOps engineers.”

And with that shift, confusion began: is DevOps about culture, tools, or a job title? The speaker argues it’s all of those - but in practice, DevOps engineers are the ones who carry responsibility for infrastructure and problem-solving.

Why It’s So Hard to Explain DevOps

When asked “What do you do?”, the speaker says he has two choices:

Talk about DevOps as a culture, set of practices, origins, and philosophy.
- Usually too abstract and boring for the listener.
Point at developers’ code and say, “I support that.”
- Feels unfair, like taking credit for developers’ work.

So author created his own practical definition:

“A DevOps engineer is someone who manages infrastructure - making it usable, observable, scalable, resilient, and solving problems along the way.”

It’s not just maintaining servers. It’s enabling developers to focus on writing code while the infrastructure behind it remains reliable, secure, and invisible.

Case Study: Debugging the Undebuggable

To illustrate, the speaker tells a story about a production issue.

A Java/Kotlin service started dropping 50% of its requests. Developers blamed infrastructure. DNS engineers swore their systems were fine. The problem seemed unsolvable.

But DevOps Engineer dug deeper:

They traced requests and discovered failures happened during DNS resolution.
It turned out that Java behaves differently depending on the C library (glibc vs musl).
- With glibc, DNS servers are tried in order.
- With musl, DNS servers are chosen randomly.
Some random choices led to a poisoned DNS cache entry that pointed “into nowhere.”

Result: half the requests failed, depending on which resolver the JVM picked.

glibc, short for GNU C Library*, is a crucial library for the Linux operating system. It provides the fundamental system calls and core C library functions that are essential for most programs to run.*

Think of it as the bridge between a program and the Linux kernel. When a program needs to do something like open a file, write to the screen, or manage memory, it uses functions provided by glibc. glibc then translates these function calls into low-level instructions (system calls) that the Linux kernel can understand and execute. This abstraction allows developers to write code without needing to know the specific details of the underlying hardware or kernel.

musl is a lightweight, clean, and modern C library designed for Linux systems. It's often used as an alternative to glibc, particularly in embedded systems, containers, and other environments where a small footprint and fast boot-up times are critical.

Unlike glibc, which is known for its extensive feature set and large size, musl prioritizes simplicity and correctness. It aims to be a complete and reliable implementation of the C standard while being as small and efficient as possible. This makes it a popular choice for building static executables, which bundle all necessary libraries into a single file, reducing dependencies and simplifying deployment.

Nobody else found the root cause. DNS engineers insisted all was fine. Developers looked at their unchanged code. But DevOps identified and fixed the issue.

For more information, see How glibc Memory Handling Affects Java Applications: The Hidden Cost of Fragmentation

Lesson: DevOps are often the detectives of IT - solving strange, cross-domain problems that don’t belong neatly to any one team.

The Invisible Value of DevOps

The speaker highlights that DevOps contributions are rarely visible until something breaks. To make the invisible visible, he outlines six areas where DevOps create massive value:

Standards & Conventions
- Consistent naming, semantic conventions, templates.
- Reduces confusion in legacy systems and helps new team members onboard quickly.

Preventing Failures Before They Happen
- If nothing breaks, nobody notices. But that’s success.
- The best DevOps work is invisible because disasters are avoided.
Saving Time in Non-Obvious Places
- Example: Kafka worker deployments.
- By changing deployment strategy, DevOps reduced downtime and saved ~4 days per year across 160 workers.
- Developers never asked for it - but everyone benefited.
Observability (Meaningful Monitoring)
- Not just CPU/memory dashboards.
- Metrics and traces that help developers see latency, errors, and user-facing problems.
- Monitoring that actually answers the question: “Why is this broken, and what should I do?”

Resilience & High Availability
- Redundant infrastructure, active-active clusters, distributed brokers.

An active-active cluster is a high-availability system where multiple servers or nodes handle workloads simultaneously, providing enhanced performance and fault tolerance*. Unlike active-passive clusters, where only one node is active at a time, all nodes in an active-active cluster are actively processing requests. This configuration distributes the workload across all available nodes, leading to improved load balancing, increased throughput, and faster response times.*

Developers don’t need to care where the app runs - it just keeps running.

Simplified Infrastructure
- Self-service pipelines, one-click rollbacks, reusable templates.
- Developers don’t have to manually copy files or manage servers.
- In large enterprises (e.g., 300–400 microservices), this standardization is the only way to stay sane.

The Challenges DevOps Face

Technology changes constantly.
- Infrastructure “ages” quickly.
- In Halyk Bank, Kubernetes deployment strategies changed twice in three years.
Scaling under load.
- DevOps ensure there are always resources available (vertical/horizontal scaling).
- They add autoscaling and optimize resource use to handle unpredictable spikes.
Legacy systems.
- The hardest part is when no one knows how old systems work.
- DevOps must reverse-engineer and rebuild them while keeping business running.

Conclusion: The Work Nobody Sees (But Everyone Needs)

The talk ends with a simple truth:

DevOps work is often invisible, but that doesn’t mean it’s unimportant.
On the contrary, it’s the invisible backbone that keeps systems alive, scalable, and reliable.
DevOps aren’t trying to be “cool” by throwing around jargon - they’re solving real, often small, but critical problems that have an outsized impact on business.
Companies and developers should value their DevOps engineers more: praise them, involve them, and ask them questions.

“Sometimes we do small things that cost very little but improve life for everyone. Praise your DevOps - they’ll be happy to explain what they do.”

This talk is not just about what DevOps do, but about the paradox of invisibility: the better they are, the less you see them. They prevent failures, save time, and make developers’ lives easier, but their success often looks like “nothing happened.”

Possible Infrastructure and Tech Stack

Languages: Java for App, Go/Python for tooling.
Platform: Kubernetes + Helm (with templating for deployments).
CI/CD: GitLab CI.
Messaging: Kafka, RabbitMQ.
Observability: VictoriaMetrics, Grafana, ElasticSearch
Infra: Multiple datacenters, Active-Active replication, on bare metal with virtualization