DevOps Complexity Philosophy: Pragmatism and Skepticism - 2025 Challenges Forecast
Like all fields and disciplines, DevOps, as a discipline is divided into theory and practice. The practice (or pragmatics) should lead us to a Production-ready environment, that is, a code running on production cloud or on-premise with underlying networking suitable for my Kubernetes environment, and that is obeying the best-practice microservices decomposition for my business-logic (whether a Bank, Fintech, Ad-tech, ML, SaaS etc…) which should serve the business’ customers which translates to ARR and profitability.
The complexity leads to deconstruction and abstraction of the infrastructure which leads to more challenges that will be with us also in 2025 along with the questions you and your DevOps team need to ask yourself!
Patterns, More Patterns
Developers are well familiar with MVC which has been mutated into MVVM alongside the OOP Design Patterns and so it goes for Microservices Design Patterns (i.e Saga).
Eventually separating the front end and the back end, infrastructure-wise, is required because for the front end, I’m going to need a CDN and for the Backend I’m going to need Kubernetes, so the back end is back to OOP Design Pattern, which takes into consideration Event-Driven Patterns and MQs. Further, Kubernetes Operators and Controllers offer more and more Design Patterns (in the form of Concepts & CRD) not to mention how the microservices design patterns should be translated into Pods to Pods relationships.
Historiography: Modern (EC2/Ansible) vs. Post-Modern(IaC/K8S/HelmCharts)
Is there right and wrong or is it a matter of historiography? Is bringing up EC2 manually without IaC and running some Ansible Playbook to bring up your server considered legacy? The same can be said of companies avoiding using MSK/SQS for the reliability and comfortability of RDS.
if so, is it like the relationship between modernity and post-modernity?
Should we use Managed MSK/SQS or should we run them inside K8s?
Continuous Integration: Development←→ QA←→Staging
After the relevant microservices patterns have been decided(are they? no really ), I’m going to need to integrate the Development teams with QA Team, and for this, I’ll create an EKS Cluster, while allowing the developers to push OCI images to ECR using Github Actions or any other CI tools integrated with Flux Image Repository or ArgoCD, and this will add Kubernetes Pods into my EKS Development Cluster.
IMPORTANT: Try to avoid using third-party software to make commit push
to GIT repo to deploy a pod!
Only left to implement Ingress Architecture to allow HTTPs/443 traffic into my EKS and for each feature-branch-pod, routing them with Traefik or Istio.
Continuous Deployment is easy with Flagger
Later, when the developers finish their development and are ready for production, I’ll take a specific Tag from a specific ECR for production, and move it to production. After another cycle of Deploying Pods to Staging-Environment that is more closely similar to the Production Environment and sessions of QA, we are moving from deployment to production
Now the CD of Pods and the HelmCharts (of scalability and observability) provisioning of Production takes forking paths.
Philosophy of Production
Pod as Microcosmos: SideCars
Alongside your Pod’s Main Application Container (This is your primary application) did you consider adding SideCars for Networking, Secrets, Observability, VPA Observability, Authentication, and Logs?
Envoy Proxy: Handles traffic management and load balancing.
Vault Agent: Manages secrets and provides security features.
Prometheus Exporter: Collects and exposes performance metrics.
Vertical Pod Autoscaler (VPA) Recommender: Optimizes resource allocation.(see)
OAuth2 Proxy: Handles authentication and authorization.
Fluentd: Collects and forwards logs.
HelmCharts Operator for Managing HelmCharts’ Helmfile: CRD
Deployment to production is easy, but how do I coordinate HelmCharts and deployments of pods alongside ReplicateSet and DeamonSet reconfiguration? How does GITOPS work with HelmCharts?
Should we use a GITOPS Repository for provisioning Production-Env or should I use Helmfile Operator? How do I manage all the HelmChart consistently?
https://github.com/mumoshu/helmfile-operator
Dealing with values.yaml
Does your DevOps team have successfully managed to integrate GITOPS with HelmChart’s values.yaml? Do you DevOps team understands all of Ciliums values.yaml
(3600~ lines), istio’s (500~ lines), Keda values.yaml
(850~ lines)? and takes full advantage of the functionality?
Q: Do you use Lens?
CNI, Networking & Ingresses Design Patterns (Envoy/Traefik/Istio)
Finishing the design of the Pods with their SideCars, outlined by the microservices design pattern leads us to the ServiceMesh and K8s networking, how do the Pods interact with each other mTLS while having all the Certificates ready-at-hand?
In this book, the complexity is dealt with low level and it’ll still be a challenge for 2025 how to deal with networking practices. But what really helps is to be able to debug all this networking with Kiali.
DevSecOps Aesthetics
Comprehensively securing of the Code (developers), Container(Trivy, gVisor), Cluster[RBAC,NetworkPolicy, Pod Security Admission] and the Cloud[IAM,OIDC,TLS,SSL,SecurityGroup,MFA,STS]) became pretty basic in today’s devSecOps world while following the Methodology of CIS Kubernetes Benchmark and eBPF could be a good start. Try to add Pods insider your K8s Cluster to take care of DevSecOps!
Q: Are you running Vault inside your k8s cluster?
eBPF-based Security Observability: Tetragon & Cilium
Between zero-trust and Air Gap, it becomes best practice to embed Kernel-Observability with eBPF tools like Tetragon & Cilium so make sure you add them to your observability helm-charts toolset of observability alongside your openTelemetry/SIEM Observabilities while reading how to build a WAF to Prevent Command Injection, Backdoors, and Reverse Shells.
DevSecOps Tools
https://medium.com/@noah_h/kubernetes-security-tools-seccomp-apparmor-586fdc61e6d9
https://github.com/aquasecurity/kube-bench
https://www.cisecurity.org/benchmark/kubernetes
https://www.kernel.org/doc/html/v4.19/userspace-api/seccomp_filter.html
https://spacelift.io/blog/aws-sts
https://spiffe.io/docs/latest/spiffe-about/overview/
https://spiffe.io/docs/latest/spiffe-about/overview/
https://medium.com/@vanchi811/aws-iam-roles-anywhere-63656682c7aa
Production Skepticism: Chaos Engineering
SRE or Continuous Observability? Chaos Engineering Experiments or Disaster Recovery?
[ TIP: Keep Challenging your production environment Consider Choas Mesh or Gremlin and the full ChaosHub ]
If you think your production environment is safe because you took care of auto-scaling, and you can sleep well at night, think again: more than Metrics Observability (and Alerting systems) you want to prepare for disasters in production by shifting left with Chaos-Engineering, by running experiments, like
Case: Scalability Chaos Engineering
If you are using auto-scaling (KEDA/Karpanter) solutions, try to challenge them with Pod Autoscaler Chaos Engineering Experiment.
About the Author:
Amit Sides is a Development, DevOps & SRE Expert, DevSecOps, and MLOps.
Useful links
https://github.com/cncf/curriculum
https://github.com/notaryproject/notary
https://github.com/notaryproject/notary/blob/master/docs/service_architecture.md
https://spiffe.io/docs/latest/spiffe-about/overview/#aws
https://github.com/Silas-cloudspace/terraform-modules
https://medium.com/@vanchi811/aws-iam-roles-anywhere-63656682c7aa
https://docs.cast.ai/docs/about-the-read-only-agent
https://github.com/bottlerocket-os/bottlerocket/tree/develop
Subscribe to my newsletter
Read articles from Amit Sides directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
Amit Sides
Amit Sides
Amit Sides is a Backend Developer, DevOps Expert, DevSecOps & MLOPS GITHUB https://github.com/amitsides Technology Stack o AWS-EKS/AKS/GKE / Cloud-Native / Multi-Cloud o Microservices + MSK + SQS + KMS o Linux System Administrator / Ansible o Dockerfiles o Kubernetes Clusters + Scalability (Karpanter/KEDA) o K8s Services Controllers Ingresses, Nginx, Load Balancers, Istio, CNI, Cillium o Jenkins/GitHub Actions Yamls, Bullds ECR Registry (OCI) o TerraForm +Terragrunt Provisioning (+Terraspace) o GITOPS/ArgoCD/Flux/App-of-Apps o Databases RDS/MySQL/PostgreSQL/DynamoDB... o SRE, Observability, Logging, Monitoring, Alerting, Load Balancing, High Availability RESTFul API Implementation + JWT PYTHON BASH Scripting DevSecOps o eBPF/Kernel Security o Pod Security Admission + RBAC o CIS Kubernetes Benchmark o kube-bench o AppArmor o Seccomp o gvisor o falco o tetragon o openpolicyagent o trivy