Infra Made Simple: For Data Scientists, ML Engineers & Research Scientists

A Zero-to-Hero Guide for AI Practitioners Navigating Infrastructure

Why This Guide?

Have you ever:

  • Struggled to understand CI/CD errors like No space left on device?

  • Waited 40 minutes for a simple app to deploy?

  • Asked "What’s a pod?" during a critical production incident?

This document is your go-to resource to:

  • Understand how your code actually runs in production

  • Collaborate effectively with DevOps and platform teams

  • Troubleshoot infra-related issues without being a Kubernetes expert

Designed for non-DevOps professionals working in AI — including data scientists, ML engineers, and researchers.

1. The Infra Mindset for Data Roles

"Infra isn’t someone else’s job anymore. It’s part of building intelligent apps."

Modern ML systems are:

  • Real-time (e.g., chatbots, APIs)

  • Multi-service (e.g., RAG pipelines, ingestion workflows)

  • GPU-dependent (LLMs, CV models)

Basic infra knowledge saves hours of debugging, enables faster iteration, and improves reliability.

2. From Code to Pod: A User Request's Journey

Here’s what happens when a user interacts with your deployed service:

  1. User sends a request (e.g., via Teams or API call)

  2. Load balancer routes it to a healthy pod

  3. Pod (a containerized app instance) runs your logic

  4. Your app queries data/models, returns response

Knowing this lets you debug which layer broke (app? infra? scaling?).

3. Core Infra Concepts You Should Know

TermWhat It Means
ImageFrozen code + dependencies (Docker snapshot)
ContainerA running instance of an image (isolated)
PodA K8s unit running one/more containers
NodeA physical/VM host that runs pods
ClusterGroup of nodes managed by Kubernetes
DeploymentYAML config that defines how pods are run
ServiceStable endpoint for accessing a set of pods

4. Metrics That Matter in ML/AI Systems

MetricDescriptionWhy You Should Care
LatencyTime taken to serve one requestImpacts user satisfaction
RPSRequests handled per secondDetermines how scalable you are
ThroughputMax stable request rateIndicates system limits
p95 LatencySlowest 5% response timeHighlights spikes/bottlenecks
MemoryRAM usage of your podPrevents OOM kills, slowness

5. Bottlenecks You’ll Hit — and How to Fix Them

SymptomRoot CauseFix
CI build takes 45 minsNo caching, bad DockerfileUse --cache-from, avoid COPY . . early
No space left on device errorImage layers too largeClean builds, use .dockerignore
Pod stuck in PendingNo compatible node (e.g. GPU)Add GPU nodes or queue fallback
Frequent OOM crashesRAM underestimatedSet limits in YAML
Logs are missing or unclearNot printed or collected properlyAdd prints, use kubectl logs

6. Best Practices for Infra-Aware ML Engineers

Docker

  • Use python:3.11-slim or similar lean images

  • Separate COPY requirements.txt + install → cache layers

  • .dockerignore = lifesaver for large repos

Kubernetes

  • Always set resources.requests and limits

  • Use HPA (horizontal pod autoscaler)

  • Define readiness/liveness probes

Monitoring

  • Emit logs in structured format

  • Capture latency, memory, error rates

  • Share dashboards with DevOps for visibility

Workflow Hygiene

  • Test locally with mocks before pushing

  • Use docker build . && docker run pre-push

  • Document infra assumptions (RAM/CPU needed) in README

7. CI/CD & Long Builds — Why They Hurt and How to Fix Them

Problem: Change one line → entire pipeline re-runs

Why?

  • You copied all code too early

  • No layer caching

  • Pip install reruns every time

Better Dockerfile:

FROM python:3.11-slim

WORKDIR /app
COPY requirements.txt .
RUN pip install -r requirements.txt

COPY . .
CMD ["python", "app.py"]

Also:

  • Use build cache in GitHub/GitLab CI

  • Avoid rebuilding on markdown/doc-only commits

8. Common Mistakes & Lessons Learned

MistakeWhat You Should Do Instead
Ignored memory limits in YAMLSet CPU/mem requests/limits
Pushed without mocking downstream servicesUse stubs for fast validation
App crashed and logs were missingUse kubectl logs -f <pod>
Copied whole repo in Docker build earlyCopy only needed files, cache wisely
Didn’t know why pod stuckUse kubectl describe pod
Image size balloonedUse .dockerignore, slim base image

9. Real-World Cheat Sheet

GoalCommand or Tip
See running podskubectl get pods
View logs for a podkubectl logs <pod-name>
Explain why pod is stuckkubectl describe pod <pod-name>
Free up Docker disk spacedocker system prune
View memory usagetop, free -h, or pod dashboard
Build & test locallydocker build . && docker run
Identify top memory usageps aux --sort=-%mem

10. Final Words for Data-Focused Engineers

Infra is no longer optional for ML teams. You don’t need to master Kubernetes, but you should:

  • Know how your code gets deployed

  • Track memory, latency, and pod health

  • Build with infra in mind (not as an afterthought)

Great models + poor infra = poor user experience

This doc is your on-ramp.

0
Subscribe to my newsletter

Read articles from Sai Sandeep Kantareddy directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Sai Sandeep Kantareddy
Sai Sandeep Kantareddy

Senior ML Engineer | GenAI + RAG Systems | Fine-tuning | MLOps | Conversational & Document AI Building reliable, real-time AI systems across high-impact domains — from Conversational AI and Document Intelligence to Healthcare, Retail, and Compliance. At 7-Eleven, I lead GenAI initiatives involving LLM fine-tuning (Mistral, QLoRA, Unsloth), hybrid RAG pipelines, and multimodal agent-based bots. Domains I specialize in: Conversational AI (Teams + Claude bots, product QA agents) Document AI (OCR + RAG, contract Q&A, layout parsing) Retail & CPG (vendor mapping, shelf audits, promotion lift) Healthcare AI (clinical retrieval, Mayo Clinic work) MLOps & Infra (Databricks, MLflow, vector DBs, CI/CD) Multimodal Vision+LLM (part lookup from images) I work at the intersection of LLM performance, retrieval relevance, and scalable deployment — making AI not just smart, but production-ready. Let’s connect if you’re exploring RAG architectures, chatbot infra, or fine-tuning strategy!