π¦ Docker for Data Science

This project provides a Docker-based environment for running Data Science workflows inside VS Code. Using Docker ensures reproducibility, isolation, and portability, so you can focus on coding without worrying about dependency issues.
π What is Docker?
π Docker is a platform that lets you package your applications (code, libraries, dependencies, OS) into a lightweight, portable unit called a container.
Think of Docker as a way to βship your code with everything it needsβ so it runs the same on any machine.
π§© Key Concepts
1. Docker Image πΌοΈ
A blueprint (template) for containers.
Defines the OS, libraries, tools, and environment your code needs.
Example: A Python image with NumPy, Pandas, and Jupyter installed.
2. Docker Container π¦
A running instance of an image.
Think of it as a lightweight virtual machine that runs your project.
You can start, stop, or remove containers anytime.
3. Docker Compose π οΈ
A tool to manage multiple containers together using one YAML file (
docker-compose.yml
).Example:
One container runs Jupyter Notebook
Another runs a PostgreSQL database
Another runs Redis (cache)
With Docker Compose, you can start all services with just:
docker compose up
π For Data Science, Docker Compose is important because many projects need multiple services (e.g., Jupyter + Database + ML Model API).
β‘ Why Use Docker for Data Science?
β Reproducibility β Same Python + library versions everywhere.
β Portability β Move your project from laptop β server β cloud easily.
β Isolation β Keep data science projects separated (no package conflicts).
β Scalability β Connect multiple tools (Jupyter, PostgreSQL, Spark, Hadoop, etc.) with Compose.
β Team Collaboration β Everyone gets the same environment.
π§βπ» Usage for Data Engineers & Data Scientists
Data Scientist β Run Jupyter, Python, ML libraries inside Docker.
Data Engineer β Orchestrate multiple tools (Spark, Kafka, Hadoop, PostgreSQL) using Docker Compose.
ML Engineer β Deploy trained ML models as APIs inside containers.
Big Data Projects β Use Compose to manage clusters (HDFS + Spark + Airflow).
π Final Takeaway
Docker β Packages your environment
Image β Blueprint of environment
Container β Running environment
Docker Compose β Manages multiple services easily
π For Data Science & Engineering, Docker ensures smooth collaboration, reproducibility, and scalability.
Redis
What is Redis? An in-memory key-value data store. Extremely fast, supports strings, lists, sets, hashes, sorted sets, pub/sub, TTLs, atomic ops, and simple persistence options.
Why use Redis?
Super fast (in RAM) β ideal for caching and counters.
Flexible data types β beyond plain key/value.
Great for sessions, rate limiting, leaderboards, lightweight queues.
Simple pub/sub messaging.
Common usages
Cache DB or API responses.
Session store for web apps.
Rate limiting and counters.
Message queue / pub-sub for background tasks.
Leaderboards with sorted sets.
Flask
What is Flask? A lightweight Python web framework (microframework). Gives routing, request handling, templates (Jinja2) and is easy to extend.
Why use Flask?
Simple to learn and fast to prototype.
Perfect for REST APIs (e.g., serve ML models).
Highly customizable β pick only what you need.
Works well with Docker and microservice architectures.
Common usages
REST API for ML models.
Small web apps or admin dashboards.
Backend for SPA (single-page applications).
Microservice endpoints in data pipelines.
π Why Use Redis + Flask in Docker?
1. Flask in Docker
Flask is your web layer β the place where you expose APIs (for ML predictions, dashboards, or data services).
In Docker, you package Flask with its exact Python version + dependencies β no βworks on my machineβ issues.
Your Flask API becomes portable: run locally, on a server, or in the cloud with the same behavior.
Example use case: You trained an ML model β you wrap it in Flask (/predict
endpoint) β Docker makes it a microservice β deploy anywhere.
2. Redis in Docker
Redis is your fast in-memory store (cache, session store, counters, pub/sub).
In Docker, Redis runs as a separate isolated service β no need to install manually on your system.
You can spin up Redis instantly for dev/test/prod using
docker run
ordocker compose
.
Example use case:
Cache expensive ML predictions or database queries.
Store Flask user sessions in Redis (instead of local memory).
Count visits to your Flask app with Redis.
β Benefits of Using Docker Here
Isolation β Flask container has Python + deps, Redis container has its own environment. No conflicts.
Reproducibility β Same versions across dev, staging, production.
Portability β Run the same app locally, on server, or in cloud.
Scalability β With Compose/Kubernetes you can scale Flask (
docker compose up --scale web=3
) while keeping Redis shared.Simplicity β One command (
docker compose up
) starts your whole stack.
π In short:
Flask = your web service (APIs, dashboards, ML endpoints)
Redis = your cache/store (fast data access, counters, sessions)
Docker = your packaging and orchestration (easy to run anywhere, together).
π³ DOCKER COMPOSE
Docker Compose is a tool that helps you define and manage multi-container Docker applications easily.
Instead of running containers one by one with long docker run
commands, you can describe your entire setup in a single docker-compose.yml
file and start everything with a single command.
π Key Points:
Multi-container apps β Example: A Flask app + a Redis database.
YAML configuration β You define services (containers), networks, and volumes in
docker-compose.yml
.One command β
docker compose up
will start all services together.Easy management β Stop everything with
docker compose down
.
π Example (docker-compose.yml
)
version: '3'
services:
web:
build: .
ports:
- "5000:5000"
redis:
image: "redis:alpine"
β‘οΈ Here:
web
= Flask app (built from Dockerfile in the current directory.
)redis
= Redis container (from official image)Both run together with just:
docker compose up
β‘ In short: Docker runs one container at a time, while Docker Compose helps run and connect multiple containers (like a whole project stack) with a single command.
π³ Docker Commands Cheat Sheet
π¨ Build Image
docker build -t welcome-app .
-t welcome-app
β names the imagewelcome-app
.
β build from current directory (important!)
βΆοΈ Run Container
docker run -d --name welcome-app -p 8000:5000 welcome-app
-d
β run in background (detach mode)--name welcome-app
β names the container-p 8000:5000
β maps host port 8000 β container port 5000
ποΈ Remove Image
docker rmi -f <image_id_or_name>
-f
β force remove
πΈ List Images
docker images
π¦ List Containers
docker ps -a
- Shows all containers (running + stopped).
π Docker Compose
Start all services:
docker compose up
Build/rebuild images:
docker compose build
Stop services:
docker compose stop
Remove containers, networks, volumes (clean):
π³ Common Docker Commands
Command | Usage |
docker build -t welcome-app . | Build image |
docker run -p 8000:5001 welcome-app | Run container |
docker rmi -f <id> | Remove image |
docker images | List images |
docker ps -a | List all containers |
docker compose up | Start services |
docker compose build | Build images for services |
docker compose stop | Stop services |
β Common Errors
Mistyping commands β Double-check command names (e.g.,
docker
, notdoker
).Port already in use β Change port mapping (
9000:5000
instead of8000:5000
).Module not found β Rebuild image with
docker compose up --build
.Redis connection error β Wait a few seconds (handled by retry loop).
β With this, you have:
A Flask app in Docker
A Flask + Redis app in Docker Compose
π¬ Letβs Connect: LinkedIn β Maddy Das
Subscribe to my newsletter
Read articles from MATHAVAN E directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by