πŸ“¦ Docker for Data Science

MATHAVAN EMATHAVAN E
7 min read

This project provides a Docker-based environment for running Data Science workflows inside VS Code. Using Docker ensures reproducibility, isolation, and portability, so you can focus on coding without worrying about dependency issues.


πŸ”‘ What is Docker?

πŸ‘‰ Docker is a platform that lets you package your applications (code, libraries, dependencies, OS) into a lightweight, portable unit called a container.

Think of Docker as a way to β€œship your code with everything it needs” so it runs the same on any machine.


🧩 Key Concepts

1. Docker Image πŸ–ΌοΈ

  • A blueprint (template) for containers.

  • Defines the OS, libraries, tools, and environment your code needs.

  • Example: A Python image with NumPy, Pandas, and Jupyter installed.

2. Docker Container πŸ“¦

  • A running instance of an image.

  • Think of it as a lightweight virtual machine that runs your project.

  • You can start, stop, or remove containers anytime.

3. Docker Compose πŸ› οΈ

  • A tool to manage multiple containers together using one YAML file (docker-compose.yml).

  • Example:

    • One container runs Jupyter Notebook

    • Another runs a PostgreSQL database

    • Another runs Redis (cache)

  • With Docker Compose, you can start all services with just:

      docker compose up
    

πŸ‘‰ For Data Science, Docker Compose is important because many projects need multiple services (e.g., Jupyter + Database + ML Model API).


⚑ Why Use Docker for Data Science?

  • βœ… Reproducibility β†’ Same Python + library versions everywhere.

  • βœ… Portability β†’ Move your project from laptop β†’ server β†’ cloud easily.

  • βœ… Isolation β†’ Keep data science projects separated (no package conflicts).

  • βœ… Scalability β†’ Connect multiple tools (Jupyter, PostgreSQL, Spark, Hadoop, etc.) with Compose.

  • βœ… Team Collaboration β†’ Everyone gets the same environment.


πŸ§‘β€πŸ’» Usage for Data Engineers & Data Scientists

  • Data Scientist β†’ Run Jupyter, Python, ML libraries inside Docker.

  • Data Engineer β†’ Orchestrate multiple tools (Spark, Kafka, Hadoop, PostgreSQL) using Docker Compose.

  • ML Engineer β†’ Deploy trained ML models as APIs inside containers.

  • Big Data Projects β†’ Use Compose to manage clusters (HDFS + Spark + Airflow).



πŸš€ Final Takeaway

  • Docker β†’ Packages your environment

  • Image β†’ Blueprint of environment

  • Container β†’ Running environment

  • Docker Compose β†’ Manages multiple services easily

πŸ‘‰ For Data Science & Engineering, Docker ensures smooth collaboration, reproducibility, and scalability.


Redis

What is Redis? An in-memory key-value data store. Extremely fast, supports strings, lists, sets, hashes, sorted sets, pub/sub, TTLs, atomic ops, and simple persistence options.

Why use Redis?

  • Super fast (in RAM) β†’ ideal for caching and counters.

  • Flexible data types β†’ beyond plain key/value.

  • Great for sessions, rate limiting, leaderboards, lightweight queues.

  • Simple pub/sub messaging.

Common usages

  • Cache DB or API responses.

  • Session store for web apps.

  • Rate limiting and counters.

  • Message queue / pub-sub for background tasks.

  • Leaderboards with sorted sets.



Flask

What is Flask? A lightweight Python web framework (microframework). Gives routing, request handling, templates (Jinja2) and is easy to extend.

Why use Flask?

  • Simple to learn and fast to prototype.

  • Perfect for REST APIs (e.g., serve ML models).

  • Highly customizable β€” pick only what you need.

  • Works well with Docker and microservice architectures.

Common usages

  • REST API for ML models.

  • Small web apps or admin dashboards.

  • Backend for SPA (single-page applications).

  • Microservice endpoints in data pipelines.



πŸ”— Why Use Redis + Flask in Docker?

1. Flask in Docker

  • Flask is your web layer β€” the place where you expose APIs (for ML predictions, dashboards, or data services).

  • In Docker, you package Flask with its exact Python version + dependencies β†’ no β€œworks on my machine” issues.

  • Your Flask API becomes portable: run locally, on a server, or in the cloud with the same behavior.

Example use case: You trained an ML model β†’ you wrap it in Flask (/predict endpoint) β†’ Docker makes it a microservice β†’ deploy anywhere.


2. Redis in Docker

  • Redis is your fast in-memory store (cache, session store, counters, pub/sub).

  • In Docker, Redis runs as a separate isolated service β€” no need to install manually on your system.

  • You can spin up Redis instantly for dev/test/prod using docker run or docker compose.

Example use case:

  • Cache expensive ML predictions or database queries.

  • Store Flask user sessions in Redis (instead of local memory).

  • Count visits to your Flask app with Redis.


βœ… Benefits of Using Docker Here

  • Isolation β†’ Flask container has Python + deps, Redis container has its own environment. No conflicts.

  • Reproducibility β†’ Same versions across dev, staging, production.

  • Portability β†’ Run the same app locally, on server, or in cloud.

  • Scalability β†’ With Compose/Kubernetes you can scale Flask (docker compose up --scale web=3) while keeping Redis shared.

  • Simplicity β†’ One command (docker compose up) starts your whole stack.


πŸ‘‰ In short:

  • Flask = your web service (APIs, dashboards, ML endpoints)

  • Redis = your cache/store (fast data access, counters, sessions)

  • Docker = your packaging and orchestration (easy to run anywhere, together).

  • 🐳 DOCKER COMPOSE

Docker Compose is a tool that helps you define and manage multi-container Docker applications easily.

Instead of running containers one by one with long docker run commands, you can describe your entire setup in a single docker-compose.yml file and start everything with a single command.


πŸ”‘ Key Points:

  • Multi-container apps β†’ Example: A Flask app + a Redis database.

  • YAML configuration β†’ You define services (containers), networks, and volumes in docker-compose.yml.

  • One command β†’ docker compose up will start all services together.

  • Easy management β†’ Stop everything with docker compose down.


πŸ“ Example (docker-compose.yml)

version: '3'
services:
  web:
    build: .
    ports:
      - "5000:5000"
  redis:
    image: "redis:alpine"

➑️ Here:

  • web = Flask app (built from Dockerfile in the current directory .)

  • redis = Redis container (from official image)

  • Both run together with just:

docker compose up

⚑ In short: Docker runs one container at a time, while Docker Compose helps run and connect multiple containers (like a whole project stack) with a single command.


🐳 Docker Commands Cheat Sheet

πŸ”¨ Build Image

docker build -t welcome-app .
  • -t welcome-app β†’ names the image welcome-app

  • . β†’ build from current directory (important!)


▢️ Run Container

docker run -d --name welcome-app -p 8000:5000 welcome-app
  • -d β†’ run in background (detach mode)

  • --name welcome-app β†’ names the container

  • -p 8000:5000 β†’ maps host port 8000 β†’ container port 5000


πŸ—‘οΈ Remove Image

docker rmi -f <image_id_or_name>
  • -f β†’ force remove

πŸ“Έ List Images

docker images

πŸ“¦ List Containers

docker ps -a
  • Shows all containers (running + stopped).

πŸ“‘ Docker Compose

Start all services:

docker compose up

Build/rebuild images:

docker compose build

Stop services:

docker compose stop

Remove containers, networks, volumes (clean):


🐳 Common Docker Commands

CommandUsage
docker build -t welcome-app .Build image
docker run -p 8000:5001 welcome-appRun container
docker rmi -f <id>Remove image
docker imagesList images
docker ps -aList all containers
docker compose upStart services
docker compose buildBuild images for services
docker compose stopStop services

❌ Common Errors

  • Mistyping commands β†’ Double-check command names (e.g., docker, not doker).

  • Port already in use β†’ Change port mapping (9000:5000 instead of 8000:5000).

  • Module not found β†’ Rebuild image with docker compose up --build.

  • Redis connection error β†’ Wait a few seconds (handled by retry loop).


βœ… With this, you have:

  • A Flask app in Docker

  • A Flask + Redis app in Docker Compose

πŸ“¬ Let’s Connect: LinkedIn – Maddy Das


0
Subscribe to my newsletter

Read articles from MATHAVAN E directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

MATHAVAN E
MATHAVAN E