Airflow Installation Using Docker Compose on Mac Mini

Table of contents
- 1. What is Airflow?
- 2. Setting Up the Mac Mini Environment
- 3. Installing and Running Airflow with Docker Compose
- 4. Testing a Simple DAG
- 5. Optimizing and Troubleshooting Airflow on Mac
- 6. Conclusion
- Additional Notes: Backfill
- Additional Notes: Executors
- Additional Notes: Flower
- Additional Notes: Using Airflow CLI with Docker Compose
- Additional Notes: Resolving Disk Usage Issues
1. What is Airflow?
Apache Airflow is an open-source tool for creating and running workflows, providing powerful automation and scheduling capabilities for data pipelines. It allows you to define and execute tasks using DAGs (Directed Acyclic Graphs) and monitor their status easily through a web UI.
My favorite feature is the ability to share logs and code.
The scheduling function is similar to Cron, but I particularly like that I can check execution results and logs directly from the web.
The web UI makes it convenient to view and share code.
Airflow is known for its powerful backfill functionality, but I have never actually used it.
- I didn't use backfill at work — instead, I used task clear to re-run past tasks. I looped over past dates, so setting the
execution_date
was necessary.
- I didn't use backfill at work — instead, I used task clear to re-run past tasks. I looped over past dates, so setting the
2. Setting Up the Mac Mini Environment
To run Airflow with Docker Compose on a Mac Mini, I first need to set up the required environment.
2.1 Required Software Installation
To run Airflow on a Mac, you need the following:
Docker and Docker Compose (installable via Homebrew)
Python 3 (needed for Airflow configuration)
I use
pyenv
to manage Python versions.Since Airflow runs in Docker, installing Python separately is not necessary.
However, having the Airflow package installed locally makes writing DAGs more convenient.
If these are not installed, you can install them using Homebrew:
brew install --cask docker
brew install python
After installing Docker, start Docker Desktop and verify that it runs correctly.
3. Installing and Running Airflow with Docker Compose
3.1 Downloading the Official Airflow Docker Compose File
Retrieve the Docker Compose file from the official Airflow GitHub repository:
mkdir airflow && cd airflow
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/2.10.3/docker-compose.yaml'
For the latest version:
curl -LfO 'https://airflow.apache.org/docs/apache-airflow/stable/docker-compose.yaml'
Note: The above commands may not be exact, as I had previously downloaded the Docker Compose file and backed it up on Google Cloud. I can't confirm if this is the exact method I used—ChatGPT suggested this.
Additionally, I run the following services as standalone installations rather than using Docker:
PostgreSQL
Redis
Thus, I configure the following environment variables in my docker-compose.yaml
file:
environment: &airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__DATABASE__SQL_ALCHEMY_CONN: postgresql+psycopg2://id:password!@localhost/airflow_db?sslmode=disable
AIRFLOW__CELERY__RESULT_BACKEND: db+postgresql://id:password!@localhost/airflow_db
AIRFLOW__CELERY__BROKER_URL: redis://:@localshot:6379/0
AIRFLOW__CORE__FERNET_KEY: ""
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: "true"
AIRFLOW__CORE__LOAD_EXAMPLES: "false"
AIRFLOW__API__AUTH_BACKENDS: "airflow.api.auth.backend.basic_auth,airflow.api.auth.backend.session"
# yamllint disable rule:line-length
# Use simple http server on scheduler for health checks
# See https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/logging-monitoring/check-health.html#scheduler-health-check-server
# yamllint enable rule:line-length
AIRFLOW__SCHEDULER__ENABLE_HEALTH_CHECK: "true"
# WARNING: Use _PIP_ADDITIONAL_REQUIREMENTS option ONLY for a quick checks
# for other purpose (development, test and especially production usage) build/extend Airflow image.
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
Since I use standalone PostgreSQL and Redis, I comment out the corresponding sections in docker-compose.yaml
:
services:
# postgres:
# #image: postgres:13
# image: busybox
# # environment:
# # POSTGRES_USER: airflow
# # POSTGRES_PASSWORD: airflow
# # POSTGRES_DB: airflow
# # volumes:
# # - postgres-db-volume:/var/lib/postgresql/data
# healthcheck:
# test: ["CMD", "pg_isready", "-h", "192.168.0.1", "-U", "admin"]
# interval: 10s
# retries: 5
# start_period: 5s
# # restart: always
# command: ["sleep", "infinity"]
# redis:
# # Redis is limited to 7.2-bookworm due to licencing change
# # https://redis.io/blog/redis-adopts-dual-source-available-licensing/
# # image: redis:7.2-bookworm
# # image: redis:7.2-bookworm
# image: busybox
# # expose:
# # - 6379
# healthcheck:
# test: ["CMD", "redis-cli", "-h", "192.168.0.2", "ping"]
# interval: 10s
# timeout: 30s
# retries: 50
# start_period: 30s
# # restart: always
# command: ["sleep", "infinity"]
airflow-webserver:
3.2 Setting Up Environment Variables
Create a .env
file for Airflow with the required configurations:
echo -e "AIRFLOW_UID=$(id -u)\nAIRFLOW_GID=0" > .env
On Mac, setting
AIRFLOW_GID=0
helps avoid permission issues (as suggested by ChatGPT).However, I did not include
GID
in my.env
file. Instead, I structured my Airflow project as follows:
AIRFLOW_PROJ_DIR=/path/to/airflow/project/airflow
- File Structure
├── .env
└── docker-compose.yaml
3.3 Creating the Required Directory Structure
Create the necessary folders for Airflow:
mkdir -p dags logs plugins
Folder structure:
- There is no file on config folder
.
├── config
├── dags
├── logs
└── plugins
3.4 Running Docker Containers
Start the Airflow containers using:
docker-compose up -d
Once all services (Webserver, Scheduler, etc.) are running, you can access the Airflow UI.
3.5 Verifying Web UI Access
Open http://localhost:8080
in your browser to check if the Airflow UI is running.
Username:
airflow
Password:
airflow
4. Testing a Simple DAG
4.1 Enabling Default Example DAGs
By default, Airflow’s example DAGs are disabled. You can enable them by modifying airflow.cfg
:
docker-compose exec webserver airflow config set core load_examples True
However, I kept example DAGs disabled. Instead of using a command, I set the following in
docker-compose.yaml
:AIRFLOW__CORE__LOAD_EXAMPLES: "false"
Initially, I enabled example DAGs for testing, but they generated excessive logs, so I disabled them.
Log management in Airflow was particularly challenging for me due to the large volume of logs generated.
4.2 Running an Example DAG
Navigate to the DAGs page in the Airflow UI and run the example_bash_operator
DAG to verify that everything is working correctly.
5. Optimizing and Troubleshooting Airflow on Mac
5.1 Limiting Container Resources
To prevent Docker from consuming excessive resources on Mac, adjust the CPU and RAM limits in Preferences > Resources in Docker Desktop.
Also, be mindful of Disk Usage. On macOS, Docker storage is classified under "System Data," and its size increases proportionally with usage.
The main issue isn’t the shrinking of disk space, but the unpredictable expansion, which can make system control difficult.
I encountered system crashes twice due to almost 100% disk usage. Restarting the system resolved the issue, twice.
Instead of adjusting settings in Docker, I created a DAG that periodically removes old logs.
ChatGPT’s Recommendations for Log Management
- Configure log retention settings in
airflow.cfg
- Configure log retention settings in
base_log_folder = /path/to/logs
logging_level = INFO
log_retention_days = 7
- Alternatively, adjust the logging level via
docker-compose.yml
by modifyingAIRFLOW__LOGGING__LOGGING_LEVEL
.
5.2 Resolving Port Conflicts
By default, Airflow uses port 8080. If this port is already in use by another process, you’ll need to change it. Modify docker-compose.yaml
as follows:
webserver:
ports:
- "9090:8080"
After this, you can access the UI at http://localhost:9090.
5.3 Fixing Volume Permission Issues
On macOS, volume mounting may cause permission issues. Check the volumes
section in docker-compose.yaml
and, if necessary, adjust permissions using the chmod
command.
5.4 Running Multiple DAGs Concurrently and Optimization
To execute multiple DAGs simultaneously, increase the
max_active_runs_per_dag
value inairflow.cfg
.If certain DAGs depend on each other, use
TriggerDagRunOperator
to enforce sequential execution.Prevent system overload by appropriately setting
concurrency
anddag_concurrency
:
concurrency = 8
dag_concurrency = 4
This configuration allows up to 4 concurrent runs per DAG, with a total of 8 tasks running simultaneously.
6. Conclusion
This guide covered setting up and running Airflow using Docker Compose on a Mac Mini. I tested basic DAG execution and addressed common issues that may arise in a macOS environment.
Airflow enables the creation of complex data pipelines. Future topics to explore include integrating external data sources, adding custom operators, and using the Kubernetes Executor.
Additional Notes: Backfill
Airflow’s Backfill feature is used to retroactively execute DAG runs for missed periods, often necessary when adding or modifying DAGs.
🔹 Understanding Backfill
Airflow executes DAGs based on
execution_date
.If DAG runs were missed or a new DAG needs to process historical data, backfill can be used.
Backfill runs DAGs for past dates according to their schedule, ensuring that missing task executions are completed.
🔹 Running Backfill
To manually trigger backfill for a specific DAG over a past period, use the following Airflow CLI command:
airflow dags backfill -s 2024-02-01 -e 2024-02-10 my_dag_id
-s 2024-02-01
: Start date-e 2024-02-10
: End datemy_dag_id
: DAG ID to run
This command runs my_dag_id
from February 1 to February 10, 2024.
🔹 Things to Consider When Using Backfill
Check
catchup
SettingIf
catchup=False
in the DAG definition, backfill won’t execute automatically.To enable backfill, set
catchup=True
:
dag = DAG(
'my_dag',
default_args=default_args,
schedule_interval='@daily',
catchup=True # 과거 실행을 허용
)
Optimizing Parallel Execution
- If processing a large backfill job, optimize execution by adjusting
parallelism
andmax_active_runs_per_dag
inairflow.cfg
:
- If processing a large backfill job, optimize execution by adjusting
parallelism = 10
max_active_runs_per_dag = 5
Consider Resource Usage
Backfill runs multiple historical DAG executions simultaneously, which increases CPU/memory usage.
Adjust scheduler and worker settings accordingly.
🔹 When to Use Backfill
✅ Running a new DAG on historical data
✅ Applying DAG modifications retroactively to past data
✅ Re-executing DAG runs for periods when they failed or weren’t triggered
Additional Notes: Executors
Airflow’s Executor determines how tasks are executed. The main types of Executors are:
- Since this setup is for a home server,
LocalExecutor
orStandaloneExecutor
might be sufficient. However, CeleryExecutor was used here for testing.
SequentialExecutor
Executes one task at a time
Default executor with SQLite
Recommended only for small test environments
LocalExecutor
Allows parallel task execution
Runs on a single machine using multiprocessing
Suitable for development or small production setups
CeleryExecutor
Distributes tasks across multiple worker nodes
Uses a message broker (Redis, RabbitMQ, etc.)
Ideal for large-scale distributed environments
KubernetesExecutor
Runs each task in an isolated Kubernetes Pod
Provides strong resource isolation and scalability
Best for cloud environments
DaskExecutor
Uses Dask for distributed execution
Supports dynamic scaling and parallel processing
StandaloneExecutor (Airflow 2.7+)
Similar to LocalExecutor but with a simpler setup
Easily runs with
airflow standalone
Additional Notes: Flower
What is Flower?
Flower is a web-based monitoring tool for Celery tasks. If using CeleryExecutor
in Airflow, Flower allows tracking worker and task statuses.
Flower’s Key Features
Monitor currently running Celery tasks
Check the status of individual workers
Retry or terminate tasks
View execution logs and queue status
Running Flower in Airflow
If using CeleryExecutor
, start Flower UI with:
airflow celery flower
However, in docker-compose.yml
, Flower is configured to start with:
# You can enable flower by adding "--profile flower" option e.g. docker-compose --profile flower up
# or by explicitly targeted on the command line e.g. docker-compose up flower.
Once running, access Flower UI at http://localhost:5555.
- I haven't tried running it.
Additional Notes: Using Airflow CLI with Docker Compose
In the current
docker-compose.yaml
, theairflow-cli
service is assigned the profiledebug
.To enable it, use:
docker-compose --profile debug up
Simply running
docker-compose up
will not startairflow-cli
unless thedebug
profile is explicitly included.
To execute Airflow commands (
airflow dags list
, etc.), run:docker-compose run --rm airflow-cli airflow dags list
- This starts the
airflow-cli
container, executes the command, and then shuts it down.
- This starts the
For an interactive shell inside the container:
docker-compose run --rm airflow-cli bash
Additional Notes: Resolving Disk Usage Issues
- To free up disk space, periodically clean up Docker volumes using:
docker system prune -a --volumes
- To automatically clear old Airflow logs, schedule a cron job:
find /path/to/airflow/logs -type f -mtime +7 -delete
Automating Container Restarts
To ensure Airflow containers restart automatically after a system reboot on macOS,
- use
launchctl
orcron
to run:docker-compose up -d
- use
Subscribe to my newsletter
Read articles from Linetor directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
