Build Your First Airflow Pipeline: A Beginner’s Blueprint

Jayasree SJayasree S
2 min read

Apache Airflow can seem intimidating at first, with its steep learning curve and occasional setup errors. However, once you get it running on your machine, the sense of achievement is unparalleled, especially for beginners. This guide provides a clear, step-by-step approach to mastering Airflow, complete with practical tips, resources, and hands-on examples to build your first ETL pipeline.

Why Airflow Rocks

Airflow lets you automate data tasks, schedule jobs, and track everything in a cool web dashboard — all using Python! Perfect for beginners.

Prerequisites

  1. Install Docker on your system.

  2. Python

  3. Concepts about ETL

Step 1: Learn Airflow Basics

Dive into the official Apache Airflow documentation to understand:

  • DAGs: Workflows defining task order.

  • Operators: Tasks like PythonOperator or BashOperator.

  • Schedulers & Executors: Manage task timing and execution.

For visual learners, watch:

Step 2:Install airflow locally

Fetching docker-compose.yaml

To deploy Airflow on Docker Compose, you should fetch docker-compose.yaml.

curl -LfO 'https://airflow.apache.org/docs/apache-airflow/3.0.0/docker-compose.yaml'
This file contains several service definitions:

airflow-scheduler - The scheduler monitors all tasks and dags, then triggers the task instances once their dependencies are complete.

airflow-dag-processor - The DAG processor parses DAG files.

airflow-api-server - The api server is available at http://localhost:8080.

airflow-worker - The worker that executes the tasks given by the scheduler.

airflow-triggerer - The triggerer runs an event loop for deferrable tasks.

airflow-init - The initialization service.

postgres - The database.

redis - The redis - broker that forwards messages from scheduler to worker.

Initialize the database

docker compose up airflow-init

Running Airflow

docker compose up

Step 3:Create your first dag

A DAG is a Python script defining your workflow. Save this as hello_airflow.py in your Airflow dags folder:

from datetime import datetime

from airflow import DAG

from airflow.operators.python import PythonOperator

def print_hello():

print("Hello, Airflow!")

with DAG(dag_id='hello_airflow', start_date=datetime(2025, 1, 1), schedule_interval='@daily', catchup=False) as dag:

hello_task = PythonOperator(task_id='print_hello_task', python_callable=print_hello)

Access the Airflow UI at http://localhost:8080 to trigger your DAG.

Step 4: Build a Simple ETL Pipeline

Create an ETL pipeline using PostgreSQL and an Amazon dataset:

  • Extract: Read data from a CSV.

  • Transform: Clean or modify data.

  • Load: Save to a PostgreSQL database.

Step 5: Explore Astronomer.io

Try Astronomer.io for easier Airflow deployment and monitoring. Their free tier is great for beginners.

Conclusion

Building your first Airflow pipeline is a rewarding challenge. Use the documentation, tutorials, and platforms like Astronomer.io to guide you. With practice, Airflow will become intuitive and powerful!

🚀 Stay curious. Stay consistent. Happy coding!

0
Subscribe to my newsletter

Read articles from Jayasree S directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

Jayasree S
Jayasree S