Apache Airflow - Getting Started

samyak jainsamyak jain
2 min read

What is Apache Airflow?

Apache Airflow is an open-source project that was created in 2014 in Airbnb by Maxime Beauchemin, and published in June 2015.

Apache Airflow is an open-source platform designed to programmatically author, schedule, and monitor workflows.

Lets Understand the basics first,

TERMINOLOGIES AND COMPONENTS

  1. WorkFlow -
    A group of interdependents processes or tasks that acheive a specific outcome.
    Workflow is divided into one or more than one task which relates to each other and forms a DAG (Directed Acyclic Graph).

  2. DAG -
    Directed - Has a direction of workflow , Task1 -> Task2 -> Task3
    Acyclic - Tasks must not contain cycle to avoid deadlocks.
    Graph - Has Vertices and Nodes.
    In simple terms, DAG is a collection of all small task which joins together to perform a big task.

  3. Tasks -
    A Task is the basic unit of execution in Airflow.
    It can be a bash script, python script, etc.

  4. Scheduler -
    Responsible for determining when to execute each task in a DAG based on the defined dependencies and schedules. It ensures that tasks are executed in the correct order and at the right times.

  5. Workers -
    Workers are the actual execution environments where your tasks run. They pull tasks from the scheduler and execute them. Airflow supports various worker types, including celery, Kubernetes, and more.

  6. Web Interface -
    Airflow provides a web-based UI for monitoring and managing your DAGs, including viewing task logs, retrying failed tasks, and triggering DAG runs.

  7. Metadatabase -
    Used to store credentials, connection information, history, and configuration. This database is crucial for tracking the status and history of your workflows.

Next Steps-

Apache Airflow Installation - https://hashnode.com/post/clxhrqbm0000008l43et9hzq0

0
Subscribe to my newsletter

Read articles from samyak jain directly inside your inbox. Subscribe to the newsletter, and don't miss out.

Written by

samyak jain
samyak jain

Hi there, I'm Samyak Jain , a seasoned data & analytics professional with problem solving mindset, passionate to solve challenging real world problems using data and technology.