Apache Airflow - Getting Started
data:image/s3,"s3://crabby-images/20cb9/20cb98a1da00aebc0d982125b59b18a8e4394ad6" alt="samyak jain"
What is Apache Airflow?
Apache Airflow is an open-source project that was created in 2014 in Airbnb by Maxime Beauchemin, and published in June 2015.
Apache Airflow is an open-source platform designed to programmatically author, schedule, and monitor workflows.
Lets Understand the basics first,
TERMINOLOGIES AND COMPONENTS
WorkFlow -
A group of interdependents processes or tasks that acheive a specific outcome.
Workflow is divided into one or more than one task which relates to each other and forms a DAG (Directed Acyclic Graph).DAG -
Directed - Has a direction of workflow , Task1 -> Task2 -> Task3
Acyclic - Tasks must not contain cycle to avoid deadlocks.
Graph - Has Vertices and Nodes.
In simple terms, DAG is a collection of all small task which joins together to perform a big task.Tasks -
A Task is the basic unit of execution in Airflow.
It can be a bash script, python script, etc.Scheduler -
Responsible for determining when to execute each task in a DAG based on the defined dependencies and schedules. It ensures that tasks are executed in the correct order and at the right times.Workers -
Workers are the actual execution environments where your tasks run. They pull tasks from the scheduler and execute them. Airflow supports various worker types, including celery, Kubernetes, and more.Web Interface -
Airflow provides a web-based UI for monitoring and managing your DAGs, including viewing task logs, retrying failed tasks, and triggering DAG runs.Metadatabase -
Used to store credentials, connection information, history, and configuration. This database is crucial for tracking the status and history of your workflows.
Next Steps-
Apache Airflow Installation - https://hashnode.com/post/clxhrqbm0000008l43et9hzq0
Subscribe to my newsletter
Read articles from samyak jain directly inside your inbox. Subscribe to the newsletter, and don't miss out.
Written by
data:image/s3,"s3://crabby-images/20cb9/20cb98a1da00aebc0d982125b59b18a8e4394ad6" alt="samyak jain"
samyak jain
samyak jain
Hi there, I'm Samyak Jain , a seasoned data & analytics professional with problem solving mindset, passionate to solve challenging real world problems using data and technology.