-
Notifications
You must be signed in to change notification settings - Fork 0
Introduction to Apache Airflow for DLME
Introduction to what airflow is and how it is being used for DLME
- This repository
- Airflow documentation
- Airflow AWS Deployment
- Enable/Disable DAG: A DAG will not run (even manually) unless enabled
- DAG name & tags: Clicking on the label will display the DAG
- Runs: Displayed in Successful/Running/Failed order. Clicking each will display a list of dag runs
- Schedule: How is this DAG scheduled - following cron syntax and special commands (i.e.
@yearly
,@once
) - Last Run: Links to DAG view from the most recent DAG run
- Play: Manually trigger the DAG
- Reload: Refresh the DAG definition
- Delete: Delete the DAG
NOTE: This is the default display when navigating into a DAG. As a DAG grown in complexity, the task display can become hard to understand in the tree view - though the grid view of dag runs can be helpful when debugging.
This DAG view is generally more appealing and understanding. It displays and updates as a DAG runs, therefore the visual representation of where in the task list a particular dag run is can be very helfpul.
Here we see a simple DAG with five (5) tasks:
- configure_git
- validate_metadata_folder
- clone_metadata
- pull_metadata
- finished_pulling
The graph display makes it clear than validate_metadata_folder
results in a branch between clone_metadata
and pull_metadata
and runs after configure_git. The final task, finished_pulling
is a DummyOperator
- a place holder task used for control flow.
The border color of the tasks in this display is important, and a key is provided at the top of the display. Here we see that configure_git
, validate_metadata_folder
, clone_metadata
, and finished_pulling
each have a dark green border indicating SUCCESS
. The pull_metadata
task has a pink border, indicating SKIPPED
.
This indicates that:
-
configure_git
ran and completed with aSUCCESS
state. -
validate_metadata_folder
then ran and completed with aSUCCESS
state. It also returned a value that forced triggering ofclone_metadata
and skipping ofpull_metadata
. -
finished_pulling
captured the flow betweenclone_metadata
andpull_metadata
and ended in aSUCCESS
state.