Apache Airflow: Self-trigger & Looping in DAG

Kuan-Chih Wang
2 min readNov 7, 2022

--

Apache Airflow has become one of the most popular scheduling and orchestration platform across the current data engineering landscape.

In some rare use cases that you might need to have a job running with a specific time length between each interval. This type of implementation is very common for manufacturing industries, such as refreshing bill of materials content, refreshing warehouse current on-hand amount, etc. And now you face a problem that DAG, directed acyclic graph, meaning there is no looping in your DAG workflow.

So how do we implement this in Apache Airflow? It turns out to be very straightforward. An Airflow built-in operator called “ TriggerDagRunOperator” was originally designed for coupling DAGs and establishing dependencies between Dags.

Looping can be achieved by utilizing TriggerDagRunOperator to trigger current DAG itself.

Here is the example DAG file:

First of all, we need to import the operator

from airflow.operators.dagrun_operator import TriggerDagRunOperator

Then specify the DAG ID that we want it to be triggered, in this case, current DAG itself. Therefore, we use dag_id attribute of dag object as the value for trigger_dag_id parameter.

with DAG("Trigger_Self_Demo"
, default_args=default_args
, schedule_interval=None
, tags=['DEMO']
, description = 'DEMO'
, catchup=False
, max_active_runs=1
, concurrency=1) as dag:
Task1 = What ever your task is
Interval = BashOperator(task_id="INTERVAL"
, bash_command = "sleep 60"
, retries=1)
Trigger = TriggerDagRunOperator(task_id='Trigger_Self'
, trigger_dag_id=dag.dag_id)
Task1 >> Interval >> Trigger

By this DAG definition, whenever the first task completed, it will wait for a minute then trigger this current DAG again, then the looping goes on, forever.

Conclusion

Airflow provides extremely high flexibility in terms of designing and orchestrating workflows. Whenever you cannot find an operator that works exactly the way you want, you may still find solutions by combining or modifying existing operators, that is why Airflow has become such a power tool in data engineering field.

--

--

No responses yet