Stay organized with collections
Save and categorize content based on your preferences.
Task dependent DAGs
This page outlines the steps to enable task dependent Directed Acyclic Graphs
(DAGs) to specify task dependencies
between SQL table nodes within a single DAG, rather than relying on cron
scheduling across multiple DAGs. Customizable settings are used to generate
Cloud Composer DAGs with multiple table refresh nodes that depend on each other.
Cortex Framework provides recommended settings for task dependent
SAP DAGs (ECC and S/4HANA). However, you can
customize them further or define task dependent DAGs for any
data sources.
Enable task dependent DAGs
Modify the config.json file by setting the enableTaskDependencies field
to True. This setting activates the Cortex Framework's search
for task-dependent reporting settings files with the suffix _task_dep.yaml.
Create a dedicated reporting settings file with the suffix _task_dep.yaml
for each data source requiring task dependencies. For more details, see
Define task-dependent reporting settings.
Customize the task dependencies by adding
table_setting.dag_setting as a new section to table type nodes.
For more details, see
Specify and customize task dependencies.
Examine the generated files that are located in the target bucket under
dags/data_source/reporting/task_dep_dags/dag_name. This folder will
contain a Python file defining the Cloud Composer DAG and a SQL file
with the refresh query for each table node within the DAG.
Copy the files to the Cloud Composer DAG bucket to deploy
Cortex Framework using the
standard build process.
Define task-dependent reporting settings
For each data source requiring task dependencies, Cortex Framework
expects task-dependent reporting settings files with the suffix _task_dep.yaml.
When creating and updating these files, consider the following:
name: A required string for all nodes in a task dependent DAG that
designates the name of the DAG to which the table node belongs. This
includes top level nodes which are referenced as a parent by other
nodes within the DAG.
parents: An optional list of strings containing the sql_file path of
other table nodes within the same DAG. These parents must run
successfully before the node is triggered.
Additional considerations:
Nodes without defined parents are considered top level nodes and will run
at the start of the DAG.
At least one top level node must have table_setting.load_frequency
defined, which will be used as the DAG schedule.
If multiple top level nodes have defined load_frequency, they must be
the same.
Child nodes that have parents defined can't define load_frequency.
Nodes that don't have dag_setting defined will be generated the same as
before as a DAG with a single table refresh node and no task dependencies.
Other node types like views and scripts can't be included in
task dependent DAGs, which only generate nodes with DML to refresh tables.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[],[],null,["# Task dependent DAGs\n===================\n\nThis page outlines the steps to enable task dependent Directed Acyclic Graphs\n(DAGs) to specify task dependencies\nbetween SQL table nodes within a single DAG, rather than relying on cron\nscheduling across multiple DAGs. Customizable settings are used to generate\nCloud Composer DAGs with multiple table refresh nodes that depend on each other.\nCortex Framework provides recommended settings for task dependent\n[SAP DAGs (ECC and S/4HANA)](/cortex/docs/operational-sap). However, you can\ncustomize them further or define task dependent DAGs for any\n[data sources](/cortex/docs/data-sources-and-workloads).\n\nEnable task dependent DAGs\n--------------------------\n\n1. Modify the [*config.json*](https://github.com/GoogleCloudPlatform/cortex-data-foundation/blob/main/config/config.json) file by setting the `enableTaskDependencies` field to `True`. This setting activates the Cortex Framework's search for task-dependent reporting settings files with the suffix `_task_dep.yaml`.\n2. Create a dedicated reporting settings file with the suffix `_task_dep.yaml` for each data source requiring task dependencies. For more details, see [Define task-dependent reporting settings](/cortex/docs/optional-step-task-dependent-dags#define_task-dependent_reporting_settings).\n3. Customize the task dependencies by adding `table_setting.dag_setting` as a new section to `table` type nodes. For more details, see [Specify and customize task dependencies](/cortex/docs/optional-step-task-dependent-dags#specify_and_customize_task_dependencies).\n4. Build Cortex Framework using the [standard build process](/cortex/docs/deployment-step-six#Cloud%20Composer).\n5. Examine the generated files that are located in the target bucket under `dags/`\u003cvar translate=\"no\"\u003edata_source\u003c/var\u003e`/reporting/task_dep_dags/`\u003cvar translate=\"no\"\u003edag_name\u003c/var\u003e. This folder will contain a Python file defining the Cloud Composer DAG and a SQL file with the refresh query for each table node within the DAG.\n6. Copy the files to the Cloud Composer DAG bucket to deploy Cortex Framework using the [standard build process](/cortex/docs/deployment-step-six#Cloud%20Composer).\n\nDefine task-dependent reporting settings\n----------------------------------------\n\nFor each data source requiring task dependencies, Cortex Framework\nexpects task-dependent reporting settings files with the suffix `_task_dep.yaml`.\nWhen creating and updating these files, consider the following:\n\n- For SAP you may customize and use the provided setting files [`reporting_settings_ecc_task_dep.yaml`](https://github.com/GoogleCloudPlatform/cortex-data-foundation/blob/main/src/SAP/SAP_REPORTING/reporting_settings_ecc_task_dep.yaml) and [`reporting_settings_s4_task_dep.yaml`](https://github.com/GoogleCloudPlatform/cortex-data-foundation/blob/main/src/SAP/SAP_REPORTING/reporting_settings_s4_task_dep.yaml).\n- For other data sources, create your own task-dependent reporting settings alongside the original, for example: `reporting_settings_task_dep.yaml.`\n\n| **Note:** These settings are distinct from the standard `reporting_settings.yaml` files so they can be enabled and deployed independently. If a data source lacks a `reporting_settings_task_dep.yaml` file, it will use the regular `reporting_settings.yaml` file without task dependencies regardless of the `enableTaskDependencies` value in *config.json* .\n\nFor more information about the fields available within reporting settings\nfiles, see [dag_types.py](https://github.com/GoogleCloudPlatform/cortex-data-foundation/tree/main/src/common/materializer/dag_types.py).\n\nSpecify and customize task dependencies\n---------------------------------------\n\nCustomize the task dependencies by adding\n`table_setting.dag_setting` as a new section to `table` type nodes: \n\n - sql_file: dependent_table2.sql\n type: table\n table_setting:\n dag_setting:\n name: \"dag1\"\n parents: [\"dependent_table1.sql\"]\n\n- DAG settings include two fields:\n\n - `name`: A required string for **all nodes in a task dependent DAG** that designates the name of the DAG to which the table node belongs. This includes top level nodes which are referenced as a *parent* by other nodes within the DAG.\n - `parents`: An optional list of strings containing the `sql_file` path of other table nodes within the same DAG. These parents must run successfully before the node is triggered.\n\n**Additional considerations**:\n\n- Nodes without defined parents are considered top level nodes and will run at the start of the DAG.\n- At least one top level node must have `table_setting.load_frequency` defined, which will be used as the DAG schedule.\n- If multiple top level nodes have defined `load_frequency`, they must be the same.\n- Child nodes that have `parents` defined can't define `load_frequency`.\n- Nodes that don't have `dag_setting` defined will be generated the same as before as a DAG with a single table refresh node and no task dependencies.\n- Other node types like views and scripts can't be included in task dependent DAGs, which only generate nodes with DML to refresh tables."]]