This document shows you how to define the relationship between objects in your workflow in Dataform by declaring dependencies.
You can define a dependency relationship between objects of a workflow. In a dependency relationship, the execution of the dependent object depends on the execution of the dependency object. This means that Dataform runs the dependent after the dependency. You define the relationship by declaring dependencies inside the SQLX definition file of the dependent object.
The dependency declarations make up a dependency tree of your workflow that determines the order in which Dataform runs your workflow actions.
You can define the dependency relationship between the following workflow actions:
- Data source declarations
- Declarations of BigQuery data sources that let you reference these data sources in Dataform table definitions and SQL operations. You can set a data source declaration as a dependency, but not as a dependent.
- Tables
- Tables that you create in Dataform based on the declared data sources or other tables in your workflow. Dataform supports the following table types: table, incremental table, view, and materialized view. You can set a table as a dependency and as a dependent.
- Custom SQL operations
- SQL statements that Dataform runs in BigQuery as they are,
without modification. You can set a custom SQL operation defined in a
type: operations
file as a dependency and as a dependent. To declare a custom SQL operation as a dependency in theref
function, you need to set thehasOutput
property totrue
in the custom SQL operation SQLX definition file. - Assertions
- Data quality test queries that you can use to test table data.
Dataform runs assertions every time it updates your workflow and
it alerts you if any assertions fail. You can set an assertion defined in a
type: assertion
file as a dependency and as a dependent by declaring dependencies in theconfig
block.
You can define the dependency relationship in the following ways:
- Declare a dependency by using the Dataform core
ref
function to reference the dependency in aSELECT
statement. - Declare a list of dependencies in the
config
block of a SQLX definition file.
Before you begin
- Create and initialize a development workspace in your repository.
- Optional: Declare a data source.
- Create at least two workflow actions: tables, assertions, data source declarations, or operations.
Required roles
To get the permissions that
you need to declare dependencies for tables, assertions, data source
declarations, and custom SQL operations,
ask your administrator to grant you the
Dataform Editor (roles/dataform.editor
)
IAM role on workspaces.
For more information about granting roles, see Manage access to projects, folders, and organizations.
You might also be able to get the required permissions through custom roles or other predefined roles.
Declare a dependency as an argument of the ref
function
To reference and automatically declare a dependency in a SELECT
statement,
add the dependency as an argument of the ref
function.
The ref
function is a Dataform core built-in function that lets you
reference and automatically depend on any table, data source declaration, or
custom SQL operation with the hasOutput
property set to true
in your workflow.
For more information about the ref
function, see
Dataform core context methods reference.
For more information about using the ref
function in a table definition, see
About table definitions.
The following code sample shows the source_data
data source declaration added
as an argument of the ref
function in the incremental_table.sqlx
SQLX
definition file of an incremental table:
// filename is incremental_table.sqlx
config { type: "incremental" }
SELECT * FROM ${ref("source_data")}
In the preceding code sample, source_data
is automatically declared a
dependency of incremental_table
.
The following code sample shows some_table
table definition SQLX file added
as an argument of the ref
function in the custom_assertion.sqlx
SQLX definition file of an assertion:
// filename is custom_assertion.sqlx
config { type: "assertion" }
SELECT
*
FROM
${ref("some_table")}
WHERE
a is null
or b is null
or c is null
In the preceding code sample, some_table
is automatically declared a
dependency of custom_assertion
. During execution, Dataform runs
some_table
first, and then runs custom_assertion
once some_table
is created.
Declare dependencies in the config
block
To declare dependencies that are not referenced in the SQL statement definition of the dependent, but need to be run before the table, assertion, or custom SQL operation, follow these steps:
- In your development workspace, in the Files pane, expand
the
definitions/
directory. - Select the table, assertion, or custom SQL operation SQLX file that you want to edit.
In the
config
block of the file, enter the following code snippet:dependencies: [ "DEPENDENCY", ]
Replace DEPENDENCY with the string target—for example, the filename of the action that you want to add as a dependency. You can enter multiple targets, separated by commas.
Optional: Click Format.
The following code sample shows the some_table
table and some_assertion
assertion added as dependencies to the config
block of a table definition file:
config { dependencies: [ "some_table", "some_assertion" ] }
Set assertions as dependencies
When workflow action B depends on workflow action A, which has assertions, the failure of the assertions of action A does not block Dataform from executing action B. To run action B only if the assertions of action A pass, you need to set the assertions of action A as dependencies of action B.
You can set assertions as dependencies of a selected action in the following ways:
- Set selected assertions as dependencies
You can manually set selected assertions as dependencies by adding them to the
dependencies: [ "" ]
line in theconfig
block of the edited action.For example, if action B depends on action A, and you want action B to depend only on the selected assertions of action A, you can add those selected assertions to the
config
block of action B.You can manually set selected assertions as dependencies for all action types except data source declarations.
- Set the assertions of a selected dependency action as dependencies
You can set the
includeDependentAssertions
parameter to automatically set all the direct assertions of a selected dependency workflow action as dependencies of the edited action. Dataform adds these assertions as dependencies during each compilation of the action to verify that the dependencies are up to date if the assertions of the dependency action change.For example, if action C depends on actions A and B, but you only want action C to depend on the assertions of action A, you can edit action C and set the
includeDependentAssertions
parameter to automatically set all assertions of action A as dependencies of action C.You can set the
includeDependentAssertions
parameter for actions of the following types:table
view
operations
- Set the assertions of all the dependency actions as dependencies
You can set the
dependOnDependencyAssertions
parameter to automatically set all the direct assertions from all the dependency actions of the edited action as additional dependencies of the edited action. Dataform adds these assertions as dependencies during each compilation of the action to verify that the dependencies are up to date if the assertions of the dependency action change.For example, if action C depends on actions A and B, you can edit action C and set the
dependOnDependencyAssertions
parameter to automatically set all the assertions of actions A and B as dependencies of action C.You can set the
dependOnDependencyAssertions
parameter for actions of the following types:table
view
operations
When you set the dependOnDependencyAssertions
parameter and the
includeDependentAssertions
parameters in a single file, the
includeDependentAssertions
parameter takes priority. For example, if you set
dependOnDependencyAssertions
to true
, but you also set
includeDependentAssertions
to false
for a selected dependency action,
Dataform won't add the assertions of that action to the dependencies.
The following code sample shows the dependOnDependencyAssertions
and
includeDependentAssertions
parameters set in the same table definition file:
// filename is tableName.sqlx
config {
type: "table",
dependOnDependencyAssertions: true,
dependencies: [ "actionA", {name: "actionB", includeDependentAssertions: false} ]
}
SELECT * FROM ${ref("actionC")}
In the preceding code sample, Dataform adds all the direct assertions
of actionA
and actionC
to the dependencies of tableName
during
compilation.
Set selected assertions as dependencies
To run a workflow action only when selected assertions pass, you can add the
selected assertion to the dependencies: [ "" ]
line in the config
block of
the edited action.
To set a selected assertion as a dependency of a selected workflow action, follow these steps:
- In your development workspace, in the Files pane, expand
definitions/
. - Select a workflow action SQLX file.
- In the
config
block of the action file, enterdependencies: [ "" ]
. Inside
dependencies: [ "" ]
, enter the name of the action assertion or the filename of the manual assertion that you want to set as a dependency in one of the following formats:nonNull
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_nonNull"] }
Replace the following:
- ACTION_TYPE: the type of workflow action:
table
,view
, oroperations
. - ACTION_DATASET_NAME: the name of the dataset in which the action is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the action in which the assertion is defined.
rowConditions
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_rowConditions"] }
Replace the following:
- ACTION_TYPE: the type of workflow action:
table
,view
, oroperations
. - DATASET_NAME: the name of the dataset in which the action is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the action in which the assertion is defined.
uniqueKey
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_uniqueKey_INDEX"] }
Replace the following:
- ACTION_TYPE: the type of workflow action:
table
,view
, oroperations
. - DATASET_NAME: the name of the dataset in which the table is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the table in which the assertion is defined.
- INDEX: the index of the array of keys defined in the
uniqueKey
assertion that you want to add as a dependency—for example,0
or1
. If only one array of keys is defined in the assertion, the index is0
.
uniqueKeys
config { type: "ACTION_TYPE", dependencies: [ "ACTION_DATASET_NAME_ACTION_NAME_assertions_uniqueKeys_INDEX"] }
Replace the following:
- ACTION_TYPE: the type of workflow action:
table
,view
, oroperations
. - DATASET_NAME: the name of the dataset in which the table is defined. The default dataset is defined in the workflow settings file.
- ACTION_NAME: the name of the table in which the assertion is defined.
- INDEX: the index of the array of keys defined in the
uniqueKeys
assertion that you want to add as a dependency—for example,0
or1
. If only one array of keys is defined in the assertion, the index is0
.
manual assertion
config { type: "ACTION_TYPE", dependencies: [ "MANUAL_ASSERTION_NAME"] }
Replace the following:
- ACTION_TYPE: the type of workflow action:
table
,view
, oroperations
. - MANUAL_ASSERTION_NAME the name of the manual assertion.
- ACTION_TYPE: the type of workflow action:
To add another assertion as a dependency to the edited table, repeat Step 4.
Optional: Click Format.
The following code sample shows assertions added to table A, which is
defined in the dataform
dataset:
config {
type: "table",
assertions: {
uniqueKey: ["user_id"],
nonNull: ["user_id", "customer_id"],
}
}
The following code sample shows table A assertions added as dependencies to table B:
config {
type: "table",
dependencies: [ "dataform_A_assertions_uniqueKey_0", "dataform_A_assertions_nonNull"]
}
The following code sample shows a manual assertion that's defined in the
manualAssertion.sqlx
file and added as a dependency to a view:
config {
type: "view",
dependencies: [ "manualAssertion"]
}
The following code sample shows the manual_assertion
file and the
assertions of the sometable
table added as dependencies to a table:
config {
type: "table",
dependencies: [ "manual_assertion", "dataform_sometable_assertions_nonNull" , "dataform_sometable_assertions_rowConditions"]
}
SELECT * FROM ${ref("referenced_table")} LEFT JOIN ...
Set the assertions of a selected action as dependencies
To run a workflow action only when all the direct assertions of a selected
dependency action pass, set the includeDependentAssertions
parameter to
true
in the edited action. Dataform automatically adds the direct
assertions of the selected dependency action to the dependencies during
compilation. The default value is false
.
To set all the assertions of a selected dependency action as dependencies, follow these steps:
- In your development workspace, in the Files pane, expand
definitions/
. - Select a workflow action SQLX file.
In the file, set the
includeDependentAssertions
parameter totrue
in one of the following ways:In the
config
blockconfig { type: "ACTION_TYPE", dependencies: [{name: "dEPENDENCY_ACTION_NAME", includeDependentAssertions: true}] }
Replace the following:
- ACTION_TYPE: the type of workflow action:
table
,view
, oroperations
. - DEPENDENCY_ACTION_NAME: the name of the dependency action with the assertions that you want to set as dependencies of the edited action.
In the
SELECT
statementconfig { type: "ACTION_TYPE" } SELECT * FROM ${ref({name: "DEPENDENCY_ACTION_NAME", includeDependentAssertions: true})}
Replace the following:
- ACTION_TYPE: the type of workflow action:
table
,view
, oroperations
. - DEPENDENCY_ACTION_NAME: the name of the dependency action with the assertions that you want to set as dependencies of the edited action.
- ACTION_TYPE: the type of workflow action:
Optional: Click Format.
The following code sample shows tableC
, which depends on viewA
, tableB
,
and all the assertions of tableB
:
// filename is tableC.sqlx
config {
type: "table",
dependencies: ["viewA", {name: "tableB", includeDependentAssertions: true}]
}
SELECT * FROM ...
In the preceding code sample, Dataform automatically adds all the
direct assertions of tableB
as dependencies to tableC
during compilation.
Set the assertions of all the dependency actions as dependencies
To run a workflow action only when all the direct assertions of all
the dependency actions pass, set the dependOnDependencyAssertions
parameter
to true
in the edited action. Dataform automatically adds the direct
assertions of the dependency actions as dependencies during compilation. The
default value is false
.
When you set the dependOnDependencyAssertions
parameter and the
includeDependentAssertions
parameters in a single file, the
includeDependentAssertions
parameter takes priority for the dependency
action for which it is set.
To set all the assertions of a selected dependency action as dependencies, follow these steps:
- In your development workspace, in the Files pane, expand
definitions/
. - Select a workflow action SQLX file.
In the file, set the
dependOnDependencyAssertions
parameter totrue
in the following format:config { type: "ACTION_TYPE", dependOnDependencyAssertions: true, dependencies: [ "dependency1", "dependency2" ] }
Replace ACTION_TYPE: the type of workflow action. Supported values include
table
,view
, andoperations
.Optional: Click Format.
The following code sample shows sometableE
, which depends on sometableA
,
sometabletableB
, sometableC
, sometableD
, and all the direct assertions of
the dependency tables:
// filename is sometableE.sqlx
config {
type: "table",
dependOnDependencyAssertions: true,
dependencies: [ "sometableA", "sometableB" ]
}
SELECT * FROM ${ref("sometableC")}
SELECT * FROM ${ref("sometableD")}
In the preceding code sample, Dataform automatically adds all the
direct assertions of sometableA
, sometableB
, sometableC
, and sometableD
as dependencies to sometableE
during compilation.
Reference a table with an overridden table name
- To reference a table with an overridden table name, in the
ref
function, enter the overridden table name that's set inname: ""
.
The following code sample references a table with a name overridden to
overridden_name
:
SELECT * FROM ${ref("overridden_name")}
For more information about overriding table names, see Override table settings.
What's next
- To learn how to set assertions as dependencies, see Test data quality.
- To learn how to declare a data source, see Declare a data source.
- To learn how to define custom SQL operations, see Create operations.
- To learn how to reuse code across your workflow with includes, see Reuse code across a single repository with includes.