Dataform core reference

This document describes the methods, properties, and configuration options of Dataform core. You can use Dataform core in SQLX and JavaScript files.

assert()

assert (name: string, query?: AContextable)

Adds a Dataform assertion the compiled graph.

Available only in the /definitions directory.

Example:

// definitions/file.js

assert("name").query(ctx => "select 1");

CommonContext

Context methods are available when evaluating contextable SQL code, such as within SQLX files, or when using a Contextable argument with Dataform core.

database () => string
Returns the database of this dataset, if applicable.
name () => string
Returns the name of this table.
ref (ref: Resolvable | string[], rest: string[]) => string
References another action, adding it as a dependency to this action, returning valid SQL to be used in a from expression.

This function can be called with a Resolvable object, for example: ${ref({ name: "name", schema: "schema", database: "database" })}

This function can also be called using individual arguments for the "database", "schema", and "name" values. When only two values are provided, the default database is used and the values are interpreted as "schema" and "name". When only one value is provided, the default database and schema are used, with the provided value interpreted as `"name"`. ${ref("database", "schema", "name")} ${ref("schema", "name")} ${ref("name")}

resolve (ref: Resolvable | string[], rest: string[]) => string
Similar to ref, but it does not add the referenced action as a dependency to this action.
self () => string
Equivalent to resolve(name()).

Returns a valid SQL string that can be used to reference the table produced by this action.

schema () => string
Returns the schema of this dataset.

Contextable

Contextable arguments can either pass a plain value for their generic type T or a function that is called with the context object for this type of operation.

T | (ctx: Context) => T

Dataform

Global variable that contains the IProjectConfig object. Required for getting IProjectConfig properties, for example:

dataform.projectConfig.vars.myVariableName === "myVariableValue"

declare()

declare (dataset: dataform.ITarget)

Declares the dataset as a Dataform data source.

Available only in the /definitions directory.

Example:

// definitions/file.js

declare({name: "a-declaration"})

IActionConfig

Defines Dataform tags and dependencies applied to a SQL workflow action.

tags string[]

A list of user-defined tags with which the action should be labeled.

dependencies Resolvable| Resolvable[]

Dependencies of the action.

disabled boolean

If set to true, this action will not be executed. However, the action can still be depended upon. Useful for temporarily turning off broken actions.

IAssertionConfig

Configuration options for assertion action types.

database string
The database (Google Cloud project ID) in which to create the corresponding view for this assertion.
description string
A description of this assertion.
disabled boolean
If set to true, this action is not executed. The action can still be depended upon. Useful for temporarily turning off broken actions.
hermetic boolean
Declares whether this action is hermetic. An action is hermetic if all of its dependencies are explicitly declared.

If this action depends on data from a source which is not declared as a dependency, then set hermetic to false. Otherwise, set to true.

schema string
The schema (BigQuery dataset) in which to create the corresponding view for this assertion.
tags string[]
A list of user-defined tags applied to this action.

IBigQueryOptions

BigQuery-specific warehouse options.

additionalOptions
Key-value pairs for the table, view, and materialized view options.

Some options, for example, partitionExpirationDays, have dedicated type/validity checked fields. For such options, use the dedicated fields.

String values must be encapsulated in double-quotes, for example: additionalOptions: {numeric_option: "5", string_option: '"string-value"'}

If the option name contains special characters, encapsulate the name in quotes, for example: additionalOptions: { "option-name": "value" }.

clusterBy string[]
The keys by which to cluster partitions by.
labels
Key-value pairs for BigQuery labels.

If the label name contains special characters, e.g. hyphens, then quote its name, e.g. labels: { "label-name": "value" }.

partitionBy string
The key by which to partition the table. Typically the name of a timestamp or the date column.
partitionExpirationDays number
The number of days for which BigQuery stores data in each partition. The setting applies to all partitions in a table, but is calculated independently for each partition based on the partition time.
requirePartitionFilter boolean
Declares whether the partitioned table requires a WHERE clause predicate filter that filters the partitioning column.
updatePartitionFilter string
SQL-based filter for when incremental updates are applied.

IColumnsDescriptor

Describes columns in a table.

{ [name]: string | IRecordDescriptor }

IDeclarationConfig

Configuration options for declaration action types.

columns IColumnsDescriptor
A description of columns within the table.
database string
The database (Google Cloud project ID) in which to create the source table exists.
description string
A description of the table.
schema string
The schema (BigQuery dataset) in which the source table exists.

IDependenciesConfig

Defines dependencies of a SQL workflow action.

dependencies Resolvable| Resolvable[]

One or more explicit dependencies for this action. Dependency actions will run before dependent actions. Typically this would remain unset, because most dependencies are declared as a by-product of using the ref function.

hermetic boolean

Declares whether or not this action is hermetic. An action is hermetic if all of its dependencies are explicitly declared. If this action depends on data from a source which has not been declared as a dependency, then hermetic should be explicitly set to false. Otherwise, if this action only depends on data from explicitly-declared dependencies, then it should be set to true.

IDocumentableConfig

Defines descriptions of a dataset and its columns.

columns IColumnsDescriptor

A description of columns within the dataset.

description string

A description of the dataset.

INamedConfig

Defines the type and name of a SQL workflow action.

type string

The type of the action.

name string

The name of the action.

IOperationConfig

Configuration options for operations action types.

columns IColumnsDescriptor
A description of columns within the table.
database string
The database (Google Cloud project ID) in which to create the output of this action.
description string
A description of the table.
disabled boolean
If set to true, this action is not executed. The action can still be depended upon. Useful for temporarily turning off broken actions.
hasOutput boolean
Declares that this operations action creates a table that is referenceable using the ref function.

If set to true, this action creates a table with its configured name, using the self() context function.

For example: create or replace table ${self()} as select ...

hermetic boolean
Declares whether this action is hermetic. An action is hermetic if all of its dependencies are explicitly declared.

If this action depends on data from a source which is not declared as a dependency, then set hermetic to false. Otherwise, set to true.

schema string
The schema (BigQuery dataset) in which to create the output of this action.
tags string[]
A list of user-defined tags applied to this action.

IProjectConfig

Contains compilation settings of a Dataform repository.

defaultDatabase string
Required. The default database (Google Cloud project ID).
defaultSchema string
Required. The default schema (BigQuery dataset ID).
defaultLocation string
Required. The default BigQuery location to use. For more information on BigQuery locations, see https://cloud.google.com/bigquery/docs/locations.
assertionSchema string
Required. The default schema (BigQuery dataset ID) for assertions.
vars map (key: string, value: string)
Optional. User-defined variables that are made available to project code during compilation. An object containing a list of "key": value pairs. Example: { "name": "wrench", "mass": "1.3kg", "count": "3" }.
databaseSuffix string
Optional. The suffix that should be appended to all database (Google Cloud project ID) names.
schemaSuffix string
Optional. The suffix that should be appended to all schema (BigQuery dataset ID) names.
tablePrefix string
Optional. The prefix that should be prepended to all table names.
warehouse string
Required. Must be set to bigquery.

You can set IProjectConfig properties in dataform.json at the repository level.

You can override the defaultSchema and defaultDatabase properties for individual tables.

You can access all IProjectConfig properties in a SQL SELECT statement in a SQLX or JavaScript file.

The following code sample shows the myVariableName custom compilation variable set in dataform.json with the projectConfig.vars property, accessed in a SELECT statement in a SQLX file:

  config { type: "view" }
  SELECT ${when(
    dataform.projectConfig.vars.myVariableName === "myVariableValue",
    "myVariableName is set to myVariableValue!",
    "myVariableName is not set to myVariableValue!"
  )}

For more information about overriding project configuration settings for individual compilation results, see the projects.locations.repositories.compilationResults#CodeCompilationConfig REST resource in Dataform API.

IRecordDescriptor

Describes a struct, object or record in a table that has nested columns.

bigqueryPolicyTags string | string[]
Full identifiers of BigQuery policy tags applied to this column. A full identifier of a BigQuery policy tag includes the project name, location, and taxonomy,

For example: "projects/1/locations/eu/taxonomies/2/policyTags/3"

Currently, BigQuery supports one tag per column.

columns IColumnsDescriptor
A description of columns within the struct, object, or record.
description string
A description of the struct, object, or record.

ITableAssertions

Options for creating assertions as part of a table definition.

nonNull string | string[]
Column(s) which can never be NULL.

If set, the corresponding assertion fails if any row contains NULL values for these column(s).

rowConditions string[]
General condition(s) which should hold true for all rows in the table.

If set, the corresponding assertion fails if any row violates any of these condition(s).

uniqueKey string | string[]
Column(s) which constitute the unique key index of the table.

If set, the resulting assertion fails if there is more than one row in the table with the same values for all of these column(s).

uniqueKeys []
Combinations of column(s), each of which constitutes a unique key index of the table.

If set, the resulting assertion fails if there is more than one row in the table with the same values for all of the column(s) in the unique key(s).

ITableConfig

Configuration options for table actions, including table, view and incremental table types.

Extends IActionConfig, IDependenciesConfig, IDocumentableConfig, INamedConfig, and ITargetableConfig.

assertions ITableAssertions
Assertions to be run on the table.

If configured, relevant assertions are automatically created and run as a dependency of this table.

bigquery IBigQueryOptions
BigQuery-specific warehouse options.
columns IColumnsDescriptor
A description of columns within the table.
database string
The database (Google Cloud project ID) in which to create the output of this action.
description string
A description of the table.
disabled boolean
If set to true, this action is not executed. The action can still be depended upon. Useful for temporarily turning off broken actions.
hermetic boolean
Declares whether this action is hermetic. An action is hermetic if all of its dependencies are explicitly declared.

If this action depends on data from a source which is not declared as a dependency, then set hermetic to false. Otherwise, set to true.

materialized boolean
Only valid when the table type is view.

If set to true, a materialized view will be created.

protected boolean
Only allowed for the incremental table type.

If set to true, running this action ignores the full-refresh option. This is useful for tables which are built from transient data, to ensure that historical data is never lost.

schema string
The schema (BigQuery dataset) in which to create the output of this action.
tags string[]
A list of user-defined tags applied to this action.
type TableType
The type of the table.
uniqueKey string[]
Unique keys for merge criteria for incremental tables.

If configured, records with matching unique key(s) are updated instead of new rows being inserted.

ITableContext

Context methods are available when evaluating contextable SQL code, such as within SQLX files, or when using a Contextable argument with Dataform core.

incremental () => boolean
Returns true when the current context indicates that the table will be built incrementally.
name () => string
Returns the fully qualified name of this table.
ref (ref: Resolvable | string[], rest: string[]) => string
References another action, adding it as a dependency to this action, returning valid SQL to be used in a from expression.

This function can be called with a Resolvable object, for example: ${ref({ name: "name", schema: "schema", database: "database" })}

This function can also be called using individual arguments for the "database", "schema", and "name" values.

When only two values are provided, the default database is used and the values are interpreted as "schema" and "name".

When only one value is provided, the default database schema is used, with the provided value interpreted as `"name"`. ${ref("database", "schema", "name")} ${ref("schema", "name")} ${ref("name")}

resolve (ref: Resolvable | string[], rest: string[]) => string
Similar to ref, but instead of adding a dependency, it resolves the provided reference so that it can be used in SQL, for example, in a `from` expression.
self () => string
Equivalent to resolve(name()).

Returns a valid SQL string that can be used to reference the table produced by this action.

when (cond: boolean, trueCase: string, falseCase: string) => string
Shorthand for an if condition. Equivalent to cond ? trueCase : falseCase.

falseCase is optional, and defaults to an empty string.

ITarget

A reference to a table within BigQuery.

database string
name string
schema string

ITargetableConfig

Defines the target database and schema of a SQL workflow action.

database string

The database in which the output of this action should be created. Must be set to BigQuery.

schema string

The schema in which the output of this action should be created.

operate()

operate (name: string, queries?: Contextable)

Defines a SQL operation.

Available only in the /definitions directory.

Example:

// definitions/file.js

operate("an-operation", ["SELECT 1", "SELECT 2"])

publish()

publish (name: string, queryOrConfig?: Contextable | ITableConfig)

Creates a table or view.

Available only in the /definitions directory.

Example:

// definitions/file.js

publish("published-table", {
    type: "table",
    dependencies: ["a-declaration"],
  }).query(ctx => "SELECT 1 AS test");

Resolvable

A resolvable can be either the name of a table as string, or the object that describes the full path to the relation.

string | ITarget

TableType

Supported types of table actions.

Tables of type view will be created as views.

Tables of type table will be created as tables.

Tables of type incremental must include a where clause. For more information, see Configure incremental tables.