Use the open-source Dataform CLI

This document shows you how to use the open-source Dataform command line interface (CLI) to locally develop SQL workflows by using the terminal.

With the open-source Dataform CLI, you can initialize, compile, test, and run Dataform core locally, outside of Google Cloud.

Dataform distributes a Docker image which you can use to run the equivalent of Dataform CLI commands.

The Dataform CLI supports Application Default Credentials (ADC). With ADC, you can make credentials available to your application in a variety of environments, such as local development or production, without needing to modify your application code. To use ADC, you must first provide your credentials to ADC.

Before you begin

Before installing the Dataform CLI, install NPM.

Install Dataform CLI

  • To install Dataform CLI, run the following command:

    npm i -g @dataform/cli@^3.0.0-beta
    

Initialize a Dataform project

  • To initialize a new Dataform project, run the following command inside your project directory:

    dataform init . PROJECT_NAME DEFAULT_LOCATION
    

    Replace the following:

    • PROJECT_NAME: the name of your project.
    • DEFAULT_LOCATION: the region where you want Dataform to write BigQuery data. For more information about BigQuery regions, see BigQuery locations.

Update Dataform

  • To update the Dataform framework, update the dataformCoreVersion in workflow_settings.yaml file, then re-run NPM install:

    npm i
    

Update Dataform CLI

  • To update the Dataform CLI tool, run the following command:

    npm i -g @dataform/cli@^3.0.0-beta.2
    

Create a credentials file

Dataform requires a credentials file to connect to remote services and create the .df-credentials.json file on your disk.

To create the credentials file, follow these steps:

  1. Run the following command:

    dataform init-creds
    
  2. Follow the init-creds wizard that walks you through credentials file creation.

Create a project

An empty Dataform project has the following structure:

   project-dir
   ├── definitions
   ├── includes
   └── workflow_settings.yaml
  • To create a Dataform project to deploy assets to BigQuery, run the following command:

    dataform init PROJECT_NAME --default-project YOUR_GOOGLE_CLOUD_PROJECT_ID
    

    Replace the following:

    • PROJECT_NAME: the name of your project.
    • YOUR_GOOGLE_CLOUD_PROJECT_ID: your Google Cloud project ID.

Clone a project

To clone an existing Dataform project from a third-party Git repository, follow the instructions from your Git provider.

  • Once the repository is cloned, run the following command inside the cloned repository directory:

    dataform install
    

Define a table

Store definitions in the definitions/ folder.

  • To define a table, run the following command:

    echo "config { type: 'TABLE_TYPE' } SELECT_STATEMENT" > definitions/FILE.sqlx
    

    Replace the following:

    • TABLE_TYPE: the type of the table: table, incremental, or view.
    • SELECT_STATEMENT: a SELECT statement that defines the table.
    • FILE: the name for the table definition file.

The following code sample defines a view in the example SQLX file.

echo "config { type: 'view' } SELECT 1 AS test" > definitions/example.sqlx

Define a manual assertion

Store definitions in the definitions/ folder.

  • To define a manual assertion, run the following command:

    echo "config { type: 'assertion' } SELECT_STATEMENT" > definitions/FILE.sqlx
    

    Replace the following:

    • SELECT_STATEMENT: a SELECT statement that defines the assertion.
    • FILE: the name for the custom SQL operation definition file.

Define a custom SQL operation

Store definitions in the definitions/ folder.

  • To define a custom SQL operation, run the following command:

    echo "config { type: 'operations' } SQL_QUERY" > definitions/FILE.sqlx
    

    Replace the following:

    • SQL_QUERY: your custom SQL operation.
    • FILE: the name for the custom SQL operation definition file.

View compilation output

Dataform compiles your code in real time.

  • To view the output of the compilation process in the terminal, run the following command:

    dataform compile
    
  • To view the output of the compilation process as a JSON object, run the following command:

    dataform compile --json
    
  • To view the output of the compilation with custom compilation variables, run the following command:

    dataform compile --vars=SAMPLE_VAR=SAMPLE_VALUE,foo=bar
    

    Replace the following:

    • SAMPLE_VAR: your custom compilation variable.
    • SAMPLE_VALUE: the value of your custom compilation variable.

Execute code

To execute your code, Dataform accesses BigQuery to determine its current state and tailor the resulting SQL accordingly.

  • To execute the code of your Dataform project, run the following command:

    dataform run
    
  • To execute the code of your Dataform project in BigQuery with custom compilation variables, run the following command:

    dataform run --vars=SAMPLE_VAR=SAMPLE_VALUE,sampleVar2=sampleValue2
    

    Replace the following:

    • SAMPLE_VAR: your custom compilation variable.
    • SAMPLE_VALUE: the value of your custom compilation variable.
  • To execute the code of your Dataform project in BigQuery and rebuild all tables from scratch, run the following command:

    dataform run --full-refresh
    

Without --full-refresh, Dataform updates incremental tables without rebuilding them from scratch.

  • To see the final compiled SQL code tailored to the current state of BigQuery, without executing it inside BigQuery, run the following command:

    dataform run --dry-run
    

Get help

  • To list all of the available commands and options, run the following command:

    dataform help
    
  • To view a description of a specific command, run the following command:

    dataform help COMMAND
    

    Replace COMMAND with the command you want to learn about.

What's next