Schedule executions with Workflows and Cloud Scheduler

This document shows you how to run scheduled executions of Dataform SQL workflows using Workflows and Cloud Scheduler.

About scheduled workflow executions

You can set the frequency of executions of your Dataform SQL workflow by creating a Cloud Scheduler job that triggers a Workflows workflow. Workflows executes services in an orchestration workflow that you define.

Workflows executes your Dataform SQL workflow in a two-step process. First, it pulls your Dataform repository code from your Git provider and compiles it into a compilation result. Then, it uses the compilation result to create a Dataform SQL workflow and executes it at the frequency that you set.

Before you begin

  1. In the Google Cloud console, go to the Dataform page.

    Go to the Dataform page

  2. Select or create a repository.

Required roles

To get the permissions that you need to schedule Dataform executions with Workflows, ask your administrator to grant you the Dataform Editor (roles/dataform.editor) IAM role on repositories. For more information about granting roles, see Manage access to projects, folders, and organizations.

You might also be able to get the required permissions through custom roles or other predefined roles.

Create a scheduled orchestration workflow

To run scheduled executions of your Dataform SQL workflow, use Workflows to create an orchestration workflow and add a Cloud Scheduler job as a trigger.

  1. Workflows uses service accounts to give workflows access to Google Cloud resources. Create a service account and grant it the Dataform Editor role as well as the minimum permissions required to manage your orchestration workflow. For more information, see Grant a workflow permission to access Google Cloud resources.

  2. Create an orchestration workflow and use the following YAML source code as your workflow definition:

    main:
        steps:
        - init:
            assign:
            - repository: projects/PROJECT_ID/locations/REPOSITORY_LOCATION/repositories/REPOSITORY_ID
        - createCompilationResult:
            call: http.post
            args:
                url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/compilationResults"}
                auth:
                    type: OAuth2
                body:
                    gitCommitish: GIT_COMMITISH
            result: compilationResult
        - createWorkflowInvocation:
            call: http.post
            args:
                url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/workflowInvocations"}
                auth:
                    type: OAuth2
                body:
                    compilationResult: ${compilationResult.body.name}
            result: workflowInvocation
        - complete:
            return: ${workflowInvocation.body.name}
    

    Replace the following:

    • PROJECT_ID: the ID of your Google Cloud project
    • REPOSITORY_LOCATION: the location of your Dataform repository
    • REPOSITORY_ID: the name of your Dataform repository
    • GIT_COMMITISH: the Git branch that you want to execute Dataform code from. For a newly created repository, replace with main.
  3. Schedule the orchestration workflow using Cloud Scheduler.

Customize the Dataform SQL workflow create compilation result request

You can update the existing orchestration workflow and define the Dataform SQL workflow create compilation result request settings in the YAML format. For more information about the settings, see the projects.locations.repositories.compilationResults REST resource reference.

For example, to add a _dev schemaSuffix to all actions during compilation, replace the createCompilationResult step body with the following code snippet:

    - createCompilationResult:
        call: http.post
        args:
            url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/compilationResults"}
            auth:
                type: OAuth2
            body:
                gitCommitish: GIT_COMMITISH
                codeCompilationConfig:
                    schemaSuffix: dev

You can also pass additional settings as runtime arguments in a Workflows execution request and access those arguments using variables. For more information, see Pass runtime arguments in an execution request.

Customize the Dataform SQL workflow invocation request

You can update the existing orchestration workflow and define the Dataform SQL workflow invocation request settings in the YAML format. For more information about the invocation request settings, see the projects.locations.repositories.workflowInvocations REST resource reference.

For example, to only execute actions with the hourly tag with all transitive dependencies included, replace the createWorkflowInvocation body with the following code snippet:

    - createWorkflowInvocation:
        call: http.post
        args:
            url: ${"https://dataform.googleapis.com/v1beta1/" + repository + "/workflowInvocations"}
            auth:
                type: OAuth2
            body:
                compilationResult: ${compilationResult.body.name}
                invocationConfig:
                    includedTags:
                    - hourly
                    transitiveDependenciesIncluded: true
                

You can also pass additional settings as runtime arguments in a Workflows execution request and access those arguments using variables. For more information, see Pass runtime arguments in an execution request.

What's next