Export job information

This page describes how to export the information for your jobs that is stored in Batch to a BigQuery table by using Workflows. Specifically, learn how to run the sample export-jobs workflow and how to view the resulting job information in BigQuery.

Exporting a job's information is useful when you want to retain the information after a job is automatically or manually deleted or analyze the information outside of Batch. Alternatively, if you want to export only task and job state-change information to BigQuery, see Monitor jobs using notifications instead.

To learn how to export job information that is stored in other Google Cloud services, see the export documentation for that service. For example, see the following pages:

Before you begin

  1. If you haven't used Batch before, review Get started with Batch and enable Batch by completing the prerequisites for projects and users.
  2. Enable the BigQuery and Workflows APIs.

    Enable the APIs

  3. Optional: Identify an existing BigQuery dataset or table in your project where you want to store the job information. An existing table must have a matching schema.

    Otherwise, you can use the export-jobs workflow to create a new dataset or table.

  4. Prepare a service account for the export-jobs workflow by doing the following:

    1. Create a new service account or identify an existing service account.
    2. To ensure that the service account has the necessary permissions to execute the export-jobs workflow, ask your administrator to grant the service account the following IAM roles on the project:

      For more information about granting roles, see Manage access to projects, folders, and organizations.

      Your administrator might also be able to give the service account the required permissions through custom roles or other predefined roles.

  5. To get the permissions that you need to create, deploy, and execute the export-jobs workflow, ask your administrator to grant you the following IAM roles on the project:

  6. Ensure that users in your project can view the exported job information.

    To ensure that a user has the necessary permissions to export job information, ask your administrator to grant a user the BigQuery Data Viewer (roles/bigquery.dataViewer) IAM role on the table, dataset, or project.

Export job information

This section explains how to export job information by using the export-jobs workflow, which is from the export-to-bigquery code sample. The export-jobs workflow exports the information for your project's jobs that are in the specified region and meet the specified filter criteria.

The export-jobs workflow exports job information to a specified table in a specified dataset, which are automatically created by the workflow if they don't already exist in your project. By default, the export-jobs workflow also deletes the exported jobs from Batch, but you can optionally edit the export-jobs workflow to not delete the jobs.

For each export-jobs workflow that you want to use, do the following:

  1. Configure the workflow definition.
  2. Create and deploy the workflow.
  3. Execute the workflow. Repeat this step each time you want to export the specified jobs.

Configure the workflow definition

  1. Download the export-to-bigquery-delete-batch-jobs.yaml file from GitHub.

  2. Open the export-to-bigquery-delete-batch-jobs.yaml file in a text editor. Then, make the following edits:

    1. Replace sys.get_env("GOOGLE_CLOUD_PROJECT_ID") with your project ID formatted as a string—for example, "my-project-id".

    2. Replace sys.get_env("GOOGLE_CLOUD_LOCATION") with the region that contains the jobs that you want to export, formatted as a string—for example, "us-central1".

    3. Optional: Edit the filter criteria that specifies which jobs to export.

      By default, the export-jobs workflow specifies the filter criteria "(status.state:SUCCEEDED OR status.state:FAILED) AND create_time<=\"2023-05-01T00:00:00Z\"". This default filter criteria exports information only for jobs that are in the SUCCEEDED or FAILED state and were created before or by the 2023-05-01T00:00:00Z RFC 3339 timestamp.

    4. Optional: Replace default_dataset_id with a different name for the dataset that you want the export-jobs workflow to use or create.

    5. Optional: Replace default_table_id with a different name of the table that you want the export-jobs workflow to use or create.

    6. If you don't want the exported jobs to be deleted by the workflow, do the following:

      1. Remove the following lines:

                - log_delete_step:
                    call: sys.log
                    args:
                      text: ${"Deleting Batch job " + j.name}
                      severity: NOTICE
                - delete_job:
                    call: googleapis.batch.v1.projects.locations.jobs.delete
                    args:
                      name: ${j.name}
        
      2. Remove + " and deleted".

    7. Save the file. Keep the file open.

Create and deploy the workflow

  1. In the Google Cloud console, go to the Workflows page:

    Go to Workflows

  2. On the Workflows page, click Create.

  3. In the Workflow name field, enter a name for the workflow—for example, export-batch-jobs-us-central1.

  4. In the Service account list, select the service account that you prepared.

  5. Click Next.

  6. In the workflow editor, replace the sample workflow with the contents of the export-to-bigquery-delete-batch-jobs.yaml file. Then, you can close the file.

  7. Click Deploy. The Workflow details page opens.

Execute the workflow

  1. On the Workflow details page, click Execute. The Execute workflow page opens.

  2. On the Execute workflow page that opens, click Execute.

  3. On the Execution details that page opens, wait for the workflow to finish executing. For example, the execution time to export and delete a few jobs is usually a few seconds, but execution might take longer if you are exporting and deleting many jobs.

    When the workflow has finished executing, the Output pane displays the results.

View the exported job information

This section introduces how to view the table data created by the export-jobs workflow. For example, you can use the following steps to verify that workflow was executed successfully and browse the table data. For more information about how to view and use your exported job information, such as writing queries, see Managing table data in the BigQuery documentation.

  1. In the Google Cloud console, go to the BigQuery page:

    Go to BigQuery

  2. In the Explorer panel, open the table for your exported job information:

    1. In the Search BigQuery resources field, enter the name of the table from your export-jobs workflow. For example, the default table name is default_table_id.
    2. Click the name of the table. The table details page opens.
    3. On the table details page, click the Details tab.

      On Details tab, note the Last modified timestamp and the Number of rows.

    4. On the table details page, click the Preview tab.

What's next