Create and manage jobs

A job runs your Apache Flink pipeline. You can either run jobs within an existing deployment, or you can run an on-demand job.

To create a deployment for your jobs, see Create a deployment.

Required APIs

To create and manage a BigQuery Engine for Apache Flink job, you must enable the BigQuery Engine for Apache Flink API.

gcloud services enable managedflink.googleapis.com

You might need to enable additional APIs such as Cloud Storage if your pipeline requires it.

Required roles and permissions

This section describes the roles required to manage your jobs. For more information about BigQuery Engine for Apache Flink roles, see BigQuery Engine for Apache Flink predefined roles.

Create, update, and delete

To get the permissions that you need to create, update, and delete a job, ask your administrator to grant you the Managed Flink Developer (roles/managedflink.developer) IAM role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to create, update, and delete a job. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create, update, and delete a job:

  • Create a job: managedflink.jobs.create
  • Update a job: managedflink.jobs.update
  • Delete a job: managedflink.jobs.delete

You might also be able to get these permissions with custom roles or other predefined roles.

Get and list

To get the permissions that you need to retrieve information about a job and list jobs, ask your administrator to grant you the Managed Flink Viewer (roles/managedflink.viewer) IAM role on your project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to retrieve information about a job and list jobs. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to retrieve information about a job and list jobs:

  • Get details about a job: managedflink.jobs.get
  • List jobs: managedflink.jobs.list

You might also be able to get these permissions with custom roles or other predefined roles.

Properties of a job

BigQuery Engine for Apache Flink jobs have the following properties.

Job ID

The ID for the job. BigQuery Engine for Apache Flink automatically generates the job ID when you create the job.

Job name

An optional name for the job. Job names don't need to be unique.

Location

The location where the job runs. The location must be one of the supported Google Cloud regions. If the job is created within an existing deployment, the job location must match the deployment location. You can't change the location of a job. For a list of available locations, see BigQuery Engine for Apache Flink locations.

Project ID

The ID of the Google Cloud project for the job that you are creating. Your job is created in the project that you specify. If the job is created within an existing deployment, the job project must match the deployment project. You can't change the project of a job. For information about Google Cloud project IDs and project numbers, see Identifying projects.

Deployment ID

The name of the BigQuery Engine for Apache Flink deployment to use for this job. If you don't specify an existing deployment, a one-time deployment is created to run the job. That deployment only exists while the job is running and can't be used to run other jobs.

JAR or SQL file

When you create a job, you specify either a JAR file or a SQL file that defines your Apache Flink pipeline. BigQuery Engine for Apache Flink uses this file to execute your job.

Along with the JAR or SQL file, you might need to specify the following additional information when you the create job, depending on your pipeline:

  • Job arguments. Parameters to pass to the pipeline. The job arguments to use depend on your pipeline code.
  • Entry class. For JAR files, the class that contains the entry point. You don't need to set the entry class if the JAR file specifies the entry class in its manifest. For more information, see Program packaging and distributed execution in the Apache Flink documentation.

For more information about using Apache Flink JAR files, see Program packaging and distributed execution in the Apache Flink documentation.

For more information about using Apache Flink SQL, see SQL in the Apache Flink documentation.

Create an on-demand job

Follow these steps to create an on-demand job that. On-demand jobs aren't associated with existing deployments.

To create an on-demand job by using the gcloud CLI, use the gcloud alpha managed-flink jobs create command.

gcloud alpha managed-flink jobs create FILE \
  --location=REGION \
  --project=PROJECT_ID \
  --name=JOB_NAME \
  --staging-location=STAGING_LOCATION \
  --min-parallelism=MINIMUM_SLOTS \
  --max-parallelism=MAXIMUM_SLOTS \
  -- JOB_ARGUMENTS

Replace the following:

  • FILE: the absolute path to an Apache Flink JAR file or Apache Flink SQL file.
  • REGION: a BigQuery Engine for Apache Flink region, like us-central1
  • PROJECT_ID: your BigQuery Engine for Apache Flink project ID
  • JOB_NAME: a name for the job
  • STAGING_LOCATION: the Cloud Storage location to stage job artifacts
  • MAXIMUM_SLOTS: the maximum number of task slots available to your job
  • MINIMUM_SLOTS: the minimum number of task slots available to your job
  • JOB_ARGUMENTS: a list of job arguments to pass to the Apache Flink job

To use service account impersonation, see Use service account impersonation.

The first time you create either a deployment or an on-demand job in a project or in a subnet, the creation can take 30 minutes or more to complete. After that, it takes less time to create a new deployment or job.

Create a job in an existing deployment

Follow these steps to create a job in an existing deployment. To create a deployment, see Create and manage deployments.

To create a job by using the gcloud CLI, use the gcloud alpha managed-flink jobs create command.

gcloud alpha managed-flink jobs create FILE \
  --location=REGION \
  --project=PROJECT_ID \
  --deployment=DEPLOYMENT_ID \
  --name=JOB_NAME \
  --staging-location=STAGING_LOCATION \
  --min-parallelism=MINIMUM_SLOTS \
  --max-parallelism=MAXIMUM_SLOTS \
  -- JOB_ARGUMENTS

Replace the following:

  • FILE: the absolute path to an Apache Flink JAR file or Apache Flink SQL file.
  • REGION: a BigQuery Engine for Apache Flink region, like us-central1
  • PROJECT_ID: your BigQuery Engine for Apache Flink project ID
  • DEPLOYMENT_ID: The name of your BigQuery Engine for Apache Flink deployment.
  • JOB_NAME: a name for the job
  • STAGING_LOCATION: the Cloud Storage location to stage job artifacts
  • MAXIMUM_SLOTS: the maximum number of task slots available to your job
  • MINIMUM_SLOTS: the minimum number of task slots available to your job
  • JOB_ARGUMENTS: a list of job arguments to pass to the Apache Flink job

To use service account impersonation, see Use service account impersonation.

Update a job

You can modify the autotuning settings for your jobs. For more information, see BigQuery Engine for Apache Flink autoscaling.

Get details about a job

console

To get information about a job in the Google Cloud console, follow these steps:

  1. In the Google Cloud console, go to the BigQuery Engine for Apache Flink Jobs page.

    Go to Jobs

    The Jobs page displays details of your job, including the job status.

  2. To open the Job details page, click the name of a job. On the Job details page, you can see the job graph and job metrics.

gcloud

To retrieve information about a job by using the gcloud CLI, use the gcloud alpha managed-flink jobs describe command. This command retrieves the initial job implementation and the state of the job.

gcloud alpha managed-flink jobs describe \
  JOB_ID \
  --project=PROJECT_ID \
  --location=REGION

Replace the following:

  • JOB_ID: the ID of your BigQuery Engine for Apache Flink job
  • PROJECT_ID: your BigQuery Engine for Apache Flink project ID
  • REGION: the region that the BigQuery Engine for Apache Flink job is in

List jobs

console

To see a list of jobs, in the Google Cloud console, go to the BigQuery Engine for Apache Flink Jobs page.

Go to Jobs

gcloud

To list the jobs in a project by using the gcloud CLI, use the gcloud alpha managed-flink jobs list command. This command lists all of the jobs in the region and project specified.

gcloud alpha managed-flink jobs list \
  REGION \
  --project=PROJECT_ID

Replace the following:

  • REGION: the region that the BigQuery Engine for Apache Flink jobs are in
  • PROJECT_ID: your BigQuery Engine for Apache Flink project ID

Delete jobs

gcloud

To delete a job by using the gcloud CLI, use the gcloud alpha managed-flink jobs delete command.

gcloud alpha managed-flink jobs delete \
  JOB_ID \
  --project=PROJECT_ID \
  --location=REGION

Replace the following:

  • JOB_ID: the ID of your BigQuery Engine for Apache Flink job
  • PROJECT_ID: your BigQuery Engine for Apache Flink project ID
  • REGION: the region that the BigQuery Engine for Apache Flink job is in

Limitations

Your Apache Flink pipelines must be compatible with Apache Flink 1.19.

What's next