Package google.cloud.datapipelines.v1

Index

Pipelines

Provides an interface for creating, updating, and managing recurring Data Analytics jobs. The service supports existing executors like Dataflow for processing, and it enables launching jobs automatically through an internal scheduler or directly through invocation by an external user.

CreatePipeline

rpc CreatePipeline(CreatePipelineRequest) returns (Pipeline)

Creates a pipeline.

For a batch pipeline, you can pass scheduler information. Data Pipelines uses the scheduler information to create an internal scheduler that runs jobs periodically.

If the internal scheduler is not configured, you can use RunPipeline to run jobs.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • datapipelines.pipelines.create

For more information, see the IAM documentation.

DeletePipeline

rpc DeletePipeline(DeletePipelineRequest) returns (Empty)

Deletes a pipeline.

If a scheduler job is attached to the pipeline, it will be deleted.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • datapipelines.pipelines.delete

For more information, see the IAM documentation.

GetPipeline

rpc GetPipeline(GetPipelineRequest) returns (Pipeline)

Looks up a single pipeline.

Returns a "NOT_FOUND" error if no such pipeline exists. Returns a "FORBIDDEN" error if the caller doesn't have permission to access it.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • datapipelines.pipelines.get

For more information, see the IAM documentation.

ListJobs

rpc ListJobs(ListJobsRequest) returns (ListJobsResponse)

Lists jobs for a given pipeline.

Throws a "FORBIDDEN" error if the caller doesn't have permission to access it.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • datapipelines.jobs.list

For more information, see the IAM documentation.

ListPipelines

rpc ListPipelines(ListPipelinesRequest) returns (ListPipelinesResponse)

Lists pipelines.

Returns a "FORBIDDEN" error if the caller doesn't have permission to access it.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the parent resource:

  • datapipelines.pipelines.list

For more information, see the IAM documentation.

RunPipeline

rpc RunPipeline(RunPipelineRequest) returns (RunPipelineResponse)

Creates a job for the specified pipeline directly. You can use this method when the internal scheduler is not configured and you want to trigger the job directly or through an external system.

Returns a "NOT_FOUND" error if the pipeline doesn't exist. Returns a "FORBIDDEN" error if the user doesn't have permission to access the pipeline or run jobs for the pipeline.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • datapipelines.pipelines.run

For more information, see the IAM documentation.

StopPipeline

rpc StopPipeline(StopPipelineRequest) returns (Pipeline)

Freezes pipeline execution permanently.

If there's a corresponding scheduler entry, it's deleted, and the pipeline state is changed to "ARCHIVED". However, pipeline metadata is retained.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • datapipelines.pipelines.stop

For more information, see the IAM documentation.

UpdatePipeline

rpc UpdatePipeline(UpdatePipelineRequest) returns (Pipeline)

Updates a pipeline.

If successful, the updated Pipeline is returned. Returns NOT_FOUND if the pipeline doesn't exist.

If UpdatePipeline does not return successfully, you can retry the UpdatePipeline request until you receive a successful response.

Authorization Scopes

Requires the following OAuth scope:

  • https://www.googleapis.com/auth/cloud-platform

For more information, see the Authentication Overview.

IAM Permissions

Requires the following IAM permission on the name resource:

  • datapipelines.pipelines.update

For more information, see the IAM documentation.

CreatePipelineRequest

Request message for CreatePipeline

Fields
parent

string

Required. The location name. For example: projects/PROJECT_ID/locations/LOCATION_ID.

pipeline

Pipeline

Required. The pipeline to add.

DataflowJobDetails

Pipeline job details specific to the Dataflow API. This is encapsulated here to allow for more executors to store their specific details separately.

Fields
sdk_version

SdkVersion

Output only. The SDK version used to run the job.

current_workers

int32

Output only. The current number of workers used to run the jobs. Only set to a value if the job is still running.

resource_info

map<string, double>

Cached version of all the metrics of interest for the job. This value gets stored here when the job is terminated. As long as the job is running, this field is populated from the Dataflow API.

DeletePipelineRequest

Request message for deleting a pipeline using DeletePipeline.

Fields
name

string

Required. The pipeline name. For example: projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID.

FlexResourceSchedulingGoal

Specifies the resource to optimize for in Flexible Resource Scheduling.

Enums
FLEXRS_UNSPECIFIED Run in the default mode.
FLEXRS_SPEED_OPTIMIZED Optimize for lower execution time.
FLEXRS_COST_OPTIMIZED Optimize for lower cost.

FlexTemplateRuntimeEnvironment

The environment values to be set at runtime for a Flex Template.

Fields
num_workers

int32

The initial number of Compute Engine instances for the job.

max_workers

int32

The maximum number of Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000.

zone

string

The Compute Engine availability zone for launching worker instances to run your pipeline. In the future, worker_zone will take precedence.

service_account_email

string

The email address of the service account to run the job as.

temp_location

string

The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with gs://.

machine_type

string

The machine type to use for the job. Defaults to the value from the template if not specified.

additional_experiments[]

string

Additional experiment flags for the job.

network

string

Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default".

subnetwork

string

Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL.

additional_user_labels

map<string, string>

Additional user labels to be specified for the job. Keys and values must follow the restrictions specified in the labeling restrictions.

An object containing a list of key/value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }.

kms_key_name

string

Name for the Cloud KMS key for the job. Key format is: projects//locations//keyRings//cryptoKeys/

ip_configuration

WorkerIPAddressConfiguration

Configuration for VM IPs.

worker_region

string

The Compute Engine region (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, defaults to the control plane region.

worker_zone

string

The Compute Engine zone (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1-a". Mutually exclusive with worker_region. If neither worker_region nor worker_zone is specified, a zone in the control plane region is chosen based on available capacity. If both worker_zone and zone are set, worker_zone takes precedence.

enable_streaming_engine

bool

Whether to enable Streaming Engine for the job.

flexrs_goal

FlexResourceSchedulingGoal

Set FlexRS goal for the job. https://cloud.google.com/dataflow/docs/guides/flexrs

GetPipelineRequest

Request message for GetPipeline.

Fields
name

string

Required. The pipeline name. For example: projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID.

Job

Definition of the job information maintained by the pipeline. Fields in this entity are retrieved from the executor API (e.g. Dataflow API).

Fields
name

string

Required. The fully qualified resource name for the job.

id

string

Output only. The internal ID for the job.

create_time

Timestamp

Output only. The time of job creation.

end_time

Timestamp

Output only. The time of job termination. This is absent if the job is still running.

state

State

The current state of the job.

status

Status

Status capturing any error code or message related to job creation or execution.

dataflow_job_details

DataflowJobDetails

All the details that are specific to a Dataflow job.

State

Enum lisitng all the job execution states.

Enums
STATE_UNSPECIFIED The job state isn't specified.
STATE_PENDING The job is waiting to start execution.
STATE_RUNNING The job is executing.
STATE_DONE The job has finished execution successfully.
STATE_FAILED The job has finished execution with a failure.
STATE_CANCELLED The job has been terminated upon user request.

LaunchFlexTemplateParameter

Launch Flex Template parameter.

Fields
job_name

string

Required. The job name to use for the created job. For an update job request, the job name should be the same as the existing running job.

parameters

map<string, string>

The parameters for the Flex Template. Example: {"num_workers":"5"}

launch_options

map<string, string>

Launch options for this Flex Template job. This is a common set of options across languages and templates. This should not be used to pass job parameters.

environment

FlexTemplateRuntimeEnvironment

The runtime environment for the Flex Template job.

update

bool

Set this to true if you are sending a request to update a running streaming job. When set, the job name should be the same as the running job.

transform_name_mappings

map<string, string>

Use this to pass transform name mappings for streaming update jobs. Example: {"oldTransformName":"newTransformName",...}

container_spec_gcs_path

string

Cloud Storage path to a file with a JSON-serialized ContainerSpec as content.

LaunchFlexTemplateRequest

A request to launch a Dataflow job from a Flex Template.

Fields
project_id

string

Required. The ID of the Cloud Platform project that the job belongs to.

launch_parameter

LaunchFlexTemplateParameter

Required. Parameter to launch a job from a Flex Template.

location

string

Required. The regional endpoint to which to direct the request. For example, us-central1, us-west1.

validate_only

bool

If true, the request is validated but not actually executed. Defaults to false.

LaunchTemplateParameters

Parameters to provide to the template being launched.

Fields
job_name

string

Required. The job name to use for the created job.

parameters

map<string, string>

The runtime parameters to pass to the job.

environment

RuntimeEnvironment

The runtime environment for the job.

update

bool

If set, replace the existing pipeline with the name specified by jobName with this pipeline, preserving state.

transform_name_mapping

map<string, string>

Map of transform name prefixes of the job to be replaced to the corresponding name prefixes of the new job. Only applicable when updating a pipeline.

LaunchTemplateRequest

A request to launch a template.

Fields
project_id

string

Required. The ID of the Cloud Platform project that the job belongs to.

validate_only

bool

If true, the request is validated but not actually executed. Defaults to false.

launch_parameters

LaunchTemplateParameters

The parameters of the template to launch. This should be part of the body of the POST request.

location

string

The regional endpoint to which to direct the request.

gcs_path

string

A Cloud Storage path to the template from which to create the job. Must be a valid Cloud Storage URL, beginning with 'gs://'.

ListJobsRequest

Request message for [ListJobs][Datapipelines.ListJobs]

Fields
parent

string

Required. The pipeline name. For example: projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID.

page_size

int32

The maximum number of entities to return. The service may return fewer than this value, even if there are additional pages. If unspecified, the max limit will be determined by the backend implementation.

page_token

string

A page token, received from a previous ListJobs call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to ListJobs must match the call that provided the page token.

ListJobsResponse

Response message for [ListJobs][Datapipelines.ListJobs]

Fields
jobs[]

Job

Results that were accessible to the caller. Results are always in descending order of job creation date.

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

ListPipelinesRequest

Request message for ListPipelines

Fields
parent

string

Required. The location name. For example: projects/PROJECT_ID/locations/LOCATION_ID.

filter

string

An expression for filtering the results of the request. If unspecified, all pipelines will be returned. Multiple filters can be applied and must be comma separated.

Fields eligible for filtering are:

  • type: The type of the pipeline (streaming or batch). Allowed values are ALL, BATCH, and STREAMING.
  • status: The activity status of the pipeline. Allowed values are ALL, ACTIVE, ARCHIVED, and PAUSED.

For example, to limit results to active batch processing pipelines:

type:BATCH,status:ACTIVE
page_size

int32

The maximum number of entities to return. The service may return fewer than this value, even if there are additional pages. If unspecified, the max limit is yet to be determined by the backend implementation.

page_token

string

A page token, received from a previous ListPipelines call. Provide this to retrieve the subsequent page.

When paginating, all other parameters provided to ListPipelines must match the call that provided the page token.

ListPipelinesResponse

Response message for ListPipelines.

Fields
pipelines[]

Pipeline

Results that matched the filter criteria and were accessible to the caller. Results are always in descending order of pipeline creation date.

next_page_token

string

A token, which can be sent as page_token to retrieve the next page. If this field is omitted, there are no subsequent pages.

Pipeline

The main pipeline entity and all the necessary metadata for launching and managing linked jobs.

Fields
name

string

The pipeline name. For example: projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID.

  • PROJECT_ID can contain letters ([A-Za-z]), numbers ([0-9]), hyphens (-), colons (:), and periods (.). For more information, see Identifying projects.
  • LOCATION_ID is the canonical ID for the pipeline's location. The list of available locations can be obtained by calling google.cloud.location.Locations.ListLocations. Note that the Data Pipelines service is not available in all regions. It depends on Cloud Scheduler, an App Engine application, so it's only available in App Engine regions.
  • PIPELINE_ID is the ID of the pipeline. Must be unique for the selected project and location.
display_name

string

Required. The display name of the pipeline. It can contain only letters ([A-Za-z]), numbers ([0-9]), hyphens (-), and underscores (_).

type

PipelineType

Required. The type of the pipeline. This field affects the scheduling of the pipeline and the type of metrics to show for the pipeline.

state

State

Required. The state of the pipeline. When the pipeline is created, the state is set to 'PIPELINE_STATE_ACTIVE' by default. State changes can be requested by setting the state to stopping, paused, or resuming. State cannot be changed through UpdatePipeline requests.

create_time

Timestamp

Output only. Immutable. The timestamp when the pipeline was initially created. Set by the Data Pipelines service.

last_update_time

Timestamp

Output only. Immutable. The timestamp when the pipeline was last modified. Set by the Data Pipelines service.

workload

Workload

Workload information for creating new jobs.

schedule_info

ScheduleSpec

Internal scheduling information for a pipeline. If this information is provided, periodic jobs will be created per the schedule. If not, users are responsible for creating jobs externally.

job_count

int32

Output only. Number of jobs.

scheduler_service_account_email

string

Optional. A service account email to be used with the Cloud Scheduler job. If not specified, the default compute engine service account will be used.

pipeline_sources

map<string, string>

Immutable. The sources of the pipeline (for example, Dataplex). The keys and values are set by the corresponding sources during pipeline creation.

PipelineType

The type of a pipeline. For example, batch or streaming.

Enums
PIPELINE_TYPE_UNSPECIFIED The pipeline type isn't specified.
PIPELINE_TYPE_BATCH A batch pipeline. It runs jobs on a specific schedule, and each job will automatically terminate once execution is finished.
PIPELINE_TYPE_STREAMING A streaming pipeline. The underlying job is continuously running until it is manually terminated by the user. This type of pipeline doesn't have a schedule to run on, and the linked job gets created when the pipeline is created.

State

The current state of pipeline execution.

Enums
STATE_UNSPECIFIED The pipeline state isn't specified.
STATE_RESUMING The pipeline is getting started or resumed. When finished, the pipeline state will be 'PIPELINE_STATE_ACTIVE'.
STATE_ACTIVE The pipeline is actively running.
STATE_STOPPING The pipeline is in the process of stopping. When finished, the pipeline state will be 'PIPELINE_STATE_ARCHIVED'.
STATE_ARCHIVED The pipeline has been stopped. This is a terminal state and cannot be undone.
STATE_PAUSED The pipeline is paused. This is a non-terminal state. When the pipeline is paused, it will hold processing jobs, but can be resumed later. For a batch pipeline, this means pausing the scheduler job. For a streaming pipeline, creating a job snapshot to resume from will give the same effect.

RunPipelineRequest

Request message for RunPipeline

Fields
name

string

Required. The pipeline name. For example: projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID.

RunPipelineResponse

Response message for RunPipeline

Fields
job

Job

Job that was created as part of RunPipeline operation.

RuntimeEnvironment

The environment values to set at runtime.

Fields
num_workers

int32

The initial number of Compute Engine instances for the job.

max_workers

int32

The maximum number of Compute Engine instances to be made available to your pipeline during execution, from 1 to 1000.

zone

string

The Compute Engine availability zone for launching worker instances to run your pipeline. In the future, worker_zone will take precedence.

service_account_email

string

The email address of the service account to run the job as.

temp_location

string

The Cloud Storage path to use for temporary files. Must be a valid Cloud Storage URL, beginning with gs://.

bypass_temp_dir_validation

bool

Whether to bypass the safety checks for the job's temporary directory. Use with caution.

machine_type

string

The machine type to use for the job. Defaults to the value from the template if not specified.

additional_experiments[]

string

Additional experiment flags for the job.

network

string

Network to which VMs will be assigned. If empty or unspecified, the service will use the network "default".

subnetwork

string

Subnetwork to which VMs will be assigned, if desired. You can specify a subnetwork using either a complete URL or an abbreviated path. Expected to be of the form "https://www.googleapis.com/compute/v1/projects/HOST_PROJECT_ID/regions/REGION/subnetworks/SUBNETWORK" or "regions/REGION/subnetworks/SUBNETWORK". If the subnetwork is located in a Shared VPC network, you must use the complete URL.

additional_user_labels

map<string, string>

Additional user labels to be specified for the job. Keys and values should follow the restrictions specified in the labeling restrictions page. An object containing a list of key/value pairs. Example: { "name": "wrench", "mass": "1kg", "count": "3" }.

kms_key_name

string

Name for the Cloud KMS key for the job. The key format is: projects//locations//keyRings//cryptoKeys/

ip_configuration

WorkerIPAddressConfiguration

Configuration for VM IPs.

worker_region

string

The Compute Engine region (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1". Mutually exclusive with worker_zone. If neither worker_region nor worker_zone is specified, default to the control plane's region.

worker_zone

string

The Compute Engine zone (https://cloud.google.com/compute/docs/regions-zones/regions-zones) in which worker processing should occur, e.g. "us-west1-a". Mutually exclusive with worker_region. If neither worker_region nor worker_zone is specified, a zone in the control plane's region is chosen based on available capacity. If both worker_zone and zone are set, worker_zone takes precedence.

enable_streaming_engine

bool

Whether to enable Streaming Engine for the job.

ScheduleSpec

Details of the schedule the pipeline runs on.

Fields
schedule

string

Unix-cron format of the schedule. This information is retrieved from the linked Cloud Scheduler.

time_zone

string

Timezone ID. This matches the timezone IDs used by the Cloud Scheduler API. If empty, UTC time is assumed.

next_job_time

Timestamp

Output only. When the next Scheduler job is going to run.

SdkVersion

The version of the SDK used to run the job.

Fields
version

string

The version of the SDK used to run the job.

version_display_name

string

A readable string describing the version of the SDK.

sdk_support_status

SdkSupportStatus

The support status for this SDK version.

SdkSupportStatus

The support status of the SDK used to run the job.

Enums
UNKNOWN Dataflow is unaware of this version.
SUPPORTED This is a known version of an SDK, and is supported.
STALE A newer version of the SDK exists, and an update is recommended.
DEPRECATED This version of the SDK is deprecated and will eventually be unsupported.
UNSUPPORTED Support for this SDK version has ended and it should no longer be used.

StopPipelineRequest

Request message for StopPipeline.

Fields
name

string

Required. The pipeline name. For example: projects/PROJECT_ID/locations/LOCATION_ID/pipelines/PIPELINE_ID.

UpdatePipelineRequest

Request message for [UpdatePipelineSchedule][Pipelines.UpdatePipelineSchedule].

Fields
pipeline

Pipeline

Required. The pipeline to update. For example: projects/PROJECT_ID/locations/LOCATION_ID/jobs/PIPELINE_ID.

update_mask

FieldMask

The list of fields to be updated.

WorkerIPAddressConfiguration

Specifies how IP addresses should be allocated to the worker machines.

Enums
WORKER_IP_UNSPECIFIED The configuration is unknown, or unspecified.
WORKER_IP_PUBLIC Workers should have public IP addresses.
WORKER_IP_PRIVATE Workers should have private IP addresses.

Workload

Workload details for creating the pipeline jobs.

Fields

Union field template_details.

template_details can be only one of the following:

dataflow_launch_template_request

LaunchTemplateRequest

Template information and additional parameters needed to launch a Dataflow job using the standard launch API.

dataflow_flex_template_request

LaunchFlexTemplateRequest

Template information and additional parameters needed to launch a Dataflow job using the flex launch API.