Resource labeling by Vertex AI Pipelines

Depending on the type of component, resource, and the Google Cloud Pipeline Components SDK version, Vertex AI Pipelines either automatically propagates the labels from your pipeline run to the resources generated from Google Cloud Pipeline Components or requires you to label the generated resources. For user-defined components, you need to author your component code to attach the labels from an environment variable. For more information, see Resources generated from user-defined components.

Resources with automatic labeling

Vertex AI Pipelines automatically labels the following resources, regardless of the Google Cloud Pipeline Components SDK version:

CustomJob resources

Vertex AI Pipelines automatically propagates the labels from your pipeline run to CustomJob resources. This is supported by the following components in all versions of the Google Cloud Pipeline Components SDK:

Resources with automatic labeling in Google Cloud Pipeline Components SDK v1.0.31 or later

Vertex AI Pipelines automatically labels the following resources if you use Google Cloud Pipeline Components SDK v1.0.31 or later:

BatchPredictionJob resources

Vertex AI Pipelines automatically propagates labels from your pipeline run to BatchPredictionJob resources generated from the ModelBatchPredictOp component if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK.

Vertex AI endpoint resources

Vertex AI Pipelines automatically propagates labels from your pipeline run to Vertex AI endpoint resources generated from the EndpointCreateOp component if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK.

HyperparameterTuningJob resources

Vertex AI Pipelines automatically propagates labels from your pipeline run to HyperparameterTuningJob resources generated from the HyperparameterTuningJobRunOp component if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK.

Vertex AI dataset resources

Vertex AI Pipelines automatically propagates labels from your pipeline run to Vertex AI dataset resources generated from the following Vertex AI components if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK:

Google Cloud BigQuery Job resources

Vertex AI Pipelines automatically propagates labels from your pipeline run to Google Cloud BigQuery Job resources resources generated from any of the BigQuery ML components if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK.

Google Cloud Dataproc Job resources

Vertex AI Pipelines automatically propagates labels from your pipeline run to Google Cloud Dataproc Job resources resources generated from any of the Dataproc Serverless components if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK.

TrainingPipeline and Model resources

Vertex AI Pipelines automatically propagates labels from your pipeline run to TrainingPipeline and Model resources resources generated from the following AutoML components if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK:

Google Cloud BigQuery table resources

Vertex AI Pipelines automatically propagates labels from your pipeline run Google Cloud BigQuery table resources resources generated from the ForecastingPreprocessingOp component if you use v1.0.31 or later of the Google Cloud Pipeline Components SDK.

Resources without automatic labeling

Vertex AI Pipelines doesn't label the following resources automatically, regardless of the Google Cloud Pipeline Components SDK version:

Google Cloud Dataflow resources

Vertex AI Pipelines doesn't automatically label Dataflow resources generated by the DataflowPythonJobOp component. You can include instructions in your code to label the resources.

Use the following code sample to propagate billing labels from your pipeline run to any Google Cloud Dataflow resource generated using the DataflowPythonJobOp component:

  import argparse
  import apache_beam as beam
  ...
  def run(argv=None):
    parser = argparse.ArgumentParser()
    # Don't add `--labels` to the argument list, so that they will be passed to the pipeline_options 
    parser.add_argument('--input', )
    parser.add_argument('--output', )
  ...
    known_args, pipeline_args = parser.parse_known_args(argv)
    pipeline_options = PipelineOptions(pipeline_args)
    with beam.Pipeline(options=pipeline_options) as p:

Resources generated from user-defined components

Vertex AI Pipelines doesn't automatically label Google Cloud resources generated from user-defined components. You can include instructions in your code to retrieve the labels from the environment variable VERTEX_AI_PIPELINES_RUN_LABELS and attach those labels to the Google Cloud resources generated using the component at runtime.

The environment variable VERTEX_AI_PIPELINE_RUN_LABELS contains the labels in JSON format as key-value pairs.

For example: { "label1_key": "label1_value", "label2_key": "label2_value", ...}

If you're using the Vertex AI SDK for Python, use the following code sample in your component code to propagate labels from the environment variable to a new resource generated from the component:

import os
import json
from google.cloud import aiplatform

aiplatform.init(
  project='PROJECT_ID',
  location='LOCATION'
)

aiplatform.RESOURCE.create(
  ...
  json.loads(os.getenv("VERTEX_AI_PIPELINES_RUN_LABELS"))
)

Replace the following:

  • PROJECT_ID: The Google Cloud project that this pipeline runs in.

  • LOCATION: The location or region that this pipeline runs in.

  • RESOURCE: Google Cloud resource generated from the component, for example, CustomJob or Model.

You can also use the gcp_labels_util.attach_system_labels utility if you want to use Python to parse the environment variable. You can use this utility only if you have access to the Google Cloud Pipeline Components library and are using Python. For more information, see the source code of the utility function in GitHub.

Resources without labeling support

Vertex AI Pipelines doesn't support billing label propagation to the following resources:

ML Metadata resources

ML Metadata resources are billed at the store level. You can't use billing labels to understand the resource-level cost.

Cloud Storage resources

Vertex AI Pipelines doesn't propagate billing labels to Cloud Storage resources, such as Cloud Storage buckets.

What's next