本頁面由 Cloud Translation API 翻譯而成。

自行建構管道元件

編寫元件以顯示 Google Cloud 控制台連結

通常在執行元件時，您不僅想查看要啟動的元件工作連結，也想查看底層雲端資源的連結，例如 Vertex 批次預測工作或資料流工作。

gcp_resource proto 是特殊參數，可用於元件，讓 Google Cloud 控制台在 Vertex AI Pipelines 控制台中提供資源記錄和狀態的自訂檢視畫面。

輸出 `gcp_resource` 參數

使用容器型元件

首先，您需要在元件中定義 gcp_resource 參數，如以下範例 component.py 檔案所示：

Python 適用的 Vertex AI SDK

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Vertex AI SDK for Python API 參考說明文件。

# Copyright 2023 The Kubeflow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
from typing import List

from google_cloud_pipeline_components import _image
from google_cloud_pipeline_components import _placeholders
from kfp.dsl import container_component
from kfp.dsl import ContainerSpec
from kfp.dsl import OutputPath


@container_component
def dataflow_python(
    python_module_path: str,
    temp_location: str,
    gcp_resources: OutputPath(str),
    location: str = 'us-central1',
    requirements_file_path: str = '',
    args: List[str] = [],
    project: str = _placeholders.PROJECT_ID_PLACEHOLDER,
):
  # fmt: off
  """Launch a self-executing Beam Python file on Google Cloud using the
  Dataflow Runner.

  Args:
      location: Location of the Dataflow job. If not set, defaults to `'us-central1'`.
      python_module_path: The GCS path to the Python file to run.
      temp_location: A GCS path for Dataflow to stage temporary job files created during the execution of the pipeline.
      requirements_file_path: The GCS path to the pip requirements file.
      args: The list of args to pass to the Python file. Can include additional parameters for the Dataflow Runner.
      project: Project to create the Dataflow job. Defaults to the project in which the PipelineJob is run.

  Returns:
      gcp_resources: Serialized gcp_resources proto tracking the Dataflow job. For more details, see https://github.com/kubeflow/pipelines/blob/master/components/google-cloud/google_cloud_pipeline_components/proto/README.md.
  """
  # fmt: on
  return ContainerSpec(
      image=_image.GCPC_IMAGE_TAG,
      command=[
          'python3',
          '-u',
          '-m',
          'google_cloud_pipeline_components.container.v1.dataflow.dataflow_launcher',
      ],
      args=[
          '--project',
          project,
          '--location',
          location,
          '--python_module_path',
          python_module_path,
          '--temp_location',
          temp_location,
          '--requirements_file_path',
          requirements_file_path,
          '--args',
          args,
          '--gcp_resources',
          gcp_resources,
      ],
  )

接下來，請在容器中安裝 Google Cloud Pipeline Components 套件：

pip install --upgrade google-cloud-pipeline-components

接著，在 Python 程式碼中，將資源定義為 gcp_resource 參數：

Python 適用的 Vertex AI SDK

如要瞭解如何安裝或更新 Python 適用的 Vertex AI SDK，請參閱「安裝 Python 適用的 Vertex AI SDK」。詳情請參閱 Vertex AI SDK for Python API 參考說明文件。

from google_cloud_pipeline_components.proto.gcp_resources_pb2 import GcpResources
from google.protobuf.json_format import MessageToJson

dataflow_resources = GcpResources()
dr = dataflow_resources.resources.add()
dr.resource_type='DataflowJob'
dr.resource_uri='https://dataflow.googleapis.com/v1b3/projects/[your-project]/locations/us-east1/jobs/[dataflow-job-id]'

with open(gcp_resources, 'w') as f:
    f.write(MessageToJson(dataflow_resources))

使用 Python 元件

或者，您也可以像傳回任何字串輸出參數一樣，傳回 gcp_resources 輸出參數：

@dsl.component(
    base_image='python:3.9',
    packages_to_install=['google-cloud-pipeline-components==2.19.0'],
)
def launch_dataflow_component(project: str, location:str) -> NamedTuple("Outputs",  [("gcp_resources", str)]):
  # Launch the dataflow job
  dataflow_job_id = [dataflow-id]
  dataflow_resources = GcpResources()
  dr = dataflow_resources.resources.add()
  dr.resource_type='DataflowJob'
  dr.resource_uri=f'https://dataflow.googleapis.com/v1b3/projects/{project}/locations/{location}/jobs/{dataflow_job_id}'
  gcp_resources=MessageToJson(dataflow_resources)
  return gcp_resources

支援的 `resource_type` 值

您可以將 resource_type 設為任意字串，但只有下列類型在 Google Cloud 控制台中提供連結：

BatchPredictionJob
BigQueryJob
CustomJob
DataflowJob
HyperparameterTuningJob

編寫元件以取消底層資源

管道工作取消時，預設行為是讓基礎 Google Cloud 資源持續執行。不會自動取消。如要變更這項行為，請將 SIGTERM 處理程序附加至管道工作。在輪詢迴圈之前，針對可能執行很長時間的工作執行此操作，是個不錯的做法。

我們已在多個 Google Cloud 管道元件上實作取消功能，包括：

批次預測工作
BigQuery ML 工作
自訂工作
Dataproc Serverless 批次工作
超參數微調工作

如需更多資訊，包括如何附加 SIGTERM 處理常式程式碼範例，請參閱下列 GitHub 連結：

實作 SIGTERM 處理程序時，請考量以下事項：

取消傳播功能僅在元件運行幾分鐘後才會生效。這通常是因為背景啟動工作需要在呼叫 Python 信號處理常式之前處理。
部分 Google Cloud 資源可能未實作取消功能。舉例來說，建立或刪除 Vertex AI 端點或模型時，可能會建立長時間執行的作業，該作業會透過 REST API 接受取消要求，但不會實作取消作業。

自行建構管道元件 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

編寫元件以顯示 Google Cloud 控制台連結

輸出 gcp_resource 參數

使用容器型元件

Python 適用的 Vertex AI SDK

Python 適用的 Vertex AI SDK

使用 Python 元件

支援的 resource_type 值

編寫元件以取消底層資源

自行建構管道元件

輸出 `gcp_resource` 參數

支援的 `resource_type` 值