When you perform custom training, you must specify what machine learning (ML) code you want Vertex AI to run. To do this, configure training container settings for either a custom container or a Python training application that runs on a prebuilt container.
To determine whether you want to use a custom container or a prebuilt container, read Training code requirements.
This document describes the fields of the Vertex AI API that you must specify in either of the preceding cases.
Where to specify container settings
Specify configuration details within a
WorkerPoolSpec
. Depending on how
you perform custom training, put this WorkerPoolSpec
in one of the following
API fields:
If you are creating a
CustomJob
resource, specify theWorkerPoolSpec
inCustomJob.jobSpec.workerPoolSpecs
.If you are using the Google Cloud CLI, then you can use the
--worker-pool-spec
flag or the--config
flag on thegcloud ai custom-jobs create
command to specify worker pool options.Learn more about creating a
CustomJob
.If you are creating a
HyperparameterTuningJob
resource, specify theWorkerPoolSpec
inHyperparameterTuningJob.trialJobSpec.workerPoolSpecs
.If you are using the gcloud CLI, then you can use the
--config
flag on thegcloud ai hpt-tuning-jobs create
command to specify worker pool options.Learn more about creating a
HyperparameterTuningJob
.If you are creating a
TrainingPipeline
resource without hyperparameter tuning, specify theWorkerPoolSpec
inTrainingPipeline.trainingTaskInputs.workerPoolSpecs
.Learn more about creating a custom
TrainingPipeline
.If you are creating a
TrainingPipeline
with hyperparameter tuning, specify theWorkerPoolSpec
inTrainingPipeline.trainingTaskInputs.trialJobSpec.workerPoolSpecs
.
If you are performing distributed training, you can use different settings for each worker pool.
Configure container settings
Depending on whether you are using a prebuilt container or a custom container,
you must specify different fields within the WorkerPoolSpec
. Select the tab for your scenario:
Prebuilt container
Select a prebuilt container that supports the ML framework you plan to use for training. Specify one of the container image's URIs in the
pythonPackageSpec.executorImageUri
field.Specify the Cloud Storage URIs of your Python training application in the
pythonPackageSpec.packageUris
field.Specify your training application's entry point module in the
pythonPackageSpec.pythonModule
field.Optionally, specify a list of command-line arguments to pass to your training application's entry point module in the
pythonPackageSpec.args
field.
The following examples highlight where you specify these container settings
when you create a CustomJob
:
Console
In the Google Cloud console, you can't create a CustomJob
directly. However,
you can create a TrainingPipeline
that creates a
CustomJob
. When you create a
TrainingPipeline
in the Google Cloud console, you can specify prebuilt
container settings in certain fields on the Training container step:
pythonPackageSpec.executorImageUri
: Use the Model framework and Model framework version drop-down lists.pythonPackageSpec.packageUris
: Use the Package location field.pythonPackageSpec.pythonModule
: Use the Python module field.pythonPackageSpec.args
: Use the Arguments field.
gcloud
gcloud ai custom-jobs create \
--region=LOCATION \
--display-name=JOB_NAME \
--python-package-uris=PYTHON_PACKAGE_URIS \
--worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,executor-image-uri=PYTHON_PACKAGE_EXECUTOR_IMAGE_URI,python-module=PYTHON_MODULE
For more context, read the guide to creating a
CustomJob
.
Custom container
Specify the Artifact Registry or Docker Hub URI of your custom container in the
containerSpec.imageUri
field.Optionally, if you want to override the
ENTRYPOINT
orCMD
instructions in your container, specify thecontainerSpec.command
orcontainerSpec.args
fields. These fields affect how your container runs according to the following rules:If you specify neither field: Your container runs according to its
ENTRYPOINT
instruction andCMD
instruction (if it exists). Refer to the Docker documentation about howCMD
andENTRYPOINT
interact.If you specify only
containerSpec.command
: Your container runs with the value ofcontainerSpec.command
replacing itsENTRYPOINT
instruction. If the container has aCMD
instruction, it is ignored.If you specify only
containerSpec.args
: Your container runs according to itsENTRYPOINT
instruction, with the value ofcontainerSpec.args
replacing itsCMD
instruction.If you specify both fields: Your container runs with
containerSpec.command
replacing itsENTRYPOINT
instruction andcontainerSpec.args
replacing itsCMD
instruction.
The following example highlights where you can specify some of these
container settings when you create a CustomJob
:
Console
In the Google Cloud console, you can't create a CustomJob
directly. However,
you can create a TrainingPipeline
that creates a
CustomJob
. When you create a
TrainingPipeline
in the Google Cloud console, you can specify custom
container settings in certain fields on the Training container step:
containerSpec.imageUri
: Use the Container image field.containerSpec.command
: This API field is not configurable in the Google Cloud console.containerSpec.args
: Use the Arguments field.
gcloud
gcloud ai custom-jobs create \
--region=LOCATION \
--display-name=JOB_NAME \
--worker-pool-spec=machine-type=MACHINE_TYPE,replica-count=REPLICA_COUNT,container-image-uri=CUSTOM_CONTAINER_IMAGE_URI
Java
Before trying this sample, follow the Java setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Java API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Node.js
Before trying this sample, follow the Node.js setup instructions in the Vertex AI quickstart using client libraries. For more information, see the Vertex AI Node.js API reference documentation.
To authenticate to Vertex AI, set up Application Default Credentials. For more information, see Set up authentication for a local development environment.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
For more context, read the guide to creating a
CustomJob
.
What's next
- Learn how to perform custom training by creating a
CustomJob
.