- NAME
-
- gcloud ai endpoints deploy-model - deploy a model to an existing Vertex AI endpoint
- SYNOPSIS
-
-
gcloud ai endpoints deploy-model
(ENDPOINT
:--region
=REGION
)--display-name
=DISPLAY_NAME
--model
=MODEL
[--accelerator
=[count
=COUNT
],[type
=TYPE
]] [--autoscaling-metric-specs
=[METRIC-NAME
=TARGET
,…]] [--deployed-model-id
=DEPLOYED_MODEL_ID
] [--disable-container-logging
] [--enable-access-logging
] [--machine-type
=MACHINE_TYPE
] [--max-replica-count
=MAX_REPLICA_COUNT
] [--min-replica-count
=MIN_REPLICA_COUNT
] [--reservation-affinity
=[key
=KEY
],[reservation-affinity-type
=RESERVATION-AFFINITY-TYPE
],[values
=VALUES
]] [--service-account
=SERVICE_ACCOUNT
] [--spot
] [--traffic-split
=[DEPLOYED_MODEL_ID
=VALUE
,…]] [GCLOUD_WIDE_FLAG …
]
-
- EXAMPLES
-
To deploy a model
to an endpoint456
under project123
in regionexample
, run:us-central1
gcloud ai endpoints deploy-model 123 --project=example --region=us-central1 --model=456 --display-name=my_deployed_model
- POSITIONAL ARGUMENTS
-
-
Endpoint resource - The endpoint to deploy a model to. The arguments in this
group can be used to specify the attributes of this resource. (NOTE) Some
attributes are not given arguments in this group but can be set in other ways.
To set the
project
attribute:-
provide the argument
endpoint
on the command line with a fully specified name; -
provide the argument
--project
on the command line; -
set the property
core/project
.
This must be specified.
ENDPOINT
-
ID of the endpoint or fully qualified identifier for the endpoint.
To set the
name
attribute:-
provide the argument
endpoint
on the command line.
This positional argument must be specified if any of the other arguments in this group are specified.
-
provide the argument
--region
=REGION
-
Cloud region for the endpoint.
To set the
region
attribute:-
provide the argument
endpoint
on the command line with a fully specified name; -
provide the argument
--region
on the command line; -
set the property
ai/region
; - choose one from the prompted list of available regions.
-
provide the argument
-
provide the argument
-
Endpoint resource - The endpoint to deploy a model to. The arguments in this
group can be used to specify the attributes of this resource. (NOTE) Some
attributes are not given arguments in this group but can be set in other ways.
- REQUIRED FLAGS
-
--display-name
=DISPLAY_NAME
- Display name of the deployed model.
--model
=MODEL
- Id of the uploaded model.
- OPTIONAL FLAGS
-
--accelerator
=[count
=COUNT
],[type
=TYPE
]-
Manage the accelerator config for GPU serving. When deploying a model with
Compute Engine Machine Types, a GPU accelerator may also be selected.
type
- The type of the accelerator. Choices are 'nvidia-a100-80gb', 'nvidia-h100-80gb', 'nvidia-h100-mega-80gb', 'nvidia-l4', 'nvidia-tesla-a100', 'nvidia-tesla-k80', 'nvidia-tesla-p100', 'nvidia-tesla-p4', 'nvidia-tesla-t4', 'nvidia-tesla-v100'.
count
-
The number of accelerators to attach to each machine running the job. This is
usually 1. If not specified, the default value is 1.
For example:
--accelerator=type=nvidia-tesla-k80,count=1
--autoscaling-metric-specs
=[METRIC-NAME
=TARGET
,…]-
Metric specifications that overrides a resource utilization metric's target
value. At most one entry is allowed per metric.
METRIC-NAME
- Resource metric name. Choices are 'cpu-usage', 'gpu-duty-cycle'.
TARGET
-
Target resource utilization in percentage (1% - 100%) for the given metric. If
the value is set to 60, the target resource utilization is 60%.
For example:
--autoscaling-metric-specs=cpu-usage=70
--deployed-model-id
=DEPLOYED_MODEL_ID
- User-specified ID of the deployed-model.
--disable-container-logging
-
For custom-trained Models and AutoML Tabular Models, the container of the
deployed model instances will send
stderr
andstdout
streams to Cloud Logging by default. Please note that the logs incur cost, which are subject to Cloud Logging pricing.User can disable container logging by setting this flag to true.
--enable-access-logging
-
If true, online prediction access logs are sent to Cloud Logging.
These logs are standard server access logs, containing information like timestamp and latency for each prediction request.
--machine-type
=MACHINE_TYPE
- The machine resources to be used for each node of this deployment. For available machine types, see https://cloud.google.com/ai-platform-unified/docs/predictions/machine-types.
--max-replica-count
=MAX_REPLICA_COUNT
- Maximum number of machine replicas for the deployment resources the model will be deployed on.
--min-replica-count
=MIN_REPLICA_COUNT
-
Minimum number of machine replicas for the deployment resources the model will
be deployed on. If specified, the value must be equal to or larger than 1.
If not specified and the uploaded models use dedicated resources, the default value is 1.
--reservation-affinity
=[key
=KEY
],[reservation-affinity-type
=RESERVATION-AFFINITY-TYPE
],[values
=VALUES
]- A ReservationAffinity can be used to configure a Vertex AI resource (e.g., a DeployedModel) to draw its Compute Engine resources from a Shared Reservation, or exclusively from on-demand capacity.
--service-account
=SERVICE_ACCOUNT
- Service account that the deployed model's container runs as. Specify the email address of the service account. If this service account is not specified, the container runs as a service account that doesn't have access to the resource project.
--spot
-
If true, online prediction access logs are sent to Cloud Logging.
These logs are standard server access logs, containing information like timestamp and latency for each prediction request.
--traffic-split
=[DEPLOYED_MODEL_ID
=VALUE
,…]- List of pairs of deployed model id and value to set as traffic split.
- GCLOUD WIDE FLAGS
-
These flags are available to all commands:
--access-token-file
,--account
,--billing-project
,--configuration
,--flags-file
,--flatten
,--format
,--help
,--impersonate-service-account
,--log-http
,--project
,--quiet
,--trace-token
,--user-output-enabled
,--verbosity
.Run
$ gcloud help
for details. - NOTES
-
These variants are also available:
gcloud alpha ai endpoints deploy-model
gcloud beta ai endpoints deploy-model
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-01-07 UTC.