You must deploy a model to an endpoint before that model can be used to serve online predictions in Google Distributed Cloud (GDC) air-gapped. Deploying a model associates physical resources with the model so it can serve online predictions with low latency.
This page describes the steps you must follow to deploy a model to an endpoint for online predictions.
Before you begin
Before deploying a model, perform the following steps:
- Create and train a prediction model targeting one of the supported containers.
- If you don't have a project, work with your Platform Administrator (PA) to create one.
- Work with your Infrastructure Operator (IO) to create the Prediction user cluster. The IO creates the cluster for you, associates it with your project, and assigns the appropriate node pools within the cluster, considering the resources you need for online predictions.
- Create a storage bucket for your project.
- Create the Vertex AI Default Serving
(
vai-default-serving-sa
) service identity within your project. For more information about how to create service identities, see Manage service identities. - Grant the Project Bucket Object Viewer (
project-bucket-object-viewer
) role to the Vertex AI Default Serving (vai-default-serving-sa
) service identity for the storage bucket you created. For more information about how to grant bucket access to service identities, see Grant bucket access.
Upload your model
You must upload your model to the storage bucket you created. For more information about how to upload objects to storage buckets, see Upload and download storage objects in projects.
If you use TensorFlow to train a model, export your model as a TensorFlow SavedModel directory.
There are several ways to export SavedModels
from TensorFlow training code.
The following list describes a few ways that work for various TensorFlow APIs:
If you use Keras for training, use
tf.keras.Model.save
to export a SavedModelIf you use an Estimator for training, use
tf.estimator.Estimator.export_saved_model
to export a SavedModel.Otherwise, use
tf.saved_model.save
or usetf.compat.v1.saved_model.SavedModelBuilder
.
If you are not using Keras or an Estimator, then make sure to use the serve
tag and serving_default
signature when you export your SavedModel
to ensure Vertex AI can use your model artifacts to serve
predictions. Keras and Estimator handle this task automatically. Learn more
about specifying signatures during export.
To serve predictions using these artifacts, create a Model
with the prebuilt container for prediction
matching the version of TensorFlow that you used for training.
The path to the storage bucket of your model must have the following structure:
s3://BUCKET_NAME/MODEL_ID/MODEL_VERSION_ID
In the MODEL_VERSION_ID
folders, you must have the
following structure for your files:
- A PB (protocol buffer or protobuf) file.
- A
variables
folder with the following files:- A
variable.index
file. - One or more
variables.data
files, for example,variables.data-00000-of-00001
.
- A
Create a resource pool
A ResourcePool
custom resource (CR) lets you have fine-grained control over
the behavior of your model. You can define settings such as the following:
- Autoscaling configurations
- Machine type, which defines CPU and memory requirements
- Accelerator options, for example, GPU resources
To create a ResourcePool
CR, perform the following steps:
Create a YAML file defining the
ResourcePool
CR.Sample YAML file without GPU accelerators (CPU-based models):
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the Prediction user cluster. machineType: n2-highcpu-8-gdc autoscaling: minReplica: 2 maxReplica: 10
Sample YAML file including GPU accelerators (GPU-based models):
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the Prediction user cluster. machineType: a2-highgpu-1g-gdc acceleratorType: nvidia-a100-80gb # The accelerator count is a slice of the requested virtualized GPUs. # The value corresponds to one-seventh of 80 GB of GPUs for each count. acceleratorCount: 2 autoscaling: minReplica: 2 maxReplica: 10
Replace the following:
RESOURCE_POOL_NAME
: the name you want to give to theResourcePool
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.
Modify the values on the
dedicatedResources
fields according to your resource needs and what is available in your Prediction user cluster.Apply the
ResourcePool
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f RESOURCE_POOL_NAME.yaml
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.RESOURCE_POOL_NAME
: the name of theResourcePool
definition file.
When you create the ResourcePool
CR, the Kubernetes API and the webhook
service validate the YAML file and report success or failure. The Prediction
operator provisions and reserves your resources from the resource pool when you
deploy your models to an endpoint.
Deploy your model to an endpoint
If you have a resource pool, you can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. Deploy a prediction model targeting supported containers. Depending on whether the endpoint already exists or not, choose between one of the following two methods:
Deploy a model to a new endpoint
To deploy a prediction model to a new endpoint, perform the following steps:
Create a YAML file defining a
DeployedModel
CR.The following YAML file shows a sample configuration for a Tensorflow model:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/1 # The model version ID must match the name of the first folder in # the artifactLocation bucket, inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/1/, # then the value for the model version ID is "1". modelVersionID: "1" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-6:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE
Replace the following:
DEPLOYED_MODEL_NAME
: the name you want to give to theDeployedModel
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name you want to give to the new endpoint, for example,my-img-prediction-endpoint
.PATH_TO_MODEL
: the path to your model in the storage bucket.RESOURCE_POOL_NAME
: the name you gave to theResourcePool
definition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
Apply the
DeployedModel
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yaml
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.DEPLOYED_MODEL_NAME
with the name of theDeployedModel
definition file.
When you create the
DeployedModel
CR, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The Prediction operator reconciles theDeployedModel
CR and serves it in the Prediction user cluster.Create a YAML file defining an
Endpoint
CR.The following YAML file shows a sample configuration:
apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081
Replace the following:
ENDPOINT_NAME
: the name you want to give to theEndpoint
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the new endpoint. You defined this name on theDeployedModel
definition file.DEPLOYED_MODEL_NAME
: the name you gave to theDeployedModel
definition file.
You can have one or more
serviceRef
destinations. If you have a secondserviceRef
object, add it to the YAML file on thedestinations
field and replaceDEPLOYED_MODEL_NAME_2
with the name you gave to a secondDeployedModel
definition file you created. Keep adding or removingserviceRef
objects as you need them, depending on the amount of models you are deploying.Set the
trafficPercentage
fields based on how you want to split traffic between the models on this endpoint. Modify the values on the remaining fields according to your endpoint configurations.Apply the
Endpoint
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yaml
Replace
ENDPOINT_NAME
with the name of theEndpoint
definition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the new endpoint.PROJECT_NAMESPACE
: the name of the prediction project namespace.
Deploy a model to an existing endpoint
You can only deploy a model to an existing endpoint if you had previously deployed another model to that endpoint when it was new. The system requires this previous step to create the endpoint.
To deploy a prediction model to an existing endpoint, perform the following steps:
Create a YAML file defining a
DeployedModel
CR.The following YAML file shows a sample configuration:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model-v2 modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/2 # The model version ID must match the name of the first folder in # the artifactLocation bucket, # inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/2/, # then the value for the model version ID is "2". modelVersionID: "2" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-6:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE
Replace the following:
DEPLOYED_MODEL_NAME
: the name you want to give to theDeployedModel
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the existing endpoint, for example,my-img-prediction-endpoint
.PATH_TO_MODEL
: the path to your model in the storage bucket.RESOURCE_POOL_NAME
: the name you gave to theResourcePool
definition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
Apply the
DeployedModel
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yaml
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.DEPLOYED_MODEL_NAME
with the name of theDeployedModel
definition file.
When you create the
DeployedModel
CR, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The Prediction operator reconciles theDeployedModel
CR and serves it in the Prediction user cluster.Show details of the existing
Endpoint
CR:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG describe -f ENDPOINT_NAME.yaml
Replace
ENDPOINT_NAME
with the name of theEndpoint
definition file.Update the YAML file of the
Endpoint
CR definition by adding a newserviceRef
object on thedestinations
field. On the new object, include the appropriate service name based on your newly createdDeployedModel
CR.The following YAML file shows a sample configuration:
apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 40 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_3 namespace: PROJECT_NAMESPACE trafficPercentage: 10 grpcPort: 8501 httpPort: 8081
Replace the following:
ENDPOINT_NAME
: the name of the existingEndpoint
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the existing endpoint. You referenced this name on theDeployedModel
definition file.DEPLOYED_MODEL_NAME
: the name of a previously createdDeployedModel
definition file.DEPLOYED_MODEL_NAME_2
: the name you gave to the newly createdDeployedModel
definition file.
You can have one or more
serviceRef
destinations. If you have a thirdserviceRef
object, add it to the YAML file on thedestinations
field and replaceDEPLOYED_MODEL_NAME_3
with the name you gave to a thirdDeployedModel
definition file you created. Keep adding or removingserviceRef
objects as you need them, depending on the amount of models you are deploying.Set the
trafficPercentage
fields based on how you want to split traffic between the models of this endpoint. Modify the values on the remaining fields according to your endpoint configurations.Apply the
Endpoint
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yaml
Replace
ENDPOINT_NAME
with the name of theEndpoint
definition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the endpoint.PROJECT_NAMESPACE
: the name of the prediction project namespace.