You must deploy a model to an endpoint before that model can be used to serve online predictions in Google Distributed Cloud (GDC) air-gapped. Deploying a model associates physical resources with the model so it can serve online predictions with low latency.
This page describes the steps you must follow to deploy a model to an endpoint for online predictions.
Before you begin
Before deploying your model to an endpoint, export your model artifacts for prediction and ensure you meet all the prerequisites from that page.
Create a resource pool
A ResourcePool
custom resource (CR) lets you have fine-grained control over
the behavior of your model. You can define settings such as the following:
- Autoscaling configurations
- Machine type, which defines CPU and memory requirements
- Accelerator options, for example, GPU resources
The machine type is essential for the node pool specification request you send to the Infrastructure Operator to create the Prediction user cluster.
For the resource pool of the deployed model, the accelerator count and type determine GPU usage. The machine type only dictates the requested CPU and memory resources.
For this reason, when including GPU accelerators in the ResourcePool
specification, the machineType
field controls the CPU and memory requirements
for the model, while the acceleratorType
field controls the GPU. Furthermore,
the acceleratorCount
field controls the number of GPU slices.
To create a ResourcePool
CR, perform the following steps:
Create a YAML file defining the
ResourcePool
CR.Sample YAML file without GPU accelerators (CPU-based models):
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the Prediction user cluster. machineType: n2-highcpu-8-gdc autoscaling: minReplica: 2 maxReplica: 10
Sample YAML file including GPU accelerators (GPU-based models):
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the Prediction user cluster. machineType: a2-highgpu-1g-gdc acceleratorType: nvidia-a100-80gb # The accelerator count is a slice of the requested virtualized GPUs. # The value corresponds to one-seventh of 80 GB of GPUs for each count. acceleratorCount: 2 autoscaling: minReplica: 2 maxReplica: 10
Replace the following:
RESOURCE_POOL_NAME
: the name you want to give to theResourcePool
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.
Modify the values on the
dedicatedResources
fields according to your resource needs and what is available in your Prediction user cluster.Apply the
ResourcePool
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f RESOURCE_POOL_NAME.yaml
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.RESOURCE_POOL_NAME
: the name of theResourcePool
definition file.
When you create the ResourcePool
CR, the Kubernetes API and the webhook
service validate the YAML file and report success or failure. The Prediction
operator provisions and reserves your resources from the resource pool when you
deploy your models to an endpoint.
Deploy your model to an endpoint
If you have a resource pool, you can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. Deploy a prediction model targeting supported containers. Depending on whether the endpoint already exists or not, choose between one of the following two methods:
Deploy a model to a new endpoint
To deploy a prediction model to a new endpoint, perform the following steps:
Create a YAML file defining a
DeployedModel
CR:TensorFlow
The following YAML file shows a sample configuration for a TensorFlow model:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/1 # The model version ID must match the name of the first folder in # the artifactLocation bucket, inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/1/, # then the value for the model version ID is "1". modelVersionID: "1" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-14:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-14:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-14:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE
Replace the following:
DEPLOYED_MODEL_NAME
: the name you want to give to theDeployedModel
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name you want to give to the new endpoint, for example,my-img-prediction-endpoint
.PATH_TO_MODEL
: the path to your model in the storage bucket.RESOURCE_POOL_NAME
: the name you gave to theResourcePool
definition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
PyTorch
The following YAML file shows a sample configuration for a PyTorch model:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: endpointPath: PREDICTION_ENDPOINT endpointInfo: id: PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/pytorch artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: "pytorch" modelDisplayName: my-pytorch-model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/pytorch/1 modelVersionID: "1" modelContainerSpec: # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/pytorch-cpu.2-1:latest # For GPU-based models: gcr.io/aiml/prediction/containers/pytorch-gpu.2-1:latest imageURI: gcr.io/aiml/prediction/containers/pytorch-cpu.2-1:latest ports: - 8080 grpcPorts: - 7070 sharesResourcePool: false resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE
Replace the following:
DEPLOYED_MODEL_NAME
: the name you want to give to theDeployedModel
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name you want to give to the new endpoint, for example,my-img-prediction-endpoint
.PATH_TO_MODEL
: the path to your model in the storage bucket.RESOURCE_POOL_NAME
: the name you gave to theResourcePool
definition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
Apply the
DeployedModel
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yaml
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.DEPLOYED_MODEL_NAME
with the name of theDeployedModel
definition file.
When you create the
DeployedModel
CR, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The Prediction operator reconciles theDeployedModel
CR and serves it in the Prediction user cluster.Create a YAML file defining an
Endpoint
CR.The following YAML file shows a sample configuration:
apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081
Replace the following:
ENDPOINT_NAME
: the name you want to give to theEndpoint
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the new endpoint. You defined this name on theDeployedModel
definition file.DEPLOYED_MODEL_NAME
: the name you gave to theDeployedModel
definition file.
You can have one or more
serviceRef
destinations. If you have a secondserviceRef
object, add it to the YAML file on thedestinations
field and replaceDEPLOYED_MODEL_NAME_2
with the name you gave to a secondDeployedModel
definition file you created. Keep adding or removingserviceRef
objects as you need them, depending on the amount of models you are deploying.Set the
trafficPercentage
fields based on how you want to split traffic between the models on this endpoint. Modify the values on the remaining fields according to your endpoint configurations.Apply the
Endpoint
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yaml
Replace
ENDPOINT_NAME
with the name of theEndpoint
definition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the new endpoint.PROJECT_NAMESPACE
: the name of the prediction project namespace.
Deploy a model to an existing endpoint
You can only deploy a model to an existing endpoint if you had previously deployed another model to that endpoint when it was new. The system requires this previous step to create the endpoint.
To deploy a prediction model to an existing endpoint, perform the following steps:
Create a YAML file defining a
DeployedModel
CR.The following YAML file shows a sample configuration:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model-v2 modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/2 # The model version ID must match the name of the first folder in # the artifactLocation bucket, # inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/2/, # then the value for the model version ID is "2". modelVersionID: "2" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-6:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE
Replace the following:
DEPLOYED_MODEL_NAME
: the name you want to give to theDeployedModel
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the existing endpoint, for example,my-img-prediction-endpoint
.PATH_TO_MODEL
: the path to your model in the storage bucket.RESOURCE_POOL_NAME
: the name you gave to theResourcePool
definition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
Apply the
DeployedModel
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yaml
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.DEPLOYED_MODEL_NAME
with the name of theDeployedModel
definition file.
When you create the
DeployedModel
CR, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The Prediction operator reconciles theDeployedModel
CR and serves it in the Prediction user cluster.Show details of the existing
Endpoint
CR:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG describe -f ENDPOINT_NAME.yaml
Replace
ENDPOINT_NAME
with the name of theEndpoint
definition file.Update the YAML file of the
Endpoint
CR definition by adding a newserviceRef
object on thedestinations
field. On the new object, include the appropriate service name based on your newly createdDeployedModel
CR.The following YAML file shows a sample configuration:
apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 40 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_3 namespace: PROJECT_NAMESPACE trafficPercentage: 10 grpcPort: 8501 httpPort: 8081
Replace the following:
ENDPOINT_NAME
: the name of the existingEndpoint
definition file.PROJECT_NAMESPACE
: the name of the project namespace associated with the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the existing endpoint. You referenced this name on theDeployedModel
definition file.DEPLOYED_MODEL_NAME
: the name of a previously createdDeployedModel
definition file.DEPLOYED_MODEL_NAME_2
: the name you gave to the newly createdDeployedModel
definition file.
You can have one or more
serviceRef
destinations. If you have a thirdserviceRef
object, add it to the YAML file on thedestinations
field and replaceDEPLOYED_MODEL_NAME_3
with the name you gave to a thirdDeployedModel
definition file you created. Keep adding or removingserviceRef
objects as you need them, depending on the amount of models you are deploying.Set the
trafficPercentage
fields based on how you want to split traffic between the models of this endpoint. Modify the values on the remaining fields according to your endpoint configurations.Apply the
Endpoint
definition file to the Prediction user cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yaml
Replace
ENDPOINT_NAME
with the name of theEndpoint
definition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG
: the path to the kubeconfig file in the Prediction user cluster.PREDICTION_ENDPOINT
: the name of the endpoint.PROJECT_NAMESPACE
: the name of the prediction project namespace.