You must deploy a model to an endpoint before you can use that model to serve online predictions. Deploying a model associates physical resources to serve online predictions with low latency.
This page describes the steps you must follow to deploy a model to an endpoint using Online Prediction.
Before you begin
Before deploying your model to an endpoint, export your model artifacts for prediction and ensure you meet all the prerequisites from that page.
Create a resource pool
A ResourcePool custom resource lets you have fine-grained control over
the behavior of your model. You can define settings such as the following:
- Autoscaling configurations.
- The machine type, which defines CPU and memory requirements.
- Accelerator options such as GPU resources.
The machine type is essential for the node pool specification request you send to create the prediction cluster.
For the resource pool of the deployed model, the accelerator count and type
determine GPU usage. The machine type only dictates the requested CPU and memory
resources. For this reason, when including GPU accelerators in the
ResourcePool specification, the machineType field controls the CPU and
memory requirements for the model, while the acceleratorType field controls
the GPU. Furthermore, the acceleratorCount field controls the number of GPU
slices.
Follow these steps to create a ResourcePool custom resource:
Create a YAML file defining the
ResourcePoolcustom resource. The following examples contain YAML files for resource pools with GPU accelerators (GPU-based models) and without GPU accelerators (CPU-based models):GPU-based models
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the prediction cluster. machineType: a2-highgpu-1g-gdc acceleratorType: nvidia-a100-80gb # The accelerator count is a slice of the requested virtualized GPUs. # The value corresponds to one-seventh of 80 GB of GPUs for each count. acceleratorCount: 2 autoscaling: minReplica: 2 maxReplica: 10CPU-based models
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: ResourcePool metadata: name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACE spec: resourcePoolID: RESOURCE_POOL_NAME enableContainerLogging: false dedicatedResources: machineSpec: # The system adds computing overhead to the nodes for mandatory components. # Choose a machineType value that allocates fewer CPU and memory resources # than those used by the nodes in the prediction cluster. machineType: n2-highcpu-8-gdc autoscaling: minReplica: 2 maxReplica: 10Replace the following:
RESOURCE_POOL_NAME: the name you want to give to theResourcePooldefinition file.PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.
Modify the values on the
dedicatedResourcesfields according to your resource needs and what is available in your prediction cluster.Apply the
ResourcePooldefinition file to the prediction cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f RESOURCE_POOL_NAME.yamlReplace the following:
PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.RESOURCE_POOL_NAME: the name of theResourcePooldefinition file.
When you create the ResourcePool custom resource, the Kubernetes API and the
webhook service validate the YAML file and report success or failure. The
prediction operator provisions and reserves your resources from the resource
pool when you deploy your models to an endpoint.
Deploy your model to an endpoint
If you have a resource pool, you can deploy more than one model to an endpoint, and you can deploy a model to more than one endpoint. Deploy a prediction model targeting supported containers. Depending on whether the endpoint already exists or not, choose between one of the following two methods:
Deploy a model to a new endpoint
Follow these steps to deploy a prediction model to a new endpoint:
Create a YAML file defining a
DeployedModelcustom resource:TensorFlow
The following YAML file shows a sample configuration for a TensorFlow model:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/1 # The model version ID must match the name of the first folder in # the artifactLocation bucket, inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/1/, # then the value for the model version ID is "1". modelVersionID: "1" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-14:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-14:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-14:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACEReplace the following:
DEPLOYED_MODEL_NAME: the name you want to give to theDeployedModeldefinition file.PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.PREDICTION_ENDPOINT: the name you want to give to the new endpoint, such asmy-img-prediction-endpoint.PATH_TO_MODEL: the path to your model in the storage bucket.RESOURCE_POOL_NAME: the name you gave to theResourcePooldefinition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
PyTorch
The following YAML file shows a sample configuration for a PyTorch model:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: endpointPath: PREDICTION_ENDPOINT endpointInfo: id: PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/pytorch artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: "pytorch" modelDisplayName: my-pytorch-model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/pytorch/1 modelVersionID: "1" modelContainerSpec: # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/pytorch-cpu.2-4:latest # For GPU-based models: gcr.io/aiml/prediction/containers/pytorch-gpu.2-4:latest imageURI: gcr.io/aiml/prediction/containers/pytorch-cpu.2-4:latest ports: - 8080 grpcPorts: - 7070 sharesResourcePool: false resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACEReplace the following:
DEPLOYED_MODEL_NAME: the name you want to give to theDeployedModeldefinition file.PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.PREDICTION_ENDPOINT: the name you want to give to the new endpoint, such asmy-img-prediction-endpoint.PATH_TO_MODEL: the path to your model in the storage bucket.RESOURCE_POOL_NAME: the name you gave to theResourcePooldefinition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
Apply the
DeployedModeldefinition file to the prediction cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yamlReplace the following:
PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.DEPLOYED_MODEL_NAME: the name of theDeployedModeldefinition file.
When you create the
DeployedModelcustom resource, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The prediction operator reconciles theDeployedModelcustom resource and serves it in the prediction cluster.Create a YAML file defining an
Endpointcustom resource.The following YAML file shows a sample configuration:
apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081Replace the following:
ENDPOINT_NAME: the name you want to give to theEndpointdefinition file.PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.PREDICTION_ENDPOINT: the name of the new endpoint. You defined this name on theDeployedModeldefinition file.DEPLOYED_MODEL_NAME: the name you gave to theDeployedModeldefinition file.
You can have one or more
serviceRefdestinations. If you have a secondserviceRefobject, add it to the YAML file on thedestinationsfield and replaceDEPLOYED_MODEL_NAME_2with the name you gave to the secondDeployedModeldefinition file you created. Keep adding or removingserviceRefobjects as you need them, depending on the amount of models you are deploying.Set the
trafficPercentagefields based on how you want to split traffic between the models on this endpoint. Modify the values on the remaining fields according to your endpoint configurations.Apply the
Endpointdefinition file to the prediction cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yamlReplace
ENDPOINT_NAMEwith the name of theEndpointdefinition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.PREDICTION_ENDPOINT: the name of the new endpoint.PROJECT_NAMESPACE: the name of the prediction project namespace.
Deploy a model to an existing endpoint
You can only deploy a model to an existing endpoint if you had previously deployed another model to that endpoint when it was new. The system requires this previous step to create the endpoint.
Follow these steps to deploy a prediction model to an existing endpoint:
Create a YAML file defining a
DeployedModelcustom resource.The following YAML file shows a sample configuration:
apiVersion: prediction.aiplatform.gdc.goog/v1 kind: DeployedModel metadata: name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE spec: # The endpoint path structure is endpoints/<endpoint-id> endpointPath: endpoints/PREDICTION_ENDPOINT modelSpec: # The artifactLocation field must be the s3 path to the folder that # contains the various model versions. # For example, s3://my-prediction-bucket/tensorflow artifactLocation: s3://PATH_TO_MODEL # The value in the id field must be unique to each model. id: img-detection-model-v2 modelDisplayName: my_img_detection_model # The model resource name structure is models/<model-id>/<model-version-id> modelResourceName: models/img-detection-model/2 # The model version ID must match the name of the first folder in # the artifactLocation bucket, # inside the 'tensorflow' folder. # For example, if the bucket path is # s3://my-prediction-bucket/tensorflow/2/, # then the value for the model version ID is "2". modelVersionID: "2" modelContainerSpec: args: - --model_config_file=/models/models.config - --rest_api_port=8080 - --port=8500 - --file_system_poll_wait_seconds=30 - --model_config_file_poll_wait_seconds=30 command: - /bin/tensorflow_model_server # The image URI field must contain one of the following values: # For CPU-based models: gcr.io/aiml/prediction/containers/tf2-cpu.2-6:latest # For GPU-based models: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest imageURI: gcr.io/aiml/prediction/containers/tf2-gpu.2-6:latest ports: - 8080 grpcPorts: - 8500 resourcePoolRef: kind: ResourcePool name: RESOURCE_POOL_NAME namespace: PROJECT_NAMESPACEReplace the following:
DEPLOYED_MODEL_NAME: the name you want to give to theDeployedModeldefinition file.PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.PREDICTION_ENDPOINT: the name of the existing endpoint, such asmy-img-prediction-endpoint.PATH_TO_MODEL: the path to your model in the storage bucket.RESOURCE_POOL_NAME: the name you gave to theResourcePooldefinition file when you created a resource pool to host the model.
Modify the values on the remaining fields according to your prediction model.
Apply the
DeployedModeldefinition file to the prediction cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f DEPLOYED_MODEL_NAME.yamlReplace the following:
PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.DEPLOYED_MODEL_NAME: the name of theDeployedModeldefinition file.
When you create the
DeployedModelcustom resource, the Kubernetes API and the webhook service validate the YAML file and report success or failure. The prediction operator reconciles theDeployedModelcustom resource and serves it in the prediction cluster.Show details of the existing
Endpointcustom resource:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG describe -f ENDPOINT_NAME.yamlReplace
ENDPOINT_NAMEwith the name of theEndpointdefinition file.Update the YAML file of the
Endpointcustom resource definition by adding a newserviceRefobject on thedestinationsfield. On the new object, include the appropriate service name based on your newly createdDeployedModelcustom resource.The following YAML file shows a sample configuration:
apiVersion: aiplatform.gdc.goog/v1 kind: Endpoint metadata: name: ENDPOINT_NAME namespace: PROJECT_NAMESPACE spec: createDns: true id: PREDICTION_ENDPOINT destinations: - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME namespace: PROJECT_NAMESPACE trafficPercentage: 40 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_2 namespace: PROJECT_NAMESPACE trafficPercentage: 50 grpcPort: 8501 httpPort: 8081 - serviceRef: kind: DeployedModel name: DEPLOYED_MODEL_NAME_3 namespace: PROJECT_NAMESPACE trafficPercentage: 10 grpcPort: 8501 httpPort: 8081Replace the following:
ENDPOINT_NAME: the name of the existingEndpointdefinition file.PROJECT_NAMESPACE: the name of the project namespace associated with the prediction cluster.PREDICTION_ENDPOINT: the name of the existing endpoint. You referenced this name on theDeployedModeldefinition file.DEPLOYED_MODEL_NAME: the name of a previously createdDeployedModeldefinition file.DEPLOYED_MODEL_NAME_2: the name you gave to the newly createdDeployedModeldefinition file.
You can have one or more
serviceRefdestinations. If you have a thirdserviceRefobject, add it to the YAML file on thedestinationsfield and replaceDEPLOYED_MODEL_NAME_3with the name you gave to the thirdDeployedModeldefinition file you created. Keep adding or removingserviceRefobjects as you need them, depending on the amount of models you are deploying.Set the
trafficPercentagefields based on how you want to split traffic between the models of this endpoint. Modify the values on the remaining fields according to your endpoint configurations.Apply the
Endpointdefinition file to the prediction cluster:kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG apply -f ENDPOINT_NAME.yamlReplace
ENDPOINT_NAMEwith the name of theEndpointdefinition file.
To get the endpoint URL path for the prediction model, run the following command:
kubectl --kubeconfig PREDICTION_CLUSTER_KUBECONFIG get endpoint PREDICTION_ENDPOINT -n PROJECT_NAMESPACE -o jsonpath='{.status.endpointFQDN}'
Replace the following:
PREDICTION_CLUSTER_KUBECONFIG: the path to the kubeconfig file in the prediction cluster.PREDICTION_ENDPOINT: the name of the endpoint.PROJECT_NAMESPACE: the name of the prediction project namespace.