You must deploy your prediction custom resources in the Prediction user cluster that the Infrastructure Operator (IO) creates for you. The Prediction Operator creates prediction workloads in this same cluster.
To create the Prediction user cluster, work with the IO to associate your prediction project and allocate the node pools that you need for online predictions.
To create a Prediction user cluster, perform the following steps:
- Identify the project in your organization that you want to associate with the new user cluster for online predictions.
- From the list of available machine types in Google Distributed Cloud (GDC) air-gapped, choose the machine type for the nodes that your workloads need in the user cluster. The machine type that you choose depends on your prediction model size and complexity and determines the amount of compute and graphic processing unit (GPU) resources that your IO provides to the user cluster. Follow node selection recommendations when determining the machine type for your nodes.
- To open a case and address your request for the creation of the cluster, send an email to the IO using the Prediction user cluster case template.
- If necessary, maintain communication with the IO until they finish creating the Prediction user cluster associated with your project and assign the appropriate node pools within the cluster.
After completing cluster provisioning, the Prediction user cluster is now ready to use for Vertex AI online predictions.
Node selection recommendations
When the IO creates node pools in a user cluster, they assign one of the available machine types in Distributed Cloud to provide a predefined set of resources for the worker nodes. Depending on the model size and complexity, you require different computing performances and, consequently, a specific amount of CPU, memory, and GPU. You must provide these details in your communication with the IO when you want to create a Prediction cluster.
When you determine with the IO the machine type for node pools that you require in the Prediction cluster, you must adhere to the following practices:
- Distributed Cloud adds computing overhead to the nodes for mandatory system components. Therefore, you must choose a larger machine type for your node pools than the one you intend to use in the resource pool for your model or models.
- Choose the solution that provides the minimum amount of memory and computing
resources necessary for your requirements. For example, if your model
requires eight vCPUs, choose the
n2-highcpu-8-gdc
machine type, the smallest solution with eight vCPUs and 8 GB of memory in Distributed Cloud. - Progressively, consider higher performance solutions only if smaller solutions are not adequate for your needs and the size and complexity of the model. You must adhere to the principle of least privilege by using the least amount of resources you require to execute your specific workflow.
- Only choose solutions that have GPUs if you require them for your model.
- If your model requires GPUs, consider the
a2-highgpu-1g-gdc
machine type, the smallest solution providing GPUs.
Prediction user cluster case template
Use the following template to send an email to your IO. The email opens a case to create the Prediction user cluster that you need for online predictions.
Good day,
I need to create a Prediction user cluster and associate it with a project in my organization to use the Vertex AI online prediction functionality.
Please use the following information for the creation of the cluster:
- **Cluster name:** vtx-ai-prediction
- **Name of the organization:** [Specify your organization's name.]
- **Project name:** [Specify the name of your project to associate with the Prediction user cluster.]
- **Machine type for the node pool:** [Specify the machine type you chose from the list of available machine types for the cluster nodes based on node selection recommendations. Please note that the IO can respond with a different suggestion based on your needs.]
- **Compute resources:** [Optionally, if you know how many compute resources your workloads need, describe them in this field.]
- **Memory resources:** [Optionally, if you know how many memory resources your workloads need, describe them in this field.]
- **GPU resources:** [Optionally, if you know how many GPU resources your workloads need, describe them in this field.]
**Note for IO:** Review the instructions to create the Prediction user cluster in the following section of the documentation: Operator > Configure the deployment > Create the Prediction cluster
Thank you,
[Your name]