In Model Garden, you can discover, test, tune, and deploy models by using the Google Cloud console or the Google Cloud CLI.
This document shows you how to use models from Model Garden and covers the following topics:
- Send test prompts: Test a model by sending it prompts directly from the Google Cloud console.
- Tune a model: Customize a foundation model for your specific task by using a tuning pipeline or a notebook.
- Deploy an open model: Deploy open models to a Vertex AI endpoint by using the console, gcloud CLI, or programmatically.
- Deploy a partner model: Deploy partner models that you have access to.
- View or manage an endpoint: See the details of your deployed models and endpoints.
- Undeploy models and delete resources: Clean up resources by undeploying models and deleting endpoints and models.
The following diagram summarizes the workflow for using a model from Model Garden:
Send test prompts
In the Google Cloud console, go to the Model Garden page.
Find a model that you want to test and click View details.
Click Open prompt design.
In the Prompt field, enter a prompt to test.
Optional: Configure the model parameters.
Click Submit.
Tune a model
You can tune a model by using a pre-configured pipeline for a guided experience or a notebook for more flexibility. The following table compares these two methods.
Tuning Method | Description | Pros | Cons | Use Case |
---|---|---|---|---|
Fine-tuning pipeline | Uses a pre-configured Vertex AI pipeline to tune a model. | Guided, automated process. Less setup required. | Less flexible than a notebook. | For users who want a straightforward, managed tuning experience without writing custom code. |
Notebook | Uses a Jupyter notebook to run the tuning process. | Highly customizable and flexible. Allows for custom logic and experimentation. | Requires more manual setup and coding. | For users who need full control over the tuning process or want to integrate custom steps. |
Tune with a fine-tuning pipeline
The model cards for some open source foundation models, such as BERT and T5-FLAN, support tuning with a pre-configured pipeline.
In the Google Cloud console, go to the Model Garden page.
In Search models, enter BERT or T5-FLAN, and then click the magnifying glass to search.
Click View details on the T5-FLAN or the BERT model card.
Click Open fine-tuning pipeline. The Vertex AI pipelines page opens.
To start tuning, click Create run.
Tune in a notebook
The model cards for most open source foundation models and fine-tunable models support tuning in a notebook.
In the Google Cloud console, go to the Model Garden page.
Find a model that you want to tune and click its model card.
Click Open notebook.
Deploy an open model
You can deploy a model by using its model card in the Google Cloud console or programmatically.
For more information about setting up the Google Gen AI SDK or Google Cloud CLI, see the Google Gen AI SDK overview or Install the Google Cloud CLI.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
List the deployable models and record the ID of the model you want to deploy. You can also list the supported Hugging Face models in Model Garden and filter them by name. The output doesn't include tuned models.
Using the model ID from the previous step, view the deployment specifications for the model. The output shows the machine type, accelerator type, and container image URI that Model Garden has verified for that model.
Deploy a model to an endpoint. By default, Model Garden uses the default deployment configuration. You can override these defaults by specifying additional arguments.
gcloud
Before you begin, specify a quota project to run the following commands. The commands you run are counted against the quotas for that project. For more information, see Set the quota project.
To list the models that you can deploy, run the
gcloud ai model-garden models list
command. This command lists all model IDs and indicates which ones you can self-deploy.gcloud ai model-garden models list
In the output, find the model ID to deploy. The following example shows an abbreviated output.
MODEL_ID CAN_DEPLOY CAN_PREDICT google/gemma2@gemma-2-27b Yes No google/gemma2@gemma-2-27b-it Yes No google/gemma2@gemma-2-2b Yes No google/gemma2@gemma-2-2b-it Yes No google/gemma2@gemma-2-9b Yes No google/gemma2@gemma-2-9b-it Yes No google/gemma3@gemma-3-12b-it Yes No google/gemma3@gemma-3-12b-pt Yes No google/gemma3@gemma-3-1b-it Yes No google/gemma3@gemma-3-1b-pt Yes No google/gemma3@gemma-3-27b-it Yes No google/gemma3@gemma-3-27b-pt Yes No google/gemma3@gemma-3-4b-it Yes No google/gemma3@gemma-3-4b-pt Yes No google/gemma3n@gemma-3n-e2b Yes No google/gemma3n@gemma-3n-e2b-it Yes No google/gemma3n@gemma-3n-e4b Yes No google/gemma3n@gemma-3n-e4b-it Yes No google/gemma@gemma-1.1-2b-it Yes No google/gemma@gemma-1.1-2b-it-gg-hf Yes No google/gemma@gemma-1.1-7b-it Yes No google/gemma@gemma-1.1-7b-it-gg-hf Yes No google/gemma@gemma-2b Yes No google/gemma@gemma-2b-gg-hf Yes No google/gemma@gemma-2b-it Yes No google/gemma@gemma-2b-it-gg-hf Yes No google/gemma@gemma-7b Yes No google/gemma@gemma-7b-gg-hf Yes No google/gemma@gemma-7b-it Yes No google/gemma@gemma-7b-it-gg-hf Yes No
The output doesn't include any tuned models or Hugging Face models. To view which Hugging Face models are supported, add the
--can-deploy-hugging-face-models
flag.To view the deployment specifications for a model, run the
gcloud ai model-garden models list-deployment-config
command. The output shows the machine type, accelerator type, and container image URI that Model Garden supports for a particular model.gcloud ai model-garden models list-deployment-config \ --model=MODEL_ID
Replace MODEL_ID with the model ID from the previous list command, such as
google/gemma@gemma-2b
orstabilityai/stable-diffusion-xl-base-1.0
.To deploy a model to an endpoint, run the
gcloud ai model-garden models deploy
command. By default, Model Garden generates a display name for your endpoint and uses the default deployment configuration. You can override these defaults by specifying additional arguments.To run the command asynchronously, include the
--asynchronous
flag.gcloud ai model-garden models deploy \ --model=MODEL_ID \ [--machine-type=MACHINE_TYPE] \ [--accelerator-type=ACCELERATOR_TYPE] \ [--endpoint-display-name=ENDPOINT_NAME] \ [--hugging-face-access-token=HF_ACCESS_TOKEN] \ [--reservation-affinity reservation-affinity-type=any-reservation] \ [--reservation-affinity reservation-affinity-type=specific-reservation, key="compute.googleapis.com/reservation-name", values=RESERVATION_RESOURCE_NAME] \ [--asynchronous]
Replace the following placeholders:
- MODEL_ID: The model ID from the previous list command. For Hugging Face models, use the Hugging Face model URL format, such as
stabilityai/stable-diffusion-xl-base-1.0
. - MACHINE_TYPE: Defines the set of resources to deploy for your model, such as
g2-standard-4
. - ACCELERATOR_TYPE: Specifies accelerators to add to your deployment to help improve performance when working with intensive workloads, such as
NVIDIA_L4
. - ENDPOINT_NAME: A name for the deployed Vertex AI endpoint.
- HF_ACCESS_TOKEN: For Hugging Face models, if the model is gated, provide an access token.
- RESERVATION_RESOURCE_NAME: To use a specific Compute Engine reservation, specify the name of your reservation. If you specify a specific reservation, you can't specify
any-reservation
.
The output includes the deployment configuration that Model Garden used, the endpoint ID, and the deployment operation ID, which you can use to check the deployment status.
Using the default deployment configuration: Machine type: g2-standard-12 Accelerator type: NVIDIA_L4 Accelerator count: 1 The project has enough quota. The current usage of quota for accelerator type NVIDIA_L4 in region us-central1 is 0 out of 28. Deploying the model to the endpoint. To check the deployment status, you can try one of the following methods: 1) Look for endpoint `ENDPOINT_DISPLAY_NAME` at the [Vertex AI] -> [Online prediction] tab in Cloud Console 2) Use `gcloud ai operations describe OPERATION_ID --region=LOCATION` to find the status of the deployment long-running operation
- MODEL_ID: The model ID from the previous list command. For Hugging Face models, use the Hugging Face model URL format, such as
To view details about your deployment, run the
gcloud ai endpoints list --list-model-garden-endpoints-only
command:gcloud ai endpoints list --list-model-garden-endpoints-only \ --region=LOCATION_ID
Replace LOCATION_ID with the region where you deployed the model.
The output lists all endpoints created from Model Garden and includes details like the endpoint ID and name. To find your deployment, look for the endpoint name returned from the previous command.
REST
List all deployable models and get the ID of the model you want to deploy. You can then deploy the model with its default configuration and endpoint, or customize your deployment by setting a specific machine type or using a dedicated endpoint.
1. List models that you can deploy
Before using any of the request data, make the following replacements:
- PROJECT_ID: Your project ID.
- QUERY_PARAMETERS: To list Model Garden
models, add the following query parameters
listAllVersions=True&filter=is_deployable(true)
. To list Hugging Face models, set the filter toalt=json&is_hf_wildcard(true)+AND+labels.VERIFIED_DEPLOYMENT_CONFIG%3DVERIFIED_DEPLOYMENT_SUCCEED&listAllVersions=True
.
HTTP method and URL:
GET https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS
To send your request, choose one of these options:
curl
Execute the following command:
curl -X GET \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "x-goog-user-project: PROJECT_ID" \
"https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS"
PowerShell
Execute the following command:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred"; "x-goog-user-project" = "PROJECT_ID" }
Invoke-WebRequest `
-Method GET `
-Headers $headers `
-Uri "https://us-central1-aiplatform.googleapis.com/v1/publishers/*/models?QUERY_PARAMETERS" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "publisherModels": [ { "name": "publishers/google/models/gemma3", "versionId": "gemma-3-1b-it", "openSourceCategory": "GOOGLE_OWNED_OSS_WITH_GOOGLE_CHECKPOINT", "supportedActions": { "openNotebook": { "references": { "us-central1": { "uri": "https://colab.research.google.com/github/GoogleCloudPlatform/vertex-ai-samples/blob/main/notebooks/community/model_garden/model_garden_gradio_streaming_chat_completions.ipynb" } }, "resourceTitle": "Notebook", "resourceUseCase": "Chat Completion Playground", "resourceDescription": "Chat with deployed Gemma 2 endpoints via Gradio UI." }, "deploy": { "modelDisplayName": "gemma-3-1b-it", "containerSpec": { "imageUri": "us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20250312_0916_RC01", "args": [ "python", "-m", "vllm.entrypoints.api_server", "--host=0.0.0.0", "--port=8080", "--model=gs://vertex-model-garden-restricted-us/gemma3/gemma-3-1b-it", "--tensor-parallel-size=1", "--swap-space=16", "--gpu-memory-utilization=0.95", "--disable-log-stats" ], "env": [ { "name": "MODEL_ID", "value": "google/gemma-3-1b-it" }, { "name": "DEPLOY_SOURCE", "value": "UI_NATIVE_MODEL" } ], "ports": [ { "containerPort": 8080 } ], "predictRoute": "/generate", "healthRoute": "/ping" }, "dedicatedResources": { "machineSpec": { "machineType": "g2-standard-12", "acceleratorType": "NVIDIA_L4", "acceleratorCount": 1 } }, "publicArtifactUri": "gs://vertex-model-garden-restricted-us/gemma3/gemma3.tar.gz", "deployTaskName": "vLLM 128K context", "deployMetadata": { "sampleRequest": "{\n \"instances\": [\n {\n \"@requestFormat\": \"chatCompletions\",\n \"messages\": [\n {\n \"role\": \"user\",\n \"content\": \"What is machine learning?\"\n }\n ],\n \"max_tokens\": 100\n }\n ]\n}\n" } }, ...
2. Deploy a model
Deploy a model from Model Garden or a model from Hugging Face. You can also customize the deployment by specifying additional JSON fields.
Deploy a model with its default configuration
Before using any of the request data, make the following replacements:
- LOCATION: A region where the model is deployed.
- PROJECT_ID: Your project ID.
- MODEL_ID: The ID of the model to deploy, which you can get from listing all the deployable models. The ID uses the following format: publishers/PUBLISHER_NAME/models/ MODEL_NAME@MODEL_VERSION.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
Request JSON body:
{ "publisher_model_name": "MODEL_ID", "model_config": { "accept_eula": "true" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "publisher_model_name": "MODEL_ID", "model_config": { "accept_eula": "true" } } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "publisher_model_name": "MODEL_ID", "model_config": { "accept_eula": "true" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata", "genericMetadata": { "createTime": "2025-03-13T21:44:44.538780Z", "updateTime": "2025-03-13T21:44:44.538780Z" }, "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it", "destination": "projects/PROJECT_ID/locations/LOCATION", "projectNumber": "PROJECT_ID" } }
Deploy a Hugging Face model
Before using any of the request data, make the following replacements:
- LOCATION: A region where the model is deployed.
- PROJECT_ID: Your project ID.
- MODEL_ID: The Hugging Face model ID model to deploy, which you can get from listing all the deployable models. The ID uses the following format: PUBLISHER_NAME/MODEL_NAME.
- ACCESS_TOKEN: If the model is gated, provide an access token.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
Request JSON body:
{ "hugging_face_model_id": "MODEL_ID", "hugging_face_access_token": "ACCESS_TOKEN", "model_config": { "accept_eula": "true" } }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "hugging_face_model_id": "MODEL_ID", "hugging_face_access_token": "ACCESS_TOKEN", "model_config": { "accept_eula": "true" } } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "hugging_face_model_id": "MODEL_ID", "hugging_face_access_token": "ACCESS_TOKEN", "model_config": { "accept_eula": "true" } } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "name": "projects/PROJECT_ID/locations/us-central1LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata", "genericMetadata": { "createTime": "2025-03-13T21:44:44.538780Z", "updateTime": "2025-03-13T21:44:44.538780Z" }, "publisherModel": "publishers/PUBLISHER_NAME/model/MODEL_NAME", "destination": "projects/PROJECT_ID/locations/LOCATION", "projectNumber": "PROJECT_ID" } }
Deploy a model with customizations
Before using any of the request data, make the following replacements:
- LOCATION: A region where the model is deployed.
- PROJECT_ID: Your project ID.
- MODEL_ID: The ID of the model to
deploy, which you can get from listing all the deployable models. The ID
uses the following format: publishers/PUBLISHER_NAME/models/
MODEL_NAME@MODEL_VERSION, such as
google/gemma@gemma-2b
orstabilityai/stable-diffusion-xl-base-1.0
. - MACHINE_TYPE: Defines the set
of resources to deploy for your model, such as
g2-standard-4
. - ACCELERATOR_TYPE:
Specifies accelerators to add to your deployment to help improve performance
when working with intensive workloads, such as
NVIDIA_L4
- ACCELERATOR_COUNT: The number of accelerators to use in your deployment.
reservation_affinity_type
: To use an existing Compute Engine reservation for your deployment, specify any reservation or a specific one. If you specify this value, don't specifyspot
.spot
: Whether to use spot VMs for your deployment.- IMAGE_URI: The location of the
container image to use, such as
us-docker.pkg.dev/vertex-ai/vertex-vision-model-garden-dockers/pytorch-vllm-serve:20241016_0916_RC00_maas
- CONTAINER_ARGS: Arguments to pass to the container during the deployment.
- CONTAINER_PORT: A port number for your container.
fast_tryout_enabled
: When testing a model, you can choose to use a faster deployment. This option is available only for the highly-used models with certain machine types. If enabled, you cannot specify model or deployment configurations.
HTTP method and URL:
POST https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy
Request JSON body:
{ "publisher_model_name": "MODEL_ID", "deploy_config": { "dedicated_resources": { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count": ACCELERATOR_COUNT, "reservation_affinity": { "reservation_affinity_type": "ANY_RESERVATION" } }, "spot": "false" } }, "model_config": { "accept_eula": "true", "container_spec": { "image_uri": "IMAGE_URI", "args": [CONTAINER_ARGS ], "ports": [ { "container_port": CONTAINER_PORT } ] } }, "deploy_config": { "fast_tryout_enabled": false }, }
To send your request, choose one of these options:
curl
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
cat > request.json << 'EOF' { "publisher_model_name": "MODEL_ID", "deploy_config": { "dedicated_resources": { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count": ACCELERATOR_COUNT, "reservation_affinity": { "reservation_affinity_type": "ANY_RESERVATION" } }, "spot": "false" } }, "model_config": { "accept_eula": "true", "container_spec": { "image_uri": "IMAGE_URI", "args": [CONTAINER_ARGS ], "ports": [ { "container_port": CONTAINER_PORT } ] } }, "deploy_config": { "fast_tryout_enabled": false }, } EOF
Then execute the following command to send your REST request:
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json; charset=utf-8" \
-d @request.json \
"https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy"
PowerShell
Save the request body in a file named request.json
.
Run the following command in the terminal to create or overwrite
this file in the current directory:
@' { "publisher_model_name": "MODEL_ID", "deploy_config": { "dedicated_resources": { "machine_spec": { "machine_type": "MACHINE_TYPE", "accelerator_type": "ACCELERATOR_TYPE", "accelerator_count": ACCELERATOR_COUNT, "reservation_affinity": { "reservation_affinity_type": "ANY_RESERVATION" } }, "spot": "false" } }, "model_config": { "accept_eula": "true", "container_spec": { "image_uri": "IMAGE_URI", "args": [CONTAINER_ARGS ], "ports": [ { "container_port": CONTAINER_PORT } ] } }, "deploy_config": { "fast_tryout_enabled": false }, } '@ | Out-File -FilePath request.json -Encoding utf8
Then execute the following command to send your REST request:
$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }
Invoke-WebRequest `
-Method POST `
-Headers $headers `
-ContentType: "application/json; charset=utf-8" `
-InFile request.json `
-Uri "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:deploy" | Select-Object -Expand Content
You receive a JSON response similar to the following.
{ "name": "projects/PROJECT_ID/locations/LOCATION/operations/OPERATION_ID", "metadata": { "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployOperationMetadata", "genericMetadata": { "createTime": "2025-03-13T21:44:44.538780Z", "updateTime": "2025-03-13T21:44:44.538780Z" }, "publisherModel": "publishers/google/models/gemma3@gemma-3-1b-it", "destination": "projects/PROJECT_ID/locations/LOCATION", "projectNumber": "PROJECT_ID" } }
Console
- In the Google Cloud console, go to the Model Garden page.
- Find a model to deploy and click its model card.
- Click Deploy to open the Deploy model pane.
- In the Deploy model pane, configure your deployment settings.
- Accept or modify the generated model and endpoint names.
- Select a location to create your model endpoint in.
- Select a machine type to use for each node of your deployment.
- To use a Compute Engine reservation, in the Deployment settings section, select Advanced. For the Reservation type field, select a reservation type. The reservation must match your specified machine specs.
- Automatically use created reservation: Vertex AI selects an available reservation with matching properties. If no capacity is available in the selected reservation, Vertex AI uses the general Google Cloud resource pool.
- Select specific reservations: Vertex AI uses a specific reservation. If no capacity is available in your selected reservation, the deployment fails.
- Don't use (default): Vertex AI uses the general Google Cloud resource pool.
- Click Deploy.
Terraform
To learn how to apply or remove a Terraform configuration, see Basic Terraform commands. For more information, see the Terraform provider reference documentation.
Deploy a model
The following example deploys the gemma-3-1b-it
model to a new Vertex AI endpoint in us-central1
by using default configurations.
terraform { required_providers { google = { source = "hashicorp/google" version = "6.45.0" } } } provider "google" { region = "us-central1" } resource "google_vertex_ai_endpoint_with_model_garden_deployment" "gemma_deployment" { publisher_model_name = "publishers/google/models/gemma3@gemma-3-1b-it" location = "us-central1" model_config { accept_eula = True } }
To deploy a model with customization, see Vertex AI Endpoint with Model Garden Deployment for details.
Apply the Configuration
terraform init terraform plan terraform apply
After you apply the configuration, Terraform provisions a new Vertex AI endpoint and deploys the specified open model.
Clean Up
To delete the endpoint and model deployment, run the following command:
terraform destroy
Deploy a partner model
Before you begin, you need an agreement with the partner. This agreement includes partner-specific terms, licensing, and pricing. For more information or to contact a partner, see the partner's documentation on their Model Garden model card and click Contact sales.
You must deploy on the machine types required by the partner, which are listed in the "Recommended hardware configuration" section on their Model Garden model card. The model serving resources are located in a secure, Google-managed project.
For self-deploy partner models, if you have sufficient quotas but encounter serving quota issues during deployment, contact your Google Cloud account team for assistance.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace the following placeholders with your information.
The values for machine_type
, accelerator_type
, and accelerator_count
must match one of the partner's recommended hardware configurations. To find the recommended configurations, see the model card in Model Garden.
- LOCATION: The region where you plan to deploy the model and endpoint.
- PROJECT_ID: Your project ID.
- DISPLAY_NAME: A descriptive name for the associated resource.
- PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
- PUBLISHER_MODEL_NAME: The name of the model to upload.
- MACHINE_TYPE: Defines the set of resources to deploy for your model, such as
g2-standard-4
. - ACCELERATOR_TYPE: Specifies accelerators to add to your deployment, such as
NVIDIA_L4
. - ACCELERATOR_COUNT: The number of accelerators to use.
- REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.
from google.cloud import aiplatform aiplatform.init(project=PROJECT_ID, location=LOCATION) # Upload a model model = aiplatform.Model.upload( display_name="DISPLAY_NAME_MODEL", model_garden_source_model_name = f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME", ) # Create endpoint my_endpoint = aiplatform.Endpoint.create(display_name="DISPLAY_NAME_ENDPOINT") # Deploy model MACHINE_TYPE = "MACHINE_TYPE" # @param {type: "string"} ACCELERATOR_TYPE = "ACCELERATOR_TYPE" # @param {type: "string"} ACCELERATOR_COUNT = ACCELERATOR_COUNT # @param {type: "number"} model.deploy( endpoint=my_endpoint, deployed_model_display_name="DISPLAY_NAME_DEPLOYED_MODEL", traffic_split={"0": 100}, machine_type=MACHINE_TYPE, accelerator_type=ACCELERATOR_TYPE, accelerator_count=ACCELERATOR_COUNT, min_replica_count=1, max_replica_count=1, ) # Unary call for predictions PAYLOAD = { REQUEST_PAYLOAD } request = json.dumps(PAYLOAD) response = my_endpoint.raw_predict( body = request, headers = {'Content-Type':'application/json'} ) print(response) # Streaming call for predictions PAYLOAD = { REQUEST_PAYLOAD } request = json.dumps(PAYLOAD) for stream_response in my_endpoint.stream_raw_predict( body = request, headers = {'Content-Type':'application/json'} ): print(stream_response)
REST
List all deployable models and get the ID of the model you want to deploy. You can then deploy the model with its default configuration and endpoint, or customize your deployment by setting a specific machine type or using a dedicated endpoint.
In the following sample curl commands, replace the placeholders with your information.
The values for machineType
, acceleratorType
, and acceleratorCount
must match one of the partner's recommended hardware configurations. To find the recommended configurations, see the model card in Model Garden.
- LOCATION: The region where you plan to deploy the model and endpoint.
- PROJECT_ID: Your project ID.
- DISPLAY_NAME: A descriptive name for the associated resource.
- PUBLISHER_NAME: The name of partner that provides the model to upload or deploy.
- PUBLISHER_MODEL_NAME: The name of the model to upload.
- ENDPOINT_ID: The ID of the endpoint.
- MACHINE_TYPE: Defines the set of resources to deploy for your model, such as
g2-standard-4
. - ACCELERATOR_TYPE: Specifies accelerators to add to your deployment, such as
NVIDIA_L4
. - ACCELERATOR_COUNT: The number of accelerators to use.
- REQUEST_PAYLOAD: The fields and values to include in your prediction request. View the partner's Model Garden model card to see the available fields.
Upload a model to add it to Model Registry.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1beta1/projects/PROJECT_ID/locations/LOCATION/models:upload \ -d '{ "model": { "displayName": "DISPLAY_NAME_MODEL", "baseModelSource": { "modelGardenSource": { "publicModelName": f"publishers/PUBLISHER_NAME/models/PUBLISHER_MODEL_NAME", } } } }'
Create an endpoint.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints \ -d '{ "displayName": "DISPLAY_NAME_ENDPOINT" }'
Deploy the uploaded model to the endpoint.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:deployModel \ -d '{ "deployedModel": { "model": f"projects/PROJECT_ID/locations/LOCATION/models/MODEL_ID", "displayName": "DISPLAY_NAME_DEPLOYED_MODEL", "dedicatedResources": { "machineSpec": { "machineType": "MACHINE_TYPE", "acceleratorType": "ACCELERATOR_TYPE", "acceleratorCount":"ACCELERATOR_COUNT", }, "minReplicaCount": 1, "maxReplicaCount": 1 }, }, "trafficSplit": { "0": 100 } }'
After the model is deployed, you can make a unary or streaming call for predictions. To see which API methods are supported, view the partner's Model Garden model card.
Sample unary call:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:rawPredict \ -d 'REQUEST_PAYLOAD'
Sample streaming call:
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ https://LOCATION-aiplatform.googleapi.com/v1/projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID:streamRawPredict \ -d 'REQUEST_PAYLOAD'
Console
- In the Google Cloud console, go to the Model Garden page.
- To find a specific model, enter its name in the Model Garden search box.
- To view all models that you can self-deploy, in the Model collections section of the filter pane, select Self-deploy partner models. The page lists all self-deployable partner models.
- Click the name of the model that you want to deploy to open its model card.
- Click Deploy options.
- In the Deploy on Vertex AI pane, configure your deployment, such as the location and machine type.
- Click Deploy.
After the deployment is complete, you can request predictions by using the SDK or API. For more instructions, see the Documentation section on the model card.
View or manage an endpoint
To view and manage your endpoint, go to the Vertex AI Online prediction page.
Vertex AI lists all endpoints in your project for the selected region. To view an endpoint's details, such as which models are deployed to it, click the endpoint.
Undeploy models and delete resources
To stop a deployed model from using resources in your project, undeploy your model from its endpoint. You must undeploy a model before you can delete the endpoint and the model.
Undeploy models
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace the following placeholders:
- PROJECT_ID: Your project ID.
- LOCATION: Your region, for example,
us-central1
. - ENDPOINT_ID: Your endpoint ID.
from google.cloud import aiplatform aiplatform.init(project=PROJECT_ID, location=LOCATION) # To find out which endpoints are available, un-comment the line below: # endpoints = aiplatform.Endpoint.list() endpoint = aiplatform.Endpoint(ENDPOINT_ID) endpoint.undeploy_all()
gcloud
In these commands, replace the following placeholders:
- PROJECT_ID: Your project ID.
- LOCATION_ID: The region where you deployed the model and endpoint.
- ENDPOINT_ID: The endpoint ID.
- MODEL_ID: The model ID.
- DEPLOYED_MODEL_ID: The deployed model ID.
To find the endpoint ID associated with your deployment, run the
gcloud ai endpoints list
command.gcloud ai endpoints list \ --project=PROJECT_ID \ --region=LOCATION_ID
To find the model ID, run the
gcloud ai models list
command.gcloud ai models list \ --project=PROJECT_ID \ --region=LOCATION_ID
Using the model ID from the previous command, get the deployed model ID by running the
gcloud ai models describe
command.gcloud ai models describe MODEL_ID \ --project=PROJECT_ID \ --region=LOCATION_ID
The output looks similar to the following example. The ID is the value of
deployedModelId
.Using endpoint [https://us-central1-aiplatform.googleapis.com/] artifactUri: [URI removed] baseModelSource: modelGardenSource: publicModelName: publishers/google/models/gemma2 ... deployedModels: - deployedModelId: '1234567891234567891' endpoint: projects/12345678912/locations/us-central1/endpoints/12345678912345 displayName: gemma2-2b-it-12345678912345 etag: [ETag removed] modelSourceInfo: sourceType: MODEL_GARDEN name: projects/123456789123/locations/us-central1/models/gemma2-2b-it-12345678912345 ...
To undeploy the model from the endpoint, run the
gcloud ai endpoints undeploy-model
command with the endpoint ID and the deployed model ID from the previous commands.gcloud ai endpoints undeploy-model ENDPOINT_ID \ --project=PROJECT_ID \ --region=LOCATION_ID \ --deployed-model-id=DEPLOYED_MODEL_ID
This command doesn't return any output.
Console
- In the Google Cloud console, go to the Endpoints tab on the Online prediction page.
- In the Region drop-down list, choose the region where your endpoint is located.
- Click the endpoint name to open the details page.
- On the row for the model, click Actions, and then select Undeploy model from endpoint.
- In the Undeploy model from endpoint dialog, click Undeploy.
Delete endpoints
After you undeploy all models from an endpoint, you can delete it.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace the following placeholders:
- PROJECT_ID: Your project ID.
- LOCATION: Your region, for example,
us-central1
. - ENDPOINT_ID: Your endpoint ID.
from google.cloud import aiplatform aiplatform.init(project=PROJECT_ID, location=LOCATION) # To find out which endpoints are available, un-comment the line below: # endpoints = aiplatform.Endpoint.list() endpoint = aiplatform.Endpoint(ENDPOINT_ID) endpoint.delete()
gcloud
In these commands, replace the following placeholders:
- PROJECT_ID: Your project ID.
- LOCATION_ID: The region where you deployed the model and endpoint.
- ENDPOINT_ID: The endpoint ID.
To get the ID of the endpoint to delete, run the
gcloud ai endpoints list
command. This command lists the endpoint IDs for all endpoints in your project.gcloud ai endpoints list \ --project=PROJECT_ID \ --region=LOCATION_ID
To delete the endpoint, run the
gcloud ai endpoints delete
command.gcloud ai endpoints delete ENDPOINT_ID \ --project=PROJECT_ID \ --region=LOCATION_ID
When prompted, enter
y
to confirm. This command doesn't return any output.
Console
- In the Google Cloud console, go to the Endpoints tab on the Online prediction page.
- In the Region drop-down list, choose the region your endpoint is located.
- At the end of the endpoint's row, click Actions, and then select Delete endpoint.
- In the confirmation prompt, click Confirm.
Delete models
After a model is undeployed from all endpoints, you can delete the model resource.
Python
To learn how to install or update the Vertex AI SDK for Python, see Install the Vertex AI SDK for Python. For more information, see the Python API reference documentation.
In your code, replace the following placeholders:
- PROJECT_ID: Your project ID.
- LOCATION: Your region, for example,
us-central1
. - MODEL_ID: Your model ID.
from google.cloud import aiplatform aiplatform.init(project=PROJECT_ID, location=LOCATION) # To find out which models are available in Model Registry, un-comment the line below: # models = aiplatform.Model.list() model = aiplatform.Model(MODEL_ID) model.delete()
gcloud
In these commands, replace the following placeholders:
- PROJECT_ID: Your project ID.
- LOCATION_ID: The region where you deployed the model and endpoint.
- MODEL_ID: The model ID.
To find the ID of the model to delete, run the
gcloud ai models list
command.gcloud ai models list \ --project=PROJECT_ID \ --region=LOCATION_ID
To delete the model, run the
gcloud ai models delete
command with the model ID and its location.gcloud ai models delete MODEL_ID \ --project=PROJECT_ID \ --region=LOCATION_ID
Console
- In the Google Cloud console, go to the Model Registry page.
- In the Region drop-down list, choose the region where you deployed your model.
- On the row for your model, click
Deleting a model also deletes all of its associated versions and evaluations from your Google Cloud project.
Actions and then select Delete model.
- In the confirmation prompt, click Delete.
View code samples
Most model cards for task-specific solutions models contain code samples that you can copy and test.
In the Google Cloud console, go to the Model Garden page.
Find a model that you want to view code samples for and click its Documentation tab.
The page scrolls to the documentation section, which contains embedded code samples.
Create a vision app
The model cards for some computer vision models let you create a vision application.
In the Google Cloud console, go to the Model Garden page.
In the Task specific solutions section, find a vision model that you want to use to create a vision application, and click View details.
Click Build app. This opens Vertex AI Vision.
In Application name, enter a name for your application and click Continue.
Select a billing plan and click Create. This opens Vertex AI Vision Studio, where you can continue to create your computer vision application.