Mit Sammlungen den Überblick behalten
Sie können Inhalte basierend auf Ihren Einstellungen speichern und kategorisieren.
Übersicht
Mit Spot-VMs können Sie die Kosten für die Ausführung Ihrer Vorhersagejobs senken.
Spot-VMs sind VM-Instanzen, die überschüssige Compute Engine-Kapazität darstellen. Für Spot-VMs gibt es erhebliche Rabatte. Compute Engine kann jedoch Spot-VMs jederzeit vorzeitig beenden oder löschen, um die Kapazität zurückzugewinnen.
Die Verwendung von Spot-VMs mit Vertex AI wird nur für benutzerdefiniertes Training und Vorhersagen unterstützt.
Die Verwendung von Spot-VMs mit TPU-Pods wird nicht unterstützt.
Das Einreichen von Jobs über die Google Cloud Console wird nicht unterstützt.
Abrechnung
Wenn Ihre Arbeitslasten fehlertolerant sind und einer möglichen vorzeitigen VM-Beendigung standhalten, können Spot-VMs Ihre Compute-Kosten erheblich reduzieren.
Wenn einige dieser VMs während der Verarbeitung beendet werden, verlangsamt sich der Job zwar, stoppt aber nicht vollständig. Spot-VMs führen Ihre Batchverarbeitungsaufgaben aus, ohne Ihre vorhandenen VMs zusätzlich zu belasten, und Sie müssen nicht den vollen Preis für zusätzliche Standard-VMs bezahlen.
Weitere Informationen finden Sie unter Vorwegnahme.
Wenn Sie Spot-VMs verwenden, werden Ihnen die Kosten basierend auf der Jobdauer und dem Maschinentyp in Rechnung gestellt.
Sie zahlen nicht für die Zeit, in der der Job in der Warteschlange steht oder vorzeitig beendet wird.
Umgang mit Vorabankündigung
Spot-VMs können jederzeit von der Compute Engine zurückgefordert werden. Daher muss Ihr Vorhersagejob fehlertolerant sein, damit Sie Spot-VMs optimal nutzen können. Wenn Spot-VMs vorzeitig beendet werden, schlägt der Vorhersagejob mit einem STOCKOUT-Fehler fehl und Compute Engine versucht, den Job bis zu sechsmal neu zu starten. Informationen dazu, wie Sie Spot-VMs optimal nutzen, finden Sie unter Best Practices für Spot-VMs.
Im Folgenden finden Sie einige Methoden, mit denen Sie Ihren Vorhersagejob ausfallsicher machen können:
Erstellen Sie Markierungen, um den Fortschritt zu speichern. Wenn Sie den Fortschritt Ihres Modells regelmäßig speichern, kann ein beendeter Vorhersagejob ab dem letzten gespeicherten Prüfpunkt fortgesetzt werden, anstatt von vorn anzufangen.
Verwenden Sie ein Shutdown-Script. Wenn Compute Engine eine Spot-VM vorzeitig beendet, können Sie ein Shutdown-Script verwenden, mit dem Bereinigungsaktionen ausgeführt werden, bevor die VM vorzeitig beendet wird. Weitere Informationen finden Sie unter Vorzeitiges Beenden mit einem Shutdown-Skript verwalten.
Vorhersagen mit Spot-VMs abrufen
Wenn Sie Spot-VMs beim Bereitstellen eines Modells verwenden möchten, um Vorhersagen zu erhalten, können Sie die REST API oder das Vertex AI SDK für Python verwenden.
[[["Leicht verständlich","easyToUnderstand","thumb-up"],["Mein Problem wurde gelöst","solvedMyProblem","thumb-up"],["Sonstiges","otherUp","thumb-up"]],[["Schwer verständlich","hardToUnderstand","thumb-down"],["Informationen oder Beispielcode falsch","incorrectInformationOrSampleCode","thumb-down"],["Benötigte Informationen/Beispiele nicht gefunden","missingTheInformationSamplesINeed","thumb-down"],["Problem mit der Übersetzung","translationIssue","thumb-down"],["Sonstiges","otherDown","thumb-down"]],["Zuletzt aktualisiert: 2025-08-28 (UTC)."],[],[],null,["# Use Spot VMs with inference\n\nOverview\n--------\n\n\nYou can reduce the cost of running your prediction jobs by using Spot VMs.\nSpot VMs are virtual machine (VM) instances that are excess\nCompute Engine capacity. Spot VMs have significant discounts,\nbut Compute Engine might preemptively stop or delete (preempt) Spot VMs to\nreclaim the capacity at any time.\n\nTo learn more, see [Spot VMs](/compute/docs/instances/spot).\n\nLimitations and requirements\n----------------------------\n\n\nConsider the following limitations and requirements when using\nSpot VMs with Vertex AI:\n\n- All [Spot VMs\n limitations](/compute/docs/instances/spot#limitations) apply when using Spot VMs with Vertex AI.\n- Using Spot VMs with Vertex AI is supported only for custom training and inference.\n- Using Spot VMs with TPU Pods isn't supported.\n- Submitting your job through the Google Cloud console is not supported.\n\nBilling\n-------\n\n\nIf your workloads are fault-tolerant and can withstand possible VM preemption,\nSpot VMs can reduce your compute costs significantly.\nIf some of your VMs stop during processing, the job slows but does not completely\nstop. Spot VMs complete your batch processing tasks without placing additional\nload on your existing VMs and without requiring you to pay full price for additional standard VMs.\nSee [Preemption handling](#preemption-handling).\n\n\nWhen you use Spot VMs, you're billed by job duration and machine type.\nYou don't pay for the time that the job is in a queue or preempted.\n\n\n| **Note:** When consuming from a reservation or spot capacity, billing is spread across two SKUs: the Compute Engine SKU with the label `vertex-ai-online-prediction` and the Vertex AI Management Fee SKU. This enables you to use your Committed Use Discounts (CUDs) in Vertex AI.\n\n\u003cbr /\u003e\n\nPreemption handling\n-------------------\n\n\nSpot VMs can be reclaimed by Compute Engine at any time.\n\nWhen Spot VMs are preempted, the prediction job\nfails with a `STOCKOUT` error and Compute Engine tries to restart the job up\nto six times. To learn how to get the most out of your Spot VMs, see\n[Spot VM\nbest practices](/compute/docs/instances/create-use-spot#best-practices).\n\nGet inferences by using Spot VMs\n--------------------------------\n\nTo use Spot VMs when you deploy a model\nto get inferences, you can use the REST API or the Vertex AI SDK for Python. \n\n### REST\n\n\nBefore using any of the request data,\nmake the following replacements:\n\n- \u003cvar translate=\"no\"\u003eLOCATION_ID\u003c/var\u003e: The region where you are using Vertex AI.\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: Your [project ID](/resource-manager/docs/creating-managing-projects#identifiers).\n- \u003cvar translate=\"no\"\u003eENDPOINT_ID\u003c/var\u003e: The ID for the endpoint.\n- \u003cvar translate=\"no\"\u003eMODEL_ID\u003c/var\u003e: The ID for the model to be deployed.\n- \u003cvar translate=\"no\"\u003eDEPLOYED_MODEL_NAME\u003c/var\u003e: A name for the `DeployedModel`. You can use the display name of the `Model` for the `DeployedModel` as well.\n- \u003cvar translate=\"no\"\u003eMACHINE_TYPE\u003c/var\u003e: Optional. The machine resources used for each node of this deployment. Its default setting is `n1-standard-2`. [Learn more about machine types.](/vertex-ai/docs/predictions/configure-compute)\n- \u003cvar translate=\"no\"\u003eACCELERATOR_TYPE\u003c/var\u003e: Optional. The type of accelerator to be attached to the machine. [Learn more](/vertex-ai/docs/predictions/configure-compute#gpus).\n- \u003cvar translate=\"no\"\u003eACCELERATOR_COUNT\u003c/var\u003e: Optional. The number of accelerators for each replica to use.\n- \u003cvar translate=\"no\"\u003eMIN_REPLICA_COUNT\u003c/var\u003e: The minimum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to the maximum number of nodes and never fewer than this number of nodes. This value must be greater than or equal to 1.\n- \u003cvar translate=\"no\"\u003eMAX_REPLICA_COUNT\u003c/var\u003e: The maximum number of nodes for this deployment. The node count can be increased or decreased as required by the inference load, up to this number of nodes and never fewer than the minimum number of nodes.\n- \u003cvar translate=\"no\"\u003eTRAFFIC_SPLIT_THIS_MODEL\u003c/var\u003e: The percentage of the prediction traffic to this endpoint to be routed to the model being deployed with this operation. Defaults to 100. All traffic percentages must add up to 100. [Learn more about traffic splits](/vertex-ai/docs/general/deployment#models-endpoint).\n- \u003cvar translate=\"no\"\u003eDEPLOYED_MODEL_ID_N\u003c/var\u003e: Optional. If other models are deployed to this endpoint, you must update their traffic split percentages so that all percentages add up to 100.\n- \u003cvar translate=\"no\"\u003eTRAFFIC_SPLIT_MODEL_N\u003c/var\u003e: The traffic split percentage value for the deployed model id key.\n- \u003cvar translate=\"no\"\u003ePROJECT_NUMBER\u003c/var\u003e: Your project's automatically generated [project number](/resource-manager/docs/creating-managing-projects#identifiers).\n\n\nHTTP method and URL:\n\n```\nPOST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel\n```\n\n\nRequest JSON body:\n\n```\n{\n\n\n\"acceleratorCount\": 1}, \"spot\": true, \"minReplicaCount\": 1, \"maxReplicaCount\": 1}}, \"trafficSplit\": {\"0\": 100}}' \\\n \"https://LOCATION_ID-aiplatform.googleapis.com/v1beta1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel\"\n\n \"deployedModel\": {\n \"model\": \"projects/PROJECT/locations/us-central1/models/MODEL_ID\",\n \"displayName\": \"DEPLOYED_MODEL_NAME\",\n \"enableContainerLogging\": true,\n \"dedicatedResources\": {\n \"machineSpec\": {\n \"machineType\": \"MACHINE_TYPE\",\n \"acceleratorType\": \"ACCELERATOR_TYPE\",\n \"acceleratorCount\": ACCELERATOR_COUNT\n },\n \"spot\": true,\n \"minReplicaCount\": MIN_REPLICA_COUNT,\n \"maxReplicaCount\": MAX_REPLICA_COUNT\n },\n },\n \"trafficSplit\": {\n \"0\": TRAFFIC_SPLIT_THIS_MODEL,\n \"DEPLOYED_MODEL_ID_1\": TRAFFIC_SPLIT_MODEL_1,\n \"DEPLOYED_MODEL_ID_2\": TRAFFIC_SPLIT_MODEL_2\n },\n}\n```\n\nTo send your request, expand one of these options:\n\n#### curl (Linux, macOS, or Cloud Shell)\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) , or by using [Cloud Shell](/shell/docs), which automatically logs you into the `gcloud` CLI . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\ncurl -X POST \\\n -H \"Authorization: Bearer $(gcloud auth print-access-token)\" \\\n -H \"Content-Type: application/json; charset=utf-8\" \\\n -d @request.json \\\n \"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel\"\n```\n\n#### PowerShell (Windows)\n\n| **Note:** The following command assumes that you have logged in to the `gcloud` CLI with your user account by running [`gcloud init`](/sdk/gcloud/reference/init) or [`gcloud auth login`](/sdk/gcloud/reference/auth/login) . You can check the currently active account by running [`gcloud auth list`](/sdk/gcloud/reference/auth/list).\n\n\nSave the request body in a file named `request.json`,\nand execute the following command:\n\n```\n$cred = gcloud auth print-access-token\n$headers = @{ \"Authorization\" = \"Bearer $cred\" }\n\nInvoke-WebRequest `\n -Method POST `\n -Headers $headers `\n -ContentType: \"application/json; charset=utf-8\" `\n -InFile request.json `\n -Uri \"https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel\" | Select-Object -Expand Content\n```\n\nYou should receive a JSON response similar to the following:\n\n```\n{\n \"name\": \"projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID\",\n \"metadata\": {\n \"@type\": \"type.googleapis.com/google.cloud.aiplatform.v1beta1.DeployModelOperationMetadata\",\n \"genericMetadata\": {\n \"createTime\": \"2020-10-19T17:53:16.502088Z\",\n \"updateTime\": \"2020-10-19T17:53:16.502088Z\"\n }\n }\n}\n```\n\n### Python\n\n\nTo learn how to install or update the Vertex AI SDK for Python, see\n[Install the Vertex AI SDK for Python](/vertex-ai/docs/start/install-sdk).\nFor more information, see the\n[Vertex AI SDK for Python\nAPI reference documentation](/python/docs/reference/aiplatform/latest). \n\n```python\nendpoint5.deploy(\n model = model,\n deployed_model_display_name=DEPLOYED_NAME,\n traffic_split=TRAFFIC_SPLIT,\n machine_type=\"\u003cvar translate=\"no\"\u003eMACHINE_TYPE\u003c/var\u003e\",\n accelerator_type=\"\u003cvar translate=\"no\"\u003eACCELERATOR_TYPE\u003c/var\u003e\",\n accelerator_count=ACCELERATOR_COUNT,\n min_replica_count=MIN_REPLICA_COUNT,\n max_replica_count=MAX_REPLICA_COUNT,\n spot=True,\n sync=True\n)\n```\n\nWhat's next\n-----------\n\n- Learn more about [Spot VMs](/compute/docs/instances/spot).\n- To learn more about Compute Engine VMs in general, read the [Virtual machine instances](/compute/docs/instances) documentation.\n- To learn how to create Spot VMs, read [Create and use\n Spot VMs](/compute/docs/instances/create-use-spot).\n- [Use Spot VMs with Vertex AI\n training](/vertex-ai/docs/training/use-spot-vms).\n- [Use Flex-start VMs with Vertex AI\n inference](/vertex-ai/docs/predictions/use-flex-start-vms)."]]