Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Stay organized with collections
Save and categorize content based on your preferences.
This guide shows you how to troubleshoot 429 resource exhausted errors for different quota frameworks in Vertex AI. A 429 error indicates that the number of your requests exceeds the capacity allocated to process them.
The following table shows the error message for each quota framework:
Quota framework
Message
Pay-as-you-go
Resource exhausted, please try again later.
Provisioned Throughput
Too many requests. Exceeded the Provisioned Throughput.
Troubleshoot pay-as-you-go errors
In the pay-as-you-go model, you use a shared pool of resources. If resources aren't available when you make a request, Vertex AI returns a 429 error. This error doesn't count against your error rate as described in your service level agreement (SLA).
To resolve 429 errors, consider the following options:
If possible, use the global endpoint instead of a regional endpoint.
If your model uses quotas, you can submit a Quota Increase Request (QIR). If your model uses Dynamic shared quota, smoothing traffic and reducing large spikes can help. For more information, see Dynamic shared quota (DSQ).
Subscribe to Provisioned Throughput for a more consistent level of service. For more information, see Provisioned Throughput.
Troubleshoot Provisioned Throughput errors
If you have a Provisioned Throughput subscription, you receive a 429 error when your requests exceed your reserved throughput and you have configured your endpoint to reject overages.
To resolve 429 errors, you can do one of the following:
Configure your endpoint to process overages on-demand, which is the Default behavior
example. With this setting, overages are billed as pay-as-you-go instead of being rejected.
Increase the number of GSUs in your Provisioned Throughput subscription.
Provisioned Throughput behavior
When you subscribe to Provisioned Throughput, Vertex AI reserves the purchased amount of throughput for your project. How Vertex AI handles requests varies depending on whether you use more or less than your purchased throughput:
Under-utilization: If you use less than your purchased throughput, the handling of capacity-related errors depends on the type of Provisioned Throughput subscription:
Standard: Capacity-related errors that would otherwise be 429 are returned as 5XX and count toward the SLA error rate.
Single Zone: Capacity-related 429 errors are treated as 5XX but don't count toward the SLA error rate.
Over-utilization: By default, when you exceed your purchased throughput, additional requests are processed on-demand and billed as pay-as-you-go.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-26 UTC."],[],[],null,["If the number of your requests exceeds the capacity allocated to process\nrequests, then error code `429` is returned. The following table displays the\nerror message generated by each type of quota framework:\n\n| Quota framework | Message |\n|------------------------|-----------------------------------------------------------|\n| Pay-as-you-go | `Resource exhausted, please try again later.` |\n| Provisioned Throughput | `Too many requests. Exceeded the Provisioned Throughput.` |\n\nWith a Provisioned Throughput subscription, you can reserve an\namount of throughput for specific generative AI models. If you don't have a\nProvisioned Throughput subscription and resources aren't available\nto your application, then an error code `429` is returned. Although you don't\nhave reserved capacity, you can try your request again. However, the request\nisn't counted against your error rate as described in your [service level\nagreement (SLA)](/vertex-ai/generative-ai/sla).\n\nFor projects that have purchased Provisioned Throughput,\nVertex AI measures a project's throughput and reserves the purchased\namount of throughput for the project's actual usage.\n\nFor standard Provisioned Throughput, when you use less than your\npurchased amount, errors that might otherwise be `429` are returned as `5XX` and\ncount toward the SLA error rate. For Single Zone Provisioned Throughput,\nwhen you use less than your purchased amount, capacity-related `429` errors are\ntreated as `5XX` but don't count toward the SLA error rate. When you exceed your\npurchased amount, the additional requests are processed on-demand as pay-as-you-go.\n\nPay-as-you-go\n\nOn the pay-as-you-go quota framework, you have the following options to\nresolving `429` errors:\n\n- Use the [global endpoint](/vertex-ai/generative-ai/docs/learn/locations#global-endpoint) instead of a regional endpoint whenever possible.\n- Implement a retry strategy by using [truncated exponential backoff](/storage/docs/retry-strategy#exponential-backoff).\n- If your model uses quotas, you can submit a Quota Increase Request (QIR). If your model uses [Dynamic shared\n quota](/vertex-ai/generative-ai/docs/dynamic-shared-quota#supported_models), smoothing traffic and reducing large spikes can help. For more information, see [Dynamic shared\n quota (DSQ)](/vertex-ai/generative-ai/docs/dynamic-shared-quota).\n- Subscribe to Provisioned Throughput for a more consistent level of service. For more information, see [Provisioned Throughput](/vertex-ai/generative-ai/docs/provisioned-throughput).\n\nProvisioned Throughput\n\nTo correct the 429 error generated by Provisioned Throughput, do the\nfollowing:\n\n- Use the [Default behavior\n example](/vertex-ai/generative-ai/docs/use-provisioned-throughput#default), which doesn't set a header in prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.\n- Increase the number of GSUs in your Provisioned Throughput subscription.\n\nWhat's next\n\n- To learn more about dynamic shared quota, see [Dynamic shared\n quota](/vertex-ai/generative-ai/docs/dsq).\n- To learn more about Provisioned Throughput, see [Provisioned Throughput](/vertex-ai/generative-ai/docs/provisioned-throughput).\n- To learn about quotas and limits for Vertex AI, see [Vertex AI quotas and limits](/vertex-ai/docs/quotas).\n- To learn more about Google Cloud quotas and system limits, see the [Cloud Quotas documentation](/docs/quotas/overview).\n- To learn more about API errors, see [API errors](/vertex-ai/generative-ai/docs/model-reference/api-errors)."]]