Error code 429

If the number of your requests exceeds the capacity allocated to process requests, then error code 429 is returned. The following table displays the error message generated by each type of quota framework:

Quota framework	Message
Pay-as-you-go	`Resource exhausted, please try again later.`
Provisioned Throughput	`Too many requests. Exceeded the Provisioned Throughput.`

With a Provisioned Throughput subscription, you can reserve an amount of throughput for specific generative AI models. If you don't have a Provisioned Throughput subscription and resources aren't available to your application, then an error code 429 is returned. Although you don't have reserved capacity, you can try your request again. However, the request isn't counted against your error rate as described in your service level agreement (SLA).

For projects that have purchased Provisioned Throughput, Vertex AI measures a project's throughput and reserves the purchased amount of throughput for the project's actual usage.

For standard Provisioned Throughput, when you use less than your purchased amount, errors that might otherwise be 429 are returned as 5XX and count toward the SLA error rate. For Single Zone Provisioned Throughput, when you use less than your purchased amount, capacity-related 429 errors are treated as 5XX but don't count toward the SLA error rate. When you exceed your purchased amount, the additional requests are processed on-demand as pay-as-you-go.

Pay-as-you-go

On the pay-as-you-go quota framework, you have the following options to resolving 429 errors:

Use the global endpoint instead of a regional endpoint whenever possible.
Implement a retry strategy by using truncated exponential backoff.
If your model uses quotas, you can submit a Quota Increase Request (QIR). If your model uses Dynamic shared quota, smoothing traffic and reducing large spikes can help. For more information, see Dynamic shared quota (DSQ).
Subscribe to Provisioned Throughput for a more consistent level of service. For more information, see Provisioned Throughput.

Provisioned Throughput

To correct the 429 error generated by Provisioned Throughput, do the following:

Use the Default behavior example, which doesn't set a header in prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.
Increase the number of GSUs in your Provisioned Throughput subscription.

What's next

To learn more about dynamic shared quota, see Dynamic shared quota.
To learn more about Provisioned Throughput, see Provisioned Throughput.
To learn about quotas and limits for Vertex AI, see Vertex AI quotas and limits.
To learn more about Google Cloud quotas and system limits, see the Cloud Quotas documentation.
To learn more about API errors, see API errors.