Error code 429

If the number of your requests exceeds the capacity allocated to process requests, then error code 429 is returned. The following table displays the error message generated by each type of quota framework:

Quota framework Message
Pay-as-you-go Resource exhausted, please try again later.
Provisioned Throughput Too many requests. Exceeded the Provisioned Throughput.

With a Provisioned Throughput subscription, you can reserve an amount of throughput for specific generative AI models. If you don't have a Provisioned Throughput subscription and resources aren't available to your application, then an error code 429 is returned. Although you don't have reserved capacity, you can try your request again. However, the request isn't counted against your error rate as described in your service level agreement (SLA).

For projects that have purchased Provisioned Throughput, Vertex AI measures a project's throughput and reserves the purchased amount of throughput for the project's actual usage. When you're using less than your purchased throughput amount, errors that might otherwise return as 429 are returned as 5XX and are counted as part of the error rate that is described in the SLA. When you're using more than your purchased throughput amount, the additional requests are processed as pay-as-you-go.

Pay-as-you-go

On the pay-as-you-go quota framework, you have the following options to resolving 429 errors:

Provisioned Throughput

To correct the 429 error generated by Provisioned Throughput, do the following:

  • Use the Default behavior example, which doesn't set a header in prediction requests. Any overages are processed on-demand and billed as pay-as-you-go.
  • Increase the number of GSUs in your Provisioned Throughput subscription.

What's next