Generative AI on Vertex AI inference API errors

This document describes common API errors for Generative AI on Vertex AI and provides guidance on how to resolve them.

  • API errors: Learn about specific HTTP error codes, their causes, and recommended solutions.
  • Handle errors: Find best practices for managing API requests to avoid errors, such as implementing retry logic and avoiding traffic spikes.

API errors

This table provides API error codes and descriptions.

HTTP error code Canonical error code Cause Example Solution
400 INVALID_ARGUMENT / FAILED_PRECONDITION The request fails API validation, or you tried to access a model that requires allowlisting or is disallowed by your organization's policy. The request exceeds the model's input token limit. Refer to the Model API reference for Generative AI for request parameters, token count, and other parameters.
403 PERMISSION_DENIED The client doesn't have sufficient permission to call the API. A service account doesn't have permission to access the Cloud Storage bucket that hosts image or video resources. 1. Verify that all necessary APIs are enabled and the service account has the correct permission to access the selected Vertex AI service.
2. Ensure the Vertex AI per-product, per-project service account (P4SA) is granted the necessary permission to access resources referenced in the input.
404 NOT_FOUND A valid object was not found at the specified URL. An image file isn't found in the storage URL. Check and fix the file location.
429 RESOURCE_EXHAUSTED Depending on the error message, the cause could be one of the following:
1. API quota is over the limit.
2. The server is overloaded due to shared server capacity.
3. You've reached the daily limit for requests using logprobs.
The Gemini API exceeds the request per minute limit. 1. Check Vertex AI Generative AI quota limits. If needed, apply for a higher quota.
2. Retry after a few seconds. If the error persists for a prolonged period (hours), contact Vertex AI support.
3. Consider purchasing Provisioned Throughput.
499 CANCELLED The request was canceled by the client.
500 UNKNOWN / INTERNAL A server error occurred due to overload or dependency failure. The request is throttled because the service is temporarily overloaded. Retry after a few seconds. If the error persists for a prolonged period (hours), contact Vertex AI support.
503 UNAVAILABLE The service is temporarily unavailable. The server isn't responding to incoming requests. The unavailable status might be temporary. If the error persists, contact Vertex AI support.
504 DEADLINE_EXCEEDED The client set a deadline that is shorter than the server's default deadline (10 minutes), and the request didn't finish within the client-provided deadline. Consider increasing the client-provided deadline.

Handle errors

To manage API requests and avoid errors, follow these best practices:

  • Avoid traffic spikes: Sudden, large increases in requests within a short period can cause quota enforcement issues and server overloads. To avoid this, distribute your requests more evenly over time.
  • Implement retry logic carefully: When you retry a failed request, limit the number of retries to a maximum of two. Use exponential backoff, and start with a minimum delay of one second between retries.

What's next