Dynamic shared quota

Dynamic shared quota distributes on-demand capacity among all queries being processed by Google Cloud services. This capability eliminates the need for you to submit quota increase requests (QIRs).

Supported Google model versions

The Google models and their versions that support dynamic shared quota are the following:

  • Gemini 1.5 Flash (gemini-1.5-flash-002)
  • Gemini 1.5 Pro (gemini-1.5-pro-002)

Other supported models

For information about Claude models that support dynamic shared quota, see Use the Claude models from Anthropic.

Example of how dynamic shared quota works

Google Cloud looks at the available capacity in a specific region, such as North America, and then looks at how many customers are sending requests. Consider customer A, who sends 25 queries per minute (QPM), and customer B, who sends 25 QPM. The service can support 100 QPM. If customer A increases the rate of their queries to 75 QPM, then dynamic shared quota supports the increase. If customer A increases the rate of their queries to 100 QPM, then dynamic shared quota throttles customer A down to 75 QPM in order to continue to serve customer B at 25 QPM.

To troubleshoot errors that might occur with the use of dynamic shared quota, see Troubleshoot quota errors.

Considerations

Consideration Solution
Control cost and prevent budget overruns. Configure a self-imposed quota called a consumer quota override. For more information, see Creating a consumer quota override.
Prioritize traffic. Use Provisioned Throughput.
Monitor your usage. View Quotas & System Limits in your Google Cloud console.

What's next