Quotas and limits

This document lists the quotas and limits that apply to Gemini.

A quota restricts how much of a shared Google Cloud resource your Google Cloud project can use, including hardware, software, and network components. Therefore, quotas are a part of a system that does the following:

  • Monitors your use or consumption of Google Cloud products and services.
  • Restricts your consumption of those resources, for reasons that include ensuring fairness and reducing spikes in usage.
  • Maintains configurations that automatically enforce prescribed restrictions.
  • Provides a means to request or make changes to the quota.

In most cases, when a quota is exceeded, the system immediately blocks access to the relevant Google resource, and the task that you're trying to perform fails. In most cases, quotas apply to each Google Cloud project and are shared across all applications and IP addresses that use that Google Cloud project.

There are also limits on Gemini resources. These limits are unrelated to the quota system. Limits cannot be changed unless otherwise stated.

Requests per second

Gemini enforces quotas on requests per second for each user in a project.

Quota Value
Requests per second 2

Requests per day

Gemini enforces quotas for the total number of requests per day for each user in a project.

Quota Value
Requests per day for Gemini code requests, such as code generation and code completion. 6000
Requests per day for chat and other requests that display responses in the Gemini pane in the Google Cloud console and IDEs. 240

Request a quota increase

To increase or decrease most quotas, use the Google Cloud console. For more information, see Request a higher quota.