The following tables show the models that support Provisioned Throughput, the throughput for each generative AI scale unit (GSU) and the burndown rates for each model.
Google models
This table shows the throughput, purchase increment, and burndown rates for Google models that support Provisioned Throughput. The Google models are measured in characters per second, which is defined as your prompt input and generated text output characters across all requests per second.
Model | Throughput per GSU (chars/sec) | Minimum GSU purchase increment | Burndown rates | |
---|---|---|---|---|
Gemini 1.5 Flash | Less than or equal to 128,000 context window: 54,000 Greater than 128,000 context window: 27,000 |
1 | Less than or equal to 128,000 context window: 1 input char = 1 char 1 output char = 4 chars 1 image = 1,067 chars 1 video per second = 1,067 chars 1 audio per second = 107 chars |
Greater than 128,000 context window: 1 input char = 2 chars 1 output char = 8 chars 1 image = 2,134 chars 1 video per second = 2,134 chars 1 audio per second = 214 chars |
Gemini 1.5 Pro | 800 | 1 | Less than or equal to 128,000 context window: 1 input char = 1 char 1 output char = 3 chars 1 image = 1,052 chars 1 video per second = 1,052 chars 1 audio per second = 100 chars |
Greater than 128,000 context window: 1 input char = 2 chars 1 output char = 6 chars 1 image = 2,104 chars 1 video per second = 2,104 chars 1 audio per second = 200 chars |
Gemini 1.0 Pro | 8,000 | 1 | 1 input char = 1 char 1 output char = 3 chars 1 image = 20,000 chars 1 video per second = 16,000 chars |
|
Imagen 3 | 0.025 Throughput is measured in images/sec instead of chars/sec. |
1 | Only output images count toward your Provisioned Throughput quota. | |
Imagen 3 Fast | 0.05 Throughput is measured in images/sec instead of chars/sec. |
1 | Only output images count toward your Provisioned Throughput quota. | |
Imagen 2 | 0.05 Throughput is measured in images/sec instead of chars/sec. |
1 | Only output images count toward your Provisioned Throughput quota. | |
Imagen 2 Edit | 0.05 Throughput is measured in images/sec instead of chars/sec. |
1 | Only output images count toward your Provisioned Throughput quota. | |
MedLM medium | 2,000 | 1 | 1 input char = 1 char 1 output char = 2 chars |
|
MedLM large | 200 | 1 | 1 input char = 1 char 1 output char = 3 chars |
|
MedLM large 1.5 | 200 | 1 | 1 input char = 1 char 1 output char = 3 chars |
For more information about supported locations, see Available locations.
You can upgrade to new models as they are made available. For information about model availability and discontinuation dates, see Google models.
Preview features
The preview features for Provisioned Throughput require access approval. To request access, fill out and submit the Provisioned Throughput access control form.
The Preview version provides the following for Google models:
Provisioned Throughput can be applied to both base models and supervised fine-tuned versions of those base models.
Supervised fine-tuned model endpoints and their corresponding base model count towards the same Provisioned Throughput quota.
For example, Provisioned Throughput purchased for
gemini-1.5-pro-002
for a specific project prioritizes requests that are made from supervised fine-tuned versions ofgemini-1.5-pro-002
created within that project. Use the appropriate header to control traffic behavior.Provisioned Throughput can be purchased for a one-week term instead of a monthly subscription, with the option to provide a start date within two weeks in the future of placing your order.
Google legacy models
See Legacy models that support Provisioned Throughput.
Partner models
This table shows the throughput, purchase increment, and burndown rates for partner models that support Provisioned Throughput. Claude models are measured in tokens per second, which is defined as a total of input and output tokens across all requests per second.
Model | Throughput per GSU (tokens/sec) | Minimum GSU purchase | GSU purchase increment | Burndown rates |
---|---|---|---|---|
Anthropic's Claude 3.5 Sonnet v2 | 350 | 25 | 1 | 1 input token = 1 token 1 output token = 5 tokens |
Anthropic's Claude 3.5 Haiku | 2,000 | 10 | 1 | 1 input token = 1 token 1 output token = 5 tokens |
Anthropic's Claude 3 Opus | 70 | 35 | 1 | 1 input token = 1 token 1 output token = 5 tokens |
Anthropic's Claude 3 Haiku | 4,200 | 5 | 1 | 1 input token = 1 token 1 output token = 5 tokens |
Anthropic's Claude 3.5 Sonnet | 350 | 25 | 1 | 1 input token = 1 token 1 output token = 5 tokens |
Anthropic's Claude 3 Sonnet | 350 | 25 | 1 | 1 input token = 1 token 1 output token = 5 tokens |
For more information about supported locations, see Available locations.
To subscribe to a partner model that supports Provisioned Throughput, contact your Google Cloud account representative.