Starting April 29, 2025, Gemini 1.5 Pro and Gemini 1.5 Flash models are not available in projects that have no prior usage of these models, including new projects. For details, see Model versions and lifecycle.
Stay organized with collections
Save and categorize content based on your preferences.
Vertex AI offers two ways to manage throughput for your generative AI
models, which lets you balance cost, flexibility, and performance. You can
either use a flexible pay-as-you-go model or reserve a dedicated amount of
throughput for a fixed price.
Pay-as-you-go
For the default pay-as-you-go model, Vertex AI uses
Dynamic Shared Quota,
which doesn't have a predefined usage limit. Instead, you get access to a large,
shared pool of resources that are dynamically allocated based on real-time
availability and demand.
This model allows your workloads to use more resources when they are available.
If you receive a resource exhausted (429) error, it means the shared pool is
temporarily experiencing high demand from many users at once. You should
implement retry mechanisms in your application, as availability can change
quickly.
Reserved Capacity
For critical production applications that require consistent performance and
predictable costs, you can use
Provisioned Throughput.
Provisioned Throughput is a fixed-cost subscription that reserves a
specific amount of throughput for your models in a chosen location.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-27 UTC."],[],[],null,["# Throughput quota\n\nVertex AI offers two ways to manage throughput for your generative AI models, which lets you balance cost, flexibility, and performance. You can either use a flexible pay-as-you-go model or reserve a dedicated amount of throughput for a fixed price.\n\n\u003cbr /\u003e\n\nPay-as-you-go\n-------------\n\nFor the default pay-as-you-go model, Vertex AI uses\n[Dynamic Shared Quota](/vertex-ai/generative-ai/docs/dynamic-shared-quota),\nwhich doesn't have a predefined usage limit. Instead, you get access to a large,\nshared pool of resources that are dynamically allocated based on real-time\navailability and demand.\n\nThis model allows your workloads to use more resources when they are available.\nIf you receive a `resource exhausted` (429) error, it means the shared pool is\ntemporarily experiencing high demand from many users at once. You should\nimplement retry mechanisms in your application, as availability can change\nquickly.\n\nReserved Capacity\n-----------------\n\nFor critical production applications that require consistent performance and\npredictable costs, you can use\n[Provisioned Throughput](/vertex-ai/generative-ai/docs/provisioned-throughput/overview).\nProvisioned Throughput is a fixed-cost subscription that reserves a\nspecific amount of throughput for your models in a chosen location.\n\nWhat's next\n-----------\n\n- Learn more about [Dynamic Shared Quota](/vertex-ai/generative-ai/docs/dynamic-shared-quota).\n- Learn more about [Provisioned Throughput](/vertex-ai/generative-ai/docs/provisioned-throughput/overview).\n- Learn more about [Google Cloud quotas](/docs/quotas/overview)."]]