- Introduction to Provisioned Throughput: Learn what Provisioned Throughput is and how it works.
- When to use Provisioned Throughput: Compare Provisioned Throughput with the pay-as-you-go model to decide which is right for your use case.
Introduction to Provisioned Throughput
Provisioned Throughput is a fixed-cost, fixed-term subscription available in several term lengths that reserves throughput for supported generative AI models on Vertex AI. To reserve throughput, you must specify the model and available locations where the model runs.
When to use Provisioned Throughput
Provisioned Throughput is one of two ways to use your generative AI models. The other option is pay-as-you-go, also referred to as on-demand. The following table compares these two options.
Option | Description | Use Case |
---|---|---|
Provisioned Throughput | Reserves model processing capacity at a fixed cost for a specific term for consistent performance. | Production applications requiring high, consistent throughput and predictable costs (for example, real-time chatbots). |
Pay-as-you-go (on-demand) | Pay only for the resources you use with no upfront commitment. Capacity is shared and subject to availability. | Development, testing, or applications with variable or unpredictable traffic. |
Consider using Provisioned Throughput if any of the following apply to your use case:
- You are building real-time generative AI production applications, such as chatbots and agents.
- Your critical workloads consistently require high throughput. Throughput is measured differently depending on the model.
- You want to provide a consistent and predictable experience for your users.
- You want predictable generative AI costs through a fixed price, with control over overages.
What's next
- Supported models using Provisioned Throughput.