Provisioned Throughput overview

This page provides an overview of Provisioned Throughput and compares it with the pay-as-you-go option. It covers the following topics:

Introduction to Provisioned Throughput

Provisioned Throughput is a fixed-cost, fixed-term subscription available in several term lengths that reserves throughput for supported generative AI models on Vertex AI. To reserve throughput, you must specify the model and available locations where the model runs.

When to use Provisioned Throughput

Provisioned Throughput is one of two ways to use your generative AI models. The other option is pay-as-you-go, also referred to as on-demand. The following table compares these two options.

Option Description Use Case
Provisioned Throughput Reserves model processing capacity at a fixed cost for a specific term for consistent performance. Production applications requiring high, consistent throughput and predictable costs (for example, real-time chatbots).
Pay-as-you-go (on-demand) Pay only for the resources you use with no upfront commitment. Capacity is shared and subject to availability. Development, testing, or applications with variable or unpredictable traffic.

Consider using Provisioned Throughput if any of the following apply to your use case:

  • You are building real-time generative AI production applications, such as chatbots and agents.
  • Your critical workloads consistently require high throughput. Throughput is measured differently depending on the model.
  • You want to provide a consistent and predictable experience for your users.
  • You want predictable generative AI costs through a fixed price, with control over overages.

What's next