Cloud TPU consumption options

This document provides a description of the consumption options that are supported for Cloud TPU. Consumption options are the methods used to request capacity. When choosing a consumption option, consider the following factors:

How quickly you need the capacity
How long you need the capacity
Whether you need the capacity for a fixed or flexible amount of time
Whether your capacity can be preempted
Pricing

If you're using Cloud TPU APIs, for every consumption method, you need either on-demand or preemptible quota for the number of Cloud TPU cores you want to use. There are different default on-demand and preemptible quotas for each TPU version. For more information, see Cloud TPU quotas.

If you're using TPUs with Google Kubernetes Engine (GKE), you use Compute Engine API quota, which is a different quota. For more information about quotas you use for TPUs in GKE, see Ensure that you have TPU quota.

Consumption option	How it works	Best used for	Supported TPU versions and zones	Quota type for Cloud TPU API
Future reservations for one year or longer	You request TPU resources in advance for one year or longer. These resources are reserved for your exclusive use during that period of time. Reservations provide the highest level of assurance for capacity and are cost-effective, with a lower price than on-demand resources. Future reservations of TPUs have a committed use discount (CUD) attached. CUDs provide discounted prices in exchange for purchasing a committed use contract. For more information, see Request a future reservation for one year or longer.	Future reservations for one year are longer are ideal for long-running training jobs and inference workloads.	All TPU versions: See TPU regions and zones	On-demand quota
Future reservations for up to 90 days (in calendar mode) (Preview)	You request TPU resources for a specific start time and duration, between 1 and 90 days. These resources are reserved for your exclusive use during that period of time. Reservations provide the highest level of assurance for capacity and are cost-effective, with a lower price than on-demand resources. For more information, see Request a future reservation for up to 90 days in calendar mode.	Future reservations in calendar mode are a good fit for training and experimentation workloads that require precise start times and have a defined duration.	v6e (Trillium) for training and serving: asia-northeast1-b, us-east5-a v5p for training and serving: us-east5-a v5e for training: us-west4-a v5e for serving: us-central1-a	No quota required
On demand	You request TPU resources to be used as soon as possible, for as long as you want. On-demand offers the highest flexibility. On-demand resources won't be preempted, but there's no guarantee that there will be enough available TPU resources to satisfy your request. On demand is the default when you create TPU resources. For more information about creating and using on-demand TPUs, see Manage TPU resources.	On demand is a good fit for urgent jobs and workloads that require a flexible end time.	All TPU versions: See TPU regions and zones	On-demand quota
Flex-start (Preview)	You request TPU resources for a specific amount of time, up to 7 days, without needing to reserve capacity in advance. Flex-start resources are delivered from a dedicated pool of capacity, so the availability of these resources is higher than on-demand. For more information about using Flex-start with TPUs, see Request Cloud TPUs using Flex-start. For more information about using Flex-start with Google Kubernetes Engine (GKE), see About GPU and TPU provisioning with flex-start provisioning mode.	Flex-start is ideal for experimentation, small-scale testing, dynamic provisioning of TPUs for inference workloads, model fine-tuning, and workload runs that take less than 7 days.	v6e (Trillium): asia-northeast1-b, us-east5-a v5p: us-east5-a v5e: us-west4-a	Preemptible quota
Spot	You request TPU resources which could be preempted. Spot VMs are available at a much lower price than on-demand resources. Spot VMs might be easier to obtain than on-demand resources but can be preempted (shut down) at any time. There is no limit on runtime duration. For more information about TPU Spot VMs, see Manage TPU Spot VMs.	Spot is a good fit for scheduling lower priority workloads like model pre-training, model fine-tuning, and simulation jobs that are tolerant to availability disruptions.	All TPU versions: See TPU regions and zones	Preemptible quota

What's next

Get started with TPUs using: