Cloud TPU consumption options

This document provides a description of the consumption options that are supported for Cloud TPU. Consumption options are the methods used to request capacity. When choosing a consumption option, consider the following factors:

  • How quickly you need the capacity
  • How long you need the capacity
  • Whether you need the capacity for a fixed or flexible amount of time
  • Whether your capacity can be preempted
  • Pricing

If you're using Cloud TPU APIs, for every consumption method, you need either on-demand or preemptible quota for the number of Cloud TPU cores you want to use. There are different default on-demand and preemptible quotas for each TPU version. For more information, see Cloud TPU quotas.

If you're using TPUs with Google Kubernetes Engine (GKE), you use Compute Engine API quota, which is a different quota. For more information about quotas you use for TPUs in GKE, see Ensure that you have TPU quota.

Consumption option How it works Best used for Supported TPU versions and zones Quota type for Cloud TPU API
Reservations

You request TPU resources in advance for a specific amount of time. These resources are reserved for your exclusive use during that period of time.

Reservations provide the highest level of assurance for capacity and are cost-effective, with a lower price than on-demand resources.

You can only use a reservation for TPUs if you have a committed use discount (CUD). For more information, contact Google Cloud sales.

Reservations are ideal for long-running training jobs and inference workloads. All TPU versions: See TPU regions and zones On-demand quota
On demand

You request TPU resources to be used as soon as possible, for as long as you want.

On-demand offers the highest flexibility. On-demand resources won't be preempted, but there's no guarantee that there will be enough available TPU resources to satisfy your request. On demand is the default when you create TPU resources. For more information about creating and using on-demand TPUs, see Manage TPU resources.

On demand is a good fit for urgent jobs and workloads that require a flexible end time. All TPU versions: See TPU regions and zones On-demand quota
Flex-start (Public preview)

You request TPU resources for a specific amount of time, up to 7 days, without needing to reserve capacity in advance.

Flex-start resources are delivered from a dedicated pool of capacity, so the availability of these resources is higher than on-demand. For more information about using Flex-start with TPUs, see Request Cloud TPUs using Flex-start.

Note: Flex-start for TPUs is only supported using the Cloud TPU API. It's not supported using the GKE API.

Flex-start is ideal for experimentation, small-scale testing, dynamic provisioning of TPUs for inference workloads, model fine-tuning, and workload runs that take less than 7 days.

v6e (Trillium) for training and serving: asia-northeast1-b, us-east5-a

v5e for training: us-west4-a

v5e for serving: us-central1-a

Preemptible quota
Spot

You request TPU resources which could be preempted.

Spot VMs are available at a much lower price than on-demand resources. Spot VMs might be easier to obtain than on-demand resources but can be preempted (shut down) at any time. There is no limit on runtime duration. For more information about TPU Spot VMs, see Manage TPU Spot VMs.

Spot is a good fit for scheduling lower priority workloads like model pre-training, model fine-tuning, and simulation jobs that are tolerant to availability disruptions. All TPU versions: See TPU regions and zones Preemptible quota

What's next

Get started with TPUs using: