Request TPUs with future reservation in calendar mode

Autopilot Standard

This guide shows you how to optimize Tensor Processing Unit (TPU) provisioning by using future reservation in calendar mode. Future reservation in calendar mode is a built-in calendar advisor and recommender that can help you locate TPU capacity and plan ahead. You can request capacity for a specified start time and duration, between 1 and 90 days, and the recommender will provide suggested dates.

This guide is intended for Machine learning (ML) engineers, Platform admins and operators, and for Data and AI specialists who are interested in using Kubernetes container orchestration capabilities for running batch workloads. For more information about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

For more information, see About future reservation in calendar mode.

Use cases

Future reservation in calendar mode works best for workloads with scheduled, short-term, high-demand requests, like training, or batch inference models that require high availability at the requested start time.

If your workload requires dynamically provisioned resources as needed, for up to 7 days without long-term reservations or complex quota management, consider using flex-start. For more information, see About GPU and TPU provisioning with flex-start.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Ensure that you have either of the following:
- an existing Standard cluster that's running version 1.28.3-gke.1098000 or later.
- an existing Autopilot cluster that's running version 1.30.3-gke.1451000 or later.

Request future reservation in calendar mode for TPUs

The process to request TPUs with future reservation in calendar mode involves the following steps:

Ensure that you have sufficient quota for any resources that aren't part of a reservation when VMs are created, such as disks or IP addresses. Future reservation requests in calendar mode don't require Compute Engine quota.
Complete the steps in create a request in calendar mode. These steps include the following:
1. View TPU future availability.
2. Create and submit a future reservation request in calendar mode for TPUs.
3. Wait for Google Cloud to approve your request.
Create a TPU node pool that uses your reservation.

Create a node pool

This section applies to Standard clusters only.

You can use your reservation when you create single-host or multi-host TPU slice node pools. For example, you can create a single-host TPU slice node pool using the Google Cloud CLI.

gcloud container node-pools create NODE_POOL_NAME \
    --location=LOCATION \
    --cluster=CLUSTER_NAME \
    --node-locations=NODE_ZONES \
    --machine-type=MACHINE_TYPE \
    --reservation-affinity=specific \ This is required
    --reservation=RESERVATION

Replace the following:

NODE_POOL_NAME: the name of the new node pool.
LOCATION: the name of the zone based on the TPU version you want to use. To identify an available location, see TPU availability in GKE.
CLUSTER_NAME: the name of the cluster.
NODE_ZONES: the comma-separated list of one or more zones where GKE creates the node pool.
MACHINE_TYPE: the type of machine to use for nodes. For more information about TPU compatible machine types, use the table in Choose the TPU version.
RESERVATION: the name of the calendar reservation to consume.

For a full list of all the flags that you can specify, see the gcloud container clusters create reference.

After you create a node pool with the calendar reservation, you can deploy your workload like any other TPU node pool. For example, you can create a Job that specifies the TPU node pool that consumes the reserved TPUs.

What's next

Try GKE deployment examples for generative AI models that use the TPU resources that you reserved:
Explore experimental samples for leveraging GKE to accelerate your AI/ML initiatives in GKE AI Labs.