This document explains how to create a future reservation request in calendar mode. To learn more about this type of reservation, see Future reservation requests in calendar mode overview.
You can create a future reservation request in calendar mode to reserve up to 1,024 TPU chips for a maximum of 90 days. You can then use this capacity to create virtual machine (VM) instances with Tensor Processing Units (TPUs) attached. Future reservation requests in calendar mode help you obtain high-demand resources for the following workloads:
Model pre-training jobs
Model fine-tuning jobs
High performance computing (HPC) simulation workloads
Short-term increases in inference workloads
To reserve GPU VMs for long-running training and inference jobs, see instead Request capacity in the AI Hypercomputer documentation.
Limitations
Before you create a future reservation request in calendar mode, consider the following limitations:
You can only reserve TPUs as follows:
Supported TPU versions Number of TPU chips per request Reservation period Supported zones TPU v6e 1, 4, 8, 16, 32, 64, 128, or 256 1 to 90 days asia-northeast1-b
us-east5-a
TPU v5p 4, 8, 16, 32, 64, 128, 256, 512, or 1,024 1 to 90 days us-east5-a
TPU v5e 1*, 4*, 8*, 16, 32, 64, 128, or 256 1 to 90 days us-west4-b
(BATCH
)us-central1-a
(SERVING
)
* You can only reserve one, four, or eight TPUs v5e for serving (
SERVING
) workload types.You can't cancel, delete, or modify requests.
Before you begin
- If you can't use future reservation requests in calendar mode, then you might not be eligible to access and use this feature. In this case, contact your account team or the sales team.
- To share your reserved capacity with other projects within your organization, ensure that the project in which you want to create future reservation requests in calendar mode is allowed to create shared reservations. Otherwise, you will encounter errors.
-
If you haven't already, then set up authentication.
Authentication is
the process by which your identity is verified for access to Google Cloud services and APIs.
To run code or samples from a local development environment, you can authenticate to
Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
-
After installing the Google Cloud CLI, initialize it by running the following command:
gcloud init
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
- Set a default region and zone.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
After installing the Google Cloud CLI, initialize it by running the following command:
gcloud init
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
-
Required roles
To get the permissions that
you need to create a future reservation request in calendar mode,
ask your administrator to grant you the
Compute Future Reservation Admin (roles/compute.futureReservationAdmin
)
IAM role on the project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to create a future reservation request in calendar mode. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create a future reservation request in calendar mode:
-
To create a future reservation request:
compute.futureReservations.create
on the project -
To let Compute Engine automatically create reservations:
compute.reservations.create
on the project -
To specify an instance template:
compute.instanceTemplates.useReadOnly
on the instance template -
To view resources future availability:
compute.advice.calendarMode
on the project
You might also be able to get these permissions with custom roles or other predefined roles.
Overview
To create a future reservation request in calendar mode, complete the following steps:
View TPU future availability. View the future availability of the TPUs that you want to reserve. Then, when you create a request, specify the number, type, and reservation duration of the TPUs that you confirmed as available. This action increases the chances that Google Cloud approves your request.
Create a reservation request for TPUs. Create and submit a future reservation request in calendar mode for TPUs. If creating your request is successful, then Google Cloud approves it within a minute.
View TPU future availability
You can view the future availability of TPUs in a region for up to 120 days in advance.
To view the future availability of four or eight TPUs v5p in a region, use the Google Cloud CLI or REST API. Otherwise, select one of the following options:
Console
You can view future availability of TPUs in a region when creating a future reservation request in calendar mode. For more information, see Create a reservation request for TPUs in this document.
gcloud
To view the future availability of TPUs in a region, use the
gcloud beta compute advice calendar-mode
command
with the --chip-count
and --tpu-version
flags:
gcloud beta compute advice calendar-mode \
--chip-count=NUMBER_OF_CHIPS \
--tpu-version=TPU_VERSION \
--region=REGION \
--start-time-range=from=FROM_START_TIME,to=TO_START_TIME \
--duration-range=max=MAXIMUM_DURATION,min=MINIMUM_DURATION
Replace the following:
NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
V6E
For TPU v5p:
V5P
For TPU v5e:
V5E
If you specify a TPU v5p or v5e, then you must include the
--workload-type
flag. Set this flag to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
REGION
: the region where to reserve TPUs. To review the supported regions and zones, see Limitations in this document.FROM_START_TIME
andTO_START_TIME
: the earliest and latest date when you want to reserve capacity. Format these dates as RFC 3339 timestamps:YYYY-MM-DDTHH:MM:SSOFFSET
Replace the following:
YYYY-MM-DD
: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-
).HH:MM:SS
: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:
).OFFSET
: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify-08:00
. To use no offset, specifyZ
.
MAXIMUM_DURATION
andMINIMUM_DURATION
: the maximum and minimum duration you want to reserve resources for. Format these durations as the number of days, hours, minutes, or seconds followed byd
,h
,m
, ors
respectively. For example, specify30m
for 30 minutes or1d2h3m4s
for one day, two hours, three minutes, and four seconds. You can reserve resources for a minimum of 24 hours and a maximum of 90 days.
The output is similar to the following:
- recommendationsPerSpec:
spec:
endTime: '2025-09-07T00:00:00Z'
location: zones/us-east5-a
otherLocations:
zones/us-east5-b:
details: this machine family is not supported in this zone
status: NOT_SUPPORTED
zones/us-east5-c:
details: this machine family is not supported in this zone
status: NOT_SUPPORTED
recommendationId: 0d3f005d-f952-4fce-96f2-6af25e1591eb
recommendationType: FUTURE_RESERVATION
startTime: '2025-06-09T00:00:00Z'
If your requested resources are available, then the output contains the
startTime
, endTime
, and location
fields. These fields specify the
earliest start time, the latest end time, and the zones when resources are
available.
REST
To view the future availability of TPUs in a region, send a GET
request to
the
beta advice.calendarMode
method.
In the request body, include the acceleratorCount
and vmFamily
fields:
POST https://www.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/advice/calendarMode
{
"futureResourcesSpecs": {
"spec": {
"targetResources": {
"aggregateResources": {
"acceleratorCount": "NUMBER_OF_CHIPS",
"vmFamily": "TPU_VERSION"
}
},
"timeRangeSpec": {
"startTimeNotEarlierThan": "FROM_START_TIME",
"startTimeNotLaterThan": "TO_START_TIME",
"minDuration": "MINIMUM_DURATION",
"maxDuration": "MAXIMUM_DURATION"
}
}
}
}
Replace the following:
PROJECT_ID
: the ID of the project where you want to reserve resources.REGION
: the region where to reserve TPUs. To review the supported regions and zones, see Limitations in this document.NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E
For TPU v5p:
VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P
For TPU v5e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP
If you specify a TPU v5p or v5e, then, in the
aggregateResources
field, you must include theworkloadType
field. Set this field to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
FROM_START_TIME
andTO_START_TIME
: the earliest and latest date that you want to reserve capacity on. Format these dates as RFC 3339 timestamps:YYYY-MM-DDTHH:MM:SSOFFSET
Replace the following:
YYYY-MM-DD
: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-
).HH:MM:SS
: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:
).OFFSET
: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify-08:00
. To use no offset, specifyZ
.
MAXIMUM_DURATION
andMINIMUM_DURATION
: the maximum and minimum durations, in seconds, that you want to reserve resources for, followed bys
. For example, to specify 3,600 seconds, use3600s
. You can reserve resources for a minimum of 24 hours (86,400 seconds) and a maximum of 90 days (7,776,000 seconds).
The output is similar to the following:
{
"recommendations": [
{
"recommendationsPerSpec": {
"spec": {
"recommendationId": "a21a2fa0-72c7-4105-8179-88de5409890b",
"recommendationType": "FUTURE_RESERVATION",
"startTime": "2025-06-09T00:00:00Z",
"endTime": "2025-09-07T00:00:00Z",
"otherLocations": {
"zones/us-east5-b": {
"status": "NOT_SUPPORTED",
"details": "this machine family is not supported in this zone"
},
"zones/us-east5-c": {
"status": "NOT_SUPPORTED",
"details": "this machine family is not supported in this zone"
}
},
"location": "zones/us-east5-a"
}
}
}
]
}
If your requested resources are available, then the output contains the
startTime
, endTime
, and location
fields. These fields specify the
earliest start time, the latest end time, and the zones when resources are
available.
Create a reservation request for TPUs
After Google Cloud approves a future reservation request in calendar mode, you can't cancel or delete your request. You must commit to pay for the requested capacity at the requested start time, regardless if you use it or not.
To create a request by using an existing TPU VM as reference, use the Google Cloud console. To reserve four or eight TPU v5p, use the gcloud CLI or REST API. Otherwise, select one of the following options:
Console
In the Google Cloud console, go to the Reservations page.
Click the Future reservations tab.
Click
Create future reservation. The Create a future reservation page opens.In the Hardware configuration section, specify the properties of the TPUs that you want to reserve by doing one of the following:
To specify TPU properties directly, complete the following steps:
Select Specify machine type.
Click the TPUs tab.
In the table, in the Series column, select the TPU version to reserve.
To specify TPU properties using an existing TPU VM as reference, select Use existing VM, and then select the VM.
If you specified a TPU v5p (CT5P) or v5e (CT5LP) in the previous step, then, in the TPU v5 workload type list, select one of the following options:
To run workloads on the TPUs that handle large amounts of data in single or multiple operations, such as ML training workloads, select Batch.
To run workloads on the TPUs that handle concurrent requests and require minimal network latency, such as ML inference workloads, select Serving.
In the Search for capacity section, complete the following steps:
In the Region and Zone lists, specify the region and zone where to reserve resources.
In the Number of chips field, select the number of TPUs to reserve.
In the Start time list, select the start time for your request. The start time must be at least 24 hours after you create and submit the request.
Optional: In the Choose your start date flexibility list, select how exact your start date needs to be
In the Reservation duration field, specify for how long you want to reserve resources. The value must be between one day and 90 days.
Click Search for capacity. Then, in the Available capacity table, select one of the available options containing the type, number, and reservation period of the TPUs to reserve.
Click Next.
In the Share type section, select the projects to share your requested capacity with:
To use the reserved capacity only within your project, select Local.
To share the reserved capacity with other projects, select Shared, click
Add projects, and then follow the prompts to select the projects.
Click Next.
In the Future reservation name field, enter a name for the request.
In the Reservation name field, enter the name of the reservation that Compute Engine automatically creates to provision your requested capacity.
Click Create.
gcloud
To create a future reservation request in calendar mode and submit it for
review, use the following
gcloud beta compute future-reservations create
command.
To reserve TPUs, include the --chip-count
and --tpu-version
flags:
gcloud beta compute future-reservations create FUTURE_RESERVATION_NAME \
--auto-delete-auto-created-reservations \
--chip-count=NUMBER_OF_CHIPS \
--tpu-version=TPU_VERSION \
--deployment-type=DENSE \
--planning-status=SUBMITTED \
--require-specific-reservation \
--reservation-mode=CALENDAR \
--reservation-name=RESERVATION_NAME \
--share-setting=SHARE_TYPE \
--start-time=START_TIME \
--duration=DURATION \
--zone=ZONE
Replace the following:
FUTURE_RESERVATION_NAME
: the name of the request.NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
V6E
For TPU v5p:
V5P
For TPU v5e:
V5E
If you specify a TPU v5p or v5e, then you must include the
--workload-type
flag. Set the flag to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
RESERVATION_NAME
: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.SHARE_TYPE
: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:To use capacity only within your project:
local
To share capacity with other projects:
projects
If you specify
projects
, then you must include the--share-with
flag set to a comma-separated list of project IDs—for example,project-1,project-2
. You can specify up to 100 projects. Don't include your project ID in this list. You can consume the reserved capacity by default.START_TIME
: the start time of the request, formatted as an RFC 3339 timestamp. Specify a start time that is at least 24 hours after you submit the request. Otherwise, creating the request fails.DURATION
: the duration, in seconds, you want to reserve the requested resources for, followed bys
. For example, to specify 3,600 seconds, use3600s
. You can reserve resources for a minimum of 24 hours (86,400 seconds) and a maximum of 90 days (7,776,000 seconds).ZONE
: the zone where you want to reserve resources.
REST
To create a future reservation request in calendar mode and submit it for
review, send the following POST
request to the
beta futureReservations.insert
method.
To reserve TPUs, include the acceleratorCount
and vmFamily
fields in the
request body:
POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/futureReservations
{
"name": "FUTURE_RESERVATION_NAME",
"autoDeleteAutoCreatedReservations": true,
"deploymentType": "DENSE",
"planningStatus": "SUBMITTED",
"reservationMode": "CALENDAR",
"reservationName": "RESERVATION_NAME",
"shareSettings": {
"shareType": "SHARE_TYPE"
},
"specificReservationRequired": true,
"aggregateReservation": {
"reservedResources": [
{
"accelerator": {
"acceleratorCount": NUMBER_OF_CHIPS
}
}
],
"vmFamily": "TPU_VERSION"
},
"timeWindow": {
"startTime": "START_TIME",
"duration": {
"seconds": DURATION
}
}
}
Replace the following:
PROJECT_ID
: the ID of the project where you want to create the request.ZONE
: the zone where you want to reserve resources.FUTURE_RESERVATION_NAME
: the name of the request.RESERVATION_NAME
: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.SHARE_TYPE
: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:To use capacity only within your project:
LOCAL
To share capacity with other projects:
SPECIFIC_PROJECTS
If you specify
SPECIFIC_PROJECTS
, then, in theshareSettings
field, you must include theprojectMap
field to specify the projects to share the capacity with. You can specify up to 100 projects. Don't specify your project ID. You can consume the reserved capacity by default.For example, to share the requested capacity with two other projects, include the following:
"shareSettings": { "shareType": "SPECIFIC_PROJECTS", "projectMap": { "CONSUMER_PROJECT_ID_1": { "projectId": "CONSUMER_PROJECT_ID_1" }, "CONSUMER_PROJECT_ID_2": { "projectId": "CONSUMER_PROJECT_ID_2" } } }
Replace
CONSUMER_PROJECT_ID_1
andCONSUMER_PROJECT_ID_2
with the IDs of two projects that you want to allow to consume the requested capacity.NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E
For TPU v5p:
VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P
For TPU v5e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP
If you specify a TPU v5p or v5e, then, in the
aggregateResources
field, you must include theworkloadType
field. Set the field to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as ML training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
START_TIME
: the start time of the request, formatted as an RFC 3339 timestamp. Specify a start time that is at least 24 hours after you submit the request. Otherwise, the creation of the request fails.DURATION
: the duration, in seconds, you want to reserve the requested resources for. You can reserve resources for a minimum of 24 hours (86,400 seconds) and a maximum of 90 days (7,776,000 seconds).