This document explains how to create a future reservation request in calendar mode. To learn more about this type of reservation, see Future reservation requests in calendar mode overview.
Create a future reservation request in calendar mode to reserve the following resources for up to 90 days:
Up to 80 virtual machine (VM) instances that have GPUs attached.
Up to 1,024 TPU chips.
At your chosen delivery date and time, you can create GPU or TPU VMs by consuming the reserved capacity. Use future reservation requests in calendar mode to obtain high-demand resources for the following workloads:
Model pre-training jobs
Model fine-tuning jobs
High performance computing (HPC) simulation workloads
Short-term increases in inference workloads
To reserve more than 80 GPU VMs or for longer than 90 days in a single request, see instead Reserve capacity in the AI Hypercomputer documentation.
Limitations
The following sections explain the limitations for future reservation requests in calendar mode.
Limitations for all requests
All future reservation requests in calendar mode have the following limitations:
You can reserve resources for a period between 1 and 90 days.
After you create and submit a request, you can't cancel, delete, or modify your request.
Limitations for requests for GPU VMs
You can only reserve GPU VMs as follows:
You can reserve between 1 and 80 GPU VMs per request.
You can reserve the following machine series:
You can reserve GPU VMs only in specific zones.
Limitations for requests for TPUs
You can only reserve TPUs as follows:
You can reserve 1, 4, 8, 16, 32, 64, 128, 256, 512, or 1,024 TPU chips per request.
You can reserve the following TPU versions:
You can only reserve 1, 4, or 8 TPU v5e chips for serving (
SERVING
) workload types.You can only reserve TPUs in the following zones:
For TPU v6e:
asia-northeast1-b
us-east5-a
us-east5-b
For TPU v5p:
us-east5-a
For TPU v5e:
For batch (
BATCH
) workload types:us-west4-b
For serving (
SERVING
) workload types:us-central1-a
Before you begin
- If you can't use future reservation requests in calendar mode, then you might not be eligible to access and use this feature. In this case, contact your account team or the sales team.
- To share your reserved capacity with other projects within your organization, ensure that the project in which you want to create future reservation requests in calendar mode is allowed to create shared reservations. Otherwise, you will encounter errors.
-
If you haven't already, then set up authentication.
Authentication is
the process by which your identity is verified for access to Google Cloud services and APIs.
To run code or samples from a local development environment, you can authenticate to
Compute Engine by selecting one of the following options:
Select the tab for how you plan to use the samples on this page:
Console
When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.
gcloud
-
After installing the Google Cloud CLI, initialize it by running the following command:
gcloud init
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
- Set a default region and zone.
REST
To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.
After installing the Google Cloud CLI, initialize it by running the following command:
gcloud init
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
For more information, see Authenticate for using REST in the Google Cloud authentication documentation.
-
Required roles
To get the permissions that
you need to create a future reservation request in calendar mode,
ask your administrator to grant you the
Compute Future Reservation Admin (roles/compute.futureReservationAdmin
)
IAM role on the project.
For more information about granting roles, see Manage access to projects, folders, and organizations.
This predefined role contains the permissions required to create a future reservation request in calendar mode. To see the exact permissions that are required, expand the Required permissions section:
Required permissions
The following permissions are required to create a future reservation request in calendar mode:
-
To create a future reservation request:
compute.futureReservations.create
on the project -
To let Compute Engine automatically create reservations:
compute.reservations.create
on the project -
To specify an instance template:
compute.instanceTemplates.useReadOnly
on the instance template -
To view resources future availability:
compute.advice.calendarMode
on the project
You might also be able to get these permissions with custom roles or other predefined roles.
Overview
To create a future reservation request in calendar mode, complete the following steps:
View resource future availability. View future availability for the GPU VMs or TPUs that you want to reserve. Then, when you create a request, specify the number, type, and reservation duration of the resources that you confirmed as available. Google Cloud is more likely to approve your request if you supply this information.
Create a reservation request for GPU VMs or TPUs. Create and submit a future reservation request in calendar mode for GPU VMs or TPUs. If you can successfully create a request, then Google Cloud approves it within a minute.
View resource future availability
You can view future availability for GPU VMs or TPUs in a region as follows:
For GPU VMs, up to 60 days in advance
For TPUs, up to 120 days in advance
To view GPU VM or TPU future availability in a region, select one of the following options:
Console
You can view GPU VM or TPU future availability in a region when creating a future reservation request in calendar mode. For more information, see Create a reservation request for GPU VMs or TPUs in this document.
gcloud
To view GPU VM or TPU future availability in a region, use one of the
following
gcloud beta compute advice calendar-mode
commands.
Based on the resources that you want to view, include the following flags:
To view GPU VM availability, include the
--vm-count
and--machine-type
flags:gcloud beta compute advice calendar-mode \ --vm-count=NUMBER_OF_VMS \ --machine-type=MACHINE_TYPE \ --region=REGION \ --start-time-range=from=FROM_START_TIME,to=TO_START_TIME \ --end-time-range=from=FROM_END_TIME,to=TO_END_TIME
To view TPU availability, include the
--chip-count
and--tpu-version
flags:gcloud beta compute advice calendar-mode \ --chip-count=NUMBER_OF_CHIPS \ --tpu-version=TPU_VERSION \ --region=REGION \ --start-time-range=from=FROM_START_TIME,to=TO_START_TIME \ --end-time-range=from=FROM_END_TIME,to=TO_END_TIME
Replace the following:
NUMBER_OF_VMS
: the number of GPU VMs to reserve.MACHINE_TYPE
: the GPU machine type to reserve.NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
V6E
For TPU v5p:
V5P
For TPU v5e:
V5E
If you specify a TPU v5e, then you must include the
--workload-type
flag. Set this flag to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
REGION
: the region where to reserve GPU VMs or TPUs. To check which regions and zones are supported, see Limitations in this document.FROM_START_TIME
andTO_START_TIME
: the earliest and latest dates that you want to reserve capacity on. Format these dates as RFC 3339 timestamps:YYYY-MM-DDTHH:MM:SSOFFSET
Replace the following:
YYYY-MM-DD
: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-
).HH:MM:SS
: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:
).OFFSET
: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify-08:00
. To use no offset, specifyZ
.
FROM_END_TIME
andTO_END_TIME
: the earliest and latest dates that you want your capacity reservation to end on. Format these dates as RFC 3339 timestamps. If you want to specify a range of durations for your reservation period instead of end times, then replace the--end-time-range
flag with the--duration-range
flag.
The output is similar to the following:
- recommendationsPerSpec:
spec:
endTime: '2025-09-07T00:00:00Z'
location: zones/us-east5-a
otherLocations:
zones/us-east5-b:
details: this machine family is not supported in this zone
status: NOT_SUPPORTED
zones/us-east5-c:
details: this machine family is not supported in this zone
status: NOT_SUPPORTED
recommendationId: 0d3f005d-f952-4fce-96f2-6af25e1591eb
recommendationType: FUTURE_RESERVATION
startTime: '2025-06-09T00:00:00Z'
If your requested resources are available, then the output contains the
startTime
, endTime
, and location
fields. These fields specify the
earliest start time, the latest end time, and the zones when resources are
available.
REST
To view GPU VM or TPU future availability in a region, make a GET
request
to the
beta advice.calendarMode
method.
Based on the resources that you want to view, include the following fields
in the request body:
To view GPU VM availability, include the
instanceCount
andmachineType
fields:POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/advice/calendarMode { "futureResourcesSpecs": { "spec": { "targetResources": { "specificSkuResources": { "instanceCount": "NUMBER_OF_VMS", "machineType": "MACHINE_TYPE" } }, "timeRangeSpec": { "startTimeNotEarlierThan": "FROM_START_TIME", "startTimeNotLaterThan": "TO_START_TIME", "endTimeNotEarlierThan": "FROM_END_TIME", "endTimeNotLaterThan": "TO_END_TIME" } } } }
To view TPU availability, include the
acceleratorCount
andvmFamily
fields:POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/advice/calendarMode { "futureResourcesSpecs": { "spec": { "targetResources": { "aggregateResources": { "acceleratorCount": "NUMBER_OF_CHIPS", "vmFamily": "TPU_VERSION" } }, "timeRangeSpec": { "startTimeNotEarlierThan": "FROM_START_TIME", "startTimeNotLaterThan": "TO_START_TIME", "endTimeNotEarlierThan": "FROM_END_TIME", "endTimeNotLaterThan": "TO_END_TIME" } } } }
Replace the following:
PROJECT_ID
: the ID of the project where you want to reserve resources.REGION
: the region where you want to reserve GPU VMs or TPUs. To check the regions and zones that are supported, see Limitations in this document.NUMBER_OF_VMS
: the number of GPU VMs to reserve.MACHINE_TYPE
: the GPU machine type to reserve.NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E
For TPU v5p:
VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P
For TPU v5e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP
If you specify a TPU v5e, then, in the
aggregateResources
field, you must include theworkloadType
field. Set this field to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
FROM_START_TIME
andTO_START_TIME
: the earliest and latest dates that you want to reserve capacity on. Format these dates as RFC 3339 timestamps:YYYY-MM-DDTHH:MM:SSOFFSET
Replace the following:
YYYY-MM-DD
: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-
).HH:MM:SS
: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:
).OFFSET
: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify-08:00
. To use no offset, specifyZ
.
FROM_END_TIME
andTO_END_TIME
: the earliest and latest dates that you want your capacity reservation to end on. Format these dates as RFC 3339 timestamps. If you want to specify a range of durations for your reservation period instead of end times, then replace theendTimeNotEarlierThan
andendTimeNotLaterThan
fields with theminDuration
andmaxDuration
fields.
The output is similar to the following:
{
"recommendations": [
{
"recommendationsPerSpec": {
"spec": {
"recommendationId": "a21a2fa0-72c7-4105-8179-88de5409890b",
"recommendationType": "FUTURE_RESERVATION",
"startTime": "2025-06-09T00:00:00Z",
"endTime": "2025-09-07T00:00:00Z",
"otherLocations": {
"zones/us-east5-b": {
"status": "NOT_SUPPORTED",
"details": "this machine family is not supported in this zone"
},
"zones/us-east5-c": {
"status": "NOT_SUPPORTED",
"details": "this machine family is not supported in this zone"
}
},
"location": "zones/us-east5-a"
}
}
}
]
}
If your requested resources are available, then the output contains the
startTime
, endTime
, and location
fields. These fields specify the
earliest start time, the latest end time, and the zones when resources are
available.
Create a reservation request for GPU VMs or TPUs
When you create a future reservation request in calendar mode, you can only specify a reservation period as follows:
Start time: based on the resources that you want to reserve, you must specify a start time that is at least one of the following values from when you create and submit a request:
For GPU VMs, 87 hours (three days and 15 hours)
For TPUs, 24 hours
End time: you can reserve resources for a maximum of 90 days.
To create a request by using an existing GPU VM as reference, use the Google Cloud console. Otherwise, select one of the following options:
Console
In the Google Cloud console, go to the Reservations page.
Click the Future reservations tab.
Click
Create future reservation. The Create a future reservation page appears and the Hardware configuration pane is selected.In the Configuration section, specify the properties of the GPU VMs or TPUs that you want to reserve by doing one of the following:
To specify GPU VM or TPU properties directly, complete the following steps:
Select Specify machine type.
Click the GPUs or TPUs tab, and then select the GPU machine type or TPU version to reserve.
To specify GPU VM properties by using an existing instance template, select Instance template, and then select the template.
To specify GPU VM properties by using an existing VM as reference, select Use existing VM, and then select the VM.
If you specified a TPU v5e (CT5LP) in the previous step, then, in the TPU v5 workload type list, select one of the following options:
To run workloads on the TPUs that handle large amounts of data in single or multiple operations, such as ML training workloads, select Batch.
To run workloads on the TPUs that handle concurrent requests and require minimal network latency, such as ML inference workloads, select Serving.
In the Search for capacity section, complete the following steps:
In the Region and Zone lists, specify the region and zone where you want to reserve resources. To review the supported regions and zones, see Limitations in this document.
In the Total capacity needed field (when reserving GPU VMs) or Number of chips list (when reserving TPUs), specify the number of GPU VMs or TPU chips to reserve.
In the Start time list, select the start time for your request.
Optional: In the Choose your start date flexibility list, select how exact your start date needs to be.
In the Reservation duration field, specify for how long you want to reserve resources.
Click Search for capacity. Then, in the Available capacity table, select one of the available options that contain the type, number, and reservation period of the GPU VMs or TPUs to reserve.
Click Next.
In the Share type section, select the projects to share your requested capacity with:
To use the reserved capacity only within your project, select Local.
To share the reserved capacity with other projects, select Shared, click
Add projects, and then follow the prompts to select the projects.
Click Next.
In the Future reservation name field, enter a name for the request.
In the Reservation name field, enter the name of the reservation that Compute Engine automatically creates to provision your requested capacity.
Click Create.
gcloud
To create a future reservation request in calendar mode and submit it for
review, use one of the following
gcloud beta compute future-reservations create
commands.
Based on the resources that you want to reserve, include the following
flags:
To reserve GPU VMs, include the
--total-count
and--machine-type
flags:gcloud beta compute future-reservations create FUTURE_RESERVATION_NAME \ --auto-delete-auto-created-reservations \ --total-count=NUMBER_OF_VMS \ --machine-type=MACHINE_TYPE \ --deployment-type=DENSE \ --planning-status=SUBMITTED \ --require-specific-reservation \ --reservation-mode=CALENDAR \ --reservation-name=RESERVATION_NAME \ --share-setting=SHARE_TYPE \ --start-time=START_TIME \ --end-time=END_TIME \ --zone=ZONE
To reserve TPUs, include the
--chip-count
and--tpu-version
flags:gcloud beta compute future-reservations create FUTURE_RESERVATION_NAME \ --auto-delete-auto-created-reservations \ --chip-count=NUMBER_OF_CHIPS \ --tpu-version=TPU_VERSION \ --deployment-type=DENSE \ --planning-status=SUBMITTED \ --require-specific-reservation \ --reservation-mode=CALENDAR \ --reservation-name=RESERVATION_NAME \ --share-setting=SHARE_TYPE \ --start-time=START_TIME \ --end-time=END_TIME \ --zone=ZONE
Replace the following:
FUTURE_RESERVATION_NAME
: the name of the request.NUMBER_OF_VMS
: the number of GPU VMs to reserve.MACHINE_TYPE
: the GPU machine type to reserve.NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
V6E
For TPU v5p:
V5P
For TPU v5e:
V5E
If you specify a TPU v5e, then you must include the
--workload-type
flag. Set the flag to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
RESERVATION_NAME
: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.SHARE_TYPE
: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:To use capacity only within your project:
local
To share capacity with other projects:
projects
If you specify
projects
, then you must include the--share-with
flag set to a comma-separated list of project IDs—for example,project-1,project-2
. You can specify up to 100 projects within your organization. Don't include your project ID in this list. You can consume the reserved capacity by default.START_TIME
: the start time of the request, formatted as an RFC 3339 timestamp.END_TIME
: the end time of your reservation period, formatted as an RFC 3339 timestamp. If you want to specify a duration, in seconds, for your reservation period instead of an end time, then replace the--end-time
flag with the--duration
flag.ZONE
: the zone where you want to reserve resources.
REST
To create a future reservation request in calendar mode and submit it for
review, send the following POST
request to the
beta futureReservations.insert
method.
Based on the resources that you want to reserve, include the following
fields in the request body:
To reserve GPU VMs, include the
totalCount
andmachineType
fields:POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/futureReservations { "name": "FUTURE_RESERVATION_NAME", "autoDeleteAutoCreatedReservations": true, "deploymentType": "DENSE", "planningStatus": "SUBMITTED", "reservationMode": "CALENDAR", "reservationName": "RESERVATION_NAME", "shareSettings": { "shareType": "SHARE_TYPE" }, "specificReservationRequired": true, "specificSkuProperties": { "totalCount": NUMBER_OF_VMS, "instanceProperties": { "machineType": "MACHINE_TYPE" } }, "timeWindow": { "startTime": "START_TIME", "endTime": "END_TIME" } }
To reserve TPUs, include the
acceleratorCount
andvmFamily
fields:POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/futureReservations { "name": "FUTURE_RESERVATION_NAME", "autoDeleteAutoCreatedReservations": true, "deploymentType": "DENSE", "planningStatus": "SUBMITTED", "reservationMode": "CALENDAR", "reservationName": "RESERVATION_NAME", "shareSettings": { "shareType": "SHARE_TYPE" }, "specificReservationRequired": true, "aggregateReservation": { "reservedResources": [ { "accelerator": { "acceleratorCount": NUMBER_OF_CHIPS } } ], "vmFamily": "TPU_VERSION" }, "timeWindow": { "startTime": "START_TIME", "endTime": "END_TIME" } }
Replace the following:
PROJECT_ID
: the ID of the project where you want to create the request.ZONE
: the zone where you want to reserve resources.FUTURE_RESERVATION_NAME
: the name of the request.RESERVATION_NAME
: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.SHARE_TYPE
: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:To use capacity only within your project:
LOCAL
To share capacity with other projects:
SPECIFIC_PROJECTS
If you specify
SPECIFIC_PROJECTS
, then, in theshareSettings
field, you must include theprojectMap
field to specify the projects to share the capacity with. You can specify up to 100 projects within your organization. Don't specify your project ID. You can consume the reserved capacity by default.For example, to share the requested capacity with two other projects, include the following:
"shareSettings": { "shareType": "SPECIFIC_PROJECTS", "projectMap": { "CONSUMER_PROJECT_ID_1": { "projectId": "CONSUMER_PROJECT_ID_1" }, "CONSUMER_PROJECT_ID_2": { "projectId": "CONSUMER_PROJECT_ID_2" } } }
Replace
CONSUMER_PROJECT_ID_1
andCONSUMER_PROJECT_ID_2
with the IDs of two projects that you want to allow to consume the requested capacity.NUMBER_OF_VMS
: the number of GPU VMs to reserve.MACHINE_TYPE
: the GPU machine type to reserve.NUMBER_OF_CHIPS
: the number of TPU chips to reserve.TPU_VERSION
: the TPU version to reserve. Specify one of the following values:For TPU v6e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E
For TPU v5p:
VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P
For TPU v5e:
VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP
If you specify a TPU v5e, then, in the
aggregateResources
field, you must include theworkloadType
field. Set the field to the type of workloads that you want to run on the TPUs:For workloads that handle large amounts of data in single or multiple operations, such as ML training workloads, specify
BATCH
.For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify
SERVING
.
START_TIME
: the start time of the request, formatted as an RFC 3339 timestamp.END_TIME
: the end time of your reservation period, formatted as an RFC 3339 timestamp. If you want to specify a duration, in seconds, for your reservation period instead of an end time, then replace theendTime
field with theduration
field.
What's next
Consume an auto-created reservation for GPU VMs in Compute Engine
Consume an auto-created reservation by using Vertex AI prediction jobs
Consume an auto-created reservation by using Vertex AI training jobs