About future reservation requests in calendar mode

This document gives an overview of future reservation requests in calendar mode. To learn more about the different ways to reserve resources in Compute Engine, see Choose a reservation type.

To obtain capacity to create virtual machine (VM) instances that have GPUs or TPUs attached, use future reservation requests in calendar mode. If Google Cloud approves your request, then Compute Engine provisions your reserved resources at your chosen date and time, and for up to 90 days. You can then use the reserved resources to create GPU VMs or TPU VMs to run the following workloads:

Model pre-training jobs
Model fine-tuning jobs
High performance computing (HPC) simulation workloads
Short-term expected increases in inference workloads

Create a request in calendar mode

The following sections explain how to view resource availability, as well as what details to specify when you create a future reservation request in calendar mode.

View resources future availability

Before you create a future reservation request in calendar mode, you can view the future availability in a region of the following resources:

For GPU VMs, up to 60 days in advance
For TPUs, up to 120 days in advance

Compute Engine uses the Dynamic Workload Scheduler (DWS) to view when your requested resources are available. When you create a request, specify the number, type, and reservation period for the resources that you confirmed as available. Google Cloud is more likely to approve your request if you supply this information.

Define request properties

When you create a future reservation request in calendar mode, you must specify the following properties:

Auto-delete. This property determines if Compute Engine deletes the automatically created (auto-created) reservation for your request at the end time, even if the reservation isn't fully consumed. To create a request in calendar mode, you must enable the auto-delete option.
Consumption type. This property defines how VMs consume the auto-created reservation. When you create a request in calendar mode, you must specify that you want to create specifically-targeted reservations. This setting means that only VMs that target the reservation can consume it.
Deployment type. This property defines the collocation of your reserved resources. When you create a request in calendar mode, you must specify to densely deploy resources. In this kind of deployment, resources are located close to each other to minimize network latency.
Name. The name of your request, which must be unique within your project.
Number of resources. The number of GPU VMs or TPUs to reserve at your requested start time.
Planning status. This property defines if you immediately submit your request to Google Cloud for review, or save it as a draft and submit it later. When you create a request in calendar mode, you must specify to immediately submit the request for review.
Reservation mode. This property defines the method to reserve resources, which you must set to CALENDAR for a request in calendar mode.
Reservation name. The name for the reservation that Compute Engine automatically creates if Google Cloud approves your request.
Share type. This property defines if other projects in your organization can consume the auto-created reservation for your approved request. You can specify one of the following options:
- Single-project. Only your project can consume the reserved capacity.
- Shared. You can share the reserved capacity with up to 100 other projects in your organization. If you specify this option, then you must specify the projects to share the auto-created reservation with. For more information, see the best practices for shared reservations.
Important: You can only specify the share type and shared projects for the auto-created reservation when you create a request.
Reservation period. The date and time when Compute Engine provisions your requested capacity, and you can consume it. The reservation period includes the following:
- Start time. When you want to start consuming your reserved capacity. Based on the resources that you reserve, the start time must be at least one of the following values from when you create and submit a request:
  - For GPU VMs, 87 hours (three days and 15 hours)
  - For TPUs, six hours
- End time. When your requested capacity is no longer reserved for you. At this time, Compute Engine deletes the auto-created reservation, and stops or deletes and any VMs that consume the reservation based on the termination action that you specified for the VMs.
Resource properties. The hardware requirements of the GPU VMs or TPUs that you want to reserve. VMs can only use a reservation if their properties match the reservation's properties. For more information, see the requirements to consume reservations.
Workload type. If you reserve TPU v5e, then you must specify how to reserve capacity based on your workload type:
- Batch. For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads.
- Serving. For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads.
Zone. The zone where you want to reserve capacity.

Request review process

To reserve capacity by using a future reservation request in calendar mode, you must create and submit the request to Google Cloud for review. After you create and submit a request, Google Cloud reviews it within a minute, and then one of the following occurs:

Google Cloud approves your request: Compute Engine reserves your requested resources and, within a minute after approval, automatically creates an empty reservation. At the request start time, Compute Engine provisions your requested capacity by increasing the number of GPU VMs or TPUs in the reservation.

Caution: After you create a request, you can't cancel, delete, or modify it. You commit to pay for the requested capacity at the request start time, regardless if you use the capacity or not.
You encounter an error. The request fails because the request's zone lacks sufficient resources. We recommend that you view future resources availability again, and then create and submit a new request for review.

Request lifecycle

The following diagram shows the different states that Compute Engine can set a future reservation request in calendar mode to:

The states and flow of events shown in the preceding diagram are as follows:

PENDING_APPROVAL: you created and submitted a request for review. Within a minute, Google Cloud approves the request.
APPROVED: Google Cloud approved your request. Then, within a minute, Compute Engine automatically creates an empty reservation and changes the request state to PROCURING.
PROCURING: Compute Engine schedules the provisioning of your reserved resources. Before the request start time, the request state changes to PROVISIONING.
PROVISIONING: Compute Engine is provisioning your reserved resources by increasing the number of reserved GPU VMs or TPUs in the auto-created reservation. At the request start time, the request state changes to FULFILLED.
FULFILLED: Compute Engine has provisioned your reserved resources, and you're charged for them. You can consume the auto-created reservation by creating VMs until the request end time.

At the request end time, Compute Engine deletes the request and the auto-created reservation. It also stops or deletes any VMs that consume the reservation based on the termination action that you specified for the VMs.

Consume provisioned capacity

After Google Cloud approves a future reservation request in calendar mode, Compute Engine automatically creates a reservation with the following characteristics:

The auto-created reservation has no reserved GPU VMs or TPUs; you can't consume it yet.
The auto-created reservation inherits the VM or TPU properties specified in your request.

At the request start time, Compute Engine provisions your requested capacity by increasing the number of GPU VMs or TPUs in the auto-created reservation. You can then consume the reservation by creating GPU VMs or TPU VMs that meet all of the following conditions:

The VMs and the reservation have matching properties.
The VMs specifically target the reservation.
The VMs use the reservation-bound provisioning model.
The VMs must be stopped or deleted at the reservation end time.

You can create VMs until the reservation is fully consumed or until the request end time. At the request end time, Compute Engine deletes the auto-created reservation, and stops or deletes any VMs that consume the reservation.

Quota

Future reservation requests in calendar mode must use the reservation-bound provisioning model. This model doesn't require Compute Engine quota to reserve resources. However, before you create a request, verify that you have sufficient quota for any resources that aren't part of a reservation when you create VMs, such as disks or IP addresses.

Pricing

When you create a future reservation request in calendar mode, you aren't charged. Instead, you incur charges when the following occurs:

Compute Engine provisions your requested capacity. When a request reaches the FULFILLED state, you're charged for the provisioned resources according to DWS pricing. This pricing model offers vCPUs, GPUs, and TPUs at a discounted price.
You use resources not covered by the reservation. When you create VMs that consume an auto-created reservation, you aren't charged again for the consumed resources. You're only charged for resources that aren't part of the reservation, such as disks or IP addresses.

You stop incurring charges at the request end time. At this time, Compute Engine deletes the auto-created reservation, and stops or deletes any VMs that consume the reservation.

Limitations

The following sections explain the limitations for future reservation requests in calendar mode.

Limitations for all requests

All future reservation requests in calendar mode have the following limitations:

You can reserve resources for a period between 1 and 90 days.
After you create and submit a request, you can't cancel, delete, or modify your request.

Limitations for requests for GPU VMs

You can only reserve GPU VMs as follows:

You can reserve between 1 and 80 GPU VMs per request.
You can reserve the following machine series:
- A4
- A3 Ultra
You can reserve GPU VMs only in specific zones.

Limitations for requests for TPUs

You can only reserve TPUs as follows:

You can reserve 1, 4, 8, 16, 32, 64, 128, 256, 512, or 1,024 TPU chips per request.
You can reserve the following TPU versions:
- TPU v6e
- TPU v5p
- TPU v5e
You can only reserve 1, 4, or 8 TPU v5e chips for serving (SERVING) workload types.
You can reserve TPUs only in the following zones:
- For TPU v6e:
  - asia-northeast1-b
  - us-east5-a
  - us-east5-b
- For TPU v5p:
  - us-east5-a
- For TPU v5e:
  - For batch (BATCH) workload types: us-west4-b
  - For serving (SERVING) workload types: us-central1-a

Limitations for all auto-created reservations

An auto-created reservation for a request has the following limitations:

You can only modify the reservation as follows:
- To allow or disallow Vertex AI jobs from consuming it.
- After the reservation start time.
You can't apply committed use discounts (CUDs) or sustained use discounts (SUDs) to the reservation.
You can't delete the reservation; Compute Engine deletes it at the end time for the reservation.

What's next

Create a future reservation request in calendar mode