This document gives an overview of future reservation requests in calendar mode. To learn more about the different ways to reserve resources in Compute Engine, see Choose a reservation type.
To help ensure that you can obtain Tensor Processing Units (TPUs), use future reservation requests in calendar mode. If Google Cloud approves your request, then Compute Engine provisions your reserved TPUs at your chosen date and time, and for your chosen duration. You can then use the reserved resources to create virtual machine (VM) instances with TPUs attached to run the following workloads:
Model pre-training jobs
Model fine-tuning jobs
High performance computing (HPC) simulation workloads
Short-term expected increases in inference workloads
Create a request in calendar mode
The following sections explain how to view resource availability, as well as what details to specify when you create a future reservation request in calendar mode.
View resources future availability
Before you create a future reservation request in calendar mode, you can view the future availability of the TPUs that you want to reserve. Compute Engine uses the Dynamic Workload Scheduler (DWS) to view when your requested number and the type of TPUs that are available, up to 120 days in the future.
When you create a request, specify the number, type, and reservation period of the TPUs that you confirmed as available. This action helps ensure that Google Cloud approves your request.
Define request properties
When you create a future reservation request in calendar mode, you must specify the following properties:
Auto-delete. This property determines if Compute Engine deletes the automatically created (auto-created) reservation for your request at the end time, even if the reservation isn't fully consumed. To create a request in calendar mode, you must enable the auto-delete option.
Consumption type. This property defines how VMs consume the auto-created reservation. When you create a request in calendar mode, you must specify that you want to create specifically-targeted reservations. This setting means that only VMs that target the reservation can consume it.
Deployment type. This property defines the collocation of your reserved resources. When you create a request in calendar mode, you must specify to densely deploy resources. In this kind of deployment, resources are located close to each other to minimize network latency.
Name. The name of your request, which must be unique within your project.
Number of resources. The number of TPUs to reserve at your request start time.
Planning status. This property defines if you immediately submit your request to Google Cloud for review, or save it as a draft and submit it later. When you create a request in calendar mode, you must specify to immediately submit the request for review.
Reservation mode. This property defines the method to reserve resources, which you must set to
CALENDAR
for a request in calendar mode.Reservation name. The name for the reservation that Compute Engine automatically creates if Google Cloud approves your request.
Share type. This property defines if other projects in your organization can consume the auto-created reservation for your approved request. You can specify one of the following options:
Single-project. Only your project can consume the reserved capacity.
Shared. You can share the reserved capacity with up to 100 other projects in your organization. If you specify this option, then you must specify the projects to share the auto-created reservation with. For more information, see the best practices for shared reservations.
Reservation period. The date and time when Compute Engine provisions your requested capacity, and you can consume it. The reservation period includes the following:
Start time. When you want to start consuming your reserved capacity. The start time must be at least 24 hours from when you create and submit a request.
End time. When your requested capacity is no longer reserved for you. At this time, Compute Engine automatically deletes the auto-created reservation and any VMs consuming it.
Resource properties. The hardware requirements of the TPUs that you want to reserve. VMs can only use a reservation if their properties match the reservation's properties. For more information, see the requirements to consume reservations.
Workload type. If you reserve TPUs v5p or v5e, then you must specify how to reserve capacity based on the workload for your TPUs:
Batch. For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads.
Serving. For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads.
Zone. The zone where you want to reserve capacity.
Request review process
To reserve capacity using a future reservation request in calendar mode, you must create and submit the request to Google Cloud for review. After you create and submit the request, one of the following occurs:
Google Cloud approves your request: Compute Engine reserves your requested resources and, within a minute after approval, automatically creates an empty reservation. At the start time, Compute Engine provisions your requested capacity by increasing the number of TPUs in the reservation.
You encounter an error. Creating the request fails because the request's zone lacks sufficient resources. We recommend that you view future resources availability again, and then create and submit a new request for review.
Request lifecycle
The following diagram shows the different states that Compute Engine can set a future reservation request in calendar mode to:
The states and flow of events shown in the preceding diagram are as follows:
PENDING_APPROVAL
: you created and submitted a request for review. Within a minute, Google Cloud approves the request.APPROVED
: Google Cloud approved your request. Then, within a minute, Compute Engine automatically creates an empty reservation and changes the request state toPROCURING
.PROCURING
: Compute Engine schedules the provisioning of your reserved resources. 30 minutes before the request start time, the request state changes toPROVISIONING
.PROVISIONING
: Compute Engine is provisioning your reserved resources by increasing the number of reserved TPUs in the auto-created reservation. At the request start time, the request state changes toFULFILLED
.FULFILLED
: Compute Engine has provisioned your reserved resources, and you're charged for them. You can consume the auto-created reservation by creating VMs until the request end time.
At the request end time, Compute Engine automatically deletes the request, the auto-created reservation, and any VMs consuming the reservation.
Consume provisioned capacity
After Google Cloud approves a future reservation request in calendar mode, Compute Engine automatically creates a reservation as follows:
The auto-created reservation has no reserved TPUs, so you can't consume it yet.
The auto-created reservation inherits the same TPU properties specified in your request.
At the request start time, Compute Engine provisions your requested capacity by increasing the number of TPUs in the auto-created reservation. You can then start consuming the reservation by creating VMs with TPUs attached that meet all of the following conditions:
The VMs and the reservation have matching properties.
The VMs specifically targets the reservation.
The VMs use the reservation-bound provisioning model.
You can create VMs until the reservation is fully consumed or until the request end time. At the request end time, Compute Engine automatically deletes the reservation and any VMs that are consuming it.
Quota
Future reservation requests in calendar mode must use the reservation-bound provisioning model. This model doesn't require Compute Engine quota to reserve resources. However, before you create a request, ensure that you have sufficient quota for any resources that aren't part of a reservation when creating VMs, such as disks or IP addresses.
Pricing
You aren't charged when creating a future reservation request in calendar mode. Instead, you incur charges when the following occurs:
Compute Engine provisions your requested capacity. When a request reaches the
FULFILLED
state, you're charged for the provisioned resources according to DWS pricing. This pricing model offers TPUs at a discounted price.You use resources not covered by the reservation. When you create VMs that consume an auto-created reservation, you aren't charged again for the consumed resources. You're only charged for resources that aren't part of the reservation, such as disks or IP addresses.
You stop incurring charges at the request end time when Compute Engine automatically deletes the auto-created reservation and any VMs consuming it.
Limitations
The following sections explain the limitations for future reservation requests in calendar mode.
Limitations on creation
When you create a future reservation request in calendar mode, the following limitations apply:
Supported TPU versions | Number of TPU chips per request | Reservation period | Supported zones |
---|---|---|---|
TPU v6e | 1, 4, 8, 16, 32, 64, 128, or 256 | 1 to 90 days |
|
TPU v5p | 4, 8, 16, 32, 64, 128, 256, 512, or 1,024 | 1 to 90 days | us-east5-a |
TPU v5e | 1*, 4*, 8*, 16, 32, 64, 128, or 256 | 1 to 90 days |
|
* You can only reserve one, four, or eight TPUs v5e for serving
(SERVING
) workload types.
Limitations after creation
After you create a future reservation request in calendar mode, and Google Cloud approves your request, the following limitations apply:
You can't cancel, delete, or modify the request.
An auto-created reservation for a request has the following limitations:
You can't apply committed use discounts (CUDs) or sustained use discounts (SUDs) to the reservation.
You can't modify or delete the reservation; Compute Engine automatically deletes it, along with any VMs that consume it, at the request end time.