Create a future reservation request in calendar mode


This document explains how to create a future reservation request in calendar mode. To learn more about this type of reservation, see Future reservation requests in calendar mode overview.

Create a future reservation request in calendar mode to reserve the following resources for up to 90 days:

  • Up to 80 virtual machine (VM) instances that have GPUs attached.

  • Up to 1,024 TPU chips.

At your chosen delivery date and time, you can create GPU or TPU VMs by consuming the reserved capacity. Use future reservation requests in calendar mode to obtain high-demand resources for the following workloads:

  • Model pre-training jobs

  • Model fine-tuning jobs

  • High performance computing (HPC) simulation workloads

  • Short-term increases in inference workloads

To reserve more than 80 GPU VMs or for longer than 90 days in a single request, see instead Reserve capacity in the AI Hypercomputer documentation.

Limitations

The following sections explain the limitations for future reservation requests in calendar mode.

Limitations for all requests

All future reservation requests in calendar mode have the following limitations:

  • You can reserve resources for a period between 1 and 90 days.

  • After you create and submit a request, you can't cancel, delete, or modify your request.

Limitations for requests for GPU VMs

You can only reserve GPU VMs as follows:

  • You can reserve between 1 and 80 GPU VMs per request.

  • You can reserve the following machine series:

  • You can reserve GPU VMs only in specific zones.

Limitations for requests for TPUs

You can only reserve TPUs as follows:

  • You can reserve 1, 4, 8, 16, 32, 64, 128, 256, 512, or 1,024 TPU chips per request.

  • You can reserve the following TPU versions:

  • You can only reserve 1, 4, or 8 TPU v5e chips for serving (SERVING) workload types.

  • You can only reserve TPUs in the following zones:

    • For TPU v6e:

      • asia-northeast1-b

      • us-east5-a

      • us-east5-b

    • For TPU v5p:

      • us-east5-a
    • For TPU v5e:

      • For batch (BATCH) workload types: us-west4-b

      • For serving (SERVING) workload types: us-central1-a

Before you begin

  • If you can't use future reservation requests in calendar mode, then you might not be eligible to access and use this feature. In this case, contact your account team or the sales team.
  • To share your reserved capacity with other projects within your organization, ensure that the project in which you want to create future reservation requests in calendar mode is allowed to create shared reservations. Otherwise, you will encounter errors.
  • If you haven't already, then set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    gcloud

      1. After installing the Google Cloud CLI, initialize it by running the following command:

        gcloud init

        If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

      2. Set a default region and zone.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      After installing the Google Cloud CLI, initialize it by running the following command:

      gcloud init

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to create a future reservation request in calendar mode, ask your administrator to grant you the Compute Future Reservation Admin (roles/compute.futureReservationAdmin) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to create a future reservation request in calendar mode. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create a future reservation request in calendar mode:

  • To create a future reservation request: compute.futureReservations.create on the project
  • To let Compute Engine automatically create reservations: compute.reservations.create on the project
  • To specify an instance template: compute.instanceTemplates.useReadOnly on the instance template
  • To view resources future availability: compute.advice.calendarMode on the project

You might also be able to get these permissions with custom roles or other predefined roles.

Overview

To create a future reservation request in calendar mode, complete the following steps:

  1. View resource future availability. View future availability for the GPU VMs or TPUs that you want to reserve. Then, when you create a request, specify the number, type, and reservation duration of the resources that you confirmed as available. Google Cloud is more likely to approve your request if you supply this information.

  2. Create a reservation request for GPU VMs or TPUs. Create and submit a future reservation request in calendar mode for GPU VMs or TPUs. If you can successfully create a request, then Google Cloud approves it within a minute.

View resource future availability

You can view future availability for GPU VMs or TPUs in a region as follows:

  • For GPU VMs, up to 60 days in advance

  • For TPUs, up to 120 days in advance

To view GPU VM or TPU future availability in a region, select one of the following options:

Console

You can view GPU VM or TPU future availability in a region when creating a future reservation request in calendar mode. For more information, see Create a reservation request for GPU VMs or TPUs in this document.

gcloud

To view GPU VM or TPU future availability in a region, use one of the following gcloud beta compute advice calendar-mode commands. Based on the resources that you want to view, include the following flags:

  • To view GPU VM availability, include the --vm-count and --machine-type flags:

    gcloud beta compute advice calendar-mode \
        --vm-count=NUMBER_OF_VMS \
        --machine-type=MACHINE_TYPE \
        --region=REGION \
        --start-time-range=from=FROM_START_TIME,to=TO_START_TIME \
        --end-time-range=from=FROM_END_TIME,to=TO_END_TIME
    
  • To view TPU availability, include the --chip-count and --tpu-version flags:

    gcloud beta compute advice calendar-mode \
        --chip-count=NUMBER_OF_CHIPS \
        --tpu-version=TPU_VERSION \
        --region=REGION \
        --start-time-range=from=FROM_START_TIME,to=TO_START_TIME \
        --end-time-range=from=FROM_END_TIME,to=TO_END_TIME
    

Replace the following:

  • NUMBER_OF_VMS: the number of GPU VMs to reserve.

  • MACHINE_TYPE: the GPU machine type to reserve.

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: V6E

    • For TPU v5p: V5P

    • For TPU v5e: V5E

    If you specify a TPU v5e, then you must include the --workload-type flag. Set this flag to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • REGION: the region where to reserve GPU VMs or TPUs. To check which regions and zones are supported, see Limitations in this document.

  • FROM_START_TIME and TO_START_TIME: the earliest and latest dates that you want to reserve capacity on. Format these dates as RFC 3339 timestamps:

    YYYY-MM-DDTHH:MM:SSOFFSET
    

    Replace the following:

    • YYYY-MM-DD: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-).

    • HH:MM:SS: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:).

    • OFFSET: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify -08:00. To use no offset, specify Z.

  • FROM_END_TIME and TO_END_TIME: the earliest and latest dates that you want your capacity reservation to end on. Format these dates as RFC 3339 timestamps. If you want to specify a range of durations for your reservation period instead of end times, then replace the --end-time-range flag with the --duration-range flag.

The output is similar to the following:

- recommendationsPerSpec:
    spec:
      endTime: '2025-09-07T00:00:00Z'
      location: zones/us-east5-a
      otherLocations:
        zones/us-east5-b:
          details: this machine family is not supported in this zone
          status: NOT_SUPPORTED
        zones/us-east5-c:
          details: this machine family is not supported in this zone
          status: NOT_SUPPORTED
      recommendationId: 0d3f005d-f952-4fce-96f2-6af25e1591eb
      recommendationType: FUTURE_RESERVATION
      startTime: '2025-06-09T00:00:00Z'

If your requested resources are available, then the output contains the startTime, endTime, and location fields. These fields specify the earliest start time, the latest end time, and the zones when resources are available.

REST

To view GPU VM or TPU future availability in a region, make a GET request to the beta advice.calendarMode method. Based on the resources that you want to view, include the following fields in the request body:

  • To view GPU VM availability, include the instanceCount and machineType fields:

    POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/advice/calendarMode
    
    {
      "futureResourcesSpecs": {
        "spec": {
          "targetResources": {
            "specificSkuResources": {
              "instanceCount": "NUMBER_OF_VMS",
              "machineType": "MACHINE_TYPE"
            }
          },
          "timeRangeSpec": {
            "startTimeNotEarlierThan": "FROM_START_TIME",
            "startTimeNotLaterThan": "TO_START_TIME",
            "endTimeNotEarlierThan": "FROM_END_TIME",
            "endTimeNotLaterThan": "TO_END_TIME"
          }
        }
      }
    }
    
  • To view TPU availability, include the acceleratorCount and vmFamily fields:

    POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/advice/calendarMode
    
    {
      "futureResourcesSpecs": {
        "spec": {
          "targetResources": {
            "aggregateResources": {
              "acceleratorCount": "NUMBER_OF_CHIPS",
              "vmFamily": "TPU_VERSION"
            }
          },
          "timeRangeSpec": {
            "startTimeNotEarlierThan": "FROM_START_TIME",
            "startTimeNotLaterThan": "TO_START_TIME",
            "endTimeNotEarlierThan": "FROM_END_TIME",
            "endTimeNotLaterThan": "TO_END_TIME"
          }
        }
      }
    }
    

Replace the following:

  • PROJECT_ID: the ID of the project where you want to reserve resources.

  • REGION: the region where you want to reserve GPU VMs or TPUs. To check the regions and zones that are supported, see Limitations in this document.

  • NUMBER_OF_VMS: the number of GPU VMs to reserve.

  • MACHINE_TYPE: the GPU machine type to reserve.

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E

    • For TPU v5p: VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P

    • For TPU v5e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP

    If you specify a TPU v5e, then, in the aggregateResources field, you must include the workloadType field. Set this field to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • FROM_START_TIME and TO_START_TIME: the earliest and latest dates that you want to reserve capacity on. Format these dates as RFC 3339 timestamps:

    YYYY-MM-DDTHH:MM:SSOFFSET
    

    Replace the following:

    • YYYY-MM-DD: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-).

    • HH:MM:SS: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:).

    • OFFSET: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify -08:00. To use no offset, specify Z.

  • FROM_END_TIME and TO_END_TIME: the earliest and latest dates that you want your capacity reservation to end on. Format these dates as RFC 3339 timestamps. If you want to specify a range of durations for your reservation period instead of end times, then replace the endTimeNotEarlierThan and endTimeNotLaterThan fields with the minDuration and maxDuration fields.

The output is similar to the following:

{
  "recommendations": [
    {
      "recommendationsPerSpec": {
        "spec": {
          "recommendationId": "a21a2fa0-72c7-4105-8179-88de5409890b",
          "recommendationType": "FUTURE_RESERVATION",
          "startTime": "2025-06-09T00:00:00Z",
          "endTime": "2025-09-07T00:00:00Z",
          "otherLocations": {
            "zones/us-east5-b": {
              "status": "NOT_SUPPORTED",
              "details": "this machine family is not supported in this zone"
            },
            "zones/us-east5-c": {
              "status": "NOT_SUPPORTED",
              "details": "this machine family is not supported in this zone"
            }
          },
          "location": "zones/us-east5-a"
        }
      }
    }
  ]
}

If your requested resources are available, then the output contains the startTime, endTime, and location fields. These fields specify the earliest start time, the latest end time, and the zones when resources are available.

Create a reservation request for GPU VMs or TPUs

When you create a future reservation request in calendar mode, you can only specify a reservation period as follows:

  • Start time: based on the resources that you want to reserve, you must specify a start time that is at least one of the following values from when you create and submit a request:

    • For GPU VMs, 87 hours (three days and 15 hours)

    • For TPUs, 24 hours

  • End time: you can reserve resources for a maximum of 90 days.

To create a request by using an existing GPU VM as reference, use the Google Cloud console. Otherwise, select one of the following options:

Console

  1. In the Google Cloud console, go to the Reservations page.

    Go to Reservations

  2. Click the Future reservations tab.

  3. Click Create future reservation. The Create a future reservation page appears and the Hardware configuration pane is selected.

  4. In the Configuration section, specify the properties of the GPU VMs or TPUs that you want to reserve by doing one of the following:

    • To specify GPU VM or TPU properties directly, complete the following steps:

      1. Select Specify machine type.

      2. Click the GPUs or TPUs tab, and then select the GPU machine type or TPU version to reserve.

    • To specify GPU VM properties by using an existing instance template, select Instance template, and then select the template.

    • To specify GPU VM properties by using an existing VM as reference, select Use existing VM, and then select the VM.

  5. If you specified a TPU v5e (CT5LP) in the previous step, then, in the TPU v5 workload type list, select one of the following options:

    • To run workloads on the TPUs that handle large amounts of data in single or multiple operations, such as ML training workloads, select Batch.

    • To run workloads on the TPUs that handle concurrent requests and require minimal network latency, such as ML inference workloads, select Serving.

  6. In the Search for capacity section, complete the following steps:

    1. In the Region and Zone lists, specify the region and zone where you want to reserve resources. To review the supported regions and zones, see Limitations in this document.

    2. In the Total capacity needed field (when reserving GPU VMs) or Number of chips list (when reserving TPUs), specify the number of GPU VMs or TPU chips to reserve.

    3. In the Start time list, select the start time for your request.

    4. Optional: In the Choose your start date flexibility list, select how exact your start date needs to be.

    5. In the Reservation duration field, specify for how long you want to reserve resources.

    6. Click Search for capacity. Then, in the Available capacity table, select one of the available options that contain the type, number, and reservation period of the GPU VMs or TPUs to reserve.

  7. Click Next.

  8. In the Share type section, select the projects to share your requested capacity with:

    • To use the reserved capacity only within your project, select Local.

    • To share the reserved capacity with other projects, select Shared, click Add projects, and then follow the prompts to select the projects.

  9. Click Next.

  10. In the Future reservation name field, enter a name for the request.

  11. In the Reservation name field, enter the name of the reservation that Compute Engine automatically creates to provision your requested capacity.

  12. Click Create.

gcloud

To create a future reservation request in calendar mode and submit it for review, use one of the following gcloud beta compute future-reservations create commands. Based on the resources that you want to reserve, include the following flags:

  • To reserve GPU VMs, include the --total-count and --machine-type flags:

    gcloud beta compute future-reservations create FUTURE_RESERVATION_NAME \
        --auto-delete-auto-created-reservations \
        --total-count=NUMBER_OF_VMS \
        --machine-type=MACHINE_TYPE \
        --deployment-type=DENSE \
        --planning-status=SUBMITTED \
        --require-specific-reservation \
        --reservation-mode=CALENDAR \
        --reservation-name=RESERVATION_NAME \
        --share-setting=SHARE_TYPE \
        --start-time=START_TIME \
        --end-time=END_TIME \
        --zone=ZONE
    
  • To reserve TPUs, include the --chip-count and --tpu-version flags:

    gcloud beta compute future-reservations create FUTURE_RESERVATION_NAME \
        --auto-delete-auto-created-reservations \
        --chip-count=NUMBER_OF_CHIPS \
        --tpu-version=TPU_VERSION \
        --deployment-type=DENSE \
        --planning-status=SUBMITTED \
        --require-specific-reservation \
        --reservation-mode=CALENDAR \
        --reservation-name=RESERVATION_NAME \
        --share-setting=SHARE_TYPE \
        --start-time=START_TIME \
        --end-time=END_TIME \
        --zone=ZONE
    

Replace the following:

  • FUTURE_RESERVATION_NAME: the name of the request.

  • NUMBER_OF_VMS: the number of GPU VMs to reserve.

  • MACHINE_TYPE: the GPU machine type to reserve.

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: V6E

    • For TPU v5p: V5P

    • For TPU v5e: V5E

    If you specify a TPU v5e, then you must include the --workload-type flag. Set the flag to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • RESERVATION_NAME: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.

  • SHARE_TYPE: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:

    • To use capacity only within your project: local

    • To share capacity with other projects: projects

    If you specify projects, then you must include the --share-with flag set to a comma-separated list of project IDs—for example, project-1,project-2. You can specify up to 100 projects within your organization. Don't include your project ID in this list. You can consume the reserved capacity by default.

  • START_TIME: the start time of the request, formatted as an RFC 3339 timestamp.

  • END_TIME: the end time of your reservation period, formatted as an RFC 3339 timestamp. If you want to specify a duration, in seconds, for your reservation period instead of an end time, then replace the --end-time flag with the --duration flag.

  • ZONE: the zone where you want to reserve resources.

REST

To create a future reservation request in calendar mode and submit it for review, send the following POST request to the beta futureReservations.insert method. Based on the resources that you want to reserve, include the following fields in the request body:

  • To reserve GPU VMs, include the totalCount and machineType fields:

    POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/futureReservations
    
    {
      "name": "FUTURE_RESERVATION_NAME",
      "autoDeleteAutoCreatedReservations": true,
      "deploymentType": "DENSE",
      "planningStatus": "SUBMITTED",
      "reservationMode": "CALENDAR",
      "reservationName": "RESERVATION_NAME",
      "shareSettings": {
        "shareType": "SHARE_TYPE"
      },
      "specificReservationRequired": true,
      "specificSkuProperties": {
        "totalCount": NUMBER_OF_VMS,
        "instanceProperties": {
          "machineType": "MACHINE_TYPE"
        }
      },
      "timeWindow": {
        "startTime": "START_TIME",
        "endTime": "END_TIME"
      }
    }
    
  • To reserve TPUs, include the acceleratorCount and vmFamily fields:

    POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/futureReservations
    
    {
      "name": "FUTURE_RESERVATION_NAME",
      "autoDeleteAutoCreatedReservations": true,
      "deploymentType": "DENSE",
      "planningStatus": "SUBMITTED",
      "reservationMode": "CALENDAR",
      "reservationName": "RESERVATION_NAME",
      "shareSettings": {
        "shareType": "SHARE_TYPE"
      },
      "specificReservationRequired": true,
      "aggregateReservation": {
        "reservedResources": [
          {
            "accelerator": {
              "acceleratorCount": NUMBER_OF_CHIPS
            }
          }
        ],
        "vmFamily": "TPU_VERSION"
      },
      "timeWindow": {
        "startTime": "START_TIME",
        "endTime": "END_TIME"
      }
    }
    

Replace the following:

  • PROJECT_ID: the ID of the project where you want to create the request.

  • ZONE: the zone where you want to reserve resources.

  • FUTURE_RESERVATION_NAME: the name of the request.

  • RESERVATION_NAME: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.

  • SHARE_TYPE: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:

    • To use capacity only within your project: LOCAL

    • To share capacity with other projects: SPECIFIC_PROJECTS

    If you specify SPECIFIC_PROJECTS, then, in the shareSettings field, you must include the projectMap field to specify the projects to share the capacity with. You can specify up to 100 projects within your organization. Don't specify your project ID. You can consume the reserved capacity by default.

    For example, to share the requested capacity with two other projects, include the following:

    "shareSettings": {
      "shareType": "SPECIFIC_PROJECTS",
      "projectMap": {
        "CONSUMER_PROJECT_ID_1": {
          "projectId": "CONSUMER_PROJECT_ID_1"
        },
        "CONSUMER_PROJECT_ID_2": {
          "projectId": "CONSUMER_PROJECT_ID_2"
        }
      }
    }
    

    Replace CONSUMER_PROJECT_ID_1 and CONSUMER_PROJECT_ID_2 with the IDs of two projects that you want to allow to consume the requested capacity.

  • NUMBER_OF_VMS: the number of GPU VMs to reserve.

  • MACHINE_TYPE: the GPU machine type to reserve.

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E

    • For TPU v5p: VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P

    • For TPU v5e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP

    If you specify a TPU v5e, then, in the aggregateResources field, you must include the workloadType field. Set the field to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as ML training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • START_TIME: the start time of the request, formatted as an RFC 3339 timestamp.

  • END_TIME: the end time of your reservation period, formatted as an RFC 3339 timestamp. If you want to specify a duration, in seconds, for your reservation period instead of an end time, then replace the endTime field with the duration field.

What's next