Create a future reservation request in calendar mode


This document explains how to create a future reservation request in calendar mode. To learn more about this type of reservation, see Future reservation requests in calendar mode overview.

You can create a future reservation request in calendar mode to reserve up to 1,024 TPU chips for a maximum of 90 days. You can then use this capacity to create virtual machine (VM) instances with Tensor Processing Units (TPUs) attached. Future reservation requests in calendar mode help you obtain high-demand resources for the following workloads:

  • Model pre-training jobs

  • Model fine-tuning jobs

  • High performance computing (HPC) simulation workloads

  • Short-term increases in inference workloads

To reserve GPU VMs for long-running training and inference jobs, see instead Request capacity in the AI Hypercomputer documentation.

Limitations

Before you create a future reservation request in calendar mode, consider the following limitations:

  • You can only reserve TPUs as follows:

    Supported TPU versions Number of TPU chips per request Reservation period Supported zones
    TPU v6e 1, 4, 8, 16, 32, 64, 128, or 256 1 to 90 days
    • asia-northeast1-b
    • us-east5-a
    TPU v5p 4, 8, 16, 32, 64, 128, 256, 512, or 1,024 1 to 90 days us-east5-a
    TPU v5e 1*, 4*, 8*, 16, 32, 64, 128, or 256 1 to 90 days
    • us-west4-b (BATCH)
    • us-central1-a (SERVING)

    * You can only reserve one, four, or eight TPUs v5e for serving (SERVING) workload types.

  • You can't cancel, delete, or modify requests.

Before you begin

  • If you can't use future reservation requests in calendar mode, then you might not be eligible to access and use this feature. In this case, contact your account team or the sales team.
  • To share your reserved capacity with other projects within your organization, ensure that the project in which you want to create future reservation requests in calendar mode is allowed to create shared reservations. Otherwise, you will encounter errors.
  • If you haven't already, then set up authentication. Authentication is the process by which your identity is verified for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:

    Select the tab for how you plan to use the samples on this page:

    Console

    When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

    gcloud

    1. After installing the Google Cloud CLI, initialize it by running the following command:

      gcloud init

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    2. Set a default region and zone.

    REST

    To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

      After installing the Google Cloud CLI, initialize it by running the following command:

      gcloud init

      If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

    For more information, see Authenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to create a future reservation request in calendar mode, ask your administrator to grant you the Compute Future Reservation Admin (roles/compute.futureReservationAdmin) IAM role on the project. For more information about granting roles, see Manage access to projects, folders, and organizations.

This predefined role contains the permissions required to create a future reservation request in calendar mode. To see the exact permissions that are required, expand the Required permissions section:

Required permissions

The following permissions are required to create a future reservation request in calendar mode:

  • To create a future reservation request: compute.futureReservations.create on the project
  • To let Compute Engine automatically create reservations: compute.reservations.create on the project
  • To specify an instance template: compute.instanceTemplates.useReadOnly on the instance template
  • To view resources future availability: compute.advice.calendarMode on the project

You might also be able to get these permissions with custom roles or other predefined roles.

Overview

To create a future reservation request in calendar mode, complete the following steps:

  1. View TPU future availability. View the future availability of the TPUs that you want to reserve. Then, when you create a request, specify the number, type, and reservation duration of the TPUs that you confirmed as available. This action increases the chances that Google Cloud approves your request.

  2. Create a reservation request for TPUs. Create and submit a future reservation request in calendar mode for TPUs. If creating your request is successful, then Google Cloud approves it within a minute.

View TPU future availability

You can view the future availability of TPUs in a region for up to 120 days in advance.

To view the future availability of four or eight TPUs v5p in a region, use the Google Cloud CLI or REST API. Otherwise, select one of the following options:

Console

You can view future availability of TPUs in a region when creating a future reservation request in calendar mode. For more information, see Create a reservation request for TPUs in this document.

gcloud

To view the future availability of TPUs in a region, use the gcloud beta compute advice calendar-mode command with the --chip-count and --tpu-version flags:

gcloud beta compute advice calendar-mode \
    --chip-count=NUMBER_OF_CHIPS \
    --tpu-version=TPU_VERSION \
    --region=REGION \
    --start-time-range=from=FROM_START_TIME,to=TO_START_TIME \
    --duration-range=max=MAXIMUM_DURATION,min=MINIMUM_DURATION

Replace the following:

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: V6E

    • For TPU v5p: V5P

    • For TPU v5e: V5E

    If you specify a TPU v5p or v5e, then you must include the --workload-type flag. Set this flag to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • REGION: the region where to reserve TPUs. To review the supported regions and zones, see Limitations in this document.

  • FROM_START_TIME and TO_START_TIME: the earliest and latest date when you want to reserve capacity. Format these dates as RFC 3339 timestamps:

    YYYY-MM-DDTHH:MM:SSOFFSET
    

    Replace the following:

    • YYYY-MM-DD: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-).

    • HH:MM:SS: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:).

    • OFFSET: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify -08:00. To use no offset, specify Z.

  • MAXIMUM_DURATION and MINIMUM_DURATION: the maximum and minimum duration you want to reserve resources for. Format these durations as the number of days, hours, minutes, or seconds followed by d, h, m, or s respectively. For example, specify 30m for 30 minutes or 1d2h3m4s for one day, two hours, three minutes, and four seconds. You can reserve resources for a minimum of 24 hours and a maximum of 90 days.

The output is similar to the following:

- recommendationsPerSpec:
    spec:
      endTime: '2025-09-07T00:00:00Z'
      location: zones/us-east5-a
      otherLocations:
        zones/us-east5-b:
          details: this machine family is not supported in this zone
          status: NOT_SUPPORTED
        zones/us-east5-c:
          details: this machine family is not supported in this zone
          status: NOT_SUPPORTED
      recommendationId: 0d3f005d-f952-4fce-96f2-6af25e1591eb
      recommendationType: FUTURE_RESERVATION
      startTime: '2025-06-09T00:00:00Z'

If your requested resources are available, then the output contains the startTime, endTime, and location fields. These fields specify the earliest start time, the latest end time, and the zones when resources are available.

REST

To view the future availability of TPUs in a region, send a GET request to the beta advice.calendarMode method. In the request body, include the acceleratorCount and vmFamily fields:

POST https://www.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/advice/calendarMode

{
  "futureResourcesSpecs": {
    "spec": {
      "targetResources": {
        "aggregateResources": {
          "acceleratorCount": "NUMBER_OF_CHIPS",
          "vmFamily": "TPU_VERSION"
        }
      },
      "timeRangeSpec": {
        "startTimeNotEarlierThan": "FROM_START_TIME",
        "startTimeNotLaterThan": "TO_START_TIME",
        "minDuration": "MINIMUM_DURATION",
        "maxDuration": "MAXIMUM_DURATION"
      }
    }
  }
}

Replace the following:

  • PROJECT_ID: the ID of the project where you want to reserve resources.

  • REGION: the region where to reserve TPUs. To review the supported regions and zones, see Limitations in this document.

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E

    • For TPU v5p: VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P

    • For TPU v5e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP

    If you specify a TPU v5p or v5e, then, in the aggregateResources field, you must include the workloadType field. Set this field to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • FROM_START_TIME and TO_START_TIME: the earliest and latest date that you want to reserve capacity on. Format these dates as RFC 3339 timestamps:

    YYYY-MM-DDTHH:MM:SSOFFSET
    

    Replace the following:

    • YYYY-MM-DD: a date formatted as a four-digit year, two-digit month, and a two-digit day, separated by hyphens (-).

    • HH:MM:SS: a time formatted as a two-digit hour using a 24-hour time, two-digit minutes, and two-digit seconds, separated by colons (:).

    • OFFSET: the time zone formatted as an offset of Coordinated Universal Time (UTC). For example, to use the Pacific Standard Time (PST), specify -08:00. To use no offset, specify Z.

  • MAXIMUM_DURATION and MINIMUM_DURATION: the maximum and minimum durations, in seconds, that you want to reserve resources for, followed by s. For example, to specify 3,600 seconds, use 3600s. You can reserve resources for a minimum of 24 hours (86,400 seconds) and a maximum of 90 days (7,776,000 seconds).

The output is similar to the following:

{
  "recommendations": [
    {
      "recommendationsPerSpec": {
        "spec": {
          "recommendationId": "a21a2fa0-72c7-4105-8179-88de5409890b",
          "recommendationType": "FUTURE_RESERVATION",
          "startTime": "2025-06-09T00:00:00Z",
          "endTime": "2025-09-07T00:00:00Z",
          "otherLocations": {
            "zones/us-east5-b": {
              "status": "NOT_SUPPORTED",
              "details": "this machine family is not supported in this zone"
            },
            "zones/us-east5-c": {
              "status": "NOT_SUPPORTED",
              "details": "this machine family is not supported in this zone"
            }
          },
          "location": "zones/us-east5-a"
        }
      }
    }
  ]
}

If your requested resources are available, then the output contains the startTime, endTime, and location fields. These fields specify the earliest start time, the latest end time, and the zones when resources are available.

Create a reservation request for TPUs

After Google Cloud approves a future reservation request in calendar mode, you can't cancel or delete your request. You must commit to pay for the requested capacity at the requested start time, regardless if you use it or not.

To create a request by using an existing TPU VM as reference, use the Google Cloud console. To reserve four or eight TPU v5p, use the gcloud CLI or REST API. Otherwise, select one of the following options:

Console

  1. In the Google Cloud console, go to the Reservations page.

    Go to Reservations

  2. Click the Future reservations tab.

  3. Click Create future reservation. The Create a future reservation page opens.

  4. In the Hardware configuration section, specify the properties of the TPUs that you want to reserve by doing one of the following:

    • To specify TPU properties directly, complete the following steps:

      1. Select Specify machine type.

      2. Click the TPUs tab.

      3. In the table, in the Series column, select the TPU version to reserve.

    • To specify TPU properties using an existing TPU VM as reference, select Use existing VM, and then select the VM.

  5. If you specified a TPU v5p (CT5P) or v5e (CT5LP) in the previous step, then, in the TPU v5 workload type list, select one of the following options:

    • To run workloads on the TPUs that handle large amounts of data in single or multiple operations, such as ML training workloads, select Batch.

    • To run workloads on the TPUs that handle concurrent requests and require minimal network latency, such as ML inference workloads, select Serving.

  6. In the Search for capacity section, complete the following steps:

    1. In the Region and Zone lists, specify the region and zone where to reserve resources.

    2. In the Number of chips field, select the number of TPUs to reserve.

    3. In the Start time list, select the start time for your request. The start time must be at least 24 hours after you create and submit the request.

    4. Optional: In the Choose your start date flexibility list, select how exact your start date needs to be

    5. In the Reservation duration field, specify for how long you want to reserve resources. The value must be between one day and 90 days.

    6. Click Search for capacity. Then, in the Available capacity table, select one of the available options containing the type, number, and reservation period of the TPUs to reserve.

  7. Click Next.

  8. In the Share type section, select the projects to share your requested capacity with:

    • To use the reserved capacity only within your project, select Local.

    • To share the reserved capacity with other projects, select Shared, click Add projects, and then follow the prompts to select the projects.

  9. Click Next.

  10. In the Future reservation name field, enter a name for the request.

  11. In the Reservation name field, enter the name of the reservation that Compute Engine automatically creates to provision your requested capacity.

  12. Click Create.

gcloud

To create a future reservation request in calendar mode and submit it for review, use the following gcloud beta compute future-reservations create command. To reserve TPUs, include the --chip-count and --tpu-version flags:

gcloud beta compute future-reservations create FUTURE_RESERVATION_NAME \
    --auto-delete-auto-created-reservations \
    --chip-count=NUMBER_OF_CHIPS \
    --tpu-version=TPU_VERSION \
    --deployment-type=DENSE \
    --planning-status=SUBMITTED \
    --require-specific-reservation \
    --reservation-mode=CALENDAR \
    --reservation-name=RESERVATION_NAME \
    --share-setting=SHARE_TYPE \
    --start-time=START_TIME \
    --duration=DURATION \
    --zone=ZONE

Replace the following:

  • FUTURE_RESERVATION_NAME: the name of the request.

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: V6E

    • For TPU v5p: V5P

    • For TPU v5e: V5E

    If you specify a TPU v5p or v5e, then you must include the --workload-type flag. Set the flag to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as machine learning (ML) training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • RESERVATION_NAME: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.

  • SHARE_TYPE: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:

    • To use capacity only within your project: local

    • To share capacity with other projects: projects

    If you specify projects, then you must include the --share-with flag set to a comma-separated list of project IDs—for example, project-1,project-2. You can specify up to 100 projects. Don't include your project ID in this list. You can consume the reserved capacity by default.

  • START_TIME: the start time of the request, formatted as an RFC 3339 timestamp. Specify a start time that is at least 24 hours after you submit the request. Otherwise, creating the request fails.

  • DURATION: the duration, in seconds, you want to reserve the requested resources for, followed by s. For example, to specify 3,600 seconds, use 3600s. You can reserve resources for a minimum of 24 hours (86,400 seconds) and a maximum of 90 days (7,776,000 seconds).

  • ZONE: the zone where you want to reserve resources.

REST

To create a future reservation request in calendar mode and submit it for review, send the following POST request to the beta futureReservations.insert method. To reserve TPUs, include the acceleratorCount and vmFamily fields in the request body:

POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/zones/ZONE/futureReservations

{
  "name": "FUTURE_RESERVATION_NAME",
  "autoDeleteAutoCreatedReservations": true,
  "deploymentType": "DENSE",
  "planningStatus": "SUBMITTED",
  "reservationMode": "CALENDAR",
  "reservationName": "RESERVATION_NAME",
  "shareSettings": {
    "shareType": "SHARE_TYPE"
  },
  "specificReservationRequired": true,
  "aggregateReservation": {
    "reservedResources": [
      {
        "accelerator": {
          "acceleratorCount": NUMBER_OF_CHIPS
        }
      }
    ],
    "vmFamily": "TPU_VERSION"
  },
  "timeWindow": {
    "startTime": "START_TIME",
    "duration": {
      "seconds": DURATION
    }
  }
}

Replace the following:

  • PROJECT_ID: the ID of the project where you want to create the request.

  • ZONE: the zone where you want to reserve resources.

  • FUTURE_RESERVATION_NAME: the name of the request.

  • RESERVATION_NAME: the name of the reservation that Compute Engine automatically creates to provision your requested capacity.

  • SHARE_TYPE: whether other projects in your organization can consume the reserved capacity. Specify one of the following values:

    • To use capacity only within your project: LOCAL

    • To share capacity with other projects: SPECIFIC_PROJECTS

    If you specify SPECIFIC_PROJECTS, then, in the shareSettings field, you must include the projectMap field to specify the projects to share the capacity with. You can specify up to 100 projects. Don't specify your project ID. You can consume the reserved capacity by default.

    For example, to share the requested capacity with two other projects, include the following:

    "shareSettings": {
      "shareType": "SPECIFIC_PROJECTS",
      "projectMap": {
        "CONSUMER_PROJECT_ID_1": {
          "projectId": "CONSUMER_PROJECT_ID_1"
        },
        "CONSUMER_PROJECT_ID_2": {
          "projectId": "CONSUMER_PROJECT_ID_2"
        }
      }
    }
    

    Replace CONSUMER_PROJECT_ID_1 and CONSUMER_PROJECT_ID_2 with the IDs of two projects that you want to allow to consume the requested capacity.

  • NUMBER_OF_CHIPS: the number of TPU chips to reserve.

  • TPU_VERSION: the TPU version to reserve. Specify one of the following values:

    • For TPU v6e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT6E

    • For TPU v5p: VM_FAMILY_CLOUD_TPU_POD_SLICE_CT5P

    • For TPU v5e: VM_FAMILY_CLOUD_TPU_LITE_POD_SLICE_CT5LP

    If you specify a TPU v5p or v5e, then, in the aggregateResources field, you must include the workloadType field. Set the field to the type of workloads that you want to run on the TPUs:

    • For workloads that handle large amounts of data in single or multiple operations, such as ML training workloads, specify BATCH.

    • For workloads that handle concurrent requests and require minimal network latency, such as ML inference workloads, specify SERVING.

  • START_TIME: the start time of the request, formatted as an RFC 3339 timestamp. Specify a start time that is at least 24 hours after you submit the request. Otherwise, the creation of the request fails.

  • DURATION: the duration, in seconds, you want to reserve the requested resources for. You can reserve resources for a minimum of 24 hours (86,400 seconds) and a maximum of 90 days (7,776,000 seconds).

What's next