Request Cloud TPUs using Flex-start

Flex-start for Cloud TPU, powered by Dynamic Workload Scheduler, provides a flexible and cost-effective way to access TPU resources for AI workloads. Flex-start lets you dynamically provision TPUs as needed, for up to 7 days, without long-term reservations or complex quota management. With Flex-start, you submit a TPU provisioning request that persists until capacity becomes available. Once available, Flex-start provisions the TPU VMs to run for the duration that you specified in your request.

Flex-start is a good fit for quick experimentation, small-scale testing, dynamic provisioning of TPUs for inference workloads, model fine-tuning, and workload runs that take less than 7 days. For more information about other TPU consumption options, see Cloud TPU consumption options.

You can delete your TPU resources at any time to stop billing. For more information about TPU pricing, see Cloud TPU pricing.

Limitations

Flex-start Cloud TPUs have the following limitations:

  • You can request Flex-start resources for a duration of up to 7 days.
  • You can only request Flex-start TPU v6e and v5e in the specified zones:
    • TPU v6e for training and serving: asia-northeast1-b, us-east5-a
    • TPU v5e for training: us-west4-a
    • TPU v5e for serving: us-central1-a
  • You must use the queued resources API to use Flex-start with Cloud TPU.

Before you begin

Before requesting Flex-start TPUs, you must:

  • Install the Google Cloud CLI
  • Create a Google Cloud project
  • Enable the Cloud TPU API

For more information, see Set up the Cloud TPU environment.

You should also ensure you have sufficient preemptible quota to use Flex-start. The default preemptible quota for TPU v5e and v6e is 64 cores. If you need more TPU cores than the amount granted by the default quota, you need to request a higher quota allocation. For more information, see Cloud TPU quotas.

Request Flex-start TPUs

Flex-start uses the TPU queued resources API to request TPU resources in a queued manner. When the requested resource becomes available, it's assigned to your Google Cloud project for your immediate, exclusive use. After the requested run duration, the TPU VMs are deleted and the queued resource moves to the SUSPENDED state. For more information about queued resources, see Manage queued resources.

To Flex-start request TPUs, use the gcloud alpha compute tpus queued-resources create command with the --provisioning-model flag set to flex-start and the --max-run-duration flag set to the duration you want your TPUs to run.

gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID
    --zone=ZONE \
    --accelerator-type=ACCELERATOR_TYPE \
    --runtime-version=RUNTIME_VERSION \
    --node-id=NODE_ID \
    --provisioning-model=flex-start \
    --max-run-duration=RUN_DURATION

Replace the following placeholders:

  • QUEUED_RESOURCE_ID: A user-assigned ID for the queued resource request.
  • ZONE: The zone in which to create the TPU VM.
  • ACCELERATOR_TYPE: Specifies the version and size of the Cloud TPU to create. For more information about supported accelerator types for each TPU version, see TPU versions.
  • RUNTIME_VERSION: The Cloud TPU software version.
  • NODE_ID: A user-assigned ID for the TPU that is created when the queued resource request is allocated.
  • RUN_DURATION: How long the TPUs should run. Format the duration as the number of days, hours, minutes, and seconds followed by d, h, m, and s, respectively. For example, specify 72h for a duration of 72 hours, or specify 1d2h3m4s for a duration of 1 day, 2 hours, 3 minutes, and 4 seconds. The maximum is 7 days.

You can further customize your queued resource request to run at specific times with additional flags:

  • --valid-after-duration: The duration before which the TPU must not be provisioned.
  • --valid-after-time: The time before which the TPU must not be provisioned.
  • --valid-until-duration: The duration for which the request is valid. If the request hasn't been fulfilled by this duration, the request expires and moves to the FAILED state.
  • --valid-until-time: The time for which the request is valid. If the request hasn't been fulfilled by this time, the request expires and moves to the FAILED state.

For more information about optional flags, see the gcloud alpha compute tpus queued-resources create documentation.

Get the status of a Flex-start request

To monitor the status of your Flex-start request, use the queued resources API to get the status of the queued resource request using the gcloud alpha compute tpus queued-resources describe command:

gcloud alpha compute tpus queued-resources describe QUEUED_RESOURCE_ID \
    --zone ZONE

A queued resource can be in one of the following states:

  • WAITING_FOR_RESOURCES: The request has passed initial validation and has been added to the queue.
  • PROVISIONING: The request has been selected from the queue, and the TPU VMs are being created.
  • ACTIVE: The request has been fulfilled, and the TPU VMs are ready.
  • FAILED: The request couldn't be completed. Use the describe command for more details.
  • SUSPENDING: The resources associated with the request are being deleted.
  • SUSPENDED: The resources associated with the request have been deleted.

For more information, see Retrieve state and diagnostic information about a queued resource request.

Monitor the run time of Flex-start TPUs

You can monitor the run time of Flex-start TPUs by checking the TPU's termination timestamp:

  1. Get the details of your queued resource request.
  2. Choose one of the following options depending on whether your TPUs have been created:

    • If the queued resource is waiting for resources: In the output, see the maxRunDuration field. This field specifies how long the TPUs will run once they're created.

    • If the TPUs associated with the queued resource have been created: In the output, see the terminationTimestamp field listed for each node in the queued resource. This field specifies when the TPU will be terminated.

Delete a queued resource

You can delete a queued resource request and the TPUs associated with the request by deleting the queued resource request and passing the --force flag to the queued-resources delete command:

gcloud alpha compute tpus queued-resources delete QUEUED_RESOURCE_ID \
    --force

If you delete the TPU directly using the gcloud compute tpus tpu-vm delete command, you also need to delete the queued resource, as shown in the following example. When you delete the TPU, the queued resource request transitions to the SUSPENDED state, after which you can delete the queued resource request.

To delete a TPU, use the gcloud compute tpus tpu-vm delete command:

gcloud compute tpus tpu-vm delete NODE_ID \
    --zone ZONE

Then, to delete the queued resource, use the gcloud alpha compute tpus queued-resources delete command:

gcloud alpha compute tpus queued-resources delete QUEUED_RESOURCE_ID \
    --zone ZONE

For more information, see Delete a queued resource request.