Request Cloud TPUs using Flex-start
Flex-start for Cloud TPU, powered by Dynamic Workload Scheduler, provides a flexible and cost-effective way to access TPU resources for AI workloads. Flex-start lets you dynamically provision TPUs as needed, for up to 7 days, without long-term reservations or complex quota management. With Flex-start, you submit a TPU provisioning request that persists until capacity becomes available. Once available, Flex-start provisions the TPU VMs to run for the duration that you specified in your request.
Flex-start is a good fit for quick experimentation, small-scale testing, dynamic provisioning of TPUs for inference workloads, model fine-tuning, and workload runs that take less than 7 days. For more information about other TPU consumption options, see Cloud TPU consumption options.
You can delete your TPU resources at any time to stop billing. For more information about TPU pricing, see Cloud TPU pricing.
Limitations
Flex-start Cloud TPUs have the following limitations:
- You can request Flex-start resources for a duration of up to 7 days.
- You can only request Flex-start TPU v6e and v5e in the specified zones:
- You must use the queued resources API to use Flex-start with Cloud TPU.
Before you begin
Before requesting Flex-start TPUs, you must:
- Install the Google Cloud CLI
- Create a Google Cloud project
- Enable the Cloud TPU API
For more information, see Set up the Cloud TPU environment.
You should also ensure you have sufficient preemptible quota to use Flex-start. The default preemptible quota for TPU v5e and v6e is 64 cores. If you need more TPU cores than the amount granted by the default quota, you need to request a higher quota allocation. For more information, see Cloud TPU quotas.
Request Flex-start TPUs
Flex-start uses the TPU queued resources API to request TPU resources in a
queued manner. When the requested resource becomes available, it's assigned to
your Google Cloud project for your immediate, exclusive use. After the requested
run duration, the TPU VMs are deleted and the queued resource moves to the
SUSPENDED
state. For more information about queued resources, see Manage
queued resources.
To Flex-start request TPUs, use the gcloud alpha compute tpus queued-resources
create
command with the --provisioning-model
flag set to flex-start
and the
--max-run-duration
flag set to the duration you want your TPUs to run.
gcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID --zone=ZONE \ --accelerator-type=ACCELERATOR_TYPE \ --runtime-version=RUNTIME_VERSION \ --node-id=NODE_ID \ --provisioning-model=flex-start \ --max-run-duration=RUN_DURATION
Replace the following placeholders:
- QUEUED_RESOURCE_ID: A user-assigned ID for the queued resource request.
- ZONE: The zone in which to create the TPU VM.
- ACCELERATOR_TYPE: Specifies the version and size of the Cloud TPU to create. For more information about supported accelerator types for each TPU version, see TPU versions.
- RUNTIME_VERSION: The Cloud TPU software version.
- NODE_ID: A user-assigned ID for the TPU that is created when the queued resource request is allocated.
- RUN_DURATION: How long the TPUs should run. Format the duration
as the number of days, hours, minutes, and seconds followed by
d
,h
,m
, ands
, respectively. For example, specify72h
for a duration of 72 hours, or specify1d2h3m4s
for a duration of 1 day, 2 hours, 3 minutes, and 4 seconds. The maximum is 7 days.
You can further customize your queued resource request to run at specific times with additional flags:
--valid-after-duration
: The duration before which the TPU must not be provisioned.--valid-after-time
: The time before which the TPU must not be provisioned.--valid-until-duration
: The duration for which the request is valid. If the request hasn't been fulfilled by this duration, the request expires and moves to theFAILED
state.--valid-until-time
: The time for which the request is valid. If the request hasn't been fulfilled by this time, the request expires and moves to theFAILED
state.
For more information about optional flags, see the
gcloud alpha compute tpus queued-resources
create
documentation.
Get the status of a Flex-start request
To monitor the status of your Flex-start request, use the queued resources API
to get the status of the queued resource request using the
gcloud alpha compute tpus queued-resources describe
command:
gcloud alpha compute tpus queued-resources describe QUEUED_RESOURCE_ID \ --zone ZONE
A queued resource can be in one of the following states:
WAITING_FOR_RESOURCES
: The request has passed initial validation and has been added to the queue.PROVISIONING
: The request has been selected from the queue, and the TPU VMs are being created.ACTIVE
: The request has been fulfilled, and the TPU VMs are ready.FAILED
: The request couldn't be completed. Use thedescribe
command for more details.SUSPENDING
: The resources associated with the request are being deleted.SUSPENDED
: The resources associated with the request have been deleted.
For more information, see Retrieve state and diagnostic information about a queued resource request.
Monitor the run time of Flex-start TPUs
You can monitor the run time of Flex-start TPUs by checking the TPU's termination timestamp:
- Get the details of your queued resource request.
Choose one of the following options depending on whether your TPUs have been created:
If the queued resource is waiting for resources: In the output, see the
maxRunDuration
field. This field specifies how long the TPUs will run once they're created.If the TPUs associated with the queued resource have been created: In the output, see the
terminationTimestamp
field listed for each node in the queued resource. This field specifies when the TPU will be terminated.
Delete a queued resource
You can delete a queued resource request and the TPUs associated with the
request by deleting the queued resource request and passing the --force
flag
to the queued-resources
delete
command:
gcloud alpha compute tpus queued-resources delete QUEUED_RESOURCE_ID \ --force
If you delete the TPU directly using the gcloud compute tpus tpu-vm delete
command,
you also need to delete the queued resource, as shown in the following example.
When you delete the TPU, the queued resource request transitions to the
SUSPENDED
state, after which you can delete the queued resource request.
To delete a TPU, use the gcloud compute tpus tpu-vm
delete
command:
gcloud compute tpus tpu-vm delete NODE_ID \ --zone ZONE
Then, to delete the queued resource, use the
gcloud alpha compute tpus queued-resources delete
command:
gcloud alpha compute tpus queued-resources delete QUEUED_RESOURCE_ID \ --zone ZONE
For more information, see Delete a queued resource request.