[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-08-19。"],[],[],null,["# Request Cloud TPUs using Flex-start\n===================================\n\n|\n| **Preview**\n|\n|\n| This product or feature is subject to the \"Pre-GA Offerings Terms\" in the General Service Terms section\n| of the [Service Specific Terms](/terms/service-terms#1).\n|\n| Pre-GA products and features are available \"as is\" and might have limited support.\n|\n| For more information, see the\n| [launch stage descriptions](/products#product-launch-stages).\n\nFlex-start for Cloud TPU, powered by\n[Dynamic Workload Scheduler](https://cloud.google.com/blog/products/compute/introducing-dynamic-workload-scheduler),\nprovides a flexible and cost-effective way to access TPU resources for AI\nworkloads. Flex-start lets you dynamically provision TPUs as needed,\nfor up to 7 days, without long-term reservations or complex quota management.\nWith Flex-start, you submit a TPU provisioning request that persists\nuntil capacity becomes available. Once available, Flex-start provisions\nthe TPU VMs to run for the duration that you specified in your request.\n\nFlex-start is a good fit for quick experimentation, small-scale\ntesting, dynamic provisioning of TPUs for inference workloads, model\nfine-tuning, and workload runs that take less than 7 days. For more information\nabout other TPU consumption options, see [Cloud TPU consumption\noptions](/tpu/docs/consumption-options).\n\nYou can delete your TPU resources at any time to stop billing. For more\ninformation about TPU pricing, see [Cloud TPU\npricing](/tpu/pricing).\n\nLimitations\n-----------\n\nFlex-start Cloud TPUs have the following limitations:\n\n- You can request Flex-start resources for a duration of up to 7 days.\n- Flex-start supports the following Cloud TPU versions and zones:\n - **[TPU v6e](/tpu/docs/v6e)**: asia-northeast1-b, us-east5-a\n - **[TPU v5p](/tpu/docs/v5p)**: us-east5-a\n - **[TPU v5e](/tpu/docs/v5e)**: us-west4-a\n- You must use the [queued resources API](/tpu/docs/queued-resources) to use Flex-start with Cloud TPU.\n\nBefore you begin\n----------------\n\nBefore requesting Flex-start TPUs, you must:\n\n- Install the Google Cloud CLI\n- Create a Google Cloud project\n- Enable the Cloud TPU API\n\nFor more information, see [Set up the Cloud TPU\nenvironment](/tpu/docs/setup-gcp-account).\n\nYou should also make sure you have sufficient preemptible quota to use\nFlex-start. If you need more TPU cores than the amount granted by the\ndefault quota, you need to request a higher quota allocation. For more\ninformation about defaults and requesting more quota, see [Cloud TPU\nquotas](/tpu/docs/quota).\n\nRequest Flex-start TPUs\n-----------------------\n\nFlex-start uses the TPU queued resources API to request TPU resources in a\nqueued manner. When the requested resource becomes available, it's assigned to\nyour Google Cloud project for your immediate, exclusive use. After the requested\nrun duration, the TPU VMs are deleted and the queued resource moves to the\n`SUSPENDED` state. For more information about queued resources, see [Manage\nqueued resources](/tpu/docs/queued-resources).\n\nTo Flex-start request TPUs, use the [`gcloud alpha compute tpus queued-resources\ncreate`](/sdk/gcloud/reference/alpha/compute/tpus/queued-resources/create)\ncommand with the `--provisioning-model` flag set to `flex-start` and the\n`--max-run-duration` flag set to the duration you want your TPUs to run. \n\n```bash\ngcloud alpha compute tpus queued-resources create QUEUED_RESOURCE_ID \\\n --zone=ZONE \\\n --accelerator-type=ACCELERATOR_TYPE \\\n --runtime-version=RUNTIME_VERSION \\\n --node-id=NODE_ID \\\n --provisioning-model=flex-start \\\n --max-run-duration=RUN_DURATION\n```\n\nReplace the following placeholders:\n\n- \u003cvar translate=\"no\"\u003eQUEUED_RESOURCE_ID\u003c/var\u003e: A user-assigned ID for the queued resource request.\n- \u003cvar translate=\"no\"\u003eZONE\u003c/var\u003e: The [zone](/tpu/docs/regions-zones) in which to create the TPU VM.\n- \u003cvar translate=\"no\"\u003eACCELERATOR_TYPE\u003c/var\u003e: Specifies the version and size of the Cloud TPU to create. For more information about supported accelerator types for each TPU version, see [TPU\n versions](/tpu/docs/system-architecture-tpu-vm#versions).\n- \u003cvar translate=\"no\"\u003eRUNTIME_VERSION\u003c/var\u003e: The Cloud TPU [software\n version](/tpu/docs/runtimes).\n- \u003cvar translate=\"no\"\u003eNODE_ID\u003c/var\u003e: A user-assigned ID for the TPU that is created when the queued resource request is allocated.\n- \u003cvar translate=\"no\"\u003eRUN_DURATION\u003c/var\u003e: How long the TPUs should run. Format the duration as the number of days, hours, minutes, and seconds followed by `d`, `h`, `m`, and `s`, respectively. For example, specify `72h` for a duration of 72 hours, or specify `1d2h3m4s` for a duration of 1 day, 2 hours, 3 minutes, and 4 seconds. The maximum is 7 days.\n\nYou can further customize your queued resource request to run at specific times\nwith additional flags:\n\n- `--valid-after-duration`: The duration before which the TPU must not be provisioned.\n- `--valid-after-time`: The time before which the TPU must not be provisioned.\n- `--valid-until-duration`: The duration for which the request is valid. If the request hasn't been fulfilled by this duration, the request expires and moves to the `FAILED` state.\n- `--valid-until-time`: The time for which the request is valid. If the request hasn't been fulfilled by this time, the request expires and moves to the `FAILED` state.\n\nFor more information about optional flags, see the\n[`gcloud alpha compute tpus queued-resources\ncreate`](/sdk/gcloud/reference/alpha/compute/tpus/queued-resources/create) documentation.\n\nGet the status of a Flex-start request\n--------------------------------------\n\nTo monitor the status of your Flex-start request, use the queued resources API\nto get the status of the queued resource request using the\n[`gcloud alpha compute tpus queued-resources describe`](/sdk/gcloud/reference/alpha/compute/tpus/queued-resources/describe)\ncommand: \n\n```bash\ngcloud alpha compute tpus queued-resources describe QUEUED_RESOURCE_ID \\\n --zone ZONE\n```\n\nA queued resource can be in one of the following states:\n\n- `WAITING_FOR_RESOURCES`: The request has passed initial validation and has been added to the queue.\n- `PROVISIONING`: The request has been selected from the queue, and the TPU VMs are being created.\n- `ACTIVE`: The request has been fulfilled, and the TPU VMs are ready.\n- `FAILED`: The request couldn't be completed. Use the `describe` command for more details.\n- `SUSPENDING`: The resources associated with the request are being deleted.\n- `SUSPENDED`: The resources associated with the request have been deleted.\n\nFor more information, see [Retrieve state and diagnostic information about a\nqueued resource\nrequest](/tpu/docs/queued-resources#retrieve_state_and_diagnostic_information_about_a_queued_resource_request).\n\nMonitor the run time of Flex-start TPUs\n---------------------------------------\n\nYou can monitor the run time of Flex-start TPUs by checking the TPU's\ntermination timestamp:\n\n1. [Get the details of your queued resource request](#get-status).\n2. Choose one of the following options depending on whether your TPUs have been\n created:\n\n - **If the queued resource is waiting for resources** : In the output, see the\n `maxRunDuration` field. This field specifies how long the TPUs will run once\n they're created.\n\n - **If the TPUs associated with the queued resource have been created** : In\n the output, see the `terminationTimestamp` field listed for each node in the\n queued resource. This field specifies when the TPU will be terminated.\n\nDelete a queued resource\n------------------------\n\n| **Important:** Queued resources consume quota regardless of their state. Delete queued resources after use to avoid blocking future requests on quota limits.\n\nYou can delete a queued resource request and the TPUs associated with the\nrequest by deleting the queued resource request and passing the `--force` flag\nto the [`queued-resources\ndelete`](/sdk/gcloud/reference/alpha/compute/tpus/queued-resources/delete)\ncommand: \n\n```bash\ngcloud alpha compute tpus queued-resources delete QUEUED_RESOURCE_ID \\\n --force\n```\n\nIf you delete the TPU directly using the `gcloud compute tpus tpu-vm delete` command,\nyou also need to delete the queued resource, as shown in the following example.\nWhen you delete the TPU, the queued resource request transitions to the\n`SUSPENDED` state, after which you can delete the queued resource request.\n\nTo delete a TPU, use the [`gcloud compute tpus tpu-vm\ndelete`](/sdk/gcloud/reference/compute/tpus/tpu-vm/delete) command: \n\n```bash\ngcloud compute tpus tpu-vm delete NODE_ID \\\n --zone ZONE\n```\n\nThen, to delete the queued resource, use the\n[`gcloud alpha compute tpus queued-resources delete`](/sdk/gcloud/reference/alpha/compute/tpus/queued-resources/delete)\ncommand: \n\n```bash\ngcloud alpha compute tpus queued-resources delete QUEUED_RESOURCE_ID \\\n --zone ZONE\n```\n\nFor more information, see [Delete a queued resource\nrequest](/tpu/docs/queued-resources#delete_a_queued_resource_request)."]]