This page explains how to run an Apache Beam pipeline on Dataflow with TPUs. Jobs that use TPUs incur charges as specified in the Dataflow pricing page.
For more information about using TPUs with Dataflow, see Dataflow support for TPUs.
Optional: Make a specific reservation to use accelerators
While you can use TPUs on-demand, we strongly recommend you use Dataflow TPUs with specifically targeted Google Cloud reservations. This helps to ensure you have access to available accelerators and quick worker startup times. Pipelines that consume a TPU reservation don't require additional TPU quota.
If you don't make a reservation and choose to use TPUs on-demand, provision TPU quota before you run your pipeline.
Optional: Provision TPU quota
You can use TPUs in an on-demand capacity or using a reservation. If you want to use TPUs on-demand, you must provision TPU quota before you do. If you use a specifically targeted reservation, you can skip this section.
To use TPUs on-demand without a reservation, check the limit and current usage of your Compute Engine API quota for TPUs as follows:
Console
Go to the Quotas page in the Google Cloud console:
In the
Filter box, do the following:Use the following table to select and copy the property of the quota based on the TPU version and machine type. For example, if you plan to create on-demand TPU v5e nodes whose machine type begins with
ct5lp-
, enterName: TPU v5 Lite PodSlice chips
.TPU version, machine type begins with Property and name of the quota for on-demand instances TPU v5e,
ct5lp-
Name:
TPU v5 Lite PodSlice chipsTPU v5p,
ct5p-
Name:
TPU v5p chipsTPU v6e,
ct6e-
Dimensions (e.g. location):
tpu_family:CT6ESelect the Dimensions (e.g. location) property and enter
region:
followed by the name of the region in which you plan to start your pipeline. For example, enterregion:us-west4
if you plan to use the zoneus-west4-a
. TPU quota is regional, so all zones within the same region consume the same TPU quota.
Configure a custom container image
To interact with TPUs in Dataflow pipelines, you need to provide software that can operate on XLA devices in your pipeline runtime environment. This requires installing TPU libraries based on your pipeline needs and configuring environment variables based on the TPU device you use.
To customize the container image, install Apache Beam into an off-the-shelf base image that has the necessary TPU libraries. Alternatively, install the TPU software into the images published with Apache Beam SDK releases.
To provide a custom container image, use the sdk_container_image
pipeline
option. For more information, see Use custom containers in
Dataflow.
When you use a TPU accelerator, you need to set the following environment variables in the container image.
ENV TPU_SKIP_MDS_QUERY=1 # Don't query metadata
ENV TPU_HOST_BOUNDS=1,1,1 # There's only one host
ENV TPU_WORKER_HOSTNAMES=localhost
ENV TPU_WORKER_ID=0 # Always 0 for single-host TPUs
Depending on the accelerator you use, the variables in the following table also need to be set.
type | topology | Required Dataflow worker_machine_type |
additional environment variables |
---|---|---|---|
tpu-v5-lite-podslice | 1x1 | ct5lp-hightpu-1t | TPU_ACCELERATOR_TYPE=v5litepod-1 |
tpu-v5-lite-podslice | 2x2 | ct5lp-hightpu-4t | TPU_ACCELERATOR_TYPE=v5litepod-4 |
tpu-v5-lite-podslice | 2x4 | ct5lp-hightpu-8t | TPU_ACCELERATOR_TYPE=v5litepod-8 |
tpu-v6e-slice | 1x1 | ct6e-standard-1t | TPU_ACCELERATOR_TYPE=v6e-1 |
tpu-v6e-slice | 2x2 | ct6e-standard-4t | TPU_ACCELERATOR_TYPE=v6e-4 |
tpu-v6e-slice | 2x4 | ct6e-standard-8t | TPU_ACCELERATOR_TYPE=v6e-8 |
tpu-v5p-slice | 2x2x1 | ct5p-hightpu-4t | TPU_ACCELERATOR_TYPE=v5p-8 |
A sample Dockerfile for the custom container image might look like the following example:
FROM python:3.11-slim
COPY --from=apache/beam_python3.11_sdk:2.66.0 /opt/apache/beam /opt/apache/beam
# Configure the environment to access TPU device
ENV TPU_SKIP_MDS_QUERY=1
ENV TPU_HOST_BOUNDS=1,1,1
ENV TPU_WORKER_HOSTNAMES=localhost
ENV TPU_WORKER_ID=0
# Configure the environment for the chosen accelerator.
# Adjust according to the accelerator you use.
ENV TPU_ACCELERATOR_TYPE=v5litepod-1
ENV TPU_CHIPS_PER_HOST_BOUNDS=1,1,1
# Install TPU software stack.
RUN pip install jax[tpu] apache-beam[gcp]==2.66.0 -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
ENTRYPOINT ["/opt/apache/beam/boot"]
Run your job with TPUs
The considerations for running a Dataflow job with TPUs include the following:
- Because TPU containers can be large, to avoid running out of disk space,
increase the default boot disk size to 50 gigabytes or an appropriate size
as required by your container image by using the
--disk_size_gb
pipeline option. - Limit intra-worker parallelism.
TPUs and worker parallelism
In the default configuration, Dataflow Python pipelines launch one Apache Beam SDK process per VM core. TPU machine types have a large number of vCPU cores, but only one process may perform computations on a TPU device. Additionally, a TPU device might be reserved by a process for the lifetime of the process. Therefore, you must limit intra-worker parallelism when running a Dataflow TPU pipeline. To limit worker parallelism, use the following guidance:
- If your use case involves running inferences on a model, use the Beam
RunInference
API. For more information, see Large Language Model Inference in Beam. - If you cannot use the Beam
RunInference
API, use Beam's multi-process shared objects to restrict certain operations to a single process. - If you cannot use the preceding recommendations and prefer to launch only
one Python process per
worker, set the
--experiments=no_use_multiple_sdk_containers
pipeline option. - For workers with more than 100 vCPUs, reduce the number of threads by using
the
--number_of_worker_harness_threads
pipeline option. Use the following table to see if your TPU type uses more than 100 vCPUs.
The following table lists the total compute resources per worker for each TPU configuration.
TPU type | topology | machine type | TPU chips | vCPU | RAM (GB) |
---|---|---|---|---|---|
tpu-v5-lite-podslice | 1x1 | ct5lp-hightpu-1t | 1 | 24 | 48 |
tpu-v5-lite-podslice | 2x2 | ct5lp-hightpu-4t | 4 | 112 | 192 |
tpu-v5-lite-podslice | 2x4 | ct5lp-hightpu-8t | 8 | 224 | 384 |
tpu-v6e-slice | 1x1 | ct6e-standard-1t | 1 | 44 | 176 |
tpu-v6e-slice | 2x2 | ct6e-standard-4t | 4 | 180 | 720 |
tpu-v6e-slice | 2x4 | ct6e-standard-8t | 8 | 360 | 1440 |
tpu-v5p-slice | 2x2x1 | ct5p-hightpu-4t | 4 | 208 | 448 |
Run a pipeline with TPUs
To run a Dataflow job with TPUs, use the following command.
python PIPELINE \
--runner "DataflowRunner" \
--project "PROJECT" \
--temp_location "gs://BUCKET/tmp" \
--region "REGION" \
--dataflow_service_options "worker_accelerator=type:TPU_TYPE;topology:TPU_TOPOLOGY" \
--worker_machine_type "MACHINE_TYPE" \
--disk_size_gb "DISK_SIZE_GB" \
--sdk_container_image "IMAGE" \
--number_of_worker_harness_threads NUMBER_OF_THREADS
Replace the following:
- PIPELINE: Your pipeline source code file.
- PROJECT: The Google Cloud project name.
- BUCKET: The Cloud Storage bucket.
- REGION: A Dataflow region, for example,
us-central1
. - TPU_TYPE: A supported TPU type, for example,
tpu-v5-lite-podslice
. For a full list of types and topologies, see Supported TPU accelerators. - TPU_TOPOLOGY: The TPU topology, for example,
1x1
. - MACHINE_TYPE: The corresponding machine type, for example,
ct5lp-hightpu-1t
. - DISK_SIZE_GB: The size of the boot disk for each worker VM, for example,
100
. - IMAGE: The Artifact Registry path for your Docker image.
- NUMBER_OF_THREADS: Optional. The number of worker harness threads.
Verify your Dataflow job
To confirm that the job uses worker VMs with TPUs, follow these steps:
In the Google Cloud console, go to the Dataflow > Jobs page.
Select a job.
Click the Job metrics tab.
In the Autoscaling section, confirm that there's at least one Current workers VM.
In the side Job info pane, check to see that the
machine_type
starts withct
. For example,ct6e-standard-1t
. This indicates TPU usage.
Troubleshoot your Dataflow job
If you run into problems running your Dataflow job with TPUs, see Troubleshoot your Dataflow TPU job.
What's next
- Learn more about TPU support on Dataflow.
- Learn about Large model inference in Beam.