Run TensorFlow code on TPU Pod slices

This document shows you how to perform a simple calculation using TensorFlow on a TPU Pod. You will perform the following steps:

  1. Create a TPU Pod slice with TensorFlow software
  2. Connect to the TPU VM using SSH
  3. Create and run a simple script

The TPU VM relies on a Service Accounts for permissions to call Cloud TPU API. By default, your TPU VM will use the default Compute Engine service account which includes all needed Cloud TPU permissions. If you use your own service account you need to add the TPU Viewer role to your service account. For more information on Google Cloud roles, see Understanding roles. You can specify your own service account using the --service-account flag when creating your TPU VM.

Create a v3-32 TPU Pod slice with TensorFlow runtime

$ gcloud compute tpus tpu-vm create tpu-name \
  --zone=europe-west4-a \
  --accelerator-type=v3-32 \
  --version=tpu-vm-tf-2.16.1-pod-pjrt

Command flag descriptions

zone
The zone where you plan to create your Cloud TPU.
accelerator-type
The accelerator type specifies the version and size of the Cloud TPU you want to create. For more information about supported accelerator types for each TPU version, see TPU versions.
version
The Cloud TPU software version.

Connect to your Cloud TPU VM using SSH

$ gcloud compute tpus tpu-vm ssh tpu-name \
      --zone europe-west4-a

Create and run a simple calculation script

  1. Set the following environment variables.

    (vm)$ export TPU_NAME=tpu-name
    (vm)$ export TPU_LOAD_LIBRARY=0
    
  2. Create a file named tpu-test.pyin the current directory and copy and paste the following script into it.

    import tensorflow as tf
    print("Tensorflow version " + tf.__version__)
    
    cluster_resolver = tf.distribute.cluster_resolver.TPUClusterResolver()
    print('Running on TPU ', cluster_resolver.cluster_spec().as_dict()['worker'])
    
    tf.config.experimental_connect_to_cluster(cluster_resolver)
    tf.tpu.experimental.initialize_tpu_system(cluster_resolver)
    strategy = tf.distribute.experimental.TPUStrategy(cluster_resolver)
    
    @tf.function
    def add_fn(x,y):
    z = x + y
    return z
    
    x = tf.constant(1.)
    y = tf.constant(1.)
    z = strategy.run(add_fn, args=(x,y))
    print(z)
    
  3. Run this script with the following command:

    (vm)$ python3 tpu-test.py

    This script performs a simple computation on a each TensorCore of a TPU. The output will look similar to the following:

    PerReplica:{
    0: tf.Tensor(2.0, shape=(), dtype=float32),
    1: tf.Tensor(2.0, shape=(), dtype=float32),
    2: tf.Tensor(2.0, shape=(), dtype=float32),
    3: tf.Tensor(2.0, shape=(), dtype=float32),
    4: tf.Tensor(2.0, shape=(), dtype=float32),
    5: tf.Tensor(2.0, shape=(), dtype=float32),
    6: tf.Tensor(2.0, shape=(), dtype=float32),
    7: tf.Tensor(2.0, shape=(), dtype=float32),
    8: tf.Tensor(2.0, shape=(), dtype=float32),
    9: tf.Tensor(2.0, shape=(), dtype=float32),
    10: tf.Tensor(2.0, shape=(), dtype=float32),
    11: tf.Tensor(2.0, shape=(), dtype=float32),
    12: tf.Tensor(2.0, shape=(), dtype=float32),
    13: tf.Tensor(2.0, shape=(), dtype=float32),
    14: tf.Tensor(2.0, shape=(), dtype=float32),
    15: tf.Tensor(2.0, shape=(), dtype=float32),
    16: tf.Tensor(2.0, shape=(), dtype=float32),
    17: tf.Tensor(2.0, shape=(), dtype=float32),
    18: tf.Tensor(2.0, shape=(), dtype=float32),
    19: tf.Tensor(2.0, shape=(), dtype=float32),
    20: tf.Tensor(2.0, shape=(), dtype=float32),
    21: tf.Tensor(2.0, shape=(), dtype=float32),
    22: tf.Tensor(2.0, shape=(), dtype=float32),
    23: tf.Tensor(2.0, shape=(), dtype=float32),
    24: tf.Tensor(2.0, shape=(), dtype=float32),
    25: tf.Tensor(2.0, shape=(), dtype=float32),
    26: tf.Tensor(2.0, shape=(), dtype=float32),
    27: tf.Tensor(2.0, shape=(), dtype=float32),
    28: tf.Tensor(2.0, shape=(), dtype=float32),
    29: tf.Tensor(2.0, shape=(), dtype=float32),
    30: tf.Tensor(2.0, shape=(), dtype=float32),
    31: tf.Tensor(2.0, shape=(), dtype=float32)
    }
    

Clean up

When you are done with your TPU VM follow these steps to clean up your resources.

  1. Disconnect from the Compute Engine:

    (vm)$ exit
    
  2. Delete your Cloud TPU.

    $ gcloud compute tpus tpu-vm delete tpu-name \
      --zone europe-west4-a
    
  3. Verify the resources have been deleted by running the following command. Make sure your TPU is no longer listed. The deletion might take several minutes.

    $ gcloud compute tpus tpu-vm list \
      --zone europe-west4-a