You can use the Accelerated Processing Kit (XPK)
to create pre-configured Google Kubernetes Engine (GKE) clusters for
Pathways-based workloads. You can also use gcloud to manually create
GKE clusters for Pathways-based workloads
Before you begin
Make sure you have:
- Installed Kubernetes tools
- Installed XPK
- Enabled the TPU API
- Enabled the Google Kubernetes Engine API
- Ensure your Google Cloud project is allowlisted for Pathways
Set up your local environment
Log in with your Google Cloud credentials.
gcloud auth application-default login
Define the following environment variables with values appropriate to your workload.
Required variables
Create a GKE cluster
In the following example, you create a cluster with two v5e 2x4 node pools.
You can create a cluster using XPK or the gcloud command.
XPK
- Set some environment variables - CLUSTER_NODEPOOL_COUNT=CLUSTER_NODEPOOL_COUNT PROJECT=PROJECT_ID ZONE=ZONE CLUSTER=GKE_CLUSTER_NAME TPU_TYPE="v5litepod-8" PW_CPU_MACHINE_TYPE="n2-standard-64" NETWORK=NETWORK SUBNETWORK=SUB_NETWORK - Replace the following: - CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use
- PROJECT_ID: your Google Cloud project name
- ZONE: the zone where you are creating resources
- CLUSTER: the GKE cluster name
- TPU_TYPE: the TPU type. For more information, see supported types in XPK
- PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller
- NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster
- SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
 
- Use XPK to create a GKE Pathways cluster. This command can take several minutes to provision the capacity. Once completed, your capacity is allocated and you will start incurring charges. - xpk cluster create-pathways \ --num-slices=${CLUSTER_NODEPOOL_COUNT} \ --tpu-type=${TPU_TYPE} \ --pathways-gce-machine-type=${PW_CPU_MACHINE_TYPE} \ --on-demand \ --project=${PROJECT} \ --zone=${ZONE} \ --cluster=${CLUSTER} \ --custom-cluster-arguments="--network=${NETWORK} --subnetwork=${SUBNETWORK} --enable-ip-alias" 
Once the cluster is created, you can create and delete workloads as needed. You don't need to re-provision the TPU capacity.
gcloud
- Set some environment variables - CLUSTER=GKE_CLUSTER_NAME PROJECT=PROJECT_ID ZONE=ZONE REGION=REGION CLUSTER_VERSION=GKE_CLUSTER_VERSION PW_CPU_MACHINE_TYPE="n2-standard-64" NETWORK=NETWORK SUBNETWORK=SUB_NETWORK CLUSTER_NODEPOOL_COUNT=3 TPU_MACHINE_TYPE="ct5lp-hightpu-4t" WORKERS_PER_SLICE=2 TOPOLOGY="2x4" NUM_CPU_NODES=1 - Replace the following: - CLUSTER: the GKE cluster name
- PROJECT_ID: your Google Cloud project name
- ZONE: the zone where you are creating resources
- REGION: the region where you are creating resources
- CLUSTER_VERSION: [Optional] the GKE cluster version, use 1.32.2-gke.1475000 or later
- PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller
- NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster
- SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
- CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use
- TPU_MACHINE_TYPE: the TPU machine type you want to use
- WORKERS_PER_SLICE: the number of nodes per node pool
- GKE_ACCELERATOR_TYPE: the Google Kubernetes Engine accelerator type, see Choose a TPU version
- TOPOLOGY: the TPU topology
- NUM_CPU_NODES: the Pathways CPU node pool size
 
The following steps explain how to create a GKE cluster and set it up for running Pathways workloads.
- Create a GKE cluster: - gcloud beta container clusters create ${CLUSTER} \ --project=${PROJECT} \ --zone=${ZONE} \ --cluster-version=${CLUSTER_VERSION} \ --scopes=storage-full,gke-default,cloud-platform \ --machine-type ${PW_CPU_MACHINE_TYPE} \ --network=${NETWORK} \ --subnetwork=${SUBNETWORK}
- Create TPU node pools: - for i in $(seq 1 ${CLUSTER_NODEPOOL_COUNT}); do gcloud container node-pools create "tpu-np-${i}" \ --project=${PROJECT} \ --zone=${ZONE} \ --cluster=${CLUSTER} \ --machine-type=${TPU_MACHINE_TYPE} \ --num-nodes=${WORKERS_PER_SLICE} \ --placement-type=COMPACT \ --tpu-topology=${TOPOLOGY} \ --scopes=storage-full,gke-default,cloud-platform \ --workload-metadata=GCE_METADATA done
- Create a CPU node pool: - gcloud container node-pools create "cpu-pathways-np" \ --project ${PROJECT} \ --zone ${ZONE} \ --cluster ${CLUSTER} \ --machine-type ${PW_CPU_MACHINE_TYPE} \ --num-nodes ${NUM_CPU_NODES} \ --scopes=storage-full,gke-default,cloud-platform \ --workload-metadata=GCE_METADATA
- Install the - JobSetand- PathwaysJobAPIs- Get credentials for the cluster and add them to your local kubectl context. - gcloud container clusters get-credentials ${CLUSTER} \ [--zone=${ZONE} | --region=${REGION}] \ --project=${PROJECT} \ && kubectl config set-context --current --namespace=default- To use the Pathways architecture on your GKE cluster, you need to install the - JobSetAPI and the- PathwaysJobAPI.- kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.8.0/manifests.yaml kubectl apply --server-side -f https://github.com/google/pathways-job/releases/download/v0.1.2/install.yaml
What's next
- Run a batch workload with Pathways
- Pathways interactive mode
- Multihost inference with Pathways
- Resilient training with Pathways
- Porting JAX workloads to Pathways
- Troubleshoot Pathways