Create a GKE Cluster with Pathways

You can use the Accelerated Processing Kit (XPK) to create pre-configured Google Kubernetes Engine (GKE) clusters for Pathways-based workloads. You can also use gcloud to manually create GKE clusters for Pathways-based workloads

Before you begin

Make sure you have:

Set up your local environment

Log in with your Google Cloud credentials.

gcloud auth application-default login

Define the following environment variables with values appropriate to your workload.

Required variables

Create a GKE cluster

In the following example, you create a cluster with two v5e 2x4 node pools. You can create a cluster using XPK or the gcloud command.

XPK

  1. Set some environment variables

    CLUSTER_NODEPOOL_COUNT=CLUSTER_NODEPOOL_COUNT
    PROJECT=PROJECT_ID
    ZONE=ZONE
    CLUSTER=GKE_CLUSTER_NAME
    TPU_TYPE="v5litepod-8"
    PW_CPU_MACHINE_TYPE="n2-standard-64"
    NETWORK=NETWORK
    SUBNETWORK=SUB_NETWORK

    Replace the following:

    • CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use
    • PROJECT_ID: your Google Cloud project name
    • ZONE: the zone where you are creating resources
    • CLUSTER: the GKE cluster name
    • TPU_TYPE: the TPU type. For more information, see supported types in XPK
    • PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller
    • NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster
    • SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
  2. Use XPK to create a GKE Pathways cluster. This command can take several minutes to provision the capacity. Once completed, your capacity is allocated and you will start incurring charges.

    xpk cluster create-pathways \
    --num-slices=${CLUSTER_NODEPOOL_COUNT} \
    --tpu-type=${TPU_TYPE} \
    --pathways-gce-machine-type=${PW_CPU_MACHINE_TYPE} \
    --on-demand \
    --project=${PROJECT} \
    --zone=${ZONE} \
    --cluster=${CLUSTER} \
    --custom-cluster-arguments="--network=${NETWORK} --subnetwork=${SUBNETWORK} --enable-ip-alias"

Once the cluster is created, you can create and delete workloads as needed. You don't need to re-provision the TPU capacity.

gcloud

  1. Set some environment variables

    CLUSTER=GKE_CLUSTER_NAME
    PROJECT=PROJECT_ID
    ZONE=ZONE
    REGION=REGION
    CLUSTER_VERSION=GKE_CLUSTER_VERSION
    PW_CPU_MACHINE_TYPE="n2-standard-64"
    NETWORK=NETWORK
    SUBNETWORK=SUB_NETWORK
    CLUSTER_NODEPOOL_COUNT=2
    TPU_MACHINE_TYPE="ct5lp-hightpu-4t"
    WORKERS_PER_SLICE=2
    TOPOLOGY="2x4"
    NUM_CPU_NODES=1

    Replace the following:

    • CLUSTER: the GKE cluster name
    • PROJECT_ID: your Google Cloud project name
    • ZONE: the zone where you are creating resources
    • REGION: the region where you are creating resources
    • CLUSTER_VERSION: [Optional] the GKE cluster version, use 1.32.2-gke.1475000 or later
    • PW_CPU_MACHINE_TYPE: the CPU node type for the Pathways controller
    • NETWORK: [Optional] set a Virtual Private Cloud name if using XPK, this must be created before creating your cluster
    • SUBNETWORK: [Optional] set a subnetwork name if using XPK, this must be created before creating your cluster
    • CLUSTER_NODEPOOL_COUNT: the maximum number of node pools a workload can use
    • TPU_MACHINE_TYPE: the TPU machine type you want to use
    • WORKERS_PER_SLICE: the number of nodes per node pool

    • GKE_ACCELERATOR_TYPE: the Google Kubernetes Engine accelerator type, see Choose a TPU version

    • TOPOLOGY: the TPU topology

    • NUM_CPU_NODES: the Pathways CPU node pool size

The following steps explain how to create a GKE cluster and set it up for running Pathways workloads.

  1. Create a GKE cluster:

    gcloud beta container clusters create ${CLUSTER} \
    --project=${PROJECT} \
    --zone=${ZONE} \
    --cluster-version=${CLUSTER_VERSION} \
    --scopes=storage-full,gke-default,cloud-platform \
    --machine-type ${PW_CPU_MACHINE_TYPE} \
    --network=${NETWORK} \
    --subnetwork=${SUBNETWORK}
    
  2. Create TPU node pools:

    for i in $(seq 1 ${CLUSTER_NODEPOOL_COUNT}); do
    gcloud container node-pools create "tpu-np-${i}" \
    --project=${PROJECT} \
    --zone=${ZONE} \
    --cluster=${CLUSTER} \
    --machine-type=${TPU_MACHINE_TYPE} \
    --num-nodes=${WORKERS_PER_SLICE} \
    --placement-type=COMPACT \
    --tpu-topology=${TOPOLOGY} \
    --scopes=storage-full,gke-default,cloud-platform \
    --workload-metadata=GCE_METADATA
    done
    
  3. Create a CPU node pool:

    gcloud container node-pools create "cpu-pathways-np" \
    --project ${PROJECT} \
    --zone ${ZONE} \
    --cluster ${CLUSTER} \
    --machine-type ${PW_CPU_MACHINE_TYPE} \
    --num-nodes ${NUM_CPU_NODES} \
    --scopes=storage-full,gke-default,cloud-platform \
    --workload-metadata=GCE_METADATA
    
  4. Install the JobSet and PathwaysJob APIs

    Get credentials for the cluster and add them to your local kubectl context.

    gcloud container clusters get-credentials ${CLUSTER} \
        [--zone=${ZONE} | --region=${REGION}] \
        --project=${PROJECT} \
        && kubectl config set-context --current --namespace=default
    

    To use the Pathways architecture on your GKE cluster, you need to install the JobSet API and the PathwaysJob API.

    kubectl apply --server-side -f https://github.com/kubernetes-sigs/jobset/releases/download/v0.8.0/manifests.yaml
    kubectl apply --server-side -f https://github.com/google/pathways-job/releases/download/v0.1.1/install.yaml
    

What's next