Run a small batch workload with GPUs and flex-start provisioning mode

Autopilot Standard

This guide shows you how to optimize GPU provisioning for medium- and small-scale training workloads by using flex-start provisioning mode. In this guide, you use flex-start to deploy a workload that consists of two Kubernetes Jobs. Each Job requires one GPU. GKE automatically provisions a single node with two A100 GPUs to run both Jobs.

If your workload requires multi-node distributed processing, consider using flex-start with queued provisioning. For more information, see Run a large-scale workload with flex-start with queued provisioning.

This guide is intended for Machine learning (ML) engineers, Platform admins and operators, and for Data and AI specialists who are interested in using Kubernetes container orchestration capabilities for running batch workloads. For more information about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks.

Flex-start pricing

Flex-start is recommended if your workload requires dynamically provisioned resources as needed, for up to seven days with short-term reservations, no complex quota management, and cost-effective access. Flex-start is powered by Dynamic Workload Scheduler and is billed using Dynamic Workload Scheduler pricing:

Discounted (up to 53%) for vCPUs, GPUs, and TPUs.
You pay as you go.

Before you begin

Before you start, make sure that you have performed the following tasks:

Enable the Google Kubernetes Engine API.

Enable Google Kubernetes Engine API

If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.
Note: For existing gcloud CLI installations, make sure to set the compute/region property. If you use primarily zonal clusters, set the compute/zone instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.

Verify that you have an Autopilot cluster or a Standard cluster that's running version 1.33.0-gke.1712000 or later.
Verify that you're familiar with limitations of flex-start.
When using a Standard cluster, verify that you maintain at least one node pool without flex-start enabled for the cluster to function correctly.
Verify that you have quota for preemptible GPUs in your node locations.

Create a node pool with flex-start

To create a node pool with flex-start enabled on an existing Standard cluster, you can use the gcloud CLI or Terraform.

If you use a cluster in Autopilot mode, skip this section and go to the Run a batch workload section.

gcloud

Create a node pool with flex-start:
```
gcloud container node-pools create NODE_POOL_NAME \
    --cluster CLUSTER_NAME \
    --location LOCATION_NAME \
    --project PROJECT_ID \
    --accelerator type=nvidia-a100-80gb,count=2 \
    --machine-type a2-ultragpu-2g \
    --max-run-duration MAX_RUN_DURATION \
    --flex-start \
    --num-nodes 0 \
    --enable-autoscaling \
    --total-min-nodes 0 \
    --total-max-nodes 5 \
    --location-policy ANY \
    --reservation-affinity none \
    --no-enable-autorepair
```
Replace the following:
- NODE_POOL_NAME: the name you choose for your node pool.
- LOCATION_NAME: the compute region for the cluster control plane.
- PROJECT_ID: your project ID.
- CLUSTER_NAME: the name of the Standard cluster you want to modify.
- MAX_RUN_DURATION: optional. The maximum runtime of a node in seconds, up to the default of seven days.
In this command, the --flex-start flag instructs gcloud to create a node pool with flex-start enabled.

GKE creates a node pool with nodes that contain two A100 GPUs (a2-ultragpu-2g). This node pool automatically scales nodes from zero to a maximum of five nodes.

Verify the status of flex-start in the node pool:

gcloud container node-pools describe NODE_POOL_NAME \
    --cluster CLUSTER_NAME \
    --location LOCATION_NAME \
    --format="get(config.flexStart)"

If flex-start is enabled in the node pool, the flexStart field is set to True.

Terraform

You can use flex-start with GPUs by using a Terraform module.

Add the following block to your Terraform configuration:

```hcl
resource "google_container_node_pool" " "gpu_dws_pool" {
name = "gpu-dws-pool"

queued_provisioning {
    enabled = false
}

}
node_config {
    machine_type = "a3-highgpu-8g"
    flex_start = true
}
```

Terraform calls Google Cloud APIs to create a cluster with a node pool that uses flex-start with GPUs. The node pool initially has zero nodes and autoscaling is enabled. To learn more about Terraform, see the google_container_node_pool resource spec on terraform.io.

Run a batch workload

In this section, you create two Kubernetes Jobs that require one GPU each. A Job controller in Kubernetes creates one or more Pods and ensures that they successfully execute a specific task.

In the Google Cloud console, launch a Cloud Shell session by clicking Activate Cloud Shell. A session opens in the bottom pane of the Google Cloud console.

Create a file named dws-flex-start.yaml:

apiVersion: batch/v1
kind: Job
metadata:
  name: job-1
spec:
  template:
    spec:
      nodeSelector:
        cloud.google.com/gke-flex-start: "true"
      containers:
      - name: container-1
        image: gcr.io/k8s-staging-perf-tests/sleep:latest
        args: ["10s"] # Sleep for 10 seconds
        resources:
          requests:
              nvidia.com/gpu: 1
          limits:
              nvidia.com/gpu: 1
      restartPolicy: OnFailure
---
apiVersion: batch/v1
kind: Job
metadata:
  name: job-2
spec:
  template:
    spec:
      nodeSelector:
        cloud.google.com/gke-flex-start: "true"
      containers:
      - name: container-2
        image: gcr.io/k8s-staging-perf-tests/sleep:latest
        args: ["10s"] # Sleep for 10 seconds
        resources:
          requests:
              nvidia.com/gpu: 1
          limits:
              nvidia.com/gpu: 1
      restartPolicy: OnFailure

Apply the dws-flex-start.yaml manifest:
```
kubectl apply -f dws-flex-start.yaml
```

Verify that the Jobs are running on the same node:

kubectl get pods -l "job-name in (job-1,job-2)" -o wide

The output is similar to the following:

NAME    READY   STATUS      RESTARTS   AGE   IP       NODE               NOMINATED NODE   READINESS GATES
job-1   0/1     Completed   0          19m   10.(...) gke-flex-zonal-a2  <none>           <none>
job-2   0/1     Completed   0          19m   10.(...) gke-flex-zonal-a2  <none>           <none>

Clean up

To avoid incurring charges to your Google Cloud account for the resources that you used on this page, either delete the project that contains the resources, or keep the project and delete the individual resources.

Delete the project

In the Google Cloud console, go to the Manage resources page.
Go to Manage resources
In the project list, select the project that you want to delete, and then click Delete.
In the dialog, type the project ID, and then click Shut down to delete the project.

Delete the individual resource

Delete the Jobs:

kubectl delete job -l "job-name in (job-1,job-2)"

Delete the node pool:

gcloud container node-pools delete NODE_POOL_NAME \
      --location LOCATION_NAME

Delete the cluster:

gcloud container clusters delete CLUSTER_NAME

What's next

Learn more about GPUs in GKE.
Learn more about node auto-provisioning.
Learn more about Best practices for running batch workloads on GKE.