This tutorial uses Kueue to show you how to implement a Job queueing system, configure workload resource and quota sharing between different namespaces on Google Kubernetes Engine (GKE), and to maximize the utilization of your cluster.
Background
As an infrastructure engineer or cluster administrator, maximizing utilization between namespaces is very important. A batch of Jobs in one namespace might not fully utilize the full quota assigned to the namespace, while another namespace may have multiple pending Jobs. In order to efficiently utilize the cluster resources among Jobs in different namespaces and to increase the flexibility of quota management, you can configure cohorts in Kueue. A cohort is a group of ClusterQueues that can borrow unused quota from one another. A ClusterQueue governs a pool of resources such as CPU, memory, and hardware accelerators.
You can find a more detailed definition of all these concepts in the Kueue documentation
Objectives
This tutorial is for infrastructure engineers or cluster administrators that want to implement a Job queueing system on Kubernetes using Kueue with quota sharing.This tutorial will mimic two teams in two different namespaces, where each team has their dedicated resources, but can borrow from each other. A third set of resources can be used as spillover when jobs accumulate.
Utilize Prometheus operator to monitor Jobs and resource allocation in different namespaces.
This tutorial covers the following steps:
- Create a GKE cluster
- Create the ResourceFlavors
- For each team, create a ClusterQueue and LocalQueue
- (Optional) Deploy kube-prometheus and monitor Workloads using Prometheus
- Create Jobs and observe the admitted workloads
- Borrow unused quota with cohorts
- Add a spillover ClusterQueue governing spot VMs
Costs
This tutorial uses the following billable components of Google Cloud:Use the Pricing Calculator to generate a cost estimate based on your projected usage.
When you finish this tutorial, you can avoid continued billing by deleting the resources you created. For more information, see Clean up.
Before you begin
Set up your project
- Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.
-
In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the GKE API.
-
In the Google Cloud console, on the project selector page, click Create project to begin creating a new Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the GKE API.
Set defaults for the Google Cloud CLI
In the Google Cloud console, start a Cloud Shell instance:
Open Cloud ShellDownload the source code for this sample app:
git clone https://github.com/GoogleCloudPlatform/kubernetes-engine-samples
Set the default environment variables:
gcloud config set project PROJECT_ID gcloud config set compute/region COMPUTE_REGION
Replace the following values:
- PROJECT_ID: your Google Cloud project ID.
- COMPUTE_REGION: the Compute Engine region.
Create a GKE cluster
Create a GKE cluster named
kueue-cohort
:You will create a cluster with 6 nodes (2 per zone) in the default pool and no autoscaling. Those will be all the resources available for the teams in the beginning, so they will have to compete for them.
You will see later how Kueue manages the workloads that both teams will send to the respective queues.
gcloud container clusters create kueue-cohort --region COMPUTE_REGION \ --release-channel rapid --machine-type e2-standard-4 --num-nodes 2
The outcome is similar to the following once the cluster is created:
kubeconfig entry generated for kueue-cohort. NAME: kueue-cohort LOCATION: us-central1 MASTER_VERSION: 1.26.2-gke.1000 MASTER_IP: 35.224.108.58 MACHINE_TYPE: e2-medium NODE_VERSION: 1.26.2-gke.1000 NUM_NODES: 6 STATUS: RUNNING
Where the
STATUS
isRUNNING
for thekueue-cluster
.Create a node pool named
spot
.This node pool uses Spot VM and has autoscaling enabled. It starts with 0 nodes, but later you will make it available to the teams for use as overspill capacity.
gcloud container node-pools create spot --cluster=kueue-cohort --region COMPUTE_REGION \ --spot --enable-autoscaling --max-nodes 20 --num-nodes 0 \ --machine-type e2-standard-4
Install the release version of Kueue to the cluster:
VERSION=VERSION kubectl apply -f \ https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
Replace VERSION with the letter v following the latest version of Kueue, for example
v0.4.0
. For more information about Kueue versions, see Kueue releases.Wait until the Kueue controller is ready:
watch kubectl -n kueue-system get pods
The output should be similar to the following before you can continue:
NAME READY STATUS RESTARTS AGE kueue-controller-manager-6cfcbb5dc5-rsf8k 2/2 Running 0 3m
Create two new namespaces called
team-a
andteam-b
:kubectl create namespace team-a kubectl create namespace team-b
Jobs will be generated on each namespace.
Create the ResourceFlavors
A ResourceFlavor represents resource variations in your cluster nodes, such as different VMs (for example spot versus on-demand), architectures (for example, x86 vs ARM CPUs), brands and models (for example, Nvidia A100 versus T4 GPUs).
ResourceFlavors use node labels and taints to match with a set of nodes in the cluster.
In this manifest:
- The ResourceFlavor
on-demand
has its label set tocloud.google.com/gke-provisioning: standard
. - The ResourceFlavor
spot
has its label set tocloud.google.com/gke-provisioning: spot
.
When a workload is assigned a ResourceFlavor, Kueue assigns the Pods of the workload to nodes that match the node labels defined for the ResourceFlavor.
Deploy the ResourceFlavor:
kubectl apply -f flavors.yaml
Create the ClusterQueue and LocalQueue
Create two ClusterQueues cq-team-a
and cq-team-b
, and their corresponding
LocalQueues lq-team-a
and lq-team-b
respectively namespaced to team-a
and team-b
.
ClusterQueues are cluster-scoped object that governs a pool of resources such as CPU, memory, and hardware accelerators. Batch administrators can restrict the visibility of these objects to batch users.
LocalQueues are namespaced objects that batch users can list. They point to CluterQueues, from which resources are allocated to run the LocalQueue workloads.
ClusterQueues allows resources to have multiple flavors. In this case, both
ClusterQueues have two flavors, on-demand
and spot
, each providing cpu
resources.
The quota of the ResourceFlavor spot
is set to 0
, and will not be used for
now.
Both ClusterQueues share the same cohort called all-teams
, defined in .spec.cohort
.
When two or more ClusterQueues share the same cohort, they can borrow unused quota from each
other.
You can learn more about how cohorts work and the borrowing semantics in the Kueue documentation
Deploy the ClusterQueues and LocalQueues:
kubectl apply -f cq-team-a.yaml
kubectl apply -f cq-team-b.yaml
(Optional) Deploy kube-prometheus and monitor Workloads using Prometheus
You can use Prometheus to monitor Kueue pending workloads and active workloads. In order to monitor workloads being brought up and observe load on each ClusterQueue, set up Prometheus to the cluster under the namespace monitoring.
Download the source code for prometheus operator for monitoring:
cd git clone https://github.com/prometheus-operator/kube-prometheus.git
Create the CustomResourceDefinitions(CRDs):
kubectl create -f kube-prometheus/manifests/setup
Create the monitoring components:
kubectl create -f kube-prometheus/manifests
Allow prometheus-operator to scrape metrics from Kueue components:
kubectl apply -f https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/prometheus.yaml
Change to the working directory:
cd kubernetes-engine-samples/batch/kueue-cohort
Start a new terminal to access Prometheus by port forwarding the service:
kubectl --namespace monitoring port-forward svc/prometheus-k8s 9090
Open Prometheus on localhost:9090 in the browser
If using Cloud Shell, click Web Preview, select change port, set the port number to 9090, and select
Change and Preview
.Enter the query for the first panel that monitors the active ClusterQueue
cq-team-a
:kueue_pending_workloads{cluster_queue="cq-team-a", status="active"} or kueue_admitted_active_workloads{cluster_queue="cq-team-a"}
Add another panel and enter the query that monitors the active ClusterQueue
cq-team-b
:kueue_pending_workloads{cluster_queue="cq-team-b", status="active"} or kueue_admitted_active_workloads{cluster_queue="cq-team-b"}
Add another panel and enter the query that monitors the number of nodes in the cluster:
count(kube_node_info)
Create Jobs and observe the admitted workloads
Generate Jobs to both ClusterQueues that will sleep for 10 seconds, with three paralleled Jobs and will be completed with three completions. It will then be cleaned up after 60 seconds.
job-team-a.yaml
creates Jobs under the namespace team-a
and points to
the LocalQueue lq-team-a
and the ClusterQueue cq-team-a
.
Similarly, job-team-b.yaml
creates Jobs under team-b
namespace, and points
to the LocalQueue lq-team-b
and the ClusterQueue cq-team-b
.
Start a new terminal and run this script to generate a Job every second:
./create_jobs.sh job-team-a.yaml 1
Start another terminal and create Jobs for the
team-b
namespace:./create_jobs.sh job-team-b.yaml 1
Observe the Jobs being queued up in Prometheus. Or with this command:
watch -n 2 kubectl get clusterqueues -o wide
The output should be similar to the following:
NAME COHORT STRATEGY PENDING WORKLOADS ADMITTED WORKLOADS
cq-team-a all-teams BestEffortFIFO 0 5
cq-team-b all-teams BestEffortFIFO 0 4
Borrow unused quota with cohorts
ClusterQueues might not be at full capacity at all times. Quotas usage is not maximized when workloads are not evenly spread out among ClusterQueues. If ClusterQueues share the same cohort between each other, ClusterQueues can borrow quotas from other ClusterQueues to maximize the quota utilization.
Once there are Jobs queued up for both ClusterQueues
cq-team-a
andcq-team-b
, stop the script for theteam-b
namespace by pressingCTRL+c
on the corresponding terminal.Once all the pending Jobs from the namespace
team-b
are processed, the jobs from the namespaceteam-a
can borrow the available resources incq-team-b
:kubectl describe clusterqueue cq-team-a
Because
cq-team-a
andcq-team-b
share the same cohort calledall-teams
, these ClusterQueues are able to share resources that are not utilized.Flavors Usage: Name: on-demand Resources: Borrowed: 5 Name: cpu Total: 15 Borrowed: 5Gi Name: memory Total: 15Gi
Resume the script for the
team-b
namespace../create_jobs.sh job-team-b.yaml 3
Observe how the borrowed resources from
cq-team-a
go back to0
, while the resources fromcq-team-b
are used for its own workloads:kubectl describe clusterqueue cq-team-a
Flavors Usage: Name: on-demand Resources: Borrowed: 0 Name: cpu Total: 9 Borrowed: 0 Name: memory Total: 9Gi
Increase quota with Spot VMs
When quota needs to be temporarily increased, for example to meet high demand in pending workloads, you can configure Kueue to accommodate the demand by adding more ClusterQueues to the cohort. ClusterQueues with unused resources can share those resources with other ClusterQueues that belong to the same cohort.
At the beginning of the tutorial, you created a node pool named spot
using Spot VMs and a ResourceFlavor named spot
with the label set to cloud.google.com/gke-provisioning: spot
. Create a ClusterQueue to use this node pool and the ResourceFlavor that represents it:
Create a new ClusterQueue called
cq-spot
with cohort set toall-teams
:Because this ClusterQueue shares the same cohort with
cq-team-a
andcq-team-b
, both ClusterQueuecq-team-a
andcq-team-b
can borrow resources up to 15 CPU requests, and 15 Gi of memory.kubectl apply -f cq-spot.yaml
In Prometheus, observe how the admitted workloads spike for both
cq-team-a
andcq-team-b
thanks to the added quota bycq-spot
who shares the same cohort. Or with this command:watch -n 2 kubectl get clusterqueues -o wide
In Prometheus, observe the number of nodes in the cluster. Or with this command:
watch -n 2 kubectl get nodes -o wide
Stop both scripts by pressing
CTRL+c
forteam-a
andteam-b
namespace.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used in this tutorial, either delete the project that contains the resources, or keep the project and delete the individual resources.
Delete the project
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, type the project ID, and then click Shut down to delete the project.
Delete the individual resource
Delete the Kueue quota system:
kubectl delete -n team-a localqueue lq-team-a kubectl delete -n team-b localqueue lq-team-b kubectl delete clusterqueue cq-team-a kubectl delete clusterqueue cq-team-b kubectl delete clusterqueue cq-spot kubectl delete resourceflavor default kubectl delete resourceflavor on-demand kubectl delete resourceflavor spot
Delete the Kueue manifest:
VERSION=VERSION kubectl delete -f \ https://github.com/kubernetes-sigs/kueue/releases/download/$VERSION/manifests.yaml
Delete the cluster:
gcloud container clusters delete kueue-cohort --region=COMPUTE_REGION
What's next
Learn more about Deploy a batch system using Kueue.
Learn more about Jobs on GKE.