Deploy GPU-accelerated Ray for AI workloads on GKE
This page shows you how to set up and run a GPU-enabled Ray cluster on Google Kubernetes Engine (GKE) to scale your AI/ML workloads.
Why use Ray?
Using Ray on GKE, facilitated by KubeRay, lets you scale Python and AI applications. Ray handles the application scaling, and GKE manages the underlying servers and resources.
This combination makes it easier to manage resources, automatically adjust capacity, ensure reliability, and deploy your applications across various environments. By connecting your distributed Ray applications with your existing cloud infrastructure and tools on GKE, you can simplify operations and provision a robust, scalable platform for complex tasks by using standard Kubernetes methods.
For more information, see the blog post Why GKE for your Ray AI workloads.
Before you begin
Before you start, make sure you have performed the following tasks:
-
In the Google Cloud console, on the project selector page, select or create a Google Cloud project.
-
Make sure that billing is enabled for your Google Cloud project.
-
Enable the GKE API.
- Install the Google Cloud CLI.
-
Configure the gcloud CLI to use your federated identity.
For more information, see Sign in to the gcloud CLI with your federated identity.
-
To initialize the gcloud CLI, run the following command:
gcloud init
- Install
kubectl
:kubectl
is the primary command line tool for interacting with GKE clusters. If you use Cloud Shell,kubectl
is already installed. Otherwise, install it in your terminal by running thegcloud components install kubectl
command.To verify your installation, run this command:
kubectl version --client
- Install Ray: we highly recommend that you create and enter
a Python virtual environment (
venv
) before you install Ray so that you can isolate its dependencies.- Install Python.
- Create and activate a new virtual environment.
- Install the latest stable version of Ray by running this command inside
your virtual environment:
pip install -U "ray[default]"
. - To verify your installation, run this command:
ray --version
. Note the Ray version (for example, 2.44.1) in the output.
- Ensure your Google Cloud project has sufficient quota for L4 GPUs. For more information, see About GPUs and Allocation quotas.
Create a cluster in GKE Autopilot mode
In Autopilot mode, Google manages your cluster configuration, including scaling, security, and other preconfigured settings. Autopilot clusters are optimized to run most production workloads and provision compute resources based on your Kubernetes manifests.
In the terminal, run the following command:
gcloud container clusters create-auto my-ray-enabled-cluster \ --enable-ray-operator \ --enable-ray-cluster-monitoring \ --enable-ray-cluster-logging \ --location=us-central1
This Google Cloud CLI command creates an Autopilot cluster named
my-ray-enabled-cluster
, with these settings:--enable-ray-operator
: installs the Ray Operator on the cluster, which simplifies managing Ray clusters within Kubernetes.--enable-ray-cluster-monitoring
: sets up integration with Cloud Monitoring for Ray cluster metrics.--enable-ray-cluster-logging
: configures integration with Cloud Logging to collect logs from your Ray cluster.--location=us-central1
: specifies the geographical region where the GKE cluster will be created. Be aware that accelerator availability and supported types vary by location. For more information, see GPU regions and zones.
Autopilot clusters automatically provisions and scales nodes (including GPU-enabled ones that use compute classes) based on workload demands, optimizing both utilization and cost. This automatic provisioning and scaling of resources in Autopilot clusters simplifies resource management, and lets you focus on your models instead of infrastructure.
If you use a GKE Standard cluster, the flags and settings for enabling Ray might be different. For more information, see Enable the Ray Operator on GKE.
Configure
kubectl
to communicate with your cluster:gcloud container clusters get-credentials my-ray-enabled-cluster \ --region=us-central1
Install the KubeRay kubectl ray
plugin
The kubectl ray
plugin simplifies common Ray on Kubernetes workflows. For the
latest installation instructions, see the KubeRay documentation.
Go to the releases page and download the binary for your platform. For example, to install
kubectl ray
plugin version 1.3.2 on Linux amd64, run the following commands:curl -LO https://github.com/ray-project/kuberay/releases/download/v1.3.2/kubectl-ray_v1.3.2_linux_amd64.tar.gz tar -xvf kubectl-ray_v1.3.2_linux_amd64.tar.gz cp kubectl-ray ~/.local/bin
Replace
~/.local/bin
with the directory in yourPATH
.Verify your installation:
kubectl ray version
If you see a warning like
KubeRay operator installation cannot be found: no KubeRay operator deployments found in any namespace.
, you can safely ignore it. If the installation was successful, the output is the version of thekubectl ray
plugin.
Create the custom compute class in your cluster
A custom compute class in GKE lets you define specific hardware requirements, like the GPU type and count, by priority. Autopilot uses this priority when provisioning nodes for your workloads.
To create the example custom compute class for this guide, follow these steps:
Save the following compute class specification as
nvidia-l4-compute-class.yaml
:apiVersion: cloud.google.com/v1 kind: ComputeClass metadata: name: gpu-compute-class spec: priorities: - gpu: type: nvidia-l4 count: 1 spot: true - gpu: type: nvidia-l4 count: 4 spot: false nodePoolAutoCreation: enabled: true whenUnsatisfiable: DoNotScaleUp
Create the compute class by by running this command:
kubectl apply -f nvidia-l4-compute-class.yaml
What happens after you apply the custom compute class? After applying the compute class YAML file, Autopilot will automatically provision nodes with one NVIDIA L4 GPU when your Ray cluster requests them. It will prioritize Spot VMs as defined in your manifest. The
whenUnsatisfiable: DoNotScaleUp
setting instructs GKE to not create nodes that don't meet the specified requirements.
Create and verify the Ray cluster
You can now create your Ray cluster on your Autopilot cluster.
Create the Ray cluster. Use the
kubectl ray create cluster
command to define and create your Ray cluster. This command simplifies the process by handling the underlying Kubernetes resource creation.kubectl ray create cluster my-ray-cluster \ --worker-replicas=1 \ --worker-cpu=2 \ --worker-memory=4Gi \ --worker-gpu=1 \ --worker-node-selectors="cloud.google.com/compute-class=gpu-compute-class"
The
kubectl ray create cluster
command uses the KubeRay plugin to translate your specifications into a RayCluster custom resource definition and sends it to the Kubernetes API to create the RayCluster object. This object acts as a blueprint, telling the Ray Operator how to provision and manage the actual Ray cluster components (head and worker nodes) within GKE. By default, GKE schedules the head Pod on a node from the E2 machine series.The example command uses these settings:
kubectl ray create cluster my-ray-cluster
: specifies the name of the Ray cluster you want to create.--worker-node-selectors
: tells Kubernetes to schedule worker Pods on nodes that are labeled withcloud.google.com/compute-class=gpu-compute-class
.
Verify the Ray cluster status. Use these commands to verify that your Ray cluster is up and running.
Check the RayCluster resource by running this command:
kubectl ray get cluster
This command lists all the Ray clusters in your Kubernetes namespace. You should see
my-ray-cluster
listed. It might take a few minutes for the cluster to initialize.Check the Kubernetes Pods by running this command:
kubectl get pods
This command lists all the Pods that are running in your Kubernetes namespace. You should see Pods related to your Ray cluster:
- A Pod for the Ray head node, with a name like
my-ray-cluster-head-0
. - One or more Pods for the Ray worker nodes, with names like
my-ray-cluster-worker-group-0-xxxxx
. The number of worker Pods depends on the initial configuration of your Ray cluster and any autoscaling that might have occurred.
- A Pod for the Ray head node, with a name like
Submit a Ray job
After your Ray cluster is ready, you can submit a Ray job to your running Ray cluster on GKE.
To do this, use the kubectl ray session
command to start an interactive
session, and the ray job submit
command to start job execution.
Start a Ray interactive session. To establish a local connection to your Ray cluster, run this command:
kubectl ray session my-ray-cluster
This command initiates port forwarding between your local machine and the Ray head node in your GKE cluster. Note that your terminal will be occupied while this session is active; to proceed, open a separate terminal instance.
Create the sample code to run. Save the following example code to a file named
sample_code.py
.import ray import os import requests ray.init() @ray.remote class Counter: def __init__(self): # Used to verify runtimeEnv self.name = os.getenv("counter_name") assert self.name == "test_counter" self.counter = 0 def inc(self): self.counter += 1 def get_counter(self): return "{} got {}".format(self.name, self.counter) counter = Counter.remote() for _ in range(5): ray.get(counter.inc.remote()) print(ray.get(counter.get_counter.remote())) # Verify that the correct runtime env was used for the job. assert requests.__version__ == "2.26.0"
Open a new terminal and submit the Ray job.
- Open a new terminal on your local machine. If you created a virtual environment for your Ray installation, ensure you activate it in this new terminal as well.
- Navigate to the directory where you saved the
sample_code.py
file. Run the following command:
ray job submit \ --working-dir=. \ --runtime-env-json='{"pip": ["requests==2.26.0"], "env_vars": {"counter_name": "test_counter"}}' \ --address http://localhost:8265 python sample_code.py
The example command uses these settings:
--working-dir=.
: specifies the current directory as the working directory to be made available to the Ray job on the cluster. Ideally, this directory contains just the application source code you want to run on Ray. In our example, this is thesample_code.py
file.--runtime-env-json='{"pip": ["requests==2.26.0"], "env_vars": {"counter_name": "test_counter"}}'
: defines the runtime environment for the job. It specifies the required Python library version (requests==2.26.0
) and sets an environment variable (counter_name
) for the job's execution.
View the Ray job. You can monitor the job in one of these ways:
- Ray dashboard: open your web browser and go to
http://localhost:8265
. This URL opens the Ray dashboard, where you can see information about your running job, including its status, logs, and resource usage. - Metrics Explorer: use the prebuilt metrics dashboards in the Google Cloud console. For more information, see Collect and view logs and metrics for Ray clusters on GKE.
- Ray dashboard: open your web browser and go to
Terminate the Ray session. After you finish interacting with the Ray cluster or monitoring your job, you can terminate the interactive session by pressing
Ctrl+C
in the terminal where you ran thekubectl ray session
command.
Clean up
To avoid incurring charges to your Google Cloud account for the resources used on this page, follow these steps.
If you deployed the quickstart in a new Google Cloud project, and if you no longer need the project, delete it by completing the following steps:
- In the Google Cloud console, go to the Manage resources page.
- In the project list, select the project that you want to delete, and then click Delete.
- In the dialog, enter the project ID, and the click Shut down to delete the project.
If you did not create a new Google Cloud project, and if you no longer need the Autopilot cluster, you can delete it by running this command:
gcloud container clusters delete my-ray-enabled-cluster \
--location=us-central1
What's next
- Explore additional examples, best practices, and tools for Ray on GKE: For more information, see the ai-on-gke GitHub repository.
- Learn more about using accelerators on GKE: Use GPUs for parallel processing in machine learning and data analytics tasks. Use TPUs for accelerating large-scale deep learning model training and inference.
- Learn more about the Ray Operator: For more information about how to use the Ray Operator to manage Ray Deployments, including Ray jobs, see About Ray on GKE.
- Utilize Kueue for job queuing: Discover how Kueue, a Kubernetes-native job queuing system, can help manage and prioritize your AI/ML workloads, which can improve resource utilization. For more information, see Optimize GKE resource utilization for mixed AI/ML training and inference workloads.