Enable the Ray operator on Google Kubernetes Engine (GKE)


This page shows you how to enable the Ray operator for Google Kubernetes Engine (GKE), letting you use the KubeRay APIs to scale and manage Ray clusters. For more information on Ray and KubeRay, see Ray on GKE overview.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Enable the Ray operator for a cluster

You can enable the Ray operator on new or existing Autopilot or Standard GKE clusters using the Google Cloud CLI, the Google Cloud console, or Terraform.

Console

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click Create then in the Standard or Autopilot section, click Configure.

  3. For Standard, from the navigation pane, under Cluster, click Features. For Autopilot, click Advanced Settings.

  4. In the AI and Machine Learning section, select the Enable Ray Operator checkbox.

  5. Click Create.

gcloud

Create a cluster using the --addons=RayOperator option:

gcloud container clusters create CLUSTER_NAME \
    --cluster-version=VERSION \
    --addons=RayOperator

Replace the following:

  • CLUSTER_NAME: the name of the new cluster.
  • VERSION: the GKE version, which must be 1.30.0-gke.1747000 or later. You can also use the --release-channel option to select a release channel. The release channel must have a default version of 1.30.0-gke.1747000 or later.

You can enable the Ray operator on an existing cluster by using the gcloud container clusters update command with option --update-addons=RayOperator=ENABLED.

Terraform

The following Terraform example creates and configures a Standard cluster:

resource "google_container_cluster" "ray-cluster" {
  name     = "gke-standard-regional-ray-operator"
  location = "us-west1"

  initial_node_count = 1

  release_channel {
    channel = "RAPID"
  }

  addons_config {
    ray_operator_config {
      enabled = true
      ray_cluster_logging_config {
        enabled = true
      }
      ray_cluster_monitoring_config {
        enabled = true
      }
    }
  }

  # Set `deletion_protection` to `true` will ensure that one cannot
  # accidentally delete this instance by use of Terraform.
  deletion_protection = false
}

To learn more about using Terraform, see Terraform support for GKE.

Versioning

The Ray Operator is available in the following GKE minor versions with a corresponding KubeRay version:

GKE Minor Version KubeRay Version
1.31 v1.2
1.30 v1.1
1.29 v1.1

Verify the Ray operator is enabled

You can verify that the Ray operator is enabled on an Autopilot or Standard GKE cluster using the gcloud CLI or the Google Cloud console.

Console

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. In the cluster list, click the name of the cluster that you want to verify.

  3. In the Features section, confirm that the Ray Operator checkbox is selected.

gcloud

Describe your cluster:

gcloud container clusters describe CLUSTER_NAME

Replace CLUSTER_NAME with the name of the cluster.

The output is similar to the following

# Several lines omitted
addonsConfig:
  rayOperatorConfig:
    enabled: true

This output indicates that the Ray operator is enabled for the cluster.

What's next