You can connect to a Ray cluster on Vertex AI and develop an application using the following methods:
Connect to the Ray cluster on Vertex AI through Ray Client using the version of the Vertex AI SDK for Python that includes the functionality of the Ray Client. Use this option if you prefer an interactive Python development environment.
Use the Vertex AI SDK for Python within the Colab Enterprise notebook in the Google Cloud console.
Use the Vertex AI SDK for Python within a Python session, shell, or Jupyter notebook.
Write a Python script and submit the script to the Ray cluster on Vertex AI using the Ray Jobs API. Use this option if you'd rather submit jobs programmatically.
Connect to a Ray cluster through Ray Client
To use the interactive Ray Client, connect to your Ray cluster on Vertex AI. The connecting environment's network depends on the cluster's network configuration. There are no restrictions on the connecting environment as long as the cluster has public internet access. That is, a VPC network wasn't specified during cluster creation. If, however, the cluster is on a private VPC network that is peered with Vertex AI, the connecting environment must be on the same VPC network as the cluster.
The Ray version on the client side must match the cluster's Ray version.
pip install "google-cloud-aiplatform[ray]"
installs Ray version 2.33 on the client side by default. If the cluster's
Ray version is 2.9, then you must use pip install ray==2.9.3
to match the
client side's Ray version to the cluster's Ray version.
Console
In accordance with the OSS Ray best practice recommendation, setting the logical CPU count to 0 on the Ray head node is enforced in order to avoid running any workload on the head node.
In the Google Cloud console, go to the Ray on Vertex AI page.
In the row for the cluster you created, Click Open in Colab Enterprise.
The Colab Enterprise notebook opens. Follow the instructions on how to use the Vertex AI SDK for Python to connect to the Ray cluster on Vertex AI.
If a dialog screen asks you to enable APIs, click Enable.
Click Connect if you're connecting to the cluster for the first time, or Re-connect if you're re-connecting to the cluster. The notebook takes a few minutes to connect to the Runtime.
Click the +CREATE to create a new notebook.
Click to open the Ray on Vertex AI panel.
Display of existing clusters appears.Select a cluster and click CONNECT.
Code appears in your open notebook that connects to your chosen cluster.Other actions (Optional): To open the Ray on Vertex AI cluster list page, click Manage clusters in the Ray on Vertex AI panel.
- Select a cluster and click
More options appear:
more actions menu.
- Select a cluster and click
Run the Getting started code cell to import the Vertex AI SDK for Python and connect to the Ray cluster on Vertex AI.
Python
In accordance with the OSS Ray best practice recommendation, setting the logical CPU count to 0 on the Ray head node is enforced in order to avoid running any workload on the head node.
From an interactive Python environment:
import ray # Necessary even if aiplatform.* symbol is not directly used in your program. from google.cloud import aiplatform import vertex_ray import vertexai vertexai.init() # The CLUSTER_RESOURCE_NAME is the one returned from vertex_ray.create_ray_cluster. CLUSTER_RESOURCE_NAME='projects/{}/locations/{}/persistentResources/{}'.format(PROJECT_ID, LOCATION, CLUSTER_NAME) ray.init('vertex_ray://{}'.format(CLUSTER_RESOURCE_NAME))
Where:
LOCATION: The location you specified for your Ray cluster on Vertex AI.
PROJECT_ID: Your Google Cloud project ID. You can find the project ID in the Google Cloud console welcome page.
CLUSTER_NAME: The name of your Ray cluster on Vertex AI, specified when you created the cluster. Go to the Google Cloud console to view the list of cluster names for a project.
You should get output similar to the following:
Python version: 3.10.12 Ray version: 2.33 Vertex SDK version: 1.46.0 Dashboard: xxxx-dot-us-central1.aiplatform-training.googleusercontent.com
You can use the Dashboard
URL to access the Ray dashboard from a browser. The
URI is in the format of
https://xxxx-dot-us-central1.aiplatform-training.googleusercontent.com/
.
The dashboard shows submitted jobs, the number of GPU or CPUs, and disk space of
each machine in the cluster.
Once you're connected to the Ray cluster on Vertex AI you can develop a Ray program the same way you would develop one for a normal OSS Ray backend.
@ray.remote def square(x): print(x) return x * x # Launch four parallel square tasks. futures = [square.remote(i) for i in range(4)] print(ray.get(futures)) # Returns [0, 1, 4, 9]
Develop an application using the Ray Jobs API
This section describes how to submit a Python program to the Ray cluster on Vertex AI using the Ray Jobs API.
Write a Python script
Develop your application as a Python script in any text editor. For example,
place the following script in a my_script.py
file:
import ray import time @ray.remote def hello_world(): return "hello world" @ray.remote def square(x): print(x) time.sleep(100) return x * x ray.init() # No need to specify address="vertex_ray://...." print(ray.get(hello_world.remote())) print(ray.get([square.remote(i) for i in range(4)]))
Submit a Ray job using the Ray Jobs API
You can submit a Ray job using Python, the Ray Jobs CLI, or the public Ray dashboard address.
Python - cluster resource name
Submit a Ray job using a Python environment:
import ray import vertex_ray from ray.job_submission import JobSubmissionClient from google.cloud import aiplatform # Necessary even if aiplatform.* symbol is not directly used in your program. CLUSTER_RESOURCE_NAME='projects/{}/locations/REGION/persistentResources/{}'.format(PROJECT_ID, CLUSTER_NAME) client = JobSubmissionClient("vertex_ray://{}".format(CLUSTER_RESOURCE_NAME)) job_id = client.submit_job( # Entrypoint shell command to execute entrypoint="python my_script.py", # Path to the local directory that contains the my_script.py file. runtime_env={ "working_dir": "./directory-containing-my-script", "pip": ["numpy", "setuptools<70.0.0", "xgboost", "ray==CLUSTER_RAY_VERSION", # pin the Ray version to the same version as the cluster ] } ) # Ensure that the Ray job has been created. print(job_id)
Where:
REGION: The region you specified for your Ray cluster on Vertex AI.
PROJECT_ID: Your Google Cloud project number. You can find the project ID in the Google Cloud console welcome page.
CLUSTER_NAME: The name of your Ray cluster on Vertex AI, specified when you created the cluster. Go to the Google Cloud console to view the list of cluster names for a project.
CLUSTER_RAY_VERSION: Pin the Ray version to the same version as the cluster. For example, 2.33.0.
Python - Ray dashboard
The Ray dashboard address is accessible from outside the VPC, including the public internet.
Note that vertex_ray
is required to obtain authentication automatically.
from ray.job_submission import JobSubmissionClient import vertex_ray DASHBOARD_ADDRESS=DASHBOARD_ADDRESS client = JobSubmissionClient( "vertex_ray://{}".format(DASHBOARD_ADDRESS), ) job_id = client.submit_job( # Entrypoint shell command to execute entrypoint="python my_script.py", # Path to the local directory that contains the my_script.py file runtime_env={ "working_dir": "./directory-containing-my-script", "pip": ["numpy", "setuptools<70.0.0", "xgboost", "ray==CLUSTER_RAY_VERSION", # pin the Ray version to the same version as the cluster ] } ) print(job_id)
Where:
DASHBOARD_ADDRESS: The Ray dashboard address for your cluster. You can find the dashboard address using the Vertex AI SDK for Python.
Ray Jobs CLI
Note that you can only use the Ray Jobs CLI commands within the peered VPC network.
$ ray job submit --working-dir ./ --address vertex_ray://{CLUSTER_RESOURCE_NAME} -- python my_script.py
After submitting a long running Ray Job, if you want to monitor your job status
using client.get_job_status(job_id)
, you may have to re-instantiate
JobSubmissionClient(client = JobSubmissionClient("vertex_ray://{}".format(CLUSTER_RESOURCE_NAME))
)
to refresh the authentication token.
Support for VPC peering and custom service account
Ray on Vertex AI supports Ray Client and Ray Jobs API (JobSubmissionClient) in a public network for default service agent and custom service accounts.
Ray on Vertex AI support for VPC peering, when the Ray cluster is created with the VPC network, is shown in this table:
VPC peering | Default service agent | Custom service account |
---|---|---|
Ray Client (interactive mode) | Yes | No |
Ray JobSubmissionClient | Yes | Yes |
VPC Service Controls (VPC-SC) require additional configurations. See Private and public connectivity for more details.
Use Network File System (NFS) in your Ray Code
If you set an NFS mount when creating the Ray cluster, you can read and write those NFS volumes in your application code.
RayClient
This section shows you how to use Network File System (NFS) in your Ray code.
Initialize the RayClient in a Python environment
import ray from google.cloud import aiplatform import vertex_ray aiplatform.init(project=PROJECT_ID, location=REGION) ray.init(address='vertex_ray://projects/{}/locations/us-central1/persistentResources/{}'.format(PROJECT_NUMBER, PERSISTENT_RESOURCE_ID))
Run job script
import ray import logging import os import sys @ray.remote def main(): logging.info("list all files in mounted folder") return os.listdir("/mnt/nfs/test") print(''.join(ray.get(main.remote())))
You can submit a Ray job using Python, the Ray Jobs CLI, or the public Ray dashboard address. For more information, see Develop an application on the Ray cluster on Vertex AI).