Overview of custom training options in Vertex AI

Compare Vertex AI custom training and Ray on Vertex AI

Vertex AI offers two options for custom training, Vertex AI custom training and Ray on Vertex AI. This page provides context for helping choose between these two options.

	Vertex AI Training	Ray on Vertex AI
Focus	General-purpose custom model training.	Scaling AI and Python applications, including model training, distributed applications, and model serving.
Underlying framework	Supports various ML frameworks: for example, TensorFlow, PyTorch, scikit-learn.	Leverages the open-source Ray framework. Supports various frameworks: TensorFlow, PyTorch, scikit-learn, and Spark on Ray using RayDP).
Flexibility	High flexibility in terms of code and environment.	High flexibility for building distributed applications; can use existing Ray code with minimal changes.
Scalability	Supports distributed training across multiple machines. Offers scalable compute resources (CPUs, GPUs, TPUs).	Designed for high scalability using Ray's distributed computing capabilities (up to 2,000 nodes). Supports manual and autoscaling.
Integration	Integrated with other Vertex AI services (Datasets, Vertex AI Experiments, and more).	Integrates with other Google Cloud services like Vertex AI Inference and BigQuery.
Ease of use	Easier to use for standard distributed training paradigms.	Requires familiarity with Ray framework concepts.
Environment	Managed environment for running custom training code using prebuilt or custom containers.	Managed environment for running distributed applications using the Ray framework; simplifies management of the Ray cluster on Vertex AI.
Hyperparameter tuning	Includes hyperparameter tuning capabilities.	Simplifies hyperparameter tuning with tools for efficient optimization and experiment management.
Training pipelines	Supports complex ML workflows with multiple steps.	Not applicable.

Key differences between Vertex AI custom training and Ray on Vertex AI

Vertex AI custom training is a broader service managing various training methods, while Ray on Vertex AI specifically uses the Ray distributed computing framework.

	Vertex AI Training	Ray on Vertex AI
Focus	Primarily focused on model development and training. Manages various training methods.	Designed for general-purpose distributed Python applications, including data processing, model serving, and scaling training.
Underlying framework	Tied to the distributed capabilities of specific ML frameworks (for example, TensorFlow, PyTorch).	Uses Ray as the central distributed computing framework. Handles task distribution regardless of the underlying ML framework used within Ray tasks.
Resource configuration	Configure resources for individual training jobs.	Manage Ray clusters on Vertex AI clusters; Ray handles the distribution of tasks within the cluster.
Configuration of distribution	Configure the number and types of replicas for a specific training job.	Configure the size and composition of the Ray cluster on Vertex AI; Ray's scheduler dynamically distributes tasks and actors across available nodes.
Scope of distribution	Generally focused on a single, potentially long-running training job.	Provides a more persistent and general-purpose distributed computing environment where you can run multiple distributed tasks and applications over the lifecycle of the Ray cluster.

Summary

If you need to use the power of distributed computing with the Ray framework within the Google Cloud environment, Ray on Vertex AI is the service to use. Ray on Vertex AI can be considered a specific tool within the larger Vertex AI ecosystem, particularly useful for highly scalable and distributed workloads.

If you need a more general-purpose managed platform for various model training approaches, including automated options, custom code execution, and hyperparameter tuning, the broader Vertex AI custom training services are useful.

Overview of custom training options in Vertex AI Stay organized with collections Save and categorize content based on your preferences.

Compare Vertex AI custom training and Ray on Vertex AI

Key differences between Vertex AI custom training and Ray on Vertex AI

Summary

Overview of custom training options in Vertex AI