Halaman ini menunjukkan cara mengonfigurasi Google Kubernetes Engine (GKE) untuk mengumpulkan log dan metrik untuk cluster Ray yang berjalan di Google Kubernetes Engine (GKE), serta cara melihat log dan metrik Ray di Cloud Logging dan Cloud Monitoring.
Jika ingin menggunakan Google Cloud CLI untuk tugas ini,
instal lalu
lakukan inisialisasi
gcloud CLI. Jika sebelumnya Anda telah menginstal gcloud CLI, dapatkan versi terbaru dengan menjalankan gcloud components update.
Anda harus mengaktifkan logging sistem dan beban kerja di cluster GKE yang ada sebelum mengaktifkan pengumpulan log untuk cluster Ray.
Jika Anda mengaktifkan pengumpulan log untuk cluster Ray di cluster GKE yang sudah ada, GKE hanya mengumpulkan log dari Pod Ray yang baru dibuat, bukan dari Pod Ray yang sudah ada.
Untuk cluster GKE Standard, Anda harus mengaktifkan
Google Cloud Managed Service for Prometheus agar dapat mengaktifkan pengumpulan metrik untuk cluster Ray. Untuk cluster Autopilot, Google Cloud Managed Service for Prometheus diaktifkan secara default.
Anda tidak boleh menentukan volume bernama ray-logs di container Ray mana pun dalam cluster Ray. Jika tidak, GKE tidak akan mengumpulkan log.
Mengaktifkan pengumpulan log untuk cluster Ray
Anda dapat mengaktifkan pengumpulan log untuk cluster Ray dengan cluster GKE Autopilot atau Standard yang baru atau yang sudah ada. Log Ray yang dikumpulkan GKE dari cluster Ray diklasifikasikan sebagai log container. Ini mencakup semua log yang dihasilkan oleh header cluster Ray dan node pekerja.
Anda dapat mengaktifkan pengumpulan log untuk cluster Ray menggunakan konsol Google Cloud atau gcloud CLI.
Konsol
Buka halaman Google Kubernetes Engine di konsol Google Cloud .
LOCATION: lokasi cluster baru, misalnya, us-central1.
Anda dapat mengaktifkan pengumpulan log untuk cluster Ray di cluster yang ada dengan
menggunakan perintah
gcloud container clusters update
dengan opsi --addons=RayOperator dan opsi
--enable-ray-cluster-logging.
Melihat log Ray
Anda dapat melihat log yang dikumpulkan dari cluster Ray yang berjalan di GKE menggunakan Logging.
Buka halaman Cloud Logging di konsol Google Cloud .
Anda dapat mengaktifkan pengumpulan metrik untuk cluster Ray dengan cluster GKE Autopilot atau Standard yang baru atau yang sudah ada.
Setelah Anda mengaktifkan pengumpulan metrik untuk cluster Ray, GKE
mengumpulkan metrik dari cluster Ray yang ada dan cluster Ray baru.
GKE mengumpulkan semua metrik sistem yang diekspor oleh Ray dalam format Prometheus.
Anda dapat mengaktifkan pengumpulan metrik untuk cluster Ray menggunakan
konsolGoogle Cloud atau gcloud CLI.
Konsol
Buka halaman Google Kubernetes Engine di konsol Google Cloud .
LOCATION: lokasi cluster baru, misalnya, us-central1.
Anda dapat mengaktifkan pengumpulan log untuk cluster Ray di cluster yang ada dengan
menggunakan perintah
gcloud container clusters update
dengan opsi --addons=RayOperator dan opsi
--enable-ray-cluster-monitoring.
Melihat metrik Ray
Google Cloud Managed Service for Prometheus menyediakan dasbor Ringkasan Ray di GKE yang telah dikonfigurasi sebelumnya dan menawarkan tampilan terpusat
dari metrik Ray utama. Ini adalah cara yang direkomendasikan
untuk memulai pemantauan cluster Ray di GKE dengan cepat.
Di kolom Pilih metrik, Anda dapat menelusuri metrik khusus Ray.
Metrik ini biasanya diawali dengan prometheus/ray_. Contohnya antara lain
prometheus/ray_worker_cpu_seconds_total atau prometheus/ray_memory_bytes_max.
Anda dapat lebih menyaring penelusuran dengan memilih jenis resource yang sesuai (misalnya, k8s_pod, k8s_container) dan memfilter berdasarkan label yang relevan dengan cluster Ray Anda (misalnya, ray.io/cluster).
[[["Mudah dipahami","easyToUnderstand","thumb-up"],["Memecahkan masalah saya","solvedMyProblem","thumb-up"],["Lainnya","otherUp","thumb-up"]],[["Sulit dipahami","hardToUnderstand","thumb-down"],["Informasi atau kode contoh salah","incorrectInformationOrSampleCode","thumb-down"],["Informasi/contoh yang saya butuhkan tidak ada","missingTheInformationSamplesINeed","thumb-down"],["Masalah terjemahan","translationIssue","thumb-down"],["Lainnya","otherDown","thumb-down"]],["Terakhir diperbarui pada 2025-08-04 UTC."],[],[],null,["# Collect and view logs and metrics for Ray clusters on Google Kubernetes Engine (GKE)\n\n[Autopilot](/kubernetes-engine/docs/concepts/autopilot-overview) [Standard](/kubernetes-engine/docs/concepts/choose-cluster-mode)\n\n*** ** * ** ***\n\nThis page shows how to configure Google Kubernetes Engine (GKE) to collect logs\nand metrics for Ray clusters running on Google Kubernetes Engine (GKE), plus how to\nview Ray logs and metrics in Cloud Logging and Cloud Monitoring.\n\nFor more\ninformation on Ray and KubeRay, see\n[Ray on Google Kubernetes Engine (GKE) overview](/kubernetes-engine/docs/add-on/ray-on-gke/concepts/overview).\n\nBefore you begin\n----------------\n\nBefore you start, make sure that you have performed the following tasks:\n\n- Enable the Google Kubernetes Engine API.\n[Enable Google Kubernetes Engine API](https://console.cloud.google.com/flows/enableapi?apiid=container.googleapis.com)\n- If you want to use the Google Cloud CLI for this task, [install](/sdk/docs/install) and then [initialize](/sdk/docs/initializing) the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running `gcloud components update`. **Note:** For existing gcloud CLI installations, make sure to set the `compute/region` [property](/sdk/docs/properties#setting_properties). If you use primarily zonal clusters, set the `compute/zone` instead. By setting a default location, you can avoid errors in the gcloud CLI like the following: `One of [--zone, --region] must be supplied: Please specify location`. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.\n\n\u003c!-- --\u003e\n\n- [Enable the Ray operator for Google Kubernetes Engine (GKE)](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/enable-ray-on-gke).\n\n### Requirements and limitations\n\n- You must enable system and workload logging on an existing GKE cluster before you enable log collection for Ray clusters.\n- If you enable log collection for Ray clusters on an existing GKE cluster, GKE only collects logs from newly created Ray Pods, not from existing Ray Pods.\n- For Standard GKE clusters, you must enable Google Cloud Managed Service for Prometheus to enable metrics collection for Ray clusters. For Autopilot clusters, Google Cloud Managed Service for Prometheus is enabled by default.\n- You must **not** specify a volume named `ray-logs` in any Ray container in the Ray cluster. Otherwise, GKE won't collect logs.\n\nEnable log collection for a Ray cluster\n---------------------------------------\n\nYou can enable log collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters. The Ray\nlogs that GKE collects from Ray clusters are classified as\ncontainer logs. This includes all logs produced by the Ray cluster header and\nworker nodes.\n\nYou can enable log collection for Ray clusters using the Google Cloud console\nor the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable log collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-logging\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-logging` option.\n| **Note:** You might observe that, in GKE, the Ray Operator collects logs from Ray head and worker Pods (standard output and standard error) even when the **Enable log collection for Ray clusters** option is not selected. This behavior is expected because GKE, by default, automatically collects all workload logs written to standard output or standard error. The **Enable log collection for Ray clusters** checkbox specifically controls the collection of additional Ray-specific logs, separate from these default workload logs. To manage which logs are sent to Cloud Logging by default and to reduce logging volume, refer to the [About GKE logs](/kubernetes-engine/docs/concepts/about-logs#what_logs) page.\n\nView Ray logs\n-------------\n\nYou can view logs collected from Ray clusters running on GKE\nusing Logging.\n\n1. Go to the **Cloud Logging** page in the Google Cloud console.\n\n [Go to Cloud Logging](https://console.cloud.google.com/logs)\n2. Open the query editor and paste your expression into the query editor\n\n3. Click **Run query**\n\nYou can use the following examples queries in the Logs Explorer:\n\nEnable metrics collection for a Ray cluster\n-------------------------------------------\n\nYou can enable metrics collection for Ray clusters with new or existing\nAutopilot or Standard GKE clusters.\n\nAfter you enable metrics collection for Ray clusters, GKE\ncollects metrics from existing Ray clusters and new Ray clusters.\nGKE collects all system metrics exported by Ray in Prometheus\nformat.\n\nYou can enable metrics collection for Ray clusters using the\nGoogle Cloud console or the gcloud CLI. \n\n### Console\n\n1. Go to the **Google Kubernetes Engine** page in the Google Cloud console.\n\n [Go to Google Kubernetes Engine](https://console.cloud.google.com/kubernetes/list)\n2. Click add_box **Create** then in the Standard or Autopilot section, click **Configure**.\n\n3. From the navigation pane, under **Cluster** , click **Features**.\n\n4. In the **Operations** section, ensure the **System and Workloads**\n checkbox is selected.\n\n5. In the **AI and Machine Learning** section, select\n **Enable Ray Operator** and then select **Enable metrics collection for\n Ray clusters**.\n\n6. Click **Create**.\n\nFor Standard clusters, you must also enable\nGoogle Cloud Managed Service for Prometheus.\n\n### gcloud\n\nCreate a cluster using the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option: \n\n gcloud container clusters create \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e \\\n --location=\u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e \\\n --addons=RayOperator \\\n --enable-ray-cluster-monitoring\n\nReplace the following:\n\n- \u003cvar translate=\"no\"\u003eCLUSTER_NAME\u003c/var\u003e: the name of the new cluster.\n- \u003cvar translate=\"no\"\u003eLOCATION\u003c/var\u003e: the location of the new cluster, for example, us-central1.\n\nYou can enable log collection for Ray clusters on an existing cluster by\nusing the\n[`gcloud container clusters update`](/sdk/gcloud/reference/container/clusters/update)\ncommand with the `--addons=RayOperator` option and the\n`--enable-ray-cluster-monitoring` option.\n\nView Ray metrics\n----------------\n\nGoogle Cloud Managed Service for Prometheus provides a pre-configured\n**Ray on GKE Overview** dashboard that offers a centralized view\nof key Ray metrics. This is the recommended way\nto quickly get started with monitoring your Ray clusters on GKE.\n\n[Go to Ray on GKE Overview dashboard](https://console.cloud.google.com/monitoring/dashboards/integration/kuberay.ray-overview)\n\nThe dashboard is automatically populated when you [enable\nmetrics collection](/kubernetes-engine/docs/add-on/ray-on-gke/how-to/collect-view-logs-metrics#enable-metrics-collection) for your Ray cluster.\n\nAlternatively, if you want to explore individual metrics collected from Ray\nclusters running on GKE, follow these steps:\n\n1. Go to the **Metrics Explorer** page in the Google Cloud console.\n\n [Go to Metrics Explorer](https://console.cloud.google.com/monitoring/metrics-explorer)\n2. In the **Select a metric** field, you can search for Ray-specific metrics.\n These metrics are typically prefixed with `prometheus/ray_`. Examples include\n `prometheus/ray_worker_cpu_seconds_total` or `prometheus/ray_memory_bytes_max`.\n\n3. You can further refine your search by selecting the appropriate resource type\n (for example, `k8s_pod`, `k8s_container`) and filtering by labels relevant to\n your Ray cluster (for example, `ray.io/cluster`).\n\nWhat's next\n-----------\n\n- Learn about [Ray on Kubernetes](https://docs.ray.io/en/latest/cluster/kubernetes/index.html).\n- Explore the [KubeRay documentation](https://docs.ray.io/en/latest/cluster/kubernetes/getting-started.html)."]]