This page covers how to explore NVIDIA Data Center GPU Manager (DCGM) metrics associated with your Vertex AI Inference endpoints.
What is DCGM
NVIDIA Data Center GPU Manager (DCGM) is a set of tools from NVIDIA that let you manage and monitor NVIDIA GPUs. Vertex AI Inference automatically exports Vertex AI DCGM metrics to Cloud Monitoring if your endpoints utilize supported GPUs. Those metrics provide a comprehensive view of GPU utilization, performance, and health.
Prerequisites
Before you start, make sure your project has enabled Cloud Monitoring. See Enable the Monitoring API for more information.
Use DCGM metrics
To view DCGM metrics on Metrics Explorer, do the following:
Go to the Metrics Explorer page in the Google Cloud console.
Under Select a metric, select Prometheus Target.
Under Active metric categories, select Vertex.
Under Active metrics, select the desired metric.
Click Apply.
You can also query metrics using Grafana, or Prometheus API or UI.
Quota
DCGM metrics consume the Time series ingestion requests per minute quota of the Cloud Monitoring API. Before enabling the metrics packages, check your recent peak usage of that quota. If you are already approaching that quota limit, you can request a quota-limit increase.
Vertex AI DCGM metrics
The Cloud Monitoring metric names in this table must be prefixed with
prometheus.googleapis.com/. That prefix has been omitted from the
entries in the table.
Along with labels on the prometheus_target monitored resource, all collected
DCGM metrics on Vertex AI have the following labels attached to
them:
GPU labels:
gpu_model: the GPU device model, such asNVIDIA L4.gpu_uuid: the GPU device UUID.gpu_i_id: the NVIDIA Multi-Instance GPU (MIG) instance ID.
Vertex AI labels:
-
deployed_model_id: the ID of a deployed model which serves inference requests. model_display_name: the display name of a deployed model.-
replica_id: the unique ID corresponding to the deployed model replica (pod name). endpoint_id: the ID of a model endpoint.endpoint_display_name: the display name of a model endpoint.-
product: the name of the feature under Vertex AI. This is alwaysOnline Inference.
| PromQL metric name Cloud Monitoring metric name |
|
|---|---|
|
Kind, Type, Unit
Monitored resources | Description |
vertex_dcgm_fi_dev_fb_freevertex_dcgm_fi_dev_fb_free/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
Free Frame Buffer in MB. |
vertex_dcgm_fi_dev_fb_totalvertex_dcgm_fi_dev_fb_total/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
Total Frame Buffer of the GPU in MB. |
vertex_dcgm_fi_dev_fb_usedvertex_dcgm_fi_dev_fb_used/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
Used Frame Buffer in MB. |
vertex_dcgm_fi_dev_gpu_tempvertex_dcgm_fi_dev_gpu_temp/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
Current temperature readings for the device (in °C). |
vertex_dcgm_fi_dev_gpu_utilvertex_dcgm_fi_dev_gpu_util/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
GPU utilization (in %). |
vertex_dcgm_fi_dev_mem_copy_utilvertex_dcgm_fi_dev_mem_copy_util/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
Memory utilization (in %). |
vertex_dcgm_fi_dev_memory_tempvertex_dcgm_fi_dev_memory_temp/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
Memory temperature for the device (in °C). |
vertex_dcgm_fi_dev_power_usagevertex_dcgm_fi_dev_power_usage/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
Power usage for the device (in Watts). |
vertex_dcgm_fi_dev_sm_clockvertex_dcgm_fi_dev_sm_clock/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
SM clock frequency (in MHz). |
vertex_dcgm_fi_dev_total_energy_consumptionvertex_dcgm_fi_dev_total_energy_consumption/counter |
|
CUMULATIVE, DOUBLE, 1
prometheus_target
|
Total energy consumption for the GPU in mJ since the driver was last reloaded. |
vertex_dcgm_fi_prof_dram_activevertex_dcgm_fi_prof_dram_active/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The ratio of cycles the device memory interface is active sending or receiving data. |
vertex_dcgm_fi_prof_gr_engine_activevertex_dcgm_fi_prof_gr_engine_active/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The ratio of time the graphics engine is active. |
vertex_dcgm_fi_prof_nvlink_rx_bytesvertex_dcgm_fi_prof_nvlink_rx_bytes/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The rate of active NvLink rx (read) data in bytes including both header and payload. |
vertex_dcgm_fi_prof_nvlink_tx_bytesvertex_dcgm_fi_prof_nvlink_tx_bytes/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The rate of active NvLink tx (transmit) data in bytes including both header and payload. |
vertex_dcgm_fi_prof_pcie_rx_bytesvertex_dcgm_fi_prof_pcie_rx_bytes/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The rate of active PCIe rx (read) data in bytes including both header and payload. |
vertex_dcgm_fi_prof_pcie_tx_bytesvertex_dcgm_fi_prof_pcie_tx_bytes/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The rate of active PCIe tx (transmit) data in bytes including both header and payload. |
vertex_dcgm_fi_prof_pipe_fp16_activevertex_dcgm_fi_prof_pipe_fp16_active/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The ratio of cycles that the fp16 pipe is active. |
vertex_dcgm_fi_prof_pipe_fp32_activevertex_dcgm_fi_prof_pipe_fp32_active/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The ratio of cycles that the fp32 pipe is active. |
vertex_dcgm_fi_prof_pipe_fp64_activevertex_dcgm_fi_prof_pipe_fp64_active/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The ratio of cycles that the fp64 pipe is active. |
vertex_dcgm_fi_prof_pipe_tensor_activevertex_dcgm_fi_prof_pipe_tensor_active/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The ratio of cycles that any tensor pipe is active. |
vertex_dcgm_fi_prof_sm_activevertex_dcgm_fi_prof_sm_active/gauge |
|
GAUGE, DOUBLE, 1
prometheus_target
|
The ratio of cycles an SM has at least 1 warp assigned. |
Supported GPUs
All NVIDIA GPUs are supported, except the following, due to resource constraints:
What's next
- Learn more about the Metrics Explorer.