Customize the Dataflow monitoring dashboard

This page shows how to customize the Dataflow project monitoring dashboard, by adding a graph that queries Cloud Monitoring metrics.

The project monitoring dashboard lets you monitor a collection of jobs and see their overall health and performance. The dashboard contains a default set of charts that are useful for most workloads. By customizing the dashboard, you can add charts that are specific to your business requirements.

Example scenario

To show how you might customize the dashboard, assume that an organization wants to track the estimated cost of the top 25 jobs in their project.

For this scenario, assume that the jobs have the following characteristics:

Choose metrics

The following metrics are needed to estimate the cost of a job, given the assumptions listed the previous section:

ResourceRelevant metric
CPU (vCPU per hour) job/total_vcpu_time
Memory (GB per hour) job/total_memory_usage_time
Data processed during shuffle (GB) job/total_streaming_data_processed
Persistent Disk usage (GB per hour, Standard and SSD) job/total_pd_usage_time

For more information, see Dataflow pricing.

To learn which resources are available in Monitoring, see Monitored resource types.

Write a PromQL query

To query metrics from Cloud Monitoring, use Prometheus Query Language (PromQL). This section shows how to write a PromQL query for the scenario by building it up from smaller clauses. This section doesn't assume any prior knowledge of PromQL. For more information, see PromQL in Cloud Monitoring.

Map metric names to PromQL strings

To use Monitoring metrics in PromQL queries, you must map the metric name to a PromQL string, as follows:

Metric namePromQL string
job/total_vcpu_time dataflow_googleapis_com:job_total_vcpu_time
job/total_memory_usage_time dataflow_googleapis_com:job_total_memory_usage_time
job/total_streaming_data_processed dataflow_googleapis_com:job_total_streaming_data_processed
job/total_pd_usage_time dataflow_googleapis_com:job_total_pd_usage_time

For more information, see Mapping Cloud Monitoring metrics to PromQL.

Build the query

To get the estimated cost, calculate the prices for each component based on the most current value of each metric. The metrics are sampled every 60 seconds, so to get the latest value, use a 1-minute windowing function and take the maximum value within each window.

  • To get the estimated CPU cost, first convert job/total_vcpu_time from seconds to hours. Multiply by CPU price per vCPU per hour.

    # ((vCPU time)[Bucket 1m] / Seconds per hour * vCPU Price)
    max_over_time(dataflow_googleapis_com:job_total_vcpu_time[1m]) / 3600 * CPU_PRICE
    

    This formula gives the estimated CPU cost for all jobs in the project. To get the estimated CPU cost per job, use the sum aggregation operator and group by job ID.

    sum(
      max_over_time(dataflow_googleapis_com:job_total_vcpu_time[1m]) / 3600 * CPU_PRICE
    ) by (job_id)
    
  • To get the estimated memory cost, convert job/total_memory_usage_time from seconds to hours. Multiply by memory price per GB per hour and group by job ID.

    #((Memory time)[Bucket 1m] / Seconds per hour) * Memory Price
    sum(
      max_over_time(dataflow_googleapis_com:job_total_memory_usage_time[1m]) / 3600 * MEMORY_PRICE
    ) by (job_id)
    
  • To get the estimated shuffle cost, convert job/total_streaming_data_processed from bytes to GB. Multiply by the price of data processed during shuffle per GB and group by job ID.

    # Shuffle Billing. Reported once every 60 seconds, measured in bytes.
    # Formula: (Shuffle Data)[Bucket 1m] / (Bytes in GB) * (Shuffle Price)
    sum(
      max_over_time(
        dataflow_googleapis_com:job_total_streaming_data_processed[1m]
     ) / 1000000000 * SHUFFLE_PRICE
    ) by (job_id)
    
  • To get the estimated Persistent Disk usage cost, indexing on the storage_type label to separate by disk type (Standard or SSD). Convert each value from GB-seconds to GB-hours and group by job ID.

    # Formula: ((Standard PD time)[Bucket 1m] / Seconds per hour) * Standard PD price
    sum(
      max_over_time(
        dataflow_googleapis_com:job_total_pd_usage_time{storage_type="HDD"}[1m]
      ) / 3600 * STANDARD_PD_PRICE
    ) by (job_id)
    
    # Formula: ((SSD PD time)[Bucket 1m] / Seconds per hour) * SSD PD price
    sum(
      max_over_time(
        dataflow_googleapis_com:job_total_pd_usage_time{storage_type="SSD"}[1m]
      ) / 3600 * SSD_PD_PRICE
    ) by (job_id)
    
  • Summing the previous values gives the estimated cost per job. To get the top 25 jobs, use a top K filter:

    topk(25,
      # Sum the individual values.
    )
    

Write the complete query

The following shows the complete query:

topk(25,
sum(
  max_over_time(
    dataflow_googleapis_com:job_total_vcpu_time[1m]) / 3600 * CPU_PRICE
  )
  by (job_id) +
sum(
  max_over_time(
    dataflow_googleapis_com:job_total_memory_usage_time[1m]) / 3600 * MEMORY_PRICE
  )
  by (job_id) +
sum(
  max_over_time(
    dataflow_googleapis_com:job_total_streaming_data_processed[1m]) / 1000000000 * SHUFFLE_PRICE
  )
  by (job_id) +
sum(
  max_over_time(
    dataflow_googleapis_com:job_total_pd_usage_time{storage_type="HDD"}[1m]) / 3600 * STANDARD_PD_PRICE
  )
  by (job_id) +
sum(
  max_over_time(
    dataflow_googleapis_com:job_total_pd_usage_time{storage_type="SSD"}[1m]) / 3600 * SSD_PD_PRICE
  )
  by (job_id)
)

Replace the following variables with values from the Dataflow pricing page for your region.

  • CPU_PRICE: CPU price, per vCPU per hour
  • MEMORY_PRICE: Memory price, per GB per hour
  • SHUFFLE_PRICE: Shuffle price, per GB
  • STANDARD_PD_PRICE: Standard Persistent Disk price, per GB per hour
  • SSD_PD_PRICE: SSD Persistent Disk price, per GB per hour

Edit the dashboard

If you didn't previously create a custom dashboard for Dataflow in this project, create one as follows:

  1. In the Google Cloud console, go to the Dataflow > Monitoring page.

    Go to Monitoring

  2. In the Predefined drop-down, click Customize dashboard.

  3. Optional: Enter a name for the dashboard in the Custom view name box.

If you previously created a custom dashboard, perform the following steps to edit the dashboard:

  1. In the Google Cloud console, go to the Dataflow > Monitoring page.

    Go to Monitoring

  2. In the Predefined drop-down, select the custom dashboard.

  3. Click Edit dashboard.

Add a metrics chart

To add a metrics chart to the dashboard, perform the following steps:

  1. Click Add widget.
  2. In the Add widget pane, select Metric.
  3. In the Widget title box, enter a title for the chart, such as Estimated cost (top 25).
  4. Click PROMQL
  5. Paste in the PromQL query shown previously.
  6. Click Run query.
  7. Click Apply.
  8. Optional: Drag the chart to position it on the dashboard.

What's next