You can view charts in the Metrics tabs of the BigQuery Engine for Apache Flink deployments and jobs pages in the Google Cloud console.
Deployment metrics:
Job metrics:
- Number of records in per second
- Number of records out per second
- Input watermark
- Output watermark
- Autoscaling
Autoscaling metrics:
Support and limitations
To export custom metrics from your BigQuery Engine for Apache Flink job to
Cloud Monitoring, the
Managed Flink Default Workload Identity
must have the IAM role roles/monitoring.metricWriter
.
Access job metrics
- Sign in to the Google Cloud console.
- Select your Google Cloud project.
- Open the navigation menu and select BigQuery Engine for Apache Flink.
- Click Deployments or Jobs.
- In the deployments or jobs list, click the name.
- Click the Metrics tab.
To access additional information in the job metrics charts, click
Explore data.Use Cloud Monitoring
BigQuery Engine for Apache Flink is fully integrated with Cloud Monitoring. Use Metrics Explorer to build queries and adjust the timespan of the metrics.
For instructions about using Metrics Explorer, see Use Cloud Monitoring with BigQuery Engine for Apache Flink.
View in Metrics Explorer
You can view the BigQuery Engine for Apache Flink metrics charts in Metrics Explorer, where you can build queries and adjust the timespan of the metrics.
To view the BigQuery Engine for Apache Flink charts in Metrics Explorer, in the Metrics view, open
More chart options, and then click View in Metrics Explorer.When you adjust the timespan of the metrics, you can select a predefined duration or select a custom time interval to analyze your job.
Deployment metrics
The deployment details Metrics tab includes the following charts.
CPU utilization
CPU utilization is the amount of CPU used by the deployment at each point in time. Use this chart to track changes in CPU usage over time.
Memory utilization
Memory utilization is the amount of memory used by the deployment at each point in time. Use this chart to track changes in memory usage over time.
Job metrics
The job details Metrics tab includes the following charts. Use these metrics to monitor and debug your BigQuery Engine for Apache Flink jobs.
Number of records in per second
Number of records in per second is the number of records each operator in the job is receiving at each point in time. The data is split based on the operator, with each operator having a separate line on the graph.
This metric shows whether the job is running and processing records.
- Refer to this chart when you don't see data in downstream systems or if you have a stale input watermark.
- Use this metric to verify whether the job is ingesting records at the expected rate.
Number of records out per second
Number of records out per second is the number of records each operator in the job is sending at each point in time. The data is split based on the operator, with each operator having a separate line on the graph.
This metric shows whether the job is outputting records.
- Refer to this chart when you don't see data in downstream systems or if you have a stale output watermark.
- Use this metric to verify whether the job is processing records at the expected rate.
Input watermark
Input watermark is the most recent watermark received by each operator, in milliseconds since the Unix epoch (00:00:00 UTC on January 1, 1970), ignoring leap seconds. The data is split based on the operator, with each operator having a separate line on the graph.
This metric confirms whether the job is making progress. A healthy watermark increases with time.
- If the input and output watermarks are stale, the job processing might be stuck.
- This metric indicates when a job is stuck and where the job becomes stuck.
Output watermark
Output watermark is the most recent watermark outputted by each operator, in milliseconds since the Unix epoch (00:00:00 UTC on January 1, 1970), ignoring leap seconds. The data is split based on the operator, with each operator having a separate line on the graph.
This metrics confirms whether the job is making progress. A healthy watermark increases with time.
- This metric indicates when a job is stuck and where the job becomes stuck.
- If the input and output watermarks are stale, the job processing might be stuck.
- If the input is progressing but the output watermark is stale, the job is ingesting data but not outputting data.
Vertex metrics
You can view metrics for individual vertices (nodes) in the job graph. To view vertex metrics:
- In the jobs list, click the job name. The Graph tab displays the job graph.
- In the job graph, click the vertex. Vertex metrics are shown in the Vertex info panel.
The following metrics are shown.
Current parallelism
The number of task slots assigned to this vertex.
Input watermark
The last watermark this operator received, in milliseconds, since the Unix epoch, ignoring leap seconds.
Backlog elements
The number of elements in the operator's backlog. This metric is defined only for Apache Kafka sources.
State milliseconds per second
The number of milliseconds within the last second that this vertex was in each of the following states:
backpressured
. The vertex is waiting for downstream vertices to finish.busy
. The vertex is processing data.idle
. The vertex has no work to perform.
Because a vertex can contain subtasks, the total value might sum to more than 1000 milliseconds.
Input metrics
If the vertex has inputs, you can select from the following charts:
- Records in. The total number of records ingested by this vertex.
- Input bytes. The total number of bytes ingested by this vertex.
Output metrics
If the vertex has outputs, you can select from the following charts:
- Records out. The total number of records output by this vertex.
- Output bytes.The total number of bytes output by this vertex.
Autoscaling metrics
In the job metrics tab, the autoscaling charts provide information about the autoscaling behavior of the job.
Current parallelism
The Current parallelism chart shows the number of task slots the job is using at any point in time. You can use this chart to understand whether the job is scaling up or down.
Recommended parallelism
The Recommended parallelism chart shows the number of task slots that the autoscaler recommends. When autoscaling is enabled for a job, BigQuery Engine for Apache Flink tries to allocate a number of task slots equal to the recommended parallelism. The current number of task slots is shown by the current parallelism metric. This value might lag behind recommended parallelism.
Recommended parallelism is always greater than or equal to the minimum parallelism, and always less than or equal to the maximum parallelism.
If the recommended parallelism consistently stays close to the maximum, consider updating the job with a higher maximum parallelism. In response, the autoscaler might raise the recommended parallelism, to take advantage of the additional slots. For more information, see Update autoscaling.
Maximum parallelism
The Maximum parallelism chart shows the maximum number of task slots available to the job at any point in time.
Minimum parallelism
The Minimum parallelism chart shows the minimum number of task slots available to the job at any point in time.