You can view charts in the Metrics tabs of the BigQuery Engine for Apache Flink deployments and jobs pages in the Google Cloud console.
Deployment metrics:
Job metrics:
- Number of records in per second
- Number of records out per second
- Input watermark
- Output watermark
- Autoscaling
Autoscaling metrics:
Support and limitations
To export custom metrics from your BigQuery Engine for Apache Flink job to
Cloud Monitoring, the
Managed Flink Default Workload Identity
must have the IAM role roles/monitoring.metricWriter
.
Access job metrics
- Sign in to the Google Cloud console.
- Select your Google Cloud project.
- Open the navigation menu and select BigQuery Engine for Apache Flink.
- Click Deployments or Jobs.
- In the deployments or jobs list, click the name.
- Click the Metrics tab.
To access additional information in the job metrics charts, click
Explore data.Use Cloud Monitoring
BigQuery Engine for Apache Flink is fully integrated with Cloud Monitoring. Use Metrics Explorer to build queries and adjust the timespan of the metrics.
For instructions about using Metrics Explorer, see Use Cloud Monitoring with BigQuery Engine for Apache Flink.
View in Metrics Explorer
You can view the BigQuery Engine for Apache Flink metrics charts in Metrics Explorer, where you can build queries and adjust the timespan of the metrics.
To view the BigQuery Engine for Apache Flink charts in Metrics Explorer, in the Metrics view, open
More chart options, and then click View in Metrics Explorer.When you adjust the timespan of the metrics, you can select a predefined duration or select a custom time interval to analyze your job.
Deployment metrics
The deployment details Metrics tab includes the following charts.
CPU utilization
CPU utilization is the amount of CPU used by the deployment at each point in time. Use this chart to track changes in CPU usage over time.
Memory utilization
Memory utilization is the amount of memory used by the deployment at each point in time. Use this chart to track changes in memory usage over time.
Job metrics
The job details Metrics tab includes the following charts. Use these metrics to monitor and debug your BigQuery Engine for Apache Flink jobs.
Number of records in per second
Number of records in per second is the number of records each operator in the job is receiving at each point in time. The data is split based on the operator, with each operator having a separate line on the graph.
This metric shows whether the job is running and processing records.
- Refer to this chart when you don't see data in downstream systems or if you have a stale input watermark.
- Use this metric to verify whether the job is ingesting records at the expected rate.
Number of records out per second
Number of records out per second is the number of records each operator in the job is sending at each point in time. The data is split based on the operator, with each operator having a separate line on the graph.
This metric shows whether the job is outputting records.
- Refer to this chart when you don't see data in downstream systems or if you have a stale output watermark.
- Use this metric to verify whether the job is processing records at the expected rate.
Input watermark
Input watermark is the most recent watermark received by each operator, in milliseconds since the Unix epoch (00:00:00 UTC on January 1, 1970), ignoring leap seconds. The data is split based on the operator, with each operator having a separate line on the graph.
This metric confirms whether the job is making progress. A healthy watermark increases with time.
- If the input and output watermarks are stale, the job processing might be stuck.
- This metric indicates when a job is stuck and where the job becomes stuck.
Output watermark
Output watermark is the most recent watermark outputted by each operator, in milliseconds since the Unix epoch (00:00:00 UTC on January 1, 1970), ignoring leap seconds. The data is split based on the operator, with each operator having a separate line on the graph.
This metrics confirms whether the job is making progress. A healthy watermark increases with time.
- This metric indicates when a job is stuck and where the job becomes stuck.
- If the input and output watermarks are stale, the job processing might be stuck.
- If the input is progressing but the output watermark is stale, the job is ingesting data but not outputting data.
Autoscaling metrics
In the job metrics tab, the autoscaling charts provide information about the autoscaling behavior of the job.
Current parallelism
The Current parallelism chart shows the number of task slots the job is using at any point in time. You can use this chart to understand whether the job is scaling up or down.
Recommended parallelism
The Recommended parallelism chart shows the number of task slots that the autoscaler recommends. When autoscaling is enabled for a job, BigQuery Engine for Apache Flink tries to allocate a number of task slots equal to the recommended parallelism. The current number of task slots is shown by the current parallelism metric. This value might lag behind recommended parallelism.
Recommended parallelism is always greater than or equal to the minimum parallelism, and always less than or equal to the maximum parallelism.
If the recommended parallelism consistently stays close to the maximum, consider updating the job with a higher maximum parallelism. In response, the autoscaler might raise the recommended parallelism, to take advantage of the additional slots. For more information, see Update autoscaling.
Maximum parallelism
The Maximum parallelism chart shows the maximum number of task slots available to the job at any point in time.
Minimum parallelism
The Minimum parallelism chart shows the minimum number of task slots available to the job at any point in time.