Troubleshoot high database load with AI assistance

This document describes how you can use AI assistance in AlloyDB for PostgreSQL to troubleshoot high database load in AlloyDB. You can use the AI assistance capabilities of AlloyDB and Gemini Cloud Assist to investigate, analyze, obtain recommendations, and finally implement those recommendations to optimize your queries in AlloyDB.

By accessing the Query insights dashboard in the Google Cloud console, you can analyze your database and troubleshoot events when your system experiences a higher database load than average. AlloyDB uses the 24 hours of data that occurs prior to your selected time range to calculate the expected load of your database. You can look into the reasons for the higher load events and analyze the evidence behind reduced performance. Finally, AlloyDB provides recommendations for optimizing your database to improve performance.

Before you begin

To troubleshoot high database load with AI assistance, do the following:

  1. Review limitations with AI-assisted troubleshooting.
  2. Enable AI-assisted troubleshooting.

Required roles and permissions

For the roles and permissions required to troubleshoot high database load with AI assistance, see Monitor and troubleshoot with AI.

Use AI assistance

To use AI assistance with troubleshooting high database load, go to the Instance Overview page or the Query insights dashboard in the Google Cloud console.

Instance overview page

Troubleshoot high database load with AI assistance in the Instance overview page by using the following steps:

  1. In the Google Cloud console, go to the Clusters page.
  2. Go to Clusters
  3. From the list of clusters and instances, click an instance.
  4. In the Overview page, from the Chart menu, select a metric for the database. You can select any metric.
  5. Optional: To select a specific analysis time period, use the Time range filter to select either 1 hour, 6 hours, 1 day, 7 days, 30 days or a custom range .
  6. Database performance chart on the Overview page that shows CPU utilization over a 24 hour time period and
    Analyze database performance.

    You can zoom in to specific sections of the chart where you notice areas of high load that you want to analyze. For example, an area of high load might display CPU utilization levels closer to 100%. To zoom in, click and select a portion of the chart.

    Zoom in to database performance chart.
  7. Click Analyze database performance to start troubleshooting high database load with AI assistance. This generates the Analyzing database load page.

Query insights dashboard

Troubleshoot high database load with AI assistance in the Query insights dashboard using the following steps:

  1. In the Google Cloud console, go to the Clusters page.
  2. Go to Clusters
  3. From the list of clusters and instances, click an instance.
  4. Click Query insights to open the Query insights dashboard.
  5. Optional: Use the Time range filter to select either 1 hour, 6 hours, 1 day, 7 days, 30 days or a custom range.
  6. Database Load chart on the Query insights chart that shows query latency over a 24 hour time period and an Analyze database performance button.

    You can zoom in to specific sections of the chart where you notice areas of higher database load by query execution time. To zoom in, click and select a portion of the chart.

  7. In the Database load chart, click the Analyze database performance to start troubleshooting high database load with AI assistance. This generates the Analyzing database load page.

Analyze high database load

Using AI assistance, you can analyze and troubleshoot the details of your database load.

In the Analyzing database load page, you can view following details for your AlloyDB instance:

  • Analysis time period
  • CPU utilization (p99)
  • Memory utilization (p99)

AlloyDB displays a Transactions/sec chart where you can look at the transactional activity during the selected time period. You can check for sudden spikes in activity during a particular time period.

Analysis time period

AlloyDB analyzes your database for the time period that you select in your database load chart from the Query insights dashboard or the Instance overview page. If you select a time period of less than 24 hours, then AlloyDB analyzes the entire time period. If you select a time period greater than 24 hours, then AlloyDB selects only the last 24 hours of the time period for analysis.

To calculate the baseline performance analysis of your database, AlloyDB includes 24 hours of a baseline time period in its analysis time period. If your selected time period occurs on a day other than Monday, then AlloyDB uses a baseline time period of the 24 hours previous to your selected time period. If your selected time period occurs on a Monday, then AlloyDB uses a baseline time period of the 7th day previous to your selected time period.

Situation

When AlloyDB starts the analysis, AlloyDB checks for significant changes in the following key metrics:

  • Queries per second (QPS)
  • CPU
  • Memory
  • Disk I/O

AlloyDB compares the baseline aggregated data for your database within the performance data of your analysis time window. If AlloyDB detects a significant change in threshold for a key metric, then AlloyDB indicates a possible situation with your database. The identified situation might explain a root cause for the high load on your database over the selected time period.

For example, one situation for why your database is experiencing high load might be identified as Lock contention.

During analysis, AlloyDB might determine there's been a significant increase in lock-wait ratio. AlloyDB might list other situations where key metrics indicate a significant increase. For example, you might also see the following situations listed:

  • Contention on system resources
  • Insufficient buffer
  • Excessive logging

Situation and evidence for Cloud SQL for PostgreSQL

Evidence

For each situation, AlloyDB provides a list of evidence to support the finding. AlloyDB bases evidence on metrics gathered from the instance.

Each situation has supporting evidence that's used to detect anomalies in system performance. AlloyDB detects an anomaly when system performance surpasses certain thresholds or meets specific time-bound criteria. AlloyDB defines these thresholds or criteria for each situation.

To support the situation of Lock contention, you might see the following pieces of evidence:

  • Lock wait ratio: A 40786.04% increase in lock wait ratio compared to baseline observation period detected.

For additional information retrieved during analysis, click each situation.

Recommendations

Based on all of the situations analyzed, AlloyDB provides you with one or more actionable recommendations to help remediate the problems of your high database load. AlloyDB presents the recommendations with a cost-benefit analysis so you can make an informed decision on whether to implement the recommendation.

For some situations, based on the analysis, there might not a recommendation.

Table of recommendations

For example, you might receive the following recommendation:

  • Identify blockers: Identify potential blocking queries and review them for optimization opportunities.

To find out how to implement this first recommendation, click the Learn more link.

If you want to continue troubleshooting or get more assistance with system performance, then you can also open Gemini Cloud Assist. For more information, see Monitor and troubleshoot with AI assistance.

What's next