Monitor your SQL query results with an alerting policy

This document explains how to create an alerting policy to monitor the results of a query you run in Log Analytics. These queries are written in SQL and they must query a log view. The alerting policy notifies you when the query result satisfies conditions that you specify. For example, you could configure an alerting policy so that you're notified when at least 25% of the log entries in a certain period have a severity of ERROR.

Alerting policies that you create from the Log Analytics page run on a BigQuery engine. Therefore, the data being queried must be accessible through a linked BigQuery dataset. For this reason, these SQL queries can only query log views. They can't query analytics views.

For general information about Log Analytics, see Query and analyze logs with Log Analytics.

How alerting policies work

An alerting policy describes the circumstances under which you want to be alerted and how you want to be notified about an incident. There are three different approaches that you can use to get notified when content or patterns appear in your log data:

To scan individual log entries for a specific phrase, create a log-based alerting policy. Use these alerting policies when you want to be notified about things like security-related events.
To monitor events in your log entry data, you can create a log-based metric and then create an alerting policy to monitor the metric. These types of alerting policies are effective when you want to monitor trends in log entry data over time. However, they aren't as effective if you expect only a few events.
To monitor aggregate analysis of your log entry data, combine Log Analytics with alerting policies. In this scenario, you upgrade a log bucket to use Log Analytics and create a linked BigQuery dataset for that log bucket. Next, you use Log Analytics, which supports SQL queries, to query a log view on the log bucket. Finally, you create the alerting policy to monitor the SQL-query results. This type of alerting policy is called a SQL-based alerting policy.

SQL-based alerting policies are most effective for evaluating exact values over multiple log entries. If you want to evaluate individual log entries, then create a log-based alerting policy.

The remainder of this document describes how to use SQL-based alerting policies.

Alerting policy components

A SQL-based alerting policy contains a condition and a schedule:

The condition contains the query, which is a SQL query that queries a log view. The condition also defines the circumstances under which the query result causes Monitoring to create an incident.
The schedule defines how frequently the alerting policy runs its query. The schedule also defines the size of the lookback window, which is a filter that selects only those log entries that have been received since the previous time the query was evaluated. For example, if you set the schedule to 60 minutes, then the query is run every 60 minutes using a lookback window that selects the most-recent 60 minutes of log entries.

Alerting policies also contain a list of notification channels. When the condition of the alerting policy is met, Cloud Monitoring creates an incident and then sends notifications about the incident through these channels. An incident is a record of the data that caused the condition to be met along with other relevant information. This information can help you troubleshoot the issues that caused the incident. You can view the incident by using the Google Cloud console.

Evaluation types for SQL-based alerting policies

Conditions that monitor a SQL query result support two types of evaluation:

Row count threshold: The condition is met when the number of rows in the query result is greater than, equal to, or less than a threshold value.

For example, suppose you want to get notified when more than 50 log entries in the lookback window have a severity greater than 200. You create a query that reports log entries whose severity is greater than 200. You then configure a condition, select the Row count threshold, and set the threshold to 50.
Boolean: The condition is met when a specific boolean column in the query result table contains any row with a value of true.

For example, suppose you want to get notified when more than 25% of the log entries in the lookback window have a severity of ERROR. You create a query that computes the percentage of log entries whose severity level is ERROR. The query results writes true to the notify column when that percentage exceeds 25%. Next, you create a condition, set the type to Boolean, and configure the condition to monitor the notify column.

Alerting policies that monitor a SQL query result must have only one condition.

Alerting policies and BigQuery

When an alerting policy runs a SQL query, that query is run using reserved BigQuery slots in the Google Cloud project where the alerting policy is defined. For more information, see Work with slot reservations.

For an alerting policy to use reserved BigQuery slots to query a log view, the log bucket that hosts the log view must be configured to have a linked BigQuery dataset. Linked datasets let BigQuery read the data in the log bucket, and they let you perform BigQuery functions on the data returned by your SQL query.

Evaluated log entries

For a log entry to be evaluated by the SQL query of an alerting policy, both of the following must be true:

The log entry's received timestamp, which records when the log entry was received by Cloud Logging, must be within the lookback window of the alerting policy.
The log entry's timestamp, which records when the log entry was generated, must be within 15 minutes of the lookback window.

For example, your SQL-based alerting policy has a 60-minute lookback window. Log Analytics runs the SQL query of the alerting policy at 1:30 PM. To be included in the query, a log entry must match both of the following criteria:

The received timestamp must be between 12:30 PM and 1:30 PM.
The timestamp must be between 12:15 PM and 1:45 PM.

When you run a query from the Log Analytics interface, all log entries in the selected time range are evaluated based on the log entry's timestamp.

Lookback window and incident propagation time

When an alerting policy is scheduled to evaluate its condition, Log Analytics delays the SQL query execution by five minutes to provide time for Cloud Logging to index the log entries received during the lookback window. For example, if the alerting policy uses a lookback window that ends at 2:00 PM, then Log Analytics doesn't execute the SQL query until 2:05 PM.

If the alert condition is met after the query is executed, then it can take up to two additional minutes for the incident to propagate through the system.

Query failures

Queries issued by SQL-based alerting policies can fail for various reasons, including the following:

The Monitoring Service Account no longer exists or it no longer has the necessary permissions to read the log data that is being queried.
The query execution time exceeds fives minutes.
An internal error occurs.

A failed query generates a log entry containing the alert policy ID and the error status. You can use a log-based alerting policy to create an alert when an error is logged.

Before you begin

This section assumes that you have upgraded your log bucket to use Log Analytics and that you can query and view your log data by using the Log Analytics page. It also assumes that you have already created a linked BigQuery dataset for your log bucket.

Before you create a SQL-based alerting policy, complete the following steps:

To get the permissions that you need to create and manage SQL-based alerting policies, ask your administrator to grant you the following IAM roles:
- Monitoring Editor (roles/monitoring.editor) on your project
- Logging SqlAlertWriter (roles/logging.sqlAlertWriter) on your project
Verify that the Monitoring Service Account exists and that it has the following roles:
1. Monitoring Service Agent (roles/monitoring.notificationServiceAgent) on your project.
2. BigQuery Data Viewer (roles/bigquery.dataViewer) on your linked dataset.
If the Monitoring Service Account doesn't exist, then see Troubleshoot: No Monitoring Service Account.
To enable your SQL-based alerting policies to run on reserved BigQuery slots, do the following:
1. Configure reserved BigQuery slots.
2. Assign reserved slots to your Google Cloud project.
Configure the notification channels that you want to use to receive any notifications for incidents. For redundancy purposes, we recommend that you create multiple types of notification channels. For more information, see Create and manage notification channels.

Create a SQL-based alerting policy

To create a SQL-based alerting policy, do the following:

Google Cloud console

In the Google Cloud console, go to the Log Analytics page:
Go to Log Analytics

If you use the search bar to find this page, then select the result whose subheading is Logging.
On the Log Analytics page, in the query editor, enter a SQL query for a log view.

Note: When you configure a query for an alerting policy, don't use the WHERE clause to filter log entries by timestamp. When Log Analytics evaluates the query, it automatically applies the correct timestamp filter to each evaluated log entry.

For more information about writing SQL queries for log views, see Query a log view.
On the toolbar, click Run on BigQuery.

Log Analytics runs your query on the BigQuery engine and displays the results in the Results table.

If Run on BigQuery isn't shown, then click Select query engine and then click BigQuery. The Run query button changes to Run on BigQuery.
On the Results table of the Log Analytics page, click Create alert.

The Log Analytics page shows the Create sql alert policy window, which shows your query under the SQL query section.
In the Alert condition section, configure the condition and schedule of your alerting policy.
Configure the alert details of your alerting policy.
1. Add notification channels and configure notification content, such as a custom subject line.
2. Optional: Add alerting policy labels and documentation.
3. Click Next.
Review your alerting policy and then create it by clicking Save.

Cloud Monitoring API

Use the alertPolicies.create method to programmatically create alerting policies. The Condition type of your alerting policy must be conditionSql, which is an instance of SqlCondition. This condition type allows the conditions of your alerting policy to be defined with SQL.

To define the schedule, set a periodicity value for one of the minutes, hours, or days fields. For example, if you want the query to run every 12 hours, then set the periodicity of the hours field to 12.

To define the condition, use the following fields:

boolean_test: Configures the alerting policy so that its condition is met when a row of a boolean column in the query result table contains a true value.
row_count_test: Configures the alerting policy so that its condition is met when the number of rows in the query result table meets a certain threshold.

For a complete list of fields and definitions, see SqlCondition in the Cloud Monitoring API documentation.

For more information about the Monitoring API for alerting policies, see Managing alerting policies by API.

Terraform

Install and configure Terraform for you project. For App Hub configurations, select the App Hub host project or the app-enabled folder's management project.
In the Cloud Shell, go to the directory that contains your Terraform configuration.
In your Terraform configuration, configure an instance of the google_monitoring_alert_policy resource, including condition_sql.
In the Cloud Shell, enter terraform apply.

To modify your alerting policy, make your edits and then re-apply the Terraform configuration. For more information, see Manage alerting policies with Terraform.

For general information about using Google Cloud with Terraform, see Terraform with Google Cloud.

Limitations

You can have one condition per SQL-based alerting policy.
SQL-based alerting policies can't query an analytics view.
Queries issued by SQL-based alerting policies fail when their execution time exceeds five minutes.
There is a delay of up to seven minutes, plus the query execution time, between when a query is scheduled and when an incident is created.

For a full list of limits associated with alerting policies, see Monitoring limits.

Pricing

For information about pricing, see the following documents:

BigQuery pricing
The Cloud Logging and Cloud Monitoring sections of the Google Cloud Observability pricing page.

What's next

For information about creating charts from your Log Analytics data, see Chart query results with Log Analytics.
For information about managing incidents for these alerting policies, see Incidents for SQL-based alerting policies.