This document contains sample queries over log entries that are stored in
log buckets that are upgraded to use Log Analytics.
On these buckets you can run SQL queries from the
Log Analytics page in the Google Cloud console. For more samples, see the
logging-analytics-samples
and the
security-analytics
GitHub repositories.
This document doesn't describe SQL or how to route and store log entries. For information about those topics, see the What's next section.
Before you begin
To use the queries shown in this document on the Log Analytics page, replace TABLE_NAME_OF_LOG_VIEW with table name for the log view that you want to query. This name has the format
project_ID.region.bucket_ID.view_ID
.To identify this name, go to the Log views list on the Log Analytics page, locate the log view, and then select Query. The Query pane is populated with a default query, which includes the table name for the log view that is queried. For information about how to access the default query, see Query a log view.
To use the queries shown in this document on the BigQuery Studio page, replace TABLE_NAME_OF_LOG_VIEW with the path to the table in the linked dataset. For example, to query the view
_AllLogs
on the linked datasetmydataset
that is in the projectmyproject
, set this field tomyproject.mydataset._AllLogs
:In the Google Cloud console, go to the BigQuery page:
You can also find this page by using the search bar.
To open the Log Analytics page, do the following:
-
In the Google Cloud console, go to the Log Analytics page:
If you use the search bar to find this page, then select the result whose subheading is Logging.
Optional: To identify the schema for log view, in the Log views list, find the view, and then select the name of the view.
The schema is displayed. You can use the Filter field to locate specific fields. You can't modify the schema.
-
Filter logs
SQL queries determine which entries in the log view to process, then they group these entries and perform aggregate operations. When no grouping and aggregation operation are listed, the result of the query includes the rows selected by the filter operation. The samples in this section illustrate filtering.
Filter by time
To set the time range of your query, we recommend that you
use the time-range selector. This selector is used automatically when a query
doesn't specify a timestamp
field in the WHERE
clause.
For example, to view the data for the past week, select Last 7 days from
the time-range selector. You can also use the time-range
selector to specify a start and end time, specify a time to view around, and
change time zones.
If you include a timestamp
field in the WHERE
clause, then the time-range
selector setting isn't used. The following example filters the data by
using the TIMESTAMP_SUB
function, which lets you specify a look-back
interval from the current time:
WHERE
timestamp > TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 1 HOUR)
For more information about how to filter by time, see Time functions and Timestamp functions.
Filter by resource
To filter by resource, add a resource.type
restriction.
For example, the following query reads the most recent hour of data, then
retains those rows whose resource type matches gce_instance
, and then sorts
and displays up to 100 entries:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
resource.type = "gce_instance"
ORDER BY timestamp ASC
LIMIT 100
Filter by severity
You can filter by a specific severity with a restriction like
severity = 'ERROR'
. Another option is to use the IN
statement
and specify a set of valid values.
For example, the following query reads the most recent hour of data, and
then retains only those rows that contain a severity
field whose value is
either 'INFO'
or 'ERROR'
:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
severity IS NOT NULL AND
severity IN ('INFO', 'ERROR')
ORDER BY timestamp ASC
LIMIT 100
The previous query filters by the value of the severity
field. However,
you can also write queries that filter by the numeric value of the log severity.
For example, if you replace the severity
lines with the following lines,
the query returns all log entries whose severity level is at least NOTICE
:
severity_number IS NOT NULL AND
severity_number > 200
For information about the enumerated values, see
LogSeverity
.
Filter by log name
To filter by a log name, you can add a restriction on the value of the
log_name
or the log_id
field. The log_name
field includes the resource
path. That is, this field has values like projects/myproject/logs/mylog
.
The log_id
field only stores the log name such as mylog
.
For example, the following query reads the most recent hour of data, then
retains those rows where the value in the log_id
field is
cloudaudit.googleapis.com/data_access
, and then sorts and displays the
results:
SELECT
timestamp, log_id, severity, json_payload, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
log_id = "cloudaudit.googleapis.com/data_access"
ORDER BY timestamp ASC
LIMIT 100
Filter by resource label
Most monitored resource descriptors define labels that are used to identify the specific resource. For example, the descriptor for a Compute Engine instance includes labels for the zone, project ID, and instance ID. When the log entry is written, values are assigned to each field. The following is such an example:
{
type: "gce_instance"
labels: {
instance_id: "1234512345123451"
project_id: "my-project"
zone: "us-central1-f"
}
}
Because the data type of the labels
field is JSON, including a restriction
like resource.labels.zone = "us-centra1-f"
in a query results in a syntax
error. To get the value of a field with a data type of JSON, use the function
JSON_VALUE
.
For example, the following query reads the most recent data and then retains
those rows where the resource is a Compute Engine instance that is
located in the us-central1-f
zone:
SELECT
timestamp, log_name, severity, JSON_VALUE(resource.labels.zone) AS zone, json_payload, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
resource.type = "gce_instance" AND
JSON_VALUE(resource.labels.zone) = "us-central1-f"
ORDER BY timestamp ASC
LIMIT 100
For information about all functions that can retrieve and transform JSON data, see JSON functions.
Filter by HTTP request
To filter the log view to only include log entries that correspond to a HTTP
request or reply, add a http_request IS NOT NULL
restriction:
SELECT
timestamp, log_name, severity, http_request, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
http_request IS NOT NULL
ORDER BY timestamp
LIMIT 100
The following query only includes rows that correspond to GET
or POST
requests:
SELECT
timestamp, log_name, severity, http_request, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
http_request IS NOT NULL AND
http_request.request_method IN ('GET', 'POST')
ORDER BY timestamp ASC
LIMIT 100
Filter by HTTP status
To filter by HTTP status, modify the WHERE
clause to require the
http_request.status
field be defined:
SELECT
timestamp, log_name, http_request.status, http_request, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
http_request IS NOT NULL AND
http_request.status IS NOT NULL
ORDER BY timestamp ASC
LIMIT 100
To determine the type of data stored in a field, view the schema or display
the field. The results of the previous query show that the
http_request.status
field stores integer values.
Filter by a field with a JSON type
To extract a value from a column whose data type is JSON, use the function
JSON_VALUE
.
Consider the following queries:
SELECT
json_payload
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
json_payload.status IS NOT NULL
and
SELECT
json_payload
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
JSON_VALUE(json_payload.status) IS NOT NULL
The previous queries test the value of the json_payload
field in the
log entry. Both queries
discard log entries that don't contain a field labeled json_payload
.
The difference between these two queries is the final line, which defines
what is tested against NULL
. Now, consider a log view that that has
two log entries. For one log entry, the json_payload
field has the following
form:
{
status: {
measureTime: "1661517845"
}
}
For the other log entry, the json_payload
field has a different structure:
{
@type: "type.googleapis.com/google.cloud.scheduler.logging.AttemptFinished"
jobName: "projects/my-project/locations/us-central1/jobs/test1"
relativeUrl: "/food=cake"
status: "NOT_FOUND"
targetType: "APP_ENGINE_HTTP"
}
Both of the previous log entries satisfy the restriction
json_payload.status IS NOT NULL
.
That is, the result of the first query includes both log entries.
However, when the restriction is JSON_VALUE(json_payload.status) IS NOT NULL
,
only the second log entry is included in the query result.
Filter by regular expression
To return the substring that matches a regular expression, use the function
REGEXP_EXTRACT
. The return type of this function is
either a STRING
or BYTES
.
The following query display the most recent log entries received, retains
those entries with a json_payload.jobName
field, and then displays the
portion of the name that starts with test
:
SELECT
timestamp, REGEXP_EXTRACT(JSON_VALUE(json_payload.jobName), r".*(test.*)$") AS name,
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
json_payload.jobName IS NOT NULL
ORDER BY timestamp DESC
LIMIT 20
For additional examples, see the
REGEXP_EXTRACT
documentation.
For examples of other regular expressions that
you can use, see Functions, operators, and conditionals.
The query shown in this example isn't efficient. For a substring match, like
the one illustrated, use the CONTAINS_SUBSTR
function.
Group and aggregate log entries
This section builds upon the previous samples and illustrates how you can
group and aggregate log entries. If you don't specify a grouping but do
specify an aggregation, a single result is printed because SQL treats all
rows that satisfy the WHERE
clause as a single group.
Every SELECT
expression must be included in the group fields or be aggregated.
Group by time
To group data by time, use the function TIMESTAMP_TRUNC
,
which truncates a timestamp to a specified granularity like MINUTE
. For
example a timestamp of 15:30:11
, which is formatted as
hours:minutes:seconds
, becomes 15:30:00
when the granularity is set to
MINUTE
.
The following query reads the data received in the interval specified by
the time-range picker and then retains
those rows where the value of the json_payload.status
field isn't NULL.
The query truncates the timestamp on each row by hour, and then groups the
rows by the truncated timestamp and status:
SELECT
TIMESTAMP_TRUNC(timestamp, HOUR) AS hour,
JSON_VALUE(json_payload.status) AS status,
COUNT(*) AS count
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
json_payload IS NOT NULL AND
JSON_VALUE(json_payload.status) IS NOT NULL
GROUP BY hour,status
ORDER BY hour ASC
For additional samples, see the
TIMESTAMP_TRUNC
documentation.
For information about other time-based functions, see
Datetime functions.
Group by resource
The following query reads the most recent hour of data, and then groups the log entries by the resource type. It then counts the number of row for each resource type, and returns a table with two columns. The first column lists the resource type while the second column is the number of rows for that resource type:
SELECT
resource.type, COUNT(*) AS count
FROM
`TABLE_NAME_OF_LOG_VIEW`
GROUP BY resource.type
LIMIT 100
Group by severity
The following query reads the most recent hour of data and then retains rows that have a severity field. The query then groups the rows by severity and counts the number of rows for each group:
SELECT
severity, COUNT(*) AS count
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
severity IS NOT NULL
GROUP BY severity
ORDER BY severity
LIMIT 100
Group by log_id
The result of the following query is a table with two columns. The first column lists the log names and the second column lists the number of log entries that were written to the log. The query sorts the results by the count of entries:
SELECT
log_id, COUNT(*) AS count
FROM
`TABLE_NAME_OF_LOG_VIEW`
GROUP BY log_id
ORDER BY count DESC
LIMIT 100
Compute average latency for HTTP request
The following query illustrates grouping by multiple columns, and computing
an average value. The query groups rows by the URL contained in the HTTP
request and by the value of the labels.checker_location
field. After
grouping the rows, the query computes the average latency for each group:
SELECT
JSON_VALUE(labels.checker_location) AS location,
AVG(http_request.latency.seconds) AS secs, http_request.request_url
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
http_request IS NOT NULL AND
http_request.request_method IN ('GET')
GROUP BY http_request.request_url, location
ORDER BY location
LIMIT 100
In the previous expression, JSON_VALUE
is required to extract the value
of the labels.checker_location
field because the data type for
labels
is JSON.
However, you don't use this function to extract the value from the
http_request.latency.seconds
field. The latter field has a data type of
integer.
Compute average bytes sent for a subnetwork test
The following query illustrates how you might display the average number of bytes sent by location.
The query reads the most recent hour of data and then retains only those rows
whose resource type column is gce_subnetwork
and whose json_payload
column isn't NULL. Next, the query groups the rows by the location of the
resource. Unlike the previous example where the data is stored as a numeric
value, the value of the bytes_sent
field is a string and therefore you must
convert the value to a FLOAT64
before computing the average:
SELECT JSON_VALUE(resource.labels.location) AS location,
AVG(CAST(JSON_VALUE(json_payload.bytes_sent) AS FLOAT64)) AS bytes
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
resource.type = "gce_subnetwork" AND
json_payload IS NOT NULL
GROUP BY location
LIMIT 100
The result of the previous query is a table where each row lists a location and the average bytes sent for that location.
For information about all functions that can retrieve and transform JSON data, see JSON functions.
For information about CAST
and other conversion functions, see
Conversion functions.
Count the log entries with a field that match a pattern
To return the substring that matches a regular expression, use the function
REGEXP_EXTRACT
. The return type of this function is
either a STRING
or BYTES
.
The following query retains the log entries for which the value
of the json_payload.jobName
field is not NULL.
Then, it groups the entries by the name suffix that begins
with test
. Lastly, the query counts the numbers of entries in each group:
SELECT
REGEXP_EXTRACT(JSON_VALUE(json_payload.jobName), r".*(test.*)$") AS name,
COUNT(*) AS count
FROM
`TABLE_NAME_OF_LOG_VIEW`
WHERE
json_payload.jobName IS NOT NULL
GROUP BY name
ORDER BY count
LIMIT 20
For additional examples, see the
REGEXP_EXTRACT
documentation.
For examples of other regular expressions that
you can use, see Functions, operators, and conditionals.
Cross-column search
This section describes two different approaches that you can use to search multiple columns of a table.
Token-based search
To search a log view for entries that match a set of search terms,
use the function SEARCH
. This function requires two parameters:
where to search, and the search query.
Because the SEARCH
function has specific rules on how the data is searched,
we recommend that you read the SEARCH
documentation.
The following query retains only those rows that have a field that exactly matches "35.193.12.15":
SELECT
timestamp, log_id, proto_payload, severity, resource.type, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW` AS t
WHERE
proto_payload IS NOT NULL AND
log_id = "cloudaudit.googleapis.com/data_access" AND
SEARCH(t,"`35.193.12.15`")
ORDER BY timestamp ASC
LIMIT 20
In the previous query, backticks wrap the value to be searched for. This
ensures that the SEARCH
function searches for an exact match between
a field value and the value between the backticks.
When backticks are omitted in the query string, the query string is split
based on rules defined in the SEARCH
documentation.
For example, when the following statement is run,
the query string is split into four tokens: "35", "193", "12", and "15":
SEARCH(t,"35.193.12.15")
The previous SEARCH
statement matches a row when a single field
matches all four tokens. The order of the tokens doesn't matter.
You can include multiple SEARCH
statements in a query. For example, in the
previous query, you could replace the filter on the log ID with a
statement like the following:
SEARCH(t,"`cloudaudit.googleapis.com/data_access`")
The previous statement searches every field of the log entries in the log view
while the original statement searches only the log_id
field of the
log entries.
To perform multiple searches on multiple fields, separate the individual strings with a space. For example, the following statement matches rows where a field contains "Hello World", "happy", and "days":
SEARCH(t,"`Hello World` happy days")
Lastly, you can search specific fields instead of searching an
entire table. For example, the following statement only searches
the columns named text_payload
and json_payload
:
SEARCH((text_payload, json_payload) ,"`35.222.132.245`")
For information about how the parameters of the SEARCH
function are processed,
see the BigQuery reference page Search functions.
Substring search
To perform a case-insensitive test to determine whether a value exists in an
expression, use the function CONTAINS_SUBSTR
.
This function returns TRUE
when the value exists and
FALSE
otherwise. The search value must be a STRING
literal, but not the
literal NULL
.
For example, the following query fetches all Data Access audit log entries with a specific IP address whose timestamps are in a specific time range. Lastly, the query sorts the results and then displays the 20 oldest results:
SELECT
timestamp, log_id, proto_payload, severity, resource.type, resource, labels
FROM
`TABLE_NAME_OF_LOG_VIEW` AS t
WHERE
proto_payload IS NOT NULL AND
log_id = "cloudaudit.googleapis.com/data_access" AND
CONTAINS_SUBSTR(t,"35.193.12.15")
ORDER BY timestamp ASC
LIMIT 20
The previous query performs a substring test. Therefore, a row that contains
"35.193.12.152" matches the CONTAINS_SUBSTR
statement.
Combine data from multiple sources
Query statements scan one or more tables or expressions and return the
computed result rows. For example, you can use query statements to merge the
results of SELECT
statements on different tables or datasets in a
variety of ways and then select the columns from the combined data.
Combine data from two tables with joins
To combine information from two tables, use the one of the join operators. The type of join and the conditional clause you use determines how rows are combined and discarded.
The following query gives you the json_payload
fields from rows in
two different tables written by the same trace span. The query performs an
inner JOIN
over two tables for rows where the values of
the span_id
and trace
columns in both tables match. From this result,
the query then selects the timestamp
, severity
, and json_payload
fields
that came from TABLE_NAME_OF_LOG_VIEW_1, the json_payload
field from
TABLE_NAME_OF_LOG_VIEW_2, and the values of the span_id
and trace
fields on which the two tables were joined, and returns up to 100
rows:
SELECT
a.timestamp, a.severity, a.json_payload, b.json_payload, a.span_id, a.trace
FROM `TABLE_NAME_OF_LOG_VIEW_1` a
JOIN `TABLE_NAME_OF_LOG_VIEW_2` b
ON
a.span_id = b.span_id AND
a.trace = b.trace
LIMIT 100
Combine multiple selections with unions
To combine the results of two or more SELECT
statements and discard
duplicate rows, use the UNION
operator. To retain duplicate
rows, use the UNION ALL
operator.
The following query reads the most recent hour of data from TABLE_NAME_OF_LOG_VIEW_1, merges the result with the most recent hour of data from TABLE_NAME_OF_LOG_VIEW_2, sorts the merged data by increasing timestamp, and then displays the oldest 100 entries:
SELECT
timestamp, log_name, severity, json_payload, resource, labels
FROM(
SELECT * FROM `TABLE_NAME_OF_LOG_VIEW_1`
UNION ALL
SELECT * FROM `TABLE_NAME_OF_LOG_VIEW_2`
)
ORDER BY timestamp ASC
LIMIT 100
Remove duplicate log entries
Log Analytics doesn't remove duplicate log entries before a query is run. This behavior is different than when you query log entries by using the Logs Explorer, which removes duplicate entries by comparing the log names, timestamps, and insert ID fields.
You can use row-level validation to remove duplicate log entries.
For more information, see Troubleshoot: There are duplicate log entries in my Log Analytics results.
Limitations
Queries used in the Log Analytics page support GoogleSQL functions with some exceptions.
The following SQL commands aren't supported for SQL queries issued by using the Log Analytics page:
- DDL and DML commands
- Javascript user-defined functions
- BigQuery ML functions
- SQL variables
The following are supported only when you are querying a linked dataset using the BigQuery Studio and Looker Studio pages, and the bq command-line tool:
- Javascript user-defined functions
- BigQuery ML functions
- SQL variables
What's next
For information about how to route and store log entries, see the following documents:
- Create a log bucket
- Upgrade a bucket to use Log Analytics
- Link a log bucket to a BigQuery dataset
- Configure and manage sinks
For SQL reference documentation, see the following documents: