Use deduplication in search and dashboards
This document explains what happens when you search data in Google Security Operations. Sometimes, results can include duplicates. This often occurs because enterprise infrastructure generates logs for the same event from multiple systems. For example, both your authentication and security systems might log a single login event.
To reduce duplicate results, use UDM fields in the dedup
section in your
YARA-L syntax. Add UDM fields to this section to return a single result for each
distinct combination of values. .
Deduplication in queries
Deduplication applies to the following types of search and dashboard queries:
Aggregated search queries: Includes
match
,match
andoutcome
, oraggregated outcome
sections. Deduplication occurs after outcomes are determined.For aggregated search queries, include these fields to the
dedup
section:- Fields from the
match
section - Fields from the
outcome
section
- Fields from the
UDM search queries: Exclude the
match
,outcome
, oraggregated outcome
sections. Note that UDM search queries can include anoutcome
section as long as there aren't any aggregates and there isn't amatch
section.For UDM queries, add these fields to the
dedup
section:- Any non-repeated, non-array, and non-grouped event fields
- Placeholder fields from the
events
section - Outcome variables from the
outcome
section
Deduplication examples in Search
This section shows the YARA-L syntax and can be run in Search.
Example: Simple search for unique IP addresses
The following example search displays network connections between events where a
unique IP address within your enterprise (principal.ip
) connects to a unique,
external IP address outside of your enterprise (target.ip
). The events are
deduplicated based on the principal.ip
.
events:
metadata.event_type = "NETWORK_CONNECTION"
target.ip != ""
principal.ip != ""
match:
target.ip, principal.ip
dedup:
principal.ip
Example: Unique IP addresses
Similar to the previous example, the following example search displays network
connection events with unique IP addresses. Applying dedup
to principal.ip
narrows results to events associated with unique IPs. The outcome
section
displays the total bytes sent between principal.ip
and target.ip
, ordering
results from highest to lowest traffic volume.
events:
metadata.event_type = "NETWORK_CONNECTION"
target.ip != ""
principal.ip != ""
match:
target.ip, principal.ip
outcome:
$total_bytes = sum(network.sent_bytes)
dedup:
principal.ip
order:
$total_bytes desc
Example: Simple search for unique hostnames
The following example searches for each unique hostname accessed from your
enterprise. Applying dedup
to target.hostname
narrows results to events
associated with unique external hostnames.
metadata.log_type != ""
dedup:
target.hostname
The following is an equivalent example without the dedup
option. It typically
returns substantially more events.
metadata.log_type != "" AND target.hostname != ""
Example: Unique hostnames
Similar to the previous example, this search displays network connection events
with unique hostnames. Applying the dedup
option to principal.hostname
narrows results to events associated with unique hosts:
events:
metadata.event_type = "NETWORK_CONNECTION"
target.hostname != ""
principal.hostname != ""
match:
target.hostname, principal.hostname
outcome:
$total_bytes = sum(network.sent_bytes)
dedup:
principal.hostname
order:
$total_bytes desc
Need more help? Get answers from Community members and Google SecOps professionals.