Use deduplication in search and dashboards

Supported in:

This document explains what happens when you search data in Google Security Operations. Sometimes, results can include duplicates. This often occurs because enterprise infrastructure generates logs for the same event from multiple systems. For example, both your authentication and security systems might log a single login event.

To reduce duplicate results, use UDM fields in the dedup section in your YARA-L syntax. Add UDM fields to this section to return a single result for each distinct combination of values. .

Deduplication in queries

Deduplication applies to the following types of search and dashboard queries:

  • Aggregated search queries: Includes match, match and outcome, or aggregated outcome sections. Deduplication occurs after outcomes are determined.

    For aggregated search queries, include these fields to the dedup section:

    • Fields from the match section
    • Fields from the outcome section
  • UDM search queries: Exclude the match, outcome, or aggregated outcome sections. Note that UDM search queries can include an outcome section as long as there aren't any aggregates and there isn't a match section.

    For UDM queries, add these fields to the dedup section:

    • Any non-repeated, non-array, and non-grouped event fields
    • Placeholder fields from the events section
    • Outcome variables from the outcome section

This section shows the YARA-L syntax and can be run in Search.

Example: Simple search for unique IP addresses

The following example search displays network connections between events where a unique IP address within your enterprise (principal.ip) connects to a unique, external IP address outside of your enterprise (target.ip). The events are deduplicated based on the principal.ip.

events:
   metadata.event_type = "NETWORK_CONNECTION"
   target.ip != ""
   principal.ip != ""
match:
   target.ip, principal.ip
dedup:
   principal.ip

Example: Unique IP addresses

Similar to the previous example, the following example search displays network connection events with unique IP addresses. Applying dedup to principal.ip narrows results to events associated with unique IPs. The outcome section displays the total bytes sent between principal.ip and target.ip, ordering results from highest to lowest traffic volume.

events:
   metadata.event_type = "NETWORK_CONNECTION"
   target.ip != ""
   principal.ip != ""
match:
   target.ip, principal.ip
outcome:
   $total_bytes = sum(network.sent_bytes)
dedup:
   principal.ip
order:
   $total_bytes desc

Example: Simple search for unique hostnames

The following example searches for each unique hostname accessed from your enterprise. Applying dedup to target.hostname narrows results to events associated with unique external hostnames.

metadata.log_type != ""
dedup:
    target.hostname

The following is an equivalent example without the dedup option. It typically returns substantially more events.

metadata.log_type != "" AND target.hostname != ""

Example: Unique hostnames

Similar to the previous example, this search displays network connection events with unique hostnames. Applying the dedup option to principal.hostname narrows results to events associated with unique hosts:

events:
   metadata.event_type = "NETWORK_CONNECTION"
   target.hostname != ""
   principal.hostname != ""
match:
   target.hostname, principal.hostname
outcome:
   $total_bytes = sum(network.sent_bytes)
dedup:
   principal.hostname
order:
   $total_bytes desc

Need more help? Get answers from Community members and Google SecOps professionals.