Auto Extraction overview
This document provides an overview of how data is automatically extracted to enhance the ability to ingest, process, and analyze data.
Google Security Operations uses prebuilt parsers to extract and structure log data using the Unified Data Model (UDM) schema. Managing and maintaining these parsers can be challenging due to several limitations: incomplete data extraction, the growing number of parsers to manage, and the requirement for frequent updates as log formats evolve.
To address these challenges, you can use the auto extraction
feature. This feature automatically extracts key-value pairs from JSON-formatted
logs ingested into Google SecOps. This extracted data is stored in a UDM, map-type
field called extracted
. You can then use this data within UDM search queries,
Preview Dashboards, and YARA-L
rules. Autonomous parsing supports JSON format logs.
As a best practice, the UDM searches using extracted fields must include metadata.log_type
in their query to improve search query performance.
The benefit of auto-extraction is reduced reliance on parsers, ensuring that data remains available even when a parser is not present or fails to parse a log.
Parse and extract data from the raw log
Parsing: Google SecOps attempts to parse logs using a parser specific to the log type, if available. If no specific parser exists, or if parsing fails, Google SecOps uses a general parser to extract basic information like ingested timestamp, log type, and metadata labels.
Data Extraction: All data points are automatically extracted from the logs.
Event Enrichment: Google SecOps combines the parsed data and any custom-formatted fields to create enriched events, providing more context and detail.
Downstream Data Transfer: These enriched events are then sent to other systems for further analysis and processing.