This page lists example SQL queries to analyze your assets' data after you've exported it to BigQuery. See standard SQL Query syntax for more information.
Query directly on available columns
To find the quantity of each asset type, run the following query:
SELECT asset_type, COUNT(*) AS asset_count
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
GROUP BY asset_type
ORDER BY asset_count DESC
Query on repeated fields
To find Identity and Access Management (IAM) policies that grant access to Gmail accounts,
run the following query. BigQuery uses UNNEST
to flatten
repeated fields into a table
that you can query directly:
SELECT name, asset_type, bindings.role
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
JOIN UNNEST(iam_policy.bindings) AS bindings
JOIN UNNEST(bindings.members) AS principals
WHERE principals like "%@gmail.com"
To find an organization, folder, or project that allows creating using a public IP, run the following query. This query is useful because allowing public IP addresses with Cloud SQL instances can introduce vulnerabilities unless SSL or a proxy is configured:
SELECT name
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
JOIN UNNEST(org_policy) AS op
WHERE
op.constraint = "constraints/sql.restrictPublicIp"
AND (op.boolean_policy IS NULL OR op.boolean_policy.enforced = FALSE);
To find an organization, folder, or project in the same VPC Service Controls service perimeter of a project, run the following query:
SELECT service_perimeter.title, service_perimeter.status.resources
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME`
CROSS JOIN UNNEST(service_perimeter.status.resources) as resource
WHERE resource = "projects/PROJECT_ID";
Query on extracted JSON string
To find open firewall rules, run the following query. Learn more about JSON functions used in BigQuery.
CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";
SELECT firewall.name, firewall.resource.parent, JSON_EXTRACT(firewall.resource.data, '$.sourceRanges') AS sourceRanges
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` AS firewall
JOIN UNNEST(json2array(JSON_EXTRACT(firewall.resource.data, '$.sourceRanges'))) AS source_ranges
WHERE asset_type="compute.googleapis.com/Firewall" AND JSON_EXTRACT(firewall.resource.data, '$.sourceRanges') IS NOT NULL AND JSON_EXTRACT_SCALAR(source_ranges, '$') = "0.0.0.0/0"
By separating tables per resource type, we can find open firewall rules with an easier and faster query.
SELECT firewall.name, firewall.resource.parent, sourceRanges
FROM `PROJECT_ID.DATASET_ID.STRUCTURED_INSTANCE_TABLE_NAME` AS firewall
JOIN UNNEST(firewall.resource.data.sourceRanges) AS sourceRanges
WHERE sourceRanges = "0.0.0.0/0";
Query on joined resource types
To join tables of different resource types, run the following query. The
following example shows how to find all subnetworks that have no VM attached.
First, the query finds all subnetworks. Then, from that list, it selects those
subnetworks whose selfLinks
are not present.
CREATE TEMP FUNCTION json2array(json STRING)
RETURNS ARRAY<STRING>
LANGUAGE js AS """
return JSON.parse(json).map(x=>JSON.stringify(x));
""";
SELECT name, JSON_EXTRACT(subnetwork.resource.data, '$.selfLink') AS selflink
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` AS subnetwork
WHERE asset_type = "compute.googleapis.com/Subnetwork" AND (JSON_EXTRACT(subnetwork.resource.data, '$.selfLink') NOT IN
(SELECT DISTINCT JSON_EXTRACT(network_interfaces, '$.subnetwork')
FROM `PROJECT_ID.DATASET_ID.TABLE_NAME` as instance
JOIN UNNEST(json2array(JSON_EXTRACT(instance.resource.data, '$.networkInterfaces'))) AS network_interfaces
WHERE asset_type ="compute.googleapis.com/Instance"
AND JSON_EXTRACT(instance.resource.data, '$.networkInterfaces') IS NOT NULL
)) IS NULL
By separating tables per resource type, we can find all subnetworks that have no VM attached with an easier and faster query.
SELECT name, subnetwork.resource.data.selfLink
FROM `PROJECT_ID.DATASET_ID.STRUCTURED_SUBNETWORK_TABLE_NAME` AS subnetwork
WHERE
(
subnetwork.resource.data.selfLink
NOT IN (
SELECT DISTINCT networkInterface.subnetwork
FROM `PROJECT_ID.DATASET_ID.STRUCTURED_INSTANCE_TABLE_NAME` as instance
JOIN
UNNEST(instance.resource.data.networkInterfaces) AS networkInterface
WHERE
networkInterface IS NOT NULL
)
) IS NULL;
Query to find vulnerable Dataproc clusters due to CVE-2021-44228
CREATE TEMP FUNCTION vulnerable_version(imageVersion STRING)
RETURNS BOOL
LANGUAGE js AS r"""
const version_regexp = /(?<major>\d+)(?:\.)(?<minor>\d+)(?:\.)?(?<sub>\d+)?/g;
let match = version_regexp.exec(imageVersion);
if(match.groups.major < 1){
return true;
}
if (match.groups.major == 1){
if (match.groups.minor < 3){
return true;
}
if(match.groups.minor == 3 && match.groups.sub < 95){
return true;
}
if(match.groups.minor == 4 && match.groups.sub < 77){
return true;
}
if(match.groups.minor == 5 && match.groups.sub < 53){
return true;
}
}
if (match.groups.major == 2 && match.groups.minor == 0 && match.groups.sub < 27){
return true;
}
return false;
""";
SELECT
c.name,
c.resource.data.projectId AS project_id,
c.resource.data.clusterName AS cluster_name,
c.resource.data.config.softwareConfig.imageVersion AS image_version,
c.resource.data.status.state AS cluster_state,
vulnerable_version(c.resource.data.config.softwareConfig.imageVersion) AS is_vulnerable
FROM
`PROJECT_ID.DATASET_ID.TABLE_NAME_PREFIX_dataproc_googleapis_com_Cluster` c
INNER JOIN `PROJECT_ID.DATASET_ID.TABLE_NAME_PREFIX_cloudresourcemanager_googleapis_com_Project` p
ON p.resource.data.projectId = c.resource.data.projectId
WHERE
c.resource.data.config.softwareConfig.imageVersion IS NOT NULL
AND c.resource.data.status.state = "RUNNING"
AND p.resource.data.lifecycleState = "ACTIVE";