BigQuery queries

This guide explains how to query data in BigQuery for typical Manufacturing Data Engine (MDE) use-cases.

Records join with cloud metadata

If cloud metadata materialization is disabled, you can access cloud metadata instances by joining relevant record table with the metadata-store on the metadata instance_id with the following SQL query:

SELECT
  dnr.*,
  ms.instance
FROM
  mde_data.`RECORD_TABLE_NAME` AS dnr
LEFT JOIN
  mde_dimension.`metadata-store` AS ms
ON
  ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.BUCKET_NAME.instance_id")
WHERE
  DATE(event_timestamp) = 'EVENT_TIMESTAMP'
LIMIT 100

Replace the following:

  • RECORD_TABLE_NAME: the name of the record table.
  • BUCKET_NAME: the name of the cloud metadata bucket.
  • EVENT_TIMESTAMP: the timestamp of the event.

To improve query performance and since the metadata-store is partitioned on bucket number, you can optionally specify the bucket number in the ON clause, as the following SQL query:

SELECT
  dnr.*,
  ms.instance
FROM
  mde_data.`<RECORD_TABLE_NAME>` AS dnr
LEFT JOIN
  mde_dimension.`metadata-store` AS ms
ON
  ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.BUCKET_NAME.instance_id")
  AND ms.bucket_number = <BUCKET_NUMBER>
WHERE
  DATE(event_timestamp) = 'EVENT_TIMESTAMP'
LIMIT 100

Replace the following:

  • BUCKET_NAME: the name of the cloud metadata bucket.
  • EVENT_TIMESTAMP: the timestamp of the event.

Cloud metadata instance attributes access

You can access metadata instance attributes using the JSON dot notation which always returns a JSON object, or using one of the BigQuery JSON functions, such as JSON_VALUE to extract strings or other data types. See the following example:

SELECT
  dnr.*,
  ms.instance.deviceName                    -- this returns a double quoted JSON string
  JSON_VALUE(ms.instance, '$.deviceName')   -- this returns a string
FROM
  mde_data.`example-record-tbl` AS dnr
LEFT JOIN
  mde_dimension.`metadata-store` AS ms
ON
  ms.instance_id = JSON_VALUE(cloud_metadata_ref, "$.bucket.instance_id")
WHERE
  DATE(event_timestamp) = '2023-01-01'
LIMIT 100

Similarly, if cloud metadata materialization is enabled, you can access metadata instance attributes directly from the record. See the following example:

SELECT
  *   (EXCEPT materialized_cloud_metadata),
  materialized_cloud_metadata.device.deviceName                     -- this returns a double quoted JSON string
  JSON_VALUE(materialized_cloud_metadata., '$.device.deviceName')   -- this returns a string
FROM
  mde_data.`example-record-tbl`
WHERE
  DATE(event_timestamp) = '2023-01-01'
LIMIT 100

Obtaining a list of all instance IDs contained in cloud_metadata_ref

To obtain an array of all metadata instance IDs contained in the cloud_metadata_ref field of a record follow these steps:

  1. Create the user defined function (UDF) with the following SQL query:

    CREATE OR REPLACE FUNCTION `mde_data.get_instance_ids`(input JSON) RETURNS ARRAY<STRING> LANGUAGE js AS R"""
    return input ? Object.keys(input).map(bucketName => input[bucketName].instance_id).filter(instance_id => instance_id != null) : [];
    """;
    
  2. Execute the function in a query:

    SELECT
     mde_data.get_instance_ids(cloud_metadata_ref) as metadata_instance_ids,
     *,
    FROM
     mde_data.`RECORD_TABLE_NAME`
    WHERE
    DATE(event_timestamp) = 'EVENT_TIMESTAMP'
    LIMIT 100
    

    Replace the following:

  • RECORD_TABLE_NAME: the name of the record table.
  • EVENT_TIMESTAMP: the timestamp of the event.