BigQuery schemas

Manufacturing Data Engine (MDE) distinguishes between three different types of tables in BigQuery:

  1. Record tables
  2. Metadata tables
  3. System tables (which include configuration, logs, and raw messages)

Record tables

MDE creates a record table in BigQuery for each Type. Record tables are stored in the mde_data dataset. The BigQuery schema of record tables depends on the Archetype.

Nr Archetype* Field Data Type Description
1 All id string Row identifier
2 All tag_name string Name of the tag configuration entity/ natural key in tag buckets
3 All type_version integer Type version
4 All embedded_metadata JSON Stores rapidly changing data in a record
5 All cloud_metadata_ref JSON Stores a foreign key (UUIDs) to a metadata instance for bucket
6 All materialized_cloud_metadata JSON Stores metadata instances in the record that are retrieved from the foreign keys stored in cloud_metadata_ref
7 All ingest_timestamp timestamp The timestamp when a record was written to BigQuery
8 All source_message_id string Source message Id (Pub/Sub assigned)
9 DDS-S, DDS-C, CDS-S, CDS-C data JSON Object holding the data payload
10 NDS value numeric Measurement value
11 DDS-S, DDS-C, NDS-S, NDS-C event_timestamp timestamp Event timestamp
12 CDS-S, CDS-C event_timestamp_start timestamp Event timestamp start
13 CDS-S, CDS-C event_timestamp_end timestamp Event timestamp end
14 CDS-S, CDS-C duration integer Duration
15 DDS-C, NDS-C, CDS-C clustered_column_1 string Generic cluster column
16 DDS-C, NDS-C, CDS-C clustered_column_2 string Generic cluster column
17 DDS-C, NDS-C, CDS-C clustered_column_3 string Generic cluster column
18 DDS-C, NDS-C, CDS-C clustered_column_4 string Generic cluster column

*Archetype reference:

  • DDS-S: Discrete Data Series Standard.
  • DDS-C: Discrete Data Series Clustered.
  • DDS-S: Discrete Data Series Standard.
  • DDS-C: Discrete Data Series Clustered.
  • NDS-S: Numeric Data Series Standard.

Partitioning

Record tables in the Discrete Data Series and Numeric Data Series archetype families are partitioned by day on the event_timestamp field.

Record tables in the Continuous Data Series archetype family are partitioned by day on the event_timestamp_start field.

Clustering

Record tables of clustered archetypes are automatically clustered by clustered_column_1, clustered_column_2, clustered_column_3, and clustered_column_4 in that order upon creation. You can populate these fields in the parser by assigning values to the corresponding fields in the proto record.

Metadata tables

Metadata instances for all buckets are stored in a single metadata-store table in the metadata_dimension dataset. The following describes the BigQuery schema of table:

Nr Field Data Type Description
1 instance_id string Unique metadata instance identifier - UUID (PK)
2 bucket_number integer Bucket identifier
3 bucket_name string Bucket name
4 bucket_version integer Bucket version
5 natural_key string Natural key of the metadata
6 instance JSON Metadata instance object
7 created_timestamp timestamp Timestamp when the instance was created

Partitioning

The metadata-store table is partitioned on bucket_number.

System tables

The mde_system dataset exposes the following sets of tables:

  • Type and bucket configuration tables for the MDE configuration model
  • Operations logs
  • Raw messages

Bucket specification

The bucket-specification table exposes bucket configuration data. The table has the following schema:

Nr Field Data Type
1 id string
2 bucket_type string
3 bucket_number integer
4 bucket_name string
5 schema JSON
6 bucket_version integer
7 created_timestamp timestamp
8 provider string
8 bucket_attributes JSON

Type specification

The type-specification table exposes bucket configuration data. The table has the following schema:

Nr Field Data Type
1 id string
2 type_name string
3 type_version integer
4 data_schema JSON
5 storage_specs JSON
6 metadata_bucket_references JSON
7 created_timestamp timestamp

Operations logs

The operations-log table exposes MDE operations log data. The logging level can be configured. See the Operations guide for more details. The table has the following schema:

Nr Field Data Type Description
1 message_id string Row identifier
2 source_message_id string Source message Id (PubSub assigned)
3 step string Processing pipeline step
4 error_message STRING Error cause
5 event_timestamp timestamp Event timestamp
6 step_timestamp timestamp Step timestamp
7 headers JSON Message headers at the processing step
8 payload JSON Message payload at the processing step

Raw messages

MDE stores every incoming message in the raw-messages table. The table has the following schema:

Nr Field Data Type
1 subscription_name string
2 message_id string
3 publish_time timestamp
4 data string
5 attributes timestamp