Message monitoring
This guide describes how to monitor messages sent to Manufacturing Data Engine (MDE), how they flow through the processing pipeline, and diagnose any issues that might happen because of configuration or potential system issues.
In order to monitor messages flowing through MDE you should
use the operations-log
BigQuery table. Whenever a step of the MDE pipeline
fails to process a message, it sends it to the operations-log
table indicating
the step and the failure reason.
All failed messages are written to this table, however, you can configure
MDE to also write successful messages to the operations-log
table. This is useful when debugging issues, but shouldn't be left turned on as
it can generate a lot of additional traffic in the system and could degrade performance
in production.
REST
Execute the following REST API request to configure the operations-log
table:
POST /configuration/v1/environment
{
"operationsLogLevel": "ALL"
}
ALL
: All messages will be sent to theoperations-log
table for each step in the processing pipeline.ERROR
: Only messages that failed any of the processing steps will be sent to theoperations-log
table.
Processing steps
In order to diagnose why a message is being rejected when passing through the processing pipeline, it's useful to know the different processing steps and understand what they do. For more information, see MDE Architecture.
- message-mapper: Reads the original JSON, matches it to the corresponding message-class and processes it using Whistle to emit one or more records.
- configuration-manager: Creates a new tag in the system if it doesn't already exist and adds to the record all the defined properties in the appropriate Type (metadata buckets, sinks,and transformations).
- metadata-manager: Resolves all metadata references of this record, updates the system metadata instances if a new one is received, adds materialized metadata to the record if it's configured to do so.
- bigquery-sink: Maps the record to the appropriate type structure and sends it to the corresponding PubSub topic to be written into BigQuery.
- pubsub-sink: Maps the record to the pubsub Proto or JSON structure and sends it to the corresponding topic.
- GCSWriter: Writes both, raw data as it's received from the
input-messages
topic as well as processed data after it goes through the metadata-manager. - BigtableWriter: Writes data to Bigtable.
- GCSReader: Reads files from Cloud Storage and sends the messages to
input-messages
.
Diagnose messages not showing in their configured sink
If you send a message to MDE and it isn't on the configured sink, first verify that the type has configured the sink properly (as explained in the Type section) and that you're querying the correct table in BigQuery. Remember that the table is named after the Type.
If that is configured correctly, then you must use the operations-log
table to
diagnose the issue. You can start with a general query, either ordering by
event_timestamp
or filtering it to when the message was sent, as the following
example shows:
SELECT
*
FROM
`mde_system.operations-log`
WHERE
DATE(event_timestamp) = CURRENT_DATE()
ORDER BY
event_timestamp desc
LIMIT 100;
You can also use the source_message_id
to filter on a specific message. This
ID is assigned by Pub/Sub when publishing a message. If you use
gcloud CLI to
publish a message from the command line, it will return the messageId
of the
published message.
SELECT
*
FROM
`mde_system.operations-log`
WHERE
DATE(event_timestamp) <= CURRENT_DATE()
AND source_message_id = 'PubSubMessageId';
If you can't find the message or there are too many, you can filter based on an
attribute of the original message. The message is always logged in the payload
field with the state that the last step left it, and it is saved in a JSON field,
so it is easier to use TO_JSON_STRING
and use %
to look for any messages that
contain the text you want.
SELECT
*
FROM
`mde_system.operations-log`
WHERE
DATE(event_timestamp) = CURRENT_DATE()
AND TO_JSON_STRING(payload) LIKE "%TEXT-TO-FIND%"
ORDER BY
event_timestamp DESC;
Once you find your message in the operations-log
table, look at the
error-message
column to find the reason the message was rejected. The most
common errors are the following:
- Couldn't match the incoming message with any of the registered message classes in the configuration manager.
- No parsers found for message class
<MESSAGE_CLASS_NAME>
. - Skipping the message processing as it couldn't be deserialized (the message is not a valid JSON).
- Couldn't construct a valid message with the parser
<PARSER_NAME>
. Message is missing timestamps field. - Couldn't construct a valid message with the parser
<PARSER_NAME>
. Message is missing tagName field or it is blank. - Couldn't construct a valid message with the parser
<PARSER_NAME>
. Message timestamp is out of the supported bounds.
After you find and fix the error that is causing the message to fail, the messages will start landing in their respective sinks.