Send records to error

This page explains how to remove common errors from a dataset when you prepare data in the Wrangler workspace of the Cloud Data Fusion Studio.

The following types of errors occur in datasets:

  • Systemic errors, such as service or instance failures
  • Logical errors, such as pipeline run failures
  • Data errors, such as invalid credit card numbers, invalid date formats, or invalid zip codes

Wrangler provides a set of over 50 directives to help you remove common errors from a dataset.

To send records to error, follow these steps:

  1. Go to the Wrangler workspace in Cloud Data Fusion.
  2. On the Data tab, go to a column name and click the arrow_drop_down expander arrow.
  3. Select Send to error, and then select the condition that sends bad records to error.

Wrangler removes values that match the specified condition from the sample and adds the send to error directive to the recipe. When you run the data pipeline, the transformation is applied to all values in the column.

Add an error collector plugin to a data pipeline

When you add a Wrangler transformation with a recipe that includes the send to error directive to a data pipeline, you can choose to connect it to the Error Collector plugin. The Error Collector plugin is usually connected to a downstream sink plugin, such as a BigQuery sink.

When you run the pipeline, the records flagged by the send to error directive go from the Wrangler transformation step in your pipeline, to the Error Collector step, to the sink step. When the run finishes, you can examine those flagged records written to the sink.

If your recipe includes the send to error transformation, but the pipeline doesn't include the Error Collector plugin, the records flagged by the send to error directive are dropped during the pipeline run.

What's next