Encode and decode rows

This page explains how to encode and decode rows when you prepare data in the Wrangler workspace of the Cloud Data Fusion Studio.

Encode a row

You can use base encoding of data to store or transfer data in environments that, for legacy reasons, are restricted to US-ASCII data. You might use it in new applications without those legacy restrictions because it allows the manipulation of objects with text editors.

You can apply the following encoding schemes, which are based on RFC-4648, to all values in a column:

  • Base32
  • Base64
  • Hex
  • URL

When you encode, Wrangler generates a new column with a name in the following format: <column>_encode_<type> except for url-encode.

Cloud Data Fusion uses the following rules for the column values:

  • If the column is null, the resulting column is also null.
  • If the chosen column isn't found in the row, the row is skipped.
  • If the column value doesn't have a string or byte data type, the transformation fails, and an error displays.

Supported encoding options

Wrangler supports the following encoding options:

Encode base64
The Base64 option adds the encode64 directive as a transformation step to the recipe and creates a new column with encoded values.
Encode base32
The Base32 option adds the encode32 directive as a transformation step to the recipe and creates a new column with encoded values.
Encode hex
The Hex option adds the encode_hex directive as a transformation step to the recipe and creates a new column with encoded values.
Encode URL
The URL option adds the url-encode directive as a transformation step to the recipe and encodes the current column.

Decode a row

You can use base decoding of data to store or transfer data in environments that, for legacy reasons, are restricted to US-ASCII data. You might use it in new applications without those legacy restrictions because it allows the manipulation of objects with text editors.

You can apply the following decoding schemes, which are based on RFC-4648, to each value in a column:

  • Base32
  • Base64
  • Hex
  • URL

When you decode, Wrangler generates a new column with a name in the following format: <column>_encode_<type>, except for url-decode.

Cloud Data Fusion uses the following rules for the column values:

  • If the column is null, the resulting column is also null.
  • If the chosen column isn't found in the row, the row is skipped.
  • If the column value doesn't contain the string or byte array data types, the operation fails.

Supported decoding options

Wrangler supports the following encoding options:

Decode base64
The base64 option adds the decode64 directive as a transformation step to the recipe and creates a new column with the decoded values.
Decode base32
The base32 option adds the decode32 directive as a transformation step to the recipe and creates a new column with the decoded values.
Decode hex
The Hex option adds the decode hex directive as a transformation step to the recipe and creates a new column with the decoded values.
Decode URL
The URL option adds the url-decode directive as a transformation step to the recipe and decodes the current column.

What's next