This page explains how to separate data from a field (a cell) into multiple rows when you prepare data in the Wrangler workspace of the Cloud Data Fusion Studio.
Separate delimited text
You can separate the values from a cell into new rows if the values are separated by the following delimiters:
- Comma
- Tab
- Pipe
- Whitespace
- Custom separator
If a cell doesn't contain the chosen delimiter, no new row is inserted.
To split values based on a delimiter, follow these steps:
- Go to Wrangler workspace in Cloud Data Fusion.
- On the Data tab, go to a column name and click the arrow_drop_down expander arrow.
- Click Explode > Delimited text.
- Choose a delimiter—for example Pipe.
Click Extract.
Wrangler splits the fields based on the selected delimiter and adds the
split-to-row
directive to the recipe. When you run the data pipeline,
Cloud Data Fusion applies the transformation to all values in the column.
In this example, a dataset has a column of string values containing the comma delimiter:
ID | Name |
---|---|
1 | Lee,Lucian,Luka |
2 | Mahan,Noam |
To divide the value into separate rows, Wrangler deletes the original column and creates a new column with one row for each value. The other column values from the original row are copied into the new rows:
ID | Name_1 |
---|---|
1 | Lee |
1 | Lucian |
1 | Luka |
2 | Mahan |
2 | Noam |
Separate arrays
The flatten
directive separates items in arrays, such as ["ELEMENT_1",
"ELEMENT_2", "ELEMENT_3"]
, into new rows. The other column values from the
original record are copied into the new records.
What's next
- Learn more about Wrangler directives.