Work with numbers

This page explains how to perform numeric calculations when you prepare data in the Wrangler workspace of the Cloud Data Fusion Studio.

You can perform these transformations in columns of the following data types:

  • Integer
  • Decimal
  • Double
  • Float
  • Long
  • Short

For more information, see the set-column directive.

You can use numeric functions on one or more columns. The available numeric functions vary by the number of columns you select. You can choose to create a new column with the results of the calculations, or you can have the results appear in the column where you add the transformation.

If you apply a numeric function to a single column, the behavior is different than if you apply a numeric function to multiple columns. For example:

  • If you use the Multiply() function on a single column, you specify the decimal value to multiply each value in the column by. Wrangler performs the multiplication on the sample data and displays the new values in the same column or in a new column.
  • If you use the Multiply() function on multiple columns, Wrangler multiplies the values in each row for the selected columns and displays the new values in the first column of the transformation.

Perform numeric calculations on values in one column

To apply a numeric calculation to one column, follow these steps:

  1. Go to the Wrangler workspace in Cloud Data Fusion.
  2. On the Data tab, go to a column name and click the arrow_drop_down expander arrow.
  3. Select Calculate and select an option—for example, Add.

    To complete the calculation, some functions require you to enter a decimal value. For example, if you select Subtract, you must enter the value to subtract from each row.

  4. Optional: to create a new column for the calculated values, select the Copy to a new column checkbox.

  5. Click Apply.

The values change based on the calculation. Wrangler adds the corresponding directive as a step in the recipe. For example, if you subtract two from each value in the Price column, Wrangler adds the following transformation to the recipe:

set-column :Price Price - 2

When you run the data pipeline, the transformation is applied to all values in the column.

Supported calculations for one column

You can perform the following calculations on all values in one column:

Function Description
Absolute value Returns the absolute value for each value in a column. For example, the absolute value of -10 is 10.
Add Adds a positive or negative decimal to a column or adds multiple columns to output the sum of each row.
Arccos Returns the inverse of cosine for each value in a column.
Arcsin Returns the inverse of sin for each value in a column.
Arctan Returns the inverse of tangent for each value in a column.
Ceil Returns the smallest integer greater than or equal to the numeric value in each cell of a column. For example, if the value is 2.21, Ceil returns 3. If the value is 5.88, Ceil returns 6. If the value is -5.15, Ceil returns -5.
Cos Returns cosine of each column value.
Cube Raises each value in the column to the power of 3.
Cube root Returns the cube root of each value in the column.
Divide Divides each value in the column by a positive or negative decimal or divides each value in two columns.
Floor Returns the largest integer greater than or equal to the numeric value in each cell of a column. For example, if the value is 2.21, FLOOR returns 2. If the value is 5.88, FLOOR returns 5. Likewise, if the value is -5.15, FLOOR returns -6.
Log Returns the logarithm of each value in a column.
Modulo Returns the remainder when dividing each value in the column by a positive decimal or returns the remainder when dividing each value in two columns.
Multiply Multiplies each value in the column by a positive or negative decimal or multiplies each value in two columns.
Natural Log Returns the natural logarithm of each value in a column.
Power of Raises the column value to the specified power of value.
Random Generates random, unpredictable Double values in a specified column.
Round Rounds each value in a column.
Sin Returns the sin for each value in a column.
Square Raises each value in the column to the power of 2.
Square root Returns the square root of each value in a column.
Substract Subtracts a positive or negative decimal to each value in the column.
Tan Returns the tangent for each value in a column.

Perform numeric calculations on values in two columns

To apply a numeric calculation to two columns, follow these steps:

  1. Go to the Wrangler workspace in Cloud Data Fusion.
  2. On the Data tab, select the checkbox by two column names.
  3. Go to a column name and click the arrow_drop_down expander arrow.
  4. Select Calculate and select an option—for example, Add.
  5. Optional: to create a new column for the calculated values, select the Copy to a new column checkbox. Otherwise, the calculated values override existing values.
  6. Click Apply.

The values change based on the calculation. Wrangler adds the corresponding directive as a step in the recipe. For example, if you add the values in each row of the Q1_Sales and Q2_Sales columns and create a new column called H1_Sales, Wrangler adds the following transformation to the recipe:

set-column :H1_Sales arithmetic:add(Q1_Sales, Q2_Sales)

When you run the data pipeline, Wrangler performs the transformation and creates a new column called H1_Sales with the total of Q1_Sales and Q2_Sales.

Supported calculations in two columns

You can perform the following numeric calculations on values in each row in two columns:

Function Description
Add Adds a positive or negative decimal to a column or adds multiple columns to output the sum of each row.
Average Returns the average of each row in multiple columns.
Divide Divides each value in the column by a positive or negative decimal or divides each value in two columns.
Equal Boolean function that compares two columns to see if values match, row by row. Returns true or false.
LCM Returns the least common multiple of each number in two columns.
Max Returns the maximum value for each row in two columns.
Min Returns the minimum value for each row in two columns.
Modulo Returns the remainder when dividing each value in the column by a positive decimal or returns the remainder when dividing each value in two columns.
Multiply Multiplies each value in the column by a positive or negative decimal or multiplies each value in two columns.
Subtract Subtracts a positive or negative decimal to each value in the column.

Perform numeric calculations on values in three or more columns

Cloud Data Fusion supports performing numeric calculations on values in three or more columns in version 6.8.0 and later.

To apply a numeric calculation to three or more columns, follow these steps:

  1. Go to the Wrangler workspace in Cloud Data Fusion.
  2. On the Data tab, select the checkbox by two column names.
  3. Go to a column name and click the arrow_drop_down expander arrow.
  4. Select Calculate and select an option—for example, Add.
  5. Optional: to create a new column for the calculated values, select the Copy to a new column checkbox. Otherwise, the calculated values override existing values.
  6. Click Apply.

The values change based on the calculation. Wrangler adds the corresponding directive as a step in the recipe. For example, if you add the values in each row of the Q1_Sales, Q2_Sales, Q3_Sales, and Q4_Sales columns and create a new column called 2022_Sales, Wrangler adds the following transformation to the recipe:

set-column :2022_Sales arithmetic:add(Q1_Sales, Q2_Sales, Q3_Sales, Q4_Sales)

When you run the data pipeline, Wrangler performs the transformation and creates a new column called 2022_Sales with the total of Q1_Sales, Q2_Sales, Q3_Sales, and Q4_Sales.

Supported calculations in three or more columns

You can perform the following calculations on values in each row in three or more columns:

Function Description
Add Adds a positive or negative decimal to a column or adds multiple columns to output the sum of each row.
Average Returns the average of each row in multiple columns.
Equal Boolean function that compares two columns to see if values match, row by row. Returns true or false.
Max Returns the maximum value for each row in two columns.
Min Returns the minimum value for each row in two columns.
Multiply Multiplies each value in the column by a positive or negative decimal or multiplies each value in two columns.

What's next