The ML.IMPUTER function

This document describes the ML.IMPUTER function, which lets you replace NULL values in a string or numerical expression. You can replace NULL values with the most frequently used value for string expressions, or the mean or median value for numerical expressions.

When used in the TRANSFORM clause, the values calculated during training for mean, median, and most frequently used value are automatically used in prediction.

You can use this function with models that support manual feature preprocessing. For more information, see the following documents:

Syntax

ML.IMPUTER(expression, strategy) OVER()

Arguments

ML.IMPUTER takes the following arguments:

expression: the numerical or STRING expression to impute.
strategy: a STRING value that specifies how to replace NULL values. Valid values are as follows:
- mean: the mean of expression. You can only use this value with numerical expressions.
- median: the median of expression. You can only use this value with numerical expressions.
- most_frequent: the most frequent value in expression.

Output

ML.IMPUTER returns a FLOAT64 (for numerical expressions) or STRING (for string expressions) value that contains the replacement for the NULL value.

Examples

Example 1

The following example imputes numerical expressions:

SELECT f, ML.IMPUTER(f, 'mean') OVER () AS output
FROM
  UNNEST([NULL, -3, -3, -3, 1, 2, 3, 4, 5]) AS f
ORDER BY f;

The output looks similar to the following:

+------+--------+
|  f   | output |
+------+--------+
| NULL |   0.75 |
|   -3 |   -3.0 |
|   -3 |   -3.0 |
|   -3 |   -3.0 |
|    1 |    1.0 |
|    2 |    2.0 |
|    3 |    3.0 |
|    4 |    4.0 |
|    5 |    5.0 |
+------+--------+

Example 2

The following example imputes string expressions:

SELECT f, ML.IMPUTER(f, 'most_frequent') OVER () AS output
FROM
  UNNEST([NULL, NULL, NULL, NULL, 'a', 'a', 'b', 'b', 'c', 'c', 'c']) AS f
ORDER BY f;

The output looks similar to the following:

+------+--------+
|  f   | output |
+------+--------+
| NULL | c      |
| NULL | c      |
| NULL | c      |
| NULL | c      |
| a    | a      |
| a    | a      |
| b    | b      |
| b    | b      |
| c    | c      |
| c    | c      |
| c    | c      |
+------+--------+

What's next

For information about feature preprocessing, see Feature preprocessing overview.