Stay organized with collections
Save and categorize content based on your preferences.
The ML.IMPUTER function
This document describes the ML.IMPUTER function, which lets you replace
NULL values in a string or numerical expression. You can replace NULL values
with the most frequently used value for string expressions, or the
mean or
median value for numerical expressions.
When used in the
TRANSFORM clause,
the values calculated during training for mean, median, and most frequently
used value are automatically used in prediction.
Syntax
ML.IMPUTER(expression, strategy) OVER()
Arguments
ML.IMPUTER takes the following arguments:
expression: the
numerical
or STRING expression to impute.
strategy: a STRING value that specifies how to replace NULL values.
Valid values are as follows:
mean: the mean of expression. You can only use this value with
numerical expressions.
median: the median of expression. You can only use this value with
numerical expressions.
most_frequent: the most frequent value in expression.
Output
ML.IMPUTER returns a FLOAT64 (for numerical expressions) or STRING
(for string expressions) value that contains the replacement for the
NULL value.
Examples
Example 1
The following example imputes numerical expressions:
+------+--------+
| f | output |
+------+--------+
| NULL | c |
| NULL | c |
| NULL | c |
| NULL | c |
| a | a |
| a | a |
| b | b |
| b | b |
| c | c |
| c | c |
| c | c |
+------+--------+
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.IMPUTER\u003c/code\u003e function replaces \u003ccode\u003eNULL\u003c/code\u003e values in string or numerical expressions.\u003c/p\u003e\n"],["\u003cp\u003eFor numerical expressions, \u003ccode\u003eNULL\u003c/code\u003e values can be replaced with the mean or median value.\u003c/p\u003e\n"],["\u003cp\u003eFor string expressions, \u003ccode\u003eNULL\u003c/code\u003e values are replaced with the most frequently occurring value.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.IMPUTER\u003c/code\u003e function can be used with the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause, allowing the values calculated during training to be automatically applied during prediction.\u003c/p\u003e\n"],["\u003cp\u003eThe function outputs either a \u003ccode\u003eFLOAT64\u003c/code\u003e value for numerical expressions or a \u003ccode\u003eSTRING\u003c/code\u003e value for string expressions, representing the imputed value.\u003c/p\u003e\n"]]],[],null,["# The ML.IMPUTER function\n=======================\n\nThis document describes the `ML.IMPUTER` function, which lets you replace\n`NULL` values in a string or numerical expression. You can replace `NULL` values\nwith the most frequently used value for string expressions, or the\n[mean](https://en.wikipedia.org/wiki/Mean) or\n[median](https://en.wikipedia.org/wiki/Median) value for numerical expressions.\n\nWhen used in the\n[`TRANSFORM` clause](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform),\nthe values calculated during training for mean, median, and most frequently\nused value are automatically used in prediction.\n\nSyntax\n------\n\n```sql\nML.IMPUTER(expression, strategy) OVER()\n```\n\nArguments\n---------\n\n`ML.IMPUTER` takes the following arguments:\n\n- `expression`: the [numerical](/bigquery/docs/reference/standard-sql/data-types#numeric_types) or `STRING` expression to impute.\n- `strategy`: a `STRING` value that specifies how to replace `NULL` values. Valid values are as follows:\n - `mean`: the mean of `expression`. You can only use this value with numerical expressions.\n - `median`: the median of `expression`. You can only use this value with numerical expressions.\n - `most_frequent`: the most frequent value in `expression`.\n\nOutput\n------\n\n`ML.IMPUTER` returns a `FLOAT64` (for numerical expressions) or `STRING`\n(for string expressions) value that contains the replacement for the\n`NULL` value.\n\nExamples\n--------\n\n**Example 1**\n\nThe following example imputes numerical expressions: \n\n```sql\nSELECT f, ML.IMPUTER(f, 'mean') OVER () AS output\nFROM\n UNNEST([NULL, -3, -3, -3, 1, 2, 3, 4, 5]) AS f\nORDER BY f;\n```\n\nThe output looks similar to the following: \n\n```\n+------+--------+\n| f | output |\n+------+--------+\n| NULL | 0.75 |\n| -3 | -3.0 |\n| -3 | -3.0 |\n| -3 | -3.0 |\n| 1 | 1.0 |\n| 2 | 2.0 |\n| 3 | 3.0 |\n| 4 | 4.0 |\n| 5 | 5.0 |\n+------+--------+\n```\n\n**Example 2**\n\nThe following example imputes string expressions: \n\n```sql\nSELECT f, ML.IMPUTER(f, 'most_frequent') OVER () AS output\nFROM\n UNNEST([NULL, NULL, NULL, NULL, 'a', 'a', 'b', 'b', 'c', 'c', 'c']) AS f\nORDER BY f;\n```\n\nThe output looks similar to the following: \n\n```\n+------+--------+\n| f | output |\n+------+--------+\n| NULL | c |\n| NULL | c |\n| NULL | c |\n| NULL | c |\n| a | a |\n| a | a |\n| b | b |\n| b | b |\n| c | c |\n| c | c |\n| c | c |\n+------+--------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]