Stay organized with collections
Save and categorize content based on your preferences.
The ML.ROBUST_SCALER function
This document describes the ML.ROBUST_SCALER function, which lets you scale a
numerical expression by using statistics that are robust to outliers. The
function performs the scaling by removing the
median and scaling
the data according to the quantile
range.
When used in the
TRANSFORM clause,
the median and quantile range calculated during training are automatically
used in prediction.
numerical_expression: the
numerical
expression to scale.
quantile_range: an array of two INT64 elements that specifies
the quantile range. The first element provides the lower boundary of the
range. It must be greater than 0. The second element provides the upper
boundary of the range. It must be greater than the first element but less
than 100. The default value is [25, 75].
with_median: a BOOL value that specifies whether the data
is centered. If TRUE, the function centers the data by removing the
median before scaling. The default value is TRUE.
with_quantile_range: a BOOL value that specifies whether the
data is scaled to the quantile range. If TRUE, the data is scaled. The
default value is TRUE.
Output
ML.ROBUST_SCALER returns a FLOAT64 value that represents the scaled
numerical expression.
Example
The following example centers a set of numerical expressions and then
scales it to the range [25, 75]:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.ROBUST_SCALER\u003c/code\u003e function scales numerical expressions using statistics resistant to outliers by removing the median and scaling based on the quantile range.\u003c/p\u003e\n"],["\u003cp\u003eThis function accepts a numerical expression, a custom quantile range, and optional parameters to control median centering and quantile range scaling.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.ROBUST_SCALER\u003c/code\u003e function returns a \u003ccode\u003eFLOAT64\u003c/code\u003e value representing the scaled numerical expression.\u003c/p\u003e\n"],["\u003cp\u003eThe transform clause uses statistics from the training to do predictions.\u003c/p\u003e\n"],["\u003cp\u003eWhen \u003ccode\u003ewith_median\u003c/code\u003e and \u003ccode\u003ewith_quantile_range\u003c/code\u003e are set to \u003ccode\u003eTRUE\u003c/code\u003e, data is centered by the median, and scaled to the quantile range, respectively, defaulting to a quantile range of \u003ccode\u003e[25, 75]\u003c/code\u003e.\u003c/p\u003e\n"]]],[],null,["# The ML.ROBUST_SCALER function\n=============================\n\nThis document describes the `ML.ROBUST_SCALER` function, which lets you scale a\nnumerical expression by using statistics that are robust to outliers. The\nfunction performs the scaling by removing the\n[median](https://en.wikipedia.org/wiki/Median) and scaling\nthe data according to the [quantile](https://en.wikipedia.org/wiki/Quantile)\nrange.\n\nWhen used in the\n[`TRANSFORM` clause](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform),\nthe median and quantile range calculated during training are automatically\nused in prediction.\n\nSyntax\n------\n\n```sql\nML.ROBUST_SCALER(numerical_expression [, quantile_range] [, with_median] [, with_quantile_range]) OVER()\n```\n\n### Arguments\n\n`ML.ROBUST_SCALER` takes the following arguments:\n\n- `numerical_expression`: the [numerical](/bigquery/docs/reference/standard-sql/data-types#numeric_types) expression to scale.\n- `quantile_range`: an array of two `INT64` elements that specifies the quantile range. The first element provides the lower boundary of the range. It must be greater than `0`. The second element provides the upper boundary of the range. It must be greater than the first element but less than `100`. The default value is `[25, 75]`.\n- `with_median`: a `BOOL` value that specifies whether the data is centered. If `TRUE`, the function centers the data by removing the median before scaling. The default value is `TRUE`.\n- `with_quantile_range`: a `BOOL` value that specifies whether the data is scaled to the quantile range. If `TRUE`, the data is scaled. The default value is `TRUE`.\n\nOutput\n------\n\n`ML.ROBUST_SCALER` returns a `FLOAT64` value that represents the scaled\nnumerical expression.\n\nExample\n-------\n\nThe following example centers a set of numerical expressions and then\nscales it to the range `[25, 75]`: \n\n```sql\nSELECT f, ML.ROBUST_SCALER(f) OVER () AS output\nFROM\n UNNEST([NULL, -3, 1, 2, 3, 4, 5]) AS f\nORDER BY f;\n```\n\nThe output looks similar to the following: \n\n```\n+------+---------------------+\n| f | output |\n+------+---------------------+\n| NULL | NULL |\n| -3 | -1.6666666666666667 |\n| 1 | -0.3333333333333333 |\n| 2 | 0.0 |\n| 3 | 0.3333333333333333 |\n| 4 | 0.6666666666666666 |\n| 5 | 1.0 |\n+------+---------------------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]