ML.QUANTILE_BUCKETIZE takes the following arguments:
numerical_expression: the
numerical
expression to bucketize.
num_buckets: an INT64 value that specifies the number of buckets to split
numerical_expression into.
output_format: a STRING value that specifies the output format of the bucket. Valid output formats are as follows:
bucket_names: returns a STRING value in the format bin_<bucket_index>. For example, bin_3. The bucket_index value starts at 1. This is the default bucket format.
bucket_ranges: returns a STRING value in the format [lower_bound, upper_bound) in interval notation. For example, (-inf, 2.5), [2.5, 4.6), [4.6, +inf).
bucket_ranges_json: returns a JSON-formatted STRING value in the format {"start": "lower_bound", "end": "upper_bound"}. For example, {"start": "-Infinity", "end": "2.5"}, {"start": "2.5", "end": "4.6"}, {"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as the bucket_ranges option.
Output
ML.QUANTILE_BUCKETIZE returns a STRING value that contains the name of the bucket, in the format specified by the output_format argument.
Example
The following example breaks a numerical expression of five elements into
three buckets:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003e\u003ccode\u003eML.QUANTILE_BUCKETIZE\u003c/code\u003e divides a continuous numerical feature into buckets based on quantiles, enabling the categorization of numerical data.\u003c/p\u003e\n"],["\u003cp\u003eThis function supports three output formats: \u003ccode\u003ebucket_names\u003c/code\u003e (e.g., \u003ccode\u003ebin_3\u003c/code\u003e), \u003ccode\u003ebucket_ranges\u003c/code\u003e (e.g., \u003ccode\u003e[2.5, 4.6)\u003c/code\u003e), and \u003ccode\u003ebucket_ranges_json\u003c/code\u003e (e.g., \u003ccode\u003e{"start": "2.5", "end": "4.6"}\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003enum_buckets\u003c/code\u003e argument specifies how many buckets the numerical expression will be split into.\u003c/p\u003e\n"],["\u003cp\u003eWhen used within the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause, \u003ccode\u003eML.QUANTILE_BUCKETIZE\u003c/code\u003e automatically applies the same quantiles during prediction.\u003c/p\u003e\n"]]],[],null,["# The ML.QUANTILE_BUCKETIZE function\n==================================\n\nThis document describes the `ML.QUANTILE_BUCKETIZE` function, which lets you\nbreak a continuous numerical feature into buckets based on quantiles.\n\nWhen used in the\n[`TRANSFORM` clause](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform),\nthe same quantiles are automatically used in prediction.\n\nSyntax\n------\n\n```sql\nML.QUANTILE_BUCKETIZE(numerical_expression, num_buckets [, output_format]) OVER()\n```\n\n### Arguments\n\n`ML.QUANTILE_BUCKETIZE` takes the following arguments:\n\n- `numerical_expression`: the [numerical](/bigquery/docs/reference/standard-sql/data-types#numeric_types) expression to bucketize.\n- `num_buckets`: an `INT64` value that specifies the number of buckets to split `numerical_expression` into.\n- `output_format`: a `STRING` value that specifies the output format of the bucket. Valid output formats are as follows:\n - `bucket_names`: returns a `STRING` value in the format `bin_\u003cbucket_index\u003e`. For example, `bin_3`. The `bucket_index` value starts at 1. This is the default bucket format.\n - `bucket_ranges`: returns a `STRING` value in the format `[lower_bound, upper_bound)` in [interval notation](https://en.wikipedia.org/wiki/Interval_(mathematics)). For example, `(-inf, 2.5)`, `[2.5, 4.6)`, `[4.6, +inf)`.\n - `bucket_ranges_json`: returns a JSON-formatted `STRING` value in the format `{\"start\": \"lower_bound\", \"end\": \"upper_bound\"}`. For example, `{\"start\": \"-Infinity\", \"end\": \"2.5\"}`, `{\"start\": \"2.5\", \"end\": \"4.6\"}`, `{\"start\": \"4.6\", \"end\": \"Infinity\"}`. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as the `bucket_ranges` option.\n\nOutput\n------\n\n`ML.QUANTILE_BUCKETIZE` returns a `STRING` value that contains the name of the bucket, in the format specified by the `output_format` argument.\n\nExample\n-------\n\nThe following example breaks a numerical expression of five elements into\nthree buckets: \n\n```sql\nSELECT\n f,\n ML.QUANTILE_BUCKETIZE(f, 3) OVER() AS bucket,\n ML.QUANTILE_BUCKETIZE(f, 3, \"bucket_ranges\") OVER() AS bucket_ranges,\n ML.QUANTILE_BUCKETIZE(f, 3, \"bucket_ranges_json\") OVER() AS bucket_ranges_json\nFROM\n UNNEST([1,2,3,4,5]) AS f\nORDER BY f;\n```\n\nThe output looks similar to the following: \n\n```\n+---+--------+---------------+------------------------------------+\n| f | bucket | bucket_ranges | bucket_ranges_json |\n|---|--------|---------------|------------------------------------|\n| 1 | bin_1 | (-inf, 2) | {\"start\": \"-Infinity\", \"end\": \"2\"} |\n| 2 | bin_2 | [2, 4) | {\"start\": \"2\", \"end\": \"4\"} |\n| 3 | bin_2 | [2, 4) | {\"start\": \"2\", \"end\": \"4\"} |\n| 4 | bin_3 | [4, +inf) | {\"start\": \"4\", \"end\": \"Infinity\"} |\n| 5 | bin_3 | [4, +inf) | {\"start\": \"4\", \"end\": \"Infinity\"} |\n+---+--------+---------------+------------------------------------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]