numerical_expression: the
numerical
expression to bucketize.
array_split_points: an array of numerical values that provide the
points at which to split the numerical_expression value. The
numerical values in the array must be finite, so not -inf, inf, or NaN.
Provide the numerical values in order, lowest to highest. The range of
possible buckets is determined by the upper and lower boundaries of the array.
For example, if the array_split_points value is [1, 2, 3, 4], then there
are five potential buckets that the numerical_expression value can be
bucketized into.
exclude_boundaries: a BOOL value that determines whether
the upper and lower boundaries from array_split_points are used.
If TRUE, then the boundary values aren't used to create buckets. For
example, if the array_split_points value is [1, 2, 3, 4] and
exclude_boundaries is TRUE, then there are three potential buckets
that the numerical_expression value can be bucketized into.
The default value is FALSE.
output_format: a STRING value that specifies the output format of the bucket. Valid output formats are as follows:
bucket_names: returns a STRING value in the format bin_<bucket_index>. For example, bin_3. The bucket_index value starts at 1. This is the default bucket format.
bucket_ranges: returns a STRING value in the format [lower_bound, upper_bound) in interval notation. For example, (-inf, 2.5), [2.5, 4.6), [4.6, +inf).
bucket_ranges_json: returns a JSON-formatted STRING value in the format {"start": "lower_bound", "end": "upper_bound"}. For example, {"start": "-Infinity", "end": "2.5"}, {"start": "2.5", "end": "4.6"}, {"start": "4.6", "end": "Infinity"}. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as the bucket_ranges option.
Output
ML.BUCKETIZE returns a STRING value that contains the name of the bucket, in the format specified by the output_format argument.
Example
The following example bucketizes a numerical expression both with and without
boundaries:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.BUCKETIZE\u003c/code\u003e function splits a numerical expression into distinct buckets based on specified split points.\u003c/p\u003e\n"],["\u003cp\u003eYou define bucket boundaries using the \u003ccode\u003earray_split_points\u003c/code\u003e argument, which is an array of ordered numerical values that determine the split points.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eexclude_boundaries\u003c/code\u003e argument (default \u003ccode\u003eFALSE\u003c/code\u003e) determines whether to include or exclude the upper and lower bounds of \u003ccode\u003earray_split_points\u003c/code\u003e when defining the buckets.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eoutput_format\u003c/code\u003e argument lets you choose between three output styles: \u003ccode\u003ebucket_names\u003c/code\u003e (default, e.g., \u003ccode\u003ebin_3\u003c/code\u003e), \u003ccode\u003ebucket_ranges\u003c/code\u003e (e.g., \u003ccode\u003e[2, 3)\u003c/code\u003e), or \u003ccode\u003ebucket_ranges_json\u003c/code\u003e (e.g., \u003ccode\u003e{"start": "2", "end": "3"}\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe output value of \u003ccode\u003eML.BUCKETIZE\u003c/code\u003e is a \u003ccode\u003eSTRING\u003c/code\u003e representing the bucket that the numerical expression falls into, based on the chosen format.\u003c/p\u003e\n"]]],[],null,["# The ML.BUCKETIZE function\n=========================\n\nThis document describes the `ML.BUCKETIZE` function, which lets you split\na numerical expression into buckets.\n\nSyntax\n------\n\n```sql\nML.BUCKETIZE(numerical_expression, array_split_points [, exclude_boundaries] [, output_format])\n```\n\n### Arguments\n\n`ML.BUCKETIZE` takes the following arguments:\n\n- `numerical_expression`: the [numerical](/bigquery/docs/reference/standard-sql/data-types#numeric_types) expression to bucketize.\n- `array_split_points`: an array of numerical values that provide the points at which to split the `numerical_expression` value. The numerical values in the array must be finite, so not `-inf`, `inf`, or `NaN`. Provide the numerical values in order, lowest to highest. The range of possible buckets is determined by the upper and lower boundaries of the array. For example, if the `array_split_points` value is `[1, 2, 3, 4]`, then there are five potential buckets that the `numerical_expression` value can be bucketized into.\n- `exclude_boundaries`: a `BOOL` value that determines whether the upper and lower boundaries from `array_split_points` are used. If `TRUE`, then the boundary values aren't used to create buckets. For example, if the `array_split_points` value is `[1, 2, 3, 4]` and `exclude_boundaries` is `TRUE`, then there are three potential buckets that the `numerical_expression` value can be bucketized into. The default value is `FALSE`.\n- `output_format`: a `STRING` value that specifies the output format of the bucket. Valid output formats are as follows:\n - `bucket_names`: returns a `STRING` value in the format `bin_\u003cbucket_index\u003e`. For example, `bin_3`. The `bucket_index` value starts at 1. This is the default bucket format.\n - `bucket_ranges`: returns a `STRING` value in the format `[lower_bound, upper_bound)` in [interval notation](https://en.wikipedia.org/wiki/Interval_(mathematics)). For example, `(-inf, 2.5)`, `[2.5, 4.6)`, `[4.6, +inf)`.\n - `bucket_ranges_json`: returns a JSON-formatted `STRING` value in the format `{\"start\": \"lower_bound\", \"end\": \"upper_bound\"}`. For example, `{\"start\": \"-Infinity\", \"end\": \"2.5\"}`, `{\"start\": \"2.5\", \"end\": \"4.6\"}`, `{\"start\": \"4.6\", \"end\": \"Infinity\"}`. The inclusivity and exclusivity of the lower and upper bound follow the same pattern as the `bucket_ranges` option.\n\nOutput\n------\n\n`ML.BUCKETIZE` returns a `STRING` value that contains the name of the bucket, in the format specified by the `output_format` argument.\n\nExample\n-------\n\nThe following example bucketizes a numerical expression both with and without\nboundaries: \n\n```sql\nSELECT\n ML.BUCKETIZE(2.5, [1, 2, 3]) AS bucket,\n ML.BUCKETIZE(2.5, [1, 2, 3], TRUE) AS bucket_without_boundaries,\n ML.BUCKETIZE(2.5, [1, 2, 3], FALSE, \"bucket_ranges\") AS bucket_ranges,\n ML.BUCKETIZE(2.5, [1, 2, 3], FALSE, \"bucket_ranges_json\") AS bucket_ranges_json;\n```\n\nThe output looks similar to the following: \n\n```\n+--------+---------------------------+---------------+----------------------------+\n| bucket | bucket_without_boundaries | bucket_ranges | bucket_ranges_json |\n|--------|---------------------------|---------------|----------------------------|\n| bin_3 | bin_2 | [2, 3) | {\"start\": \"2\", \"end\": \"3\"} |\n+--------+---------------------------+---------------+----------------------------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]