Stay organized with collections
Save and categorize content based on your preferences.
The ML.NGRAMS function
This document describes the ML.NGRAMS function, which lets you create
n-grams of the input values.
Syntax
ML.NGRAMS(array_input, range [, separator])
Arguments
ML.NGRAMS takes the following arguments:
array_input: an ARRAY<STRING> value that represent the tokens to be
merged.
range: an ARRAY of two INT64 elements or a single INT64 value. If
you specify an ARRAY value, the INT64 elements provide the range
of n-gram sizes to return. Provide the numerical values in order, lower to
higher. If you specify a single INT64 value of x, the
range of n-gram sizes to return is [x, x].
separator: a STRING value that specifies the separator to
connect two adjacent tokens in the output. The default value is
whitespace .
Output
ML.NGRAMS returns an ARRAY<STRING> value that contain the n-grams.
Example
The following example outputs all possible 2-token and 3-token combinations
for a set of three input strings:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.NGRAMS\u003c/code\u003e function creates n-grams from input values, enabling the combination of adjacent tokens.\u003c/p\u003e\n"],["\u003cp\u003eThe function accepts an array of strings (\u003ccode\u003earray_input\u003c/code\u003e), a range of n-gram sizes (\u003ccode\u003erange\u003c/code\u003e), and an optional separator (\u003ccode\u003eseparator\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003erange\u003c/code\u003e argument can be an array specifying a range (lower to higher) or a single integer, defining the size of the n-grams.\u003c/p\u003e\n"],["\u003cp\u003eThe default \u003ccode\u003eseparator\u003c/code\u003e is whitespace, but users can customize it with any string to join the tokens.\u003c/p\u003e\n"],["\u003cp\u003eThe output of \u003ccode\u003eML.NGRAMS\u003c/code\u003e is an array of strings, with each string representing an n-gram created based on the input.\u003c/p\u003e\n"]]],[],null,["# The ML.NGRAMS function\n======================\n\nThis document describes the `ML.NGRAMS` function, which lets you create\n[n-grams](https://wikipedia.org/wiki/N-gram) of the input values.\n\nSyntax\n------\n\n```sql\nML.NGRAMS(array_input, range [, separator])\n```\n\n### Arguments\n\n`ML.NGRAMS` takes the following arguments:\n\n- `array_input`: an `ARRAY\u003cSTRING\u003e` value that represent the tokens to be merged.\n- `range`: an `ARRAY` of two `INT64` elements or a single `INT64` value. If you specify an `ARRAY` value, the `INT64` elements provide the range of n-gram sizes to return. Provide the numerical values in order, lower to higher. If you specify a single `INT64` value of *x* , the range of n-gram sizes to return is `[x, x]`.\n- `separator`: a `STRING` value that specifies the separator to connect two adjacent tokens in the output. The default value is whitespace .\n\nOutput\n------\n\n`ML.NGRAMS` returns an `ARRAY\u003cSTRING\u003e` value that contain the n-grams.\n\nExample\n-------\n\nThe following example outputs all possible 2-token and 3-token combinations\nfor a set of three input strings: \n\n```sql\nSELECT\n ML.NGRAMS(['a', 'b', 'c'], [2,3], '#') AS output;\n```\n\nThe output looks similar to the following: \n\n```\n+-----------------------+\n| output |\n+-----------------------+\n| [\"a#b\",\"a#b#c\",\"b#c\"] |\n+-----------------------+\n```\n\nWhat's next\n-----------\n\n- For information about feature preprocessing, see [Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]