Stay organized with collections
Save and categorize content based on your preferences.
The ML.NGRAMS function
This document describes the ML.NGRAMS function, which lets you create
n-grams of the input values.
Syntax
ML.NGRAMS(array_input, range [, separator])
Arguments
ML.NGRAMS takes the following arguments:
array_input: an ARRAY<STRING> value that represent the tokens to be
merged.
range: an ARRAY of two INT64 elements or a single INT64 value. If
you specify an ARRAY value, the INT64 elements provide the range
of n-gram sizes to return. Provide the numerical values in order, lower to
higher. If you specify a single INT64 value of x, the
range of n-gram sizes to return is [x, x].
separator: a STRING value that specifies the separator to
connect two adjacent tokens in the output. The default value is
whitespace .
Output
ML.NGRAMS returns an ARRAY<STRING> value that contain the n-grams.
Example
The following example outputs all possible 2-token and 3-token combinations
for a set of three input strings:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-07 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.NGRAMS\u003c/code\u003e function creates n-grams from input values, enabling the combination of adjacent tokens.\u003c/p\u003e\n"],["\u003cp\u003eThe function accepts an array of strings (\u003ccode\u003earray_input\u003c/code\u003e), a range of n-gram sizes (\u003ccode\u003erange\u003c/code\u003e), and an optional separator (\u003ccode\u003eseparator\u003c/code\u003e).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003erange\u003c/code\u003e argument can be an array specifying a range (lower to higher) or a single integer, defining the size of the n-grams.\u003c/p\u003e\n"],["\u003cp\u003eThe default \u003ccode\u003eseparator\u003c/code\u003e is whitespace, but users can customize it with any string to join the tokens.\u003c/p\u003e\n"],["\u003cp\u003eThe output of \u003ccode\u003eML.NGRAMS\u003c/code\u003e is an array of strings, with each string representing an n-gram created based on the input.\u003c/p\u003e\n"]]],[],null,[]]