The ML.NGRAMS function

This document describes the ML.NGRAMS function, which lets you create n-grams of the input values.

You can use this function with models that support manual feature preprocessing. For more information, see the following documents:

Syntax

ML.NGRAMS(array_input, range [, separator])

ML.NGRAMS takes the following arguments:

array_input: an ARRAY<STRING> value that represent the tokens to be merged.
range: an ARRAY of two INT64 elements or a single INT64 value. If you specify an ARRAY value, the INT64 elements provide the range of n-gram sizes to return. Provide the numerical values in order, lower to higher. If you specify a single INT64 value of x, the range of n-gram sizes to return is [x, x].
separator: a STRING value that specifies the separator to connect two adjacent tokens in the output. The default value is whitespace .

ML.NGRAMS returns an ARRAY<STRING> value that contain the n-grams.

The following example outputs all possible 2-token and 3-token combinations for a set of three input strings:

SELECT
  ML.NGRAMS(['a', 'b', 'c'], [2,3], '#') AS output;

The output looks similar to the following:

+-----------------------+
|        output         |
+-----------------------+
| ["a#b","a#b#c","b#c"] |
+-----------------------+

For information about feature preprocessing, see Feature preprocessing overview.