The ML.HASH_BUCKETIZE function
This document describes the ML.HASH_BUCKETIZE
function, which lets you
convert a string expression to a deterministic hash and then bucketize it by the
modulo value of that hash.
You can use this function with models that support manual feature preprocessing. For more information, see the following documents:
Syntax
ML.HASH_BUCKETIZE(string_expression, hash_bucket_size)
Arguments
ML.HASH_BUCKETIZE
takes the following arguments:
string_expression
: theSTRING
expression to bucketize.hash_bucket_size
: anINT64
value that specifies the number of buckets to create. This value must be greater than or equal to0
. Ifhash_bucket_size
equals0
, the function only hashes the string without bucketizing the hashed value.
Output
ML.HASH_BUCKETIZE
returns an INT64
value that identifies the bucket.
Example
The following example bucketizes string expressions into three buckets:
SELECT f, ML.HASH_BUCKETIZE(f, 3) AS bucket FROM UNNEST(['a', 'b', 'c', 'd']) AS f;
The output looks similar to the following:
+---+--------+ | f | bucket | +---+--------+ | a | 0 | +---+--------+ | b | 1 | +---+--------+ | c | 1 | +---+--------+ | d | 2 | +------------+
What's next
- For information about feature preprocessing, see Feature preprocessing overview.