GoogleSQL for BigQuery supports KLL functions.
The KLL16 algorithm estimates quantiles from sketches. If you don't want to work with sketches and don't need customized precision, consider using approximate aggregate functions with system-defined precision.
KLL functions are approximate aggregate functions. Approximate aggregation requires significantly less memory than an exact quantiles computation, but also introduces statistical error. This makes approximate aggregation appropriate for large data streams for which linear memory usage is impractical, as well as for data that is already approximate.
Due to the non-deterministic nature of the KLL algorithm, sketches created on the same set of data with the same precision might not be identical, leading to variation in the approximate quantile results.
Function list
Name | Summary |
---|---|
KLL_QUANTILES.EXTRACT_INT64
|
Gets a selected number of quantiles from an
INT64 -initialized KLL sketch.
|
KLL_QUANTILES.EXTRACT_FLOAT64
|
Gets a selected number of quantiles from a
FLOAT64 -initialized KLL sketch.
|
KLL_QUANTILES.EXTRACT_POINT_INT64
|
Gets a specific quantile from an
INT64 -initialized KLL sketch.
|
KLL_QUANTILES.EXTRACT_POINT_FLOAT64
|
Gets a specific quantile from a
FLOAT64 -initialized KLL sketch.
|
KLL_QUANTILES.INIT_INT64
|
Aggregates values into an
INT64 -initialized KLL sketch.
|
KLL_QUANTILES.INIT_FLOAT64
|
Aggregates values into a
FLOAT64 -initialized KLL sketch.
|
KLL_QUANTILES.MERGE_INT64
|
Merges INT64 -initialized KLL sketches into a new sketch, and
then gets the quantiles from the new sketch.
|
KLL_QUANTILES.MERGE_FLOAT64
|
Merges FLOAT64 -initialized KLL sketches
into a new sketch, and then gets the quantiles from the new sketch.
|
KLL_QUANTILES.MERGE_PARTIAL
|
Merges KLL sketches of the same underlying type into a new sketch. |
KLL_QUANTILES.MERGE_POINT_INT64
|
Merges INT64 -initialized KLL sketches into a new sketch, and
then gets a specific quantile from the new sketch.
|
KLL_QUANTILES.MERGE_POINT_FLOAT64
|
Merges FLOAT64 -initialized KLL sketches
into a new sketch, and then gets a specific quantile from the new sketch.
|
KLL_QUANTILES.EXTRACT_INT64
KLL_QUANTILES.EXTRACT_INT64(sketch, num_quantiles)
Description
Gets a selected number of approximate quantiles from an
INT64
-initialized KLL sketch, including the minimum value and the
maximum value in the input set.
Supported Argument Types
sketch
:BYTES
KLL sketch initialized on theINT64
data type. If this isn't a valid KLL quantiles sketch, or if the underlying data type is different fromINT64
, an error is produced.num_quantiles
: A positiveINT64
value that represents the number of roughly equal-sized subsets that the quantiles partition the sketch-captured input values into. The maximum value is 100,000.
Details
The number of returned values produced is always num_quantiles + 1
as
an array in this order:
- minimum value in input set
- each approximate quantile
- maximum value in input set
For example, if num_quantiles
is 3
, and the result of this function
is [0, 34, 67, 100]
, this means that 0
is the minimum value,
34
and 67
are the approximate quantiles, and 100
is the maximum value.
In addition, the result represents the following three segments:
0 to 34
, 34 to 67
, and 67 to 100
.
Return Type
ARRAY<INT64>
Example
The following query initializes a KLL sketch, kll_sketch
, from Data
, and
then extracts the minimum value (0
), the maximum value (100
), and
approximate quantiles in between.
WITH Data AS (
SELECT x FROM UNNEST(GENERATE_ARRAY(1, 100)) AS x
)
SELECT
KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS halves,
KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 3) AS terciles,
KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 4) AS quartiles,
KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 6) AS sextiles,
FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch FROM Data);
/*------------+---------------+------------------+------------------------*
| halves | terciles | quartiles | sextiles |
+------------+---------------+------------------+------------------------+
| [1,50,100] | [1,34,67,100] | [1,25,50,75,100] | [1,17,34,50,67,84,100] |
*------------+---------------+------------------+------------------------*/
KLL_QUANTILES.EXTRACT_FLOAT64
KLL_QUANTILES.EXTRACT_FLOAT64(sketch, num_quantiles)
Description
Like KLL_QUANTILES.EXTRACT_INT64
,
but accepts KLL sketches initialized on data of type
FLOAT64
.
Return Type
ARRAY<FLOAT64>
KLL_QUANTILES.EXTRACT_POINT_INT64
KLL_QUANTILES.EXTRACT_POINT_INT64(sketch, phi)
Description
Takes a single KLL sketch as BYTES
and returns a single quantile.
The phi
argument specifies the quantile to return as a fraction of the total
number of rows in the input, normalized between 0 and 1. This means that the
function will return a value v such that approximately Φ * n inputs are less
than or equal to v, and a (1-Φ) * n inputs are greater than or equal to v.
This is a scalar function.
Returns an error if the underlying type of the input sketch isn't compatible
with type INT64
.
Returns an error if the input isn't a valid KLL quantiles sketch.
Supported Argument Types
sketch
:BYTES
KLL sketch initialized onINT64
data typephi
:FLOAT64
between 0 and 1
Return Type
INT64
Example
The following query initializes a KLL sketch from five rows of data. Then it returns the value of the eighth decile or 80th percentile of the sketch.
SELECT KLL_QUANTILES.EXTRACT_POINT_INT64(kll_sketch, .8) AS quintile
FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 1 AS x UNION ALL
SELECT 2 AS x UNION ALL
SELECT 3 AS x UNION ALL
SELECT 4 AS x UNION ALL
SELECT 5 AS x));
/*----------*
| quintile |
+----------+
| 4 |
*----------*/
KLL_QUANTILES.EXTRACT_POINT_FLOAT64
KLL_QUANTILES.EXTRACT_POINT_FLOAT64(sketch, phi)
Description
Like KLL_QUANTILES.EXTRACT_POINT_INT64
,
but accepts KLL sketches initialized on data of type
FLOAT64
.
Supported Argument Types
sketch
:BYTES
KLL sketch initialized onFLOAT64
data typephi
:FLOAT64
between 0 and 1
Return Type
FLOAT64
KLL_QUANTILES.INIT_INT64
KLL_QUANTILES.INIT_INT64(
input
[, precision [, weight => input_weight ]]
)
Description
Takes one or more input
values and aggregates them into a
KLL sketch. This function represents the output sketch
using the BYTES
data type. This is an
aggregate function.
Supported Argument Types
input
:INT64
precision
: AnINT64
value that defines the exactness of the returned approximate quantile q. The default value is 1000. For more information about precision, see Precision for KLL sketches. The value ofprecision
must be between 1 and 100,000.input_weight
:INT64
. By default, values in an initialized KLL sketch are weighted equally as1
. To weight values differently, use the named argument,weight
, which assigns a weight to each input in the resulting KLL sketch.weight
is a multiplier. For example, if you assign a weight of3
to an input value, it's as if three instances of the input value are included in the generation of the KLL sketch. The minimum value forweight
is1
and the maximum value is2,147,483,647
.
Return Type
KLL sketch as BYTES
Examples
The following query takes a column of type INT64
and outputs a sketch as
BYTES
that allows you to retrieve values whose ranks are within
±1/1000 * 5 = ±1/200 ≈ 0 ranks of their exact quantile.
SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 1 AS x UNION ALL
SELECT 2 AS x UNION ALL
SELECT 3 AS x UNION ALL
SELECT 4 AS x UNION ALL
SELECT 5 AS x);
The following examples illustrate how weight works when you initialize a KLL sketch. The results are converted to quantiles.
WITH points AS (
SELECT 1 AS x, 1 AS y UNION ALL
SELECT 2 AS x, 1 AS y UNION ALL
SELECT 3 AS x, 1 AS y UNION ALL
SELECT 4 AS x, 1 AS y UNION ALL
SELECT 5 AS x, 1 AS y)
SELECT KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS halves
FROM
(
SELECT KLL_QUANTILES.INIT_INT64(x, 1000, weight=>y) AS kll_sketch
FROM points
);
/*---------*
| halves |
+---------+
| [1,3,5] |
*---------*/
WITH points AS (
SELECT 1 AS x, 1 AS y UNION ALL
SELECT 2 AS x, 3 AS y UNION ALL
SELECT 3 AS x, 1 AS y UNION ALL
SELECT 4 AS x, 1 AS y UNION ALL
SELECT 5 AS x, 1 AS y)
SELECT KLL_QUANTILES.EXTRACT_INT64(kll_sketch, 2) AS halves
FROM
(
SELECT KLL_QUANTILES.INIT_INT64(x, 1000, weight=>y) AS kll_sketch
FROM points
);
/*---------*
| halves |
+---------+
| [1,2,5] |
*---------*/
KLL_QUANTILES.INIT_FLOAT64
KLL_QUANTILES.INIT_FLOAT64(input[, precision[, weight => input_weight]])
Description
Like KLL_QUANTILES.INIT_INT64
,
but accepts input
of type FLOAT64
.
KLL_QUANTILES.INIT_FLOAT64
orders values according to the GoogleSQL
floating point sort order. For example, NaN
orders before
‑inf
.
Supported Argument Types
input
:FLOAT64
precision
:INT64
input_weight
:INT64
Return Type
KLL sketch as BYTES
KLL_QUANTILES.MERGE_INT64
KLL_QUANTILES.MERGE_INT64(sketch, num_quantiles)
Description
Takes KLL sketches as BYTES
and merges them into
a new sketch, then returns the quantiles that divide the input into
num_quantiles
equal-sized groups, along with the minimum and maximum values of the
input. The output is an ARRAY
containing the exact minimum value from
the input data that you used to initialize the sketches, each
approximate quantile, and the exact maximum value from the initial input data.
This is an aggregate function.
If the merged sketches were initialized with different precisions, the precision is downgraded to the lowest precision involved in the merge — except if the aggregations are small enough to still capture the input exactly — then the mergee's precision is maintained.
Returns an error if the underlying type of one or more input sketches isn't
compatible with type INT64
.
Returns an error if the input isn't a valid KLL quantiles sketch.
Supported Argument Types
sketch
:BYTES
KLL sketch initialized onINT64
data typenum_quantiles
: A positiveINT64
value that represents the number of roughly equal-sized groups to divide the merged sketches into. The maximum value is 100,000.
Return Type
ARRAY<INT64>
Example
The following query initializes two KLL sketches from five rows of data each.
Then it merges these two sketches and returns an ARRAY
containing the minimum,
median, and maximum values in the input sketches.
SELECT KLL_QUANTILES.MERGE_INT64(kll_sketch, 2) AS halves
FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 1 AS x UNION ALL
SELECT 2 AS x UNION ALL
SELECT 3 AS x UNION ALL
SELECT 4 AS x UNION ALL
SELECT 5)
UNION ALL
SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 6 AS x UNION ALL
SELECT 7 AS x UNION ALL
SELECT 8 AS x UNION ALL
SELECT 9 AS x UNION ALL
SELECT 10 AS x));
/*----------*
| halves |
+----------+
| [1,5,10] |
*----------*/
KLL_QUANTILES.MERGE_FLOAT64
KLL_QUANTILES.MERGE_FLOAT64(sketch, num_quantiles)
Description
Like KLL_QUANTILES.MERGE_INT64
,
but accepts KLL sketches initialized on data of type
FLOAT64
.
KLL_QUANTILES.MERGE_FLOAT64
orders values according to the GoogleSQL
floating point sort order. For example, NaN
orders before
‑inf
.
Supported Argument Types
sketch
:BYTES
KLL sketch initialized onFLOAT64
data typenum_quantiles
:INT64
Return Type
ARRAY<FLOAT64>
KLL_QUANTILES.MERGE_PARTIAL
KLL_QUANTILES.MERGE_PARTIAL(sketch)
Description
Takes KLL sketches of the same underlying type and merges them to return a new sketch of the same underlying type. This is an aggregate function.
If the merged sketches were initialized with different precisions, the precision is downgraded to the lowest precision involved in the merge — except if the aggregations are small enough to still capture the input exactly — then the mergee's precision is maintained.
Returns an error if two or more sketches don't have compatible underlying types,
such as one sketch of INT64
values and another of
FLOAT64
values.
Returns an error if one or more inputs aren't a valid KLL quantiles sketch.
Ignores NULL
sketches. If the input contains zero rows or only NULL
sketches, the function returns NULL
.
Supported Argument Types
sketch
:BYTES
KLL sketch
Return Type
KLL sketch as BYTES
Example
The following query initializes two KLL sketches from five rows of data each.
Then it merges these two sketches into a new sketch, also as BYTES
. Both
input sketches have the same underlying data type and precision.
SELECT KLL_QUANTILES.MERGE_PARTIAL(kll_sketch) AS merged_sketch
FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 1 AS x UNION ALL
SELECT 2 AS x UNION ALL
SELECT 3 AS x UNION ALL
SELECT 4 AS x UNION ALL
SELECT 5)
UNION ALL
SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 6 AS x UNION ALL
SELECT 7 AS x UNION ALL
SELECT 8 AS x UNION ALL
SELECT 9 AS x UNION ALL
SELECT 10 AS x));
KLL_QUANTILES.MERGE_POINT_INT64
KLL_QUANTILES.MERGE_POINT_INT64(sketch, phi)
Description
Takes KLL sketches as BYTES
and merges them, then extracts a single
quantile from the merged sketch. The phi
argument specifies the quantile
to return as a fraction of the total number of rows in the input, normalized
between 0 and 1. This means that the function will return a value v such that
approximately Φ * n inputs are less than or equal to v, and a (1-Φ) * n
inputs are greater than or equal to v. This is an aggregate function.
If the merged sketches were initialized with different precisions, the precision is downgraded to the lowest precision involved in the merge — except if the aggregations are small enough to still capture the input exactly — then the mergee's precision is maintained.
Returns an error if the underlying type of one or more input sketches isn't
compatible with type INT64
.
Returns an error if the input isn't a valid KLL quantiles sketch.
Supported Argument Types
sketch
:BYTES
KLL sketch initialized onINT64
data typephi
:FLOAT64
between 0 and 1
Return Type
INT64
Example
The following query initializes two KLL sketches from five rows of data each. Then it merges these two sketches and returns the value of the ninth decile or 90th percentile of the merged sketch.
SELECT KLL_QUANTILES.MERGE_POINT_INT64(kll_sketch, .9) AS quantile
FROM (SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 1 AS x UNION ALL
SELECT 2 AS x UNION ALL
SELECT 3 AS x UNION ALL
SELECT 4 AS x UNION ALL
SELECT 5)
UNION ALL
SELECT KLL_QUANTILES.INIT_INT64(x, 1000) AS kll_sketch
FROM (SELECT 6 AS x UNION ALL
SELECT 7 AS x UNION ALL
SELECT 8 AS x UNION ALL
SELECT 9 AS x UNION ALL
SELECT 10 AS x));
/*----------*
| quantile |
+----------+
| 9 |
*----------*/
KLL_QUANTILES.MERGE_POINT_FLOAT64
KLL_QUANTILES.MERGE_POINT_FLOAT64(sketch, phi)
Description
Like KLL_QUANTILES.MERGE_POINT_INT64
,
but accepts KLL sketches initialized on data of type
FLOAT64
.
KLL_QUANTILES.MERGE_POINT_FLOAT64
orders values according to the
GoogleSQL floating point sort order. For example, NaN
orders before ‑inf
.
Supported Argument Types
sketch
:BYTES
KLL sketch initialized onFLOAT64
data typephi
:FLOAT64
between 0 and 1
Return Type
FLOAT64