BigQuery ML supports ARRAY<numerical> as dense vector input
during model training. The embedding feature is a special type of dense vector.
see the ML.GENERATE_EMBEDDING function for more information.
Sparse input
BigQuery ML supports ARRAY<STRUCT> as sparse input during
model training. Each struct contains an INT64 value that represents its
zero-based index, and a
numeric type
that represents the corresponding value.
Below is an example of a sparse tensor input for the integer array
[0,1,0,0,0,0,1]:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eBigQuery ML accommodates various input feature types, tailored to different model categories such as supervised, unsupervised, and time series models.\u003c/p\u003e\n"],["\u003cp\u003eNumeric, categorical, timestamp, struct, geography, and array types are supported across many BigQuery ML models, with specific models having certain specificities.\u003c/p\u003e\n"],["\u003cp\u003eDense vector input is supported using \u003ccode\u003eARRAY<numerical>\u003c/code\u003e for model training, which includes a special embedding feature as seen in the \u003ccode\u003eML.GENERATE_EMBEDDING\u003c/code\u003e function.\u003c/p\u003e\n"],["\u003cp\u003eSparse input during model training is supported through the use of \u003ccode\u003eARRAY<STRUCT>\u003c/code\u003e, where each struct contains an \u003ccode\u003eINT64\u003c/code\u003e index and a numeric value.\u003c/p\u003e\n"],["\u003cp\u003eMatrix Factorization and ARIMA_PLUS models have unique input requirements, with the provided input types for ARIMA_PLUS_XREG only applying to external regressors.\u003c/p\u003e\n"]]],[],null,["# Supported input feature types\n=============================\n\nBigQuery ML supports different input feature types for different model types.\nSupported input feature types are listed in the following table:\n\n| **Note:** [Matrix Factorization](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-matrix-factorization#inputs) and [ARIMA_PLUS](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-time-series#time_series_data_col) models have special input feature types. The input types listed for [ARIMA_PLUS_XREG](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-multivariate-time-series#time_series_data_col) are only for external regressors.\n\nDense vector input\n------------------\n\nBigQuery ML supports `ARRAY\u003cnumerical\u003e` as dense vector input\nduring model training. The embedding feature is a special type of dense vector.\nsee the [`ML.GENERATE_EMBEDDING` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-generate-embedding) for more information.\n\nSparse input\n------------\n\nBigQuery ML supports `ARRAY\u003cSTRUCT\u003e` as sparse input during\nmodel training. Each struct contains an `INT64` value that represents its\nzero-based index, and a\n[numeric type](/bigquery/docs/reference/standard-sql/data-types#numeric_types)\nthat represents the corresponding value.\n\nBelow is an example of a sparse tensor input for the integer array\n`[0,1,0,0,0,0,1]`: \n\n ARRAY\u003cSTRUCT\u003ck INT64, v INT64\u003e\u003e[(1, 1), (6, 1)] AS f1"]]