Mantenha tudo organizado com as coleções
Salve e categorize o conteúdo com base nas suas preferências.
Pré-processamento manual de atributos
É possível usar a cláusula TRANSFORM da instrução CREATE MODEL junto com as funções de pré-processamento manual para definir o pré-processamento de dados personalizado. Também é possível usar essas funções de pré-processamento manual fora da cláusula TRANSFORM.
Se você quiser dissociar o pré-processamento de dados do treinamento de modelo, crie um
modelo somente de transformação
que só realize transformações de dados usando a cláusula TRANSFORM.
É possível usar a
função ML.TRANSFORM
para aumentar a transparência do pré-processamento de atributos. Essa função permite retornar os dados pré-processados da cláusula TRANSFORM de um modelo para que você possa ver os dados reais de treinamento que entram no treinamento de modelo, bem como os dados de previsão reais que a disponibilização do modelo usa.
Há vários tipos de funções de pré-processamento manual:
As funções escalares operam em uma única linha (por exemplo, ML.BUCKETIZE).
As funções com valor de tabela operam em todas as linhas e geram uma tabela. Exemplo,
ML.FEATURES_AT_TIME.
As funções analíticas operam em todas as linhas e geram o resultado de cada linha com base nas estatísticas coletadas em todas as linhas. Exemplo,
ML.QUANTILE_BUCKETIZE.
Sempre use uma cláusula OVER() vazia com funções analíticas de ML.
Quando você usa funções analíticas de ML dentro da cláusula TRANSFORM durante o treinamento, as mesmas estatísticas são aplicadas automaticamente à entrada na previsão.
As seções a seguir descrevem as funções de pré-processamento disponíveis.
Funções gerais
Use a seguinte função em strings ou expressões numéricas para fazer a limpeza de dados:
[[["Fácil de entender","easyToUnderstand","thumb-up"],["Meu problema foi resolvido","solvedMyProblem","thumb-up"],["Outro","otherUp","thumb-up"]],[["Difícil de entender","hardToUnderstand","thumb-down"],["Informações incorretas ou exemplo de código","incorrectInformationOrSampleCode","thumb-down"],["Não contém as informações/amostras de que eu preciso","missingTheInformationSamplesINeed","thumb-down"],["Problema na tradução","translationIssue","thumb-down"],["Outro","otherDown","thumb-down"]],["Última atualização 2025-08-17 UTC."],[[["\u003cp\u003eManual feature preprocessing can be defined using custom functions with the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause in the \u003ccode\u003eCREATE MODEL\u003c/code\u003e statement, or independently.\u003c/p\u003e\n"],["\u003cp\u003eTransform-only models can be created using the \u003ccode\u003eTRANSFORM\u003c/code\u003e clause to perform data transformations without training a model.\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.TRANSFORM\u003c/code\u003e function allows users to inspect preprocessed data from a model's \u003ccode\u003eTRANSFORM\u003c/code\u003e clause for improved transparency.\u003c/p\u003e\n"],["\u003cp\u003eManual preprocessing functions are categorized into scalar, table-valued, and analytic functions, each operating on different scopes of data.\u003c/p\u003e\n"],["\u003cp\u003eThe data cleanup, numerical, categorical, text, and image functions are available for use in manual preprocessing.\u003c/p\u003e\n"]]],[],null,["# Manual feature preprocessing\n============================\n\nYou can use the\n[`TRANSFORM` clause](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#transform)\nof the `CREATE MODEL` statement in combination with manual preprocessing\nfunctions to define custom data preprocessing. You can\nalso use these manual preprocessing functions outside of the `TRANSFORM` clause.\n\nIf you want to decouple data preprocessing from model training, you can create a\n[transform-only model](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-transform)\nthat only performs data transformations by using the `TRANSFORM` clause.\n\nYou can use the\n[`ML.TRANSFORM` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-transform)\nto increase the transparency of feature preprocessing. This function lets you\nreturn the preprocessed data from a model's `TRANSFORM` clause, so that you can\nsee the actual training data that goes into the model training, as well as the\nactual prediction data that goes into model serving.\n\nFor information about feature preprocessing support in\nBigQuery ML, see\n[Feature preprocessing overview](/bigquery/docs/preprocess-overview).\n\nFor information about the supported SQL statements and functions for each model\ntype, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey).\n\nTypes of preprocessing functions\n--------------------------------\n\nThere are several types of manual preprocessing functions:\n\n- Scalar functions operate on a single row. For example, [`ML.BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-bucketize).\n- Table-valued functions operate on all rows and output a table. For example, [`ML.FEATURES_AT_TIME`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-feature-time).\n- Analytic functions operate on all rows, and output the result for each\n row based on the statistics collected across all rows. For example,\n [`ML.QUANTILE_BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-quantile-bucketize).\n\n You must always use an empty `OVER()` clause with ML analytic functions.\n\n When you use ML analytic functions inside the`TRANSFORM` clause\n during training, the same statistics are automatically applied to\n the input in prediction.\n\nThe following sections describe the available preprocessing functions.\n\n### General functions\n\nUse the following function on string or numerical expressions to do data cleanup:\n\n- [`ML.IMPUTER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-imputer)\n\n### Numerical functions\n\nUse the following functions on numerical expressions to regularize data:\n\n- [`ML.BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-bucketize)\n- [`ML.MAX_ABS_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-max-abs-scaler)\n- [`ML.MIN_MAX_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-min-max-scaler)\n- [`ML.NORMALIZER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-normalizer)\n- [`ML.POLYNOMIAL_EXPAND`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-polynomial-expand)\n- [`ML.QUANTILE_BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-quantile-bucketize)\n- [`ML.ROBUST_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-robust-scaler)\n- [`ML.STANDARD_SCALER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-standard-scaler)\n\n### Categorical functions\n\nUse the following functions on categorize data:\n\n- [`ML.FEATURE_CROSS`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-feature-cross)\n- [`ML.HASH_BUCKETIZE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-hash-bucketize)\n- [`ML.LABEL_ENCODER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-label-encoder)\n- [`ML.MULTI_HOT_ENCODER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-multi-hot-encoder)\n- [`ML.ONE_HOT_ENCODER`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-one-hot-encoder)\n\n### Text functions\n\nUse the following functions on text string expressions:\n\n- [`ML.NGRAMS`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-ngrams)\n- [`ML.BAG_OF_WORDS`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-bag-of-words)\n- [`ML.TF_IDF`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-tf-idf)\n\n### Image functions\n\nUse the following functions on image data:\n\n- [`ML.CONVERT_COLOR_SPACE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-convert-color-space)\n- [`ML.CONVERT_IMAGE_TYPE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-convert-image-type)\n- [`ML.DECODE_IMAGE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-decode-image)\n- [`ML.RESIZE_IMAGE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-resize-image)\n\nKnown limitations\n-----------------\n\n- BigQuery ML supports both automatic preprocessing and manual preprocessing in the [model export](/bigquery/docs/exporting-models). See the [supported data types](/bigquery/docs/exporting-models#export-transform-types) and [functions](/bigquery/docs/exporting-models#export-transform-functions) for exporting models trained with the [BigQuery ML `TRANSFORM` clause](/bigquery/docs/bigqueryml-transform)."]]