Stay organized with collections
Save and categorize content based on your preferences.
The ML.TRAINING_INFO function
This document describes the ML.TRAINING_INFO function, which lets you see
information about the training iterations of a model.
You can run ML.TRAINING_INFO while the CREATE MODEL
statement for the target model is running, or you can wait until after the
CREATE MODEL statement completes. If you run ML.TRAINING_INFO before the
first training iteration of the CREATE MODEL statement completes, the query
returns a Not found error.
Syntax
ML.TRAINING_INFO(
MODEL `PROJECT_ID.DATASET.MODEL_NAME`,
)
Arguments
ML.TRAINING_INFO takes the following arguments:
PROJECT_ID: your project ID.
DATASET: the BigQuery dataset that contains
the model.
MODEL_NAME: the name of the model.
Output
ML.TRAINING_INFO returns the following columns:
training_run: an INT64 value that contains the training run identifier
for the model. The value in this column is 0 for a newly created model. If
you retrain the model using the warm_start argument of the CREATE MODEL
statement, this value is incremented.
iteration: an INT64 value that contains the iteration number of the
training run. The value for the first iteration is 0. This value is
incremented for each additional training run.
loss: a FLOAT64 value that contains the loss metric calculated after
an iteration on the training data:
For more information about what the variables mean, see
Feedback types.
eval_loss: a FLOAT64 value that contains the loss metric calculated on the
holdout data. For k-means models, ML.TRAINING_INFO doesn't return an
eval_loss column. If the
DATA_SPLIT_METHOD
argument is NO_SPLIT, then all entries in the eval_loss column are NULL.
learning_rate: a FLOAT64 value that contains the
learning rate
in this iteration.
duration_ms: an INT64 value that contains how long the iteration took,
in milliseconds.
cluster_info: an ARRAY<STRUCT> value that contains the
fields centroid_id, cluster_radius, and cluster_size.
ML.TRAINING_INFO computes cluster_radius and cluster_size with
standardized features. Only returned for k-means models.
For time series models,
ML.TRAINING_INFO only returns three columns: training_run, iteration, and
duration_ms. It doesn't expose the training information per iteration, or per
time series if multiple time series are forecasted at once. The duration_ms is
the total time cost for the entire process.
Example
The following example retrieves training information from the model
mydataset.mymodel in your default project:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.TRAINING_INFO\u003c/code\u003e function provides details about the training iterations of a BigQuery ML model, allowing users to monitor model training progress.\u003c/p\u003e\n"],["\u003cp\u003eIt can be run during or after the \u003ccode\u003eCREATE MODEL\u003c/code\u003e statement, but it will return a \u003ccode\u003eNot found\u003c/code\u003e error if run before the first training iteration completes.\u003c/p\u003e\n"],["\u003cp\u003eThe function returns columns such as \u003ccode\u003etraining_run\u003c/code\u003e, \u003ccode\u003eiteration\u003c/code\u003e, \u003ccode\u003eloss\u003c/code\u003e, \u003ccode\u003eeval_loss\u003c/code\u003e, \u003ccode\u003elearning_rate\u003c/code\u003e, \u003ccode\u003eduration_ms\u003c/code\u003e, and \u003ccode\u003ecluster_info\u003c/code\u003e (for k-means models).\u003c/p\u003e\n"],["\u003cp\u003eThe \u003ccode\u003eML.TRAINING_INFO\u003c/code\u003e function has limitations including not supporting imported TensorFlow models and limited information for time series models.\u003c/p\u003e\n"],["\u003cp\u003eThe user needs the \u003ccode\u003ebigquery.models.create\u003c/code\u003e and \u003ccode\u003ebigquery.models.getData\u003c/code\u003e IAM permissions to run the \u003ccode\u003eML.TRAINING_INFO\u003c/code\u003e function.\u003c/p\u003e\n"]]],[],null,["# The ML.TRAINING_INFO function\n=============================\n\nThis document describes the `ML.TRAINING_INFO` function, which lets you see\ninformation about the training iterations of a model.\n\nYou can run `ML.TRAINING_INFO` while the `CREATE MODEL`\nstatement for the target model is running, or you can wait until after the\n`CREATE MODEL` statement completes. If you run `ML.TRAINING_INFO` before the\nfirst training iteration of the `CREATE MODEL` statement completes, the query\nreturns a `Not found` error.\n\nSyntax\n------\n\n```sql\nML.TRAINING_INFO(\n MODEL `PROJECT_ID.DATASET.MODEL_NAME`,\n)\n```\n\n### Arguments\n\n`ML.TRAINING_INFO` takes the following arguments:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: your project ID.\n- \u003cvar translate=\"no\"\u003eDATASET\u003c/var\u003e: the BigQuery dataset that contains the model.\n- \u003cvar translate=\"no\"\u003eMODEL_NAME\u003c/var\u003e: the name of the model.\n\nOutput\n------\n\n`ML.TRAINING_INFO` returns the following columns:\n\n- `training_run`: an `INT64` value that contains the training run identifier for the model. The value in this column is `0` for a newly created model. If you retrain the model using the `warm_start` argument of the `CREATE MODEL` statement, this value is incremented.\n- `iteration`: an `INT64` value that contains the iteration number of the training run. The value for the first iteration is `0`. This value is incremented for each additional training run.\n- `loss`: a `FLOAT64` value that contains the loss metric calculated after\n an iteration on the training data:\n\n - For logistic regression models, this is [log loss](https://developers.google.com/machine-learning/glossary/#Log_Loss).\n - For linear regression models, this is [mean squared error](https://developers.google.com/machine-learning/glossary/#MSE).\n - For multiclass logistic regressions, this is [cross-entropy log loss](https://developers.google.com/machine-learning/glossary/#cross-entropy).\n - For explicit matrix factorization models this is mean squared error calculated over the seen input ratings.\n - For implicit matrix factorization models, the loss is calculated using the following formula:\n\n $$ Loss = \\\\sum_{u, i} c_{ui}(p_{ui} - x\\^T_uy_i)\\^2 + \\\\lambda(\\\\sum_u\\|\\|x_u\\|\\|\\^2 + \\\\sum_i\\|\\|y_i\\|\\|\\^2) $$\n\n For more information about what the variables mean, see\n [Feedback types](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-matrix-factorization#feedback-info).\n- `eval_loss`: a `FLOAT64` value that contains the loss metric calculated on the\n holdout data. For k-means models, `ML.TRAINING_INFO` doesn't return an\n `eval_loss` column. If the\n [`DATA_SPLIT_METHOD`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#data_split_method)\n argument is `NO_SPLIT`, then all entries in the `eval_loss` column are `NULL`.\n\n- `learning_rate`: a `FLOAT64` value that contains the\n [learning rate](https://developers.google.com/machine-learning/glossary/#learning_rate)\n in this iteration.\n\n- `duration_ms`: an `INT64` value that contains how long the iteration took,\n in milliseconds.\n\n- `cluster_info`: an `ARRAY\u003cSTRUCT\u003e` value that contains the\n fields `centroid_id`, `cluster_radius`, and `cluster_size`.\n `ML.TRAINING_INFO` computes `cluster_radius` and `cluster_size` with\n standardized features. Only returned for k-means models.\n\n| **Note:** For linear and logistic regression models, the `learning_rate` value can increase over the course of training if the [`LEARN_RATE_STRATEGY`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#learn_rate_strategy) argument isn't set to `CONSTANT`. This increase is due to the fact that when the `LEARN_RATE_STRATEGY` is set to `LINE_SEARCH`, four learning rates are generated from [`LS_INIT_LEARN_RATE`](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create#ls_init_learn_rate). For example, if `LS_INIT_LEARN_RATE` is `0.1`, then the first iteration of training compares the respective loss from a set of four different models trained from setting the learn rate to `0.2`, `0.1`, `0.05`, and `0.025`. If `LEARN_RATE=0.2` generates the model with the smallest loss, then the next iteration generates four models with the learning rate set to `0.4`, `0.2`, `0.1`, `0.05` from the previous best fit model, and this process continues until the model converges.\n\nPermissions\n-----------\n\nYou must have the `bigquery.models.create` and `bigquery.models.getData`\n[Identity and Access Management (IAM) permissions](/bigquery/docs/access-control#bq-permissions)\nin order to run `ML.TRAINING_INFO`.\n\nLimitations\n-----------\n\n`ML.TRAINING_INFO` is subject to the following limitations:\n\n- `ML.TRAINING_INFO` doesn't support [imported TensorFlow models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-tensorflow).\n- For [time series models](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-create-time-series), `ML.TRAINING_INFO` only returns three columns: `training_run`, `iteration`, and `duration_ms`. It doesn't expose the training information per iteration, or per time series if multiple time series are forecasted at once. The `duration_ms` is the total time cost for the entire process.\n\nExample\n-------\n\nThe following example retrieves training information from the model\n`mydataset.mymodel` in your default project: \n\n```sql\nSELECT\n *\nFROM\n ML.TRAINING_INFO(MODEL `mydataset.mymodel`)\n```\n\nWhat's next\n-----------\n\n- For information about model evaluation, see [BigQuery ML model evaluation overview](/bigquery/docs/evaluate-overview).\n- For information about the supported SQL statements and functions for each model type, see [End-to-end user journey for each model](/bigquery/docs/e2e-journey)."]]