The AI.FORECAST function

This document describes the AI.FORECAST function, which lets you forecast a time series by using BigQuery ML's built-in TimesFM model.

Using the AI.FORECAST function with the built-in TimesFM model lets you perform forecasting without having to create and train your own model, so you can avoid the need for model management.

Syntax

SELECT
  *
FROM
  AI.FORECAST(
    { TABLE table_name | (query_statement) },
    data_col => string_value,
    timestamp_col => string_value
    [, model => string_value]
    [, id_cols => string_array]
    [, horizon => int64_value]
    [, confidence_level => float64_value]
  )

Arguments

AI.FORECAST takes the following arguments:

  • table: the name of the table that contains the data that you want to forecast. For example, \`mydataset.mytable\`.

    If the table is in a different project, then you must prepend the project ID to the table name in the following format, including backticks:

    \`[PROJECT_ID].[DATASET].[TABLE]\`

    For example, `myproject.mydataset.mytable`.

  • query_statement: the GoogleSQL query that generates the data that you want to forecast. See the GoogleSQL query syntax page for the supported SQL syntax of the query_statement clause.

  • data_col: aSTRING value that specifies the name of the data column. The data column contains the data to forecast. The data column must use one of the following data types:

    • INT64
    • NUMERIC
    • FLOAT64
  • timestamp_col: a STRING value that specified the name of the time points column. The time points column provides the time points used to generate the forecast. The time points column must use one of the following data types:

    • TIMESTAMP
    • DATE
    • DATETIME
  • model: a STRING value that specifies the name of the model. TimesFM 2.0 is the only supported value, and is the default value.

  • id_cols: An ARRAY<STRING> value that specifies the names of one or more ID columns. Each ID identifies a unique time series to forecast. Specify one or more values for this argument in order to forecast multiple time series using a single query. The columns that you specify must use one of the following data types:

    • STRING
    • INT64
    • ARRAY<STRING>
    • ARRAY<INT64>
  • horizon: an INT64 value that specifies the number of time points to forecast. The default value is 10. The valid input range is [1, 10,000].

  • confidence_level: a FLOAT64 value that specifies the percentage of the future values that fall in the prediction interval. The default value is 0.95. The valid input range is [0, 1).

Output

AI.FORECAST returns the following columns:

  • forecast_timestamp: a TIMESTAMP value that contains the timestamps of the time series.
  • forecast_value: a FLOAT64 value that contains the average of the prediction_interval_lower_bound and prediction_interval_upper_bound values.
  • confidence_level: a FLOAT64 value that contains the confidence_level value that you specified in the function input, or 0.95 if you didn't specify a confidence_level value. This value is the same across all rows.
  • prediction_interval_lower_bound: a FLOAT64 value that contains the lower bound of the prediction interval for each forecasted point.
  • prediction_interval_upper_bound: a FLOAT64 value that contains the upper bound of the prediction interval for each forecasted point.
  • ai_forecast_status: a STRING value that contains the forecast status. This value is empty if the operation was successful. If the operation wasn't successful, the value is the error string. A common error is The time series length is less than 3, which is the minimum required length to produce a forecast. This error indicates that there wasn't enough historical data in the time series to generate a forecast.

Example

The following example forecasts units sold by sales date for each city and store number. The query forecasts 50 time points with a confidence level of 0.75:

SELECT
  *
FROM
  AI.FORECAST(
    `mydataset.sales`,
    data_col => 'units_sold',
    timestamp_col => 'sales_date',
    model => 'TimesFM 2.0',
    id_cols => ['store_number', 'city']
    horizon => 50,
    confidence_level => .75
  );

Limitations

Only the most recent 512 timepoints are used when generating the forecast.

What's next