Stay organized with collections
Save and categorize content based on your preferences.
The ML.FEATURES_AT_TIME function
This document describes the ML.FEATURES_AT_TIME function, which lets you use
a point-in-time cutoff for all entities when retrieving features, because
features can have time dependencies if they include time-sensitive data. To
avoid data leakage,
use point-in-time features when training models and running inference.
Use this function to use the same point-in-time cutoff for all entities when
retrieving features. Use the
ML.ENTITY_FEATURES_AT_TIME function
to retrieve features from multiple points in time for multiple entities.
ML.FEATURES_AT_TIME takes the following arguments:
PROJECT_ID: the project that contains the
table.
DATASET: the BigQuery dataset that
contains the table.
TABLE_NAME: is the name of the BigQuery
table that contains the feature data. The feature table must contain the
following columns:
entity_id: a STRING column that contains the ID of the entity
related to the features.
One or more feature columns.
feature_timestamp: a TIMESTAMP column that identifies when the
feature data was last updated.
The column names are case-insensitive. For example, you can use a column
named Entity_ID instead of entity_id.
The feature table must be in
wide format,
with one column for each feature.
QUERY_STATEMENT: a STRING value that specifies a
GoogleSQL query that returns the feature data.
This query must return the same columns as the TABLE_NAME argument. See
GoogleSQL query syntax
for the supported SQL syntax of the QUERY_STATEMENT clause.
TIME: a TIMESTAMP value that specifies the point in
time to use as a cutoff for feature data. Only rows where the value in the
feature_timestamp column is equal to or earlier than the TIME value are
returned. Defaults to the value of the
CURRENT_TIMESTAMP function.
NUM_ROWS: an INT64 value that specifies the number
of rows to return for each entity ID. Defaults to 1.
IGNORE_FEATURE_NULLS: a BOOL value that indicates
whether to replace a NULL value in a feature column with the feature column
value from the row for the same entity that immediately precedes it in time.
For example, for the following feature table:
Results in the following output, where the f2 value from the
row for entity ID 2 that is timestamped '2022-06-10 12:00:00+00' is
substituted for the NULL value in the row timestamped
'2022-06-11 10:00:00+00':
If there is no available replacement value, for example, where there is no
earlier row for that entity ID, a NULL value is returned.
Defaults to FALSE.
Output
The ML.FEATURES_AT_TIME function returns the input table rows that meet the
point-in-time cutoff criteria, with the feature_timestamp column showing the
timestamp that was input in the time argument.
Examples
Example 1
This example shows a how to retrain a model using only features that were
created or updated before 2023-01-01 12:00:00+00:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-25 UTC."],[[["\u003cp\u003eThe \u003ccode\u003eML.FEATURES_AT_TIME\u003c/code\u003e function retrieves features with a point-in-time cutoff for all entities, which is essential for avoiding data leakage when dealing with time-sensitive data.\u003c/p\u003e\n"],["\u003cp\u003eThis function uses a \u003ccode\u003efeature_table\u003c/code\u003e or a \u003ccode\u003equery_statement\u003c/code\u003e with required columns like \u003ccode\u003eentity_id\u003c/code\u003e and \u003ccode\u003efeature_timestamp\u003c/code\u003e to determine feature data, along with optional arguments to filter rows and specify the cutoff point.\u003c/p\u003e\n"],["\u003cp\u003eYou can use the \u003ccode\u003etime\u003c/code\u003e argument to specify a timestamp, ensuring that only data from or before that point is considered, or you can also have it default to \u003ccode\u003eCURRENT_TIMESTAMP\u003c/code\u003e.\u003c/p\u003e\n"],["\u003cp\u003eThe function allows specifying the number of rows per entity using the \u003ccode\u003enum_rows\u003c/code\u003e argument and the handling of \u003ccode\u003eNULL\u003c/code\u003e values in features using the \u003ccode\u003eignore_feature_nulls\u003c/code\u003e argument, replacing them with earlier values when available.\u003c/p\u003e\n"],["\u003cp\u003e\u003ccode\u003eML.FEATURES_AT_TIME\u003c/code\u003e is crucial for both training and inference when working with models that rely on time-dependent features, providing a consistent and accurate view of data at a specific point.\u003c/p\u003e\n"]]],[],null,["# The ML.FEATURES_AT_TIME function\n================================\n\nThis document describes the `ML.FEATURES_AT_TIME` function, which lets you use\na point-in-time cutoff for all entities when retrieving features, because\nfeatures can have time dependencies if they include time-sensitive data. To\navoid [data leakage](https://en.wikipedia.org/wiki/Leakage_(machine_learning)),\nuse point-in-time features when training models and running inference.\n\nUse this function to use the same point-in-time cutoff for all entities when\nretrieving features. Use the\n[`ML.ENTITY_FEATURES_AT_TIME` function](/bigquery/docs/reference/standard-sql/bigqueryml-syntax-entity-feature-time)\nto retrieve features from multiple points in time for multiple entities.\n\nSyntax\n------\n\n```sql\nML.FEATURES_AT_TIME(\n { TABLE `PROJECT_ID.DATASET.TABLE_NAME` | (QUERY_STATEMENT) }\n [, TIME =\u003e TIMESTAMP][, NUM_ROWS =\u003e INT64][, IGNORE_FEATURE_NULLS =\u003e BOOL])\n```\n\n### Arguments\n\n`ML.FEATURES_AT_TIME` takes the following arguments:\n\n- \u003cvar translate=\"no\"\u003ePROJECT_ID\u003c/var\u003e: the project that contains the table.\n- \u003cvar translate=\"no\"\u003eDATASET\u003c/var\u003e: the BigQuery dataset that contains the table.\n- \u003cvar translate=\"no\"\u003eTABLE_NAME\u003c/var\u003e: is the name of the BigQuery\n table that contains the feature data. The feature table must contain the\n following columns:\n\n - `entity_id`: a `STRING` column that contains the ID of the entity related to the features.\n - One or more feature columns.\n - `feature_timestamp`: a `TIMESTAMP` column that identifies when the feature data was last updated.\n\n The column names are case-insensitive. For example, you can use a column\n named `Entity_ID` instead of `entity_id`.\n\n The feature table must be in\n [wide](https://en.wikipedia.org/wiki/Wide_and_narrow_data#Wide) format,\n with one column for each feature.\n- \u003cvar translate=\"no\"\u003eQUERY_STATEMENT\u003c/var\u003e: a `STRING` value that specifies a\n GoogleSQL query that returns the feature data.\n This query must return the same columns as the `TABLE_NAME` argument. See\n [GoogleSQL query syntax](/bigquery/docs/reference/standard-sql/query-syntax#sql_syntax)\n for the supported SQL syntax of the `QUERY_STATEMENT` clause.\n\n- \u003cvar translate=\"no\"\u003eTIME\u003c/var\u003e: a `TIMESTAMP` value that specifies the point in\n time to use as a cutoff for feature data. Only rows where the value in the\n `feature_timestamp` column is equal to or earlier than the `TIME` value are\n returned. Defaults to the value of the\n [`CURRENT_TIMESTAMP` function](/bigquery/docs/reference/standard-sql/timestamp_functions#current_timestamp).\n\n- \u003cvar translate=\"no\"\u003eNUM_ROWS\u003c/var\u003e: an `INT64` value that specifies the number\n of rows to return for each entity ID. Defaults to `1`.\n\n- \u003cvar translate=\"no\"\u003eIGNORE_FEATURE_NULLS\u003c/var\u003e: a `BOOL` value that indicates\n whether to replace a `NULL` value in a feature column with the feature column\n value from the row for the same entity that immediately precedes it in time.\n For example, for the following feature table:\n\n ```\n +-----------+------+------+--------------------------+\n | entity_id | f1 | f2 | feature_timestamp |\n +-----------+------+------+--------------------------+\n | '2' | 5.0 | 8.0 | '2022-06-10 09:00:00+00' |\n +-----------+------+------+--------------------------+\n | '2' | 2.0 | 4.0 | '2022-06-10 12:00:00+00' |\n +-----------+------+------+--------------------------+\n | '2' | 7.0 | NULL | '2022-06-11 10:00:00+00' |\n +-----------+------+------+--------------------------+\n ```\n\n Running this query: \n\n ```sql\n SELECT *\n FROM\n ML.FEATURES_AT_TIME(\n TABLE mydataset.feature_table,\n time =\u003e '2022-06-11 10:00:00+00',\n num_rows =\u003e 1,\n ignore_feature_nulls =\u003e TRUE);\n ```\n\n Results in the following output, where the `f2` value from the\n row for entity ID 2 that is timestamped `'2022-06-10 12:00:00+00'` is\n substituted for the `NULL` value in the row timestamped\n `'2022-06-11 10:00:00+00'`: \n\n ```\n +-----------+------+------+--------------------------+\n | entity_id | f1 | f2 | feature_timestamp |\n +-----------+------+------+--------------------------+\n | '2' | 7.0 | 4.0 | '2022-06-11 10:00:00+00' |\n +-----------+------+------+--------------------------+\n ```\n\n If there is no available replacement value, for example, where there is no\n earlier row for that entity ID, a `NULL` value is returned.\n\n Defaults to `FALSE`.\n\nOutput\n------\n\nThe `ML.FEATURES_AT_TIME` function returns the input table rows that meet the\npoint-in-time cutoff criteria, with the `feature_timestamp` column showing the\ntimestamp that was input in the `time` argument.\n\nExamples\n--------\n\n**Example 1**\n\nThis example shows a how to retrain a model using only features that were\ncreated or updated before `2023-01-01 12:00:00+00`: \n\n```sql\nCREATE OR REPLACE\n `mydataset.mymodel` OPTIONS (WARM_START = TRUE)\nAS\nSELECT * EXCEPT (feature_timestamp, entity_id)\nFROM\n ML.FEATURES_AT_TIME(\n TABLE `mydataset.feature_table`,\n time =\u003e '2023-01-01 12:00:00+00',\n num_rows =\u003e 1,\n ignore_feature_nulls =\u003e TRUE);\n```\n\n**Example 2**\n\nThis example shows how to get predictions from a model based on features\nthat were created or updated before `2023-01-01 12:00:00+00`: \n\n```sql\nSELECT\n *\nFROM\n ML.PREDICT(\n MODEL `mydataset.mymodel`,\n (\n SELECT * EXCEPT (feature_timestamp, entity_id)\n FROM\n ML.FEATURES_AT_TIME(\n TABLE `mydataset.feature_table`,\n time =\u003e '2023-01-01 12:00:00+00',\n num_rows =\u003e 1,\n ignore_feature_nulls =\u003e TRUE)\n )\n );\n```\n\n**Example 3**\n\nThis is a contrived example that you can use to see the output of the\nfunction: \n\n```sql\nWITH\n feature_table AS (\n SELECT * FROM UNNEST(\n ARRAY\u003cSTRUCT\u003centity_id STRING, f_1 FLOAT64, f_2 FLOAT64, feature_timestamp TIMESTAMP\u003e\u003e[\n ('id1', 1.0, 1.0, TIMESTAMP '2022-06-10 12:00:00+00'),\n ('id2', 12.0, 24.0, TIMESTAMP '2022-06-11 12:00:00+00'),\n ('id1', 11.0, NULL, TIMESTAMP '2022-06-11 12:00:00+00'),\n ('id1', 6.0, 12.0, TIMESTAMP '2022-06-11 10:00:00+00'),\n ('id2', 2.0, 4.0, TIMESTAMP '2022-06-10 12:00:00+00'),\n ('id2', 7.0, NULL, TIMESTAMP '2022-06-11 10:00:00+00')])\n )\nSELECT *\nFROM\n ML.FEATURES_AT_TIME(\n TABLE feature_table,\n time =\u003e '2022-06-12 10:00:00+00',\n num_rows =\u003e 1,\n ignore_feature_nulls =\u003e TRUE);\n```"]]