将特征数据加载到 BigQuery 表或视图后,您需要将此数据源提供给 Vertex AI Feature Store 以进行在线传送。您可以通过以下两种方式将数据源连接到在线传送资源,例如在线存储区和特征视图实例:
通过创建特征组和特征来注册数据源:您可以将特征组和特征与在线存储区中的特征视图实例相关联。在这种情况中,您可以通过添加 feature_timestamp 列将数据格式化为时序格式。Vertex AI Feature Store 根据特征时间戳为每个唯一实体 ID 仅传送最新的非 null 值。如需了解如何创建特征组,请参阅创建特征组。如需了解如何在特征组中创建特征,请参阅创建特征。
不创建特征组和特征,直接从数据源传送特征:您可以在特征视图中指定数据源的 URI。请注意,在这种情况下,您无法将数据格式化为时序,也无法在 BigQuery 来源中包含历史数据。每行必须包含与唯一 ID 对应的最新特征值。不支持同一实体 ID 在不同行中多次出现。
由于 Vertex AI Feature Store 允许您在 BigQuery 中维护特征数据并从 BigQuery 数据源传送特征数据,因此无需将特征导入或复制到离线存储区。
数据源准备准则
在 BigQuery 中准备数据源时,请遵循以下准则以了解架构和限制条件:
数据源必须包含以下列:
至少有一个实体 ID 列具有 string 值。此列中的每个值的大小必须小于 4 KB。
如果使用特征组和特征注册数据源,请添加 feature_timestamp 列并将数据格式化为时序格式。feature_timestamp 列包含 timestamp 类型的值。在在线传送期间,Vertex AI Feature Store 会根据此时间戳传送特征的最新非 null 值。
如果您直接将 BigQuery 数据源与特征视图相关联,则不需要 feature_timestamp 列。在这种情况下,您必须在数据源中仅添加最新的特征值,并且 Vertex AI Feature Store 不会查找时间戳。
如果您想在为优化在线传送而创建的在线存储区中使用嵌入管理,数据源必须包含以下列:
包含 float 类型数组的 embedding 列。
可选:一个或多个 string 或 string 类型数组的过滤列。
可选:int 类型的数量上限列。
数据源中的每一行都是与实体 ID 关联的特征值的完整记录。如果其中一列缺少特征值,则将其视为 null 值。根据您定义特征视图的方式,Vertex AI Feature Store 有两种方式选择其传送的特征值:
如果特征视图是根据特征组和特征定义的,则 Vertex AI Feature Store 会使用特征时间戳传送最新的非 null 特征值。例如,如果与最新时间戳对应的特定特征的值为 null,则 Vertex AI Feature Store 会根据该特征的历史值传送最新的非 null 值。
如果特征视图通过直接指定 BigQuery 数据源来定义,则每行都必须包含实体 ID 列的唯一值。在这种情况下,Vertex AI Feature Store 会从关联的数据源中传送所有特征值。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-07-05。"],[],[],null,["# Prepare data source\n\nBefore you can start serving features online using\nVertex AI Feature Store, you need to set up your feature data source in\nBigQuery, as follows:\n\n1. Create a BigQuery table or view using your feature data. To load\n feature data into a BigQuery table or view, you can create a\n BigQuery dataset using the data, create a BigQuery\n table, and then load the feature data from the dataset into the table.\n\n2. After you load the feature data into the BigQuery table or\n view, you need to make this data source available to\n Vertex AI Feature Store for online serving. There are two ways in\n which you can connect the data source to online serving resources, such as\n online stores and feature view instances:\n\n - **Register the data source by creating feature groups and features:**\n You can associate feature groups and features with feature view instances\n in your online store. You can format the data in either of the following ways:\n\n - Format your data as a time series by including a feature timestamp\n column. Vertex AI Feature Store serves only the latest\n feature values for each unique entity ID, based on the feature\n timestamp in this column.\n\n - Format the data without including a feature timestamp columns.\n Vertex AI Feature Store manages the timestamps and serves\n only the latest feature values for each unique entity ID.\n\n For information about how to create feature groups, see\n [Create a feature group](/vertex-ai/docs/featurestore/latest/create-featuregroup). For\n information about how to create features within a feature group, see\n [create a feature](/vertex-ai/docs/featurestore/latest/create-feature).\n - **Directly serve features from the data source without creating feature groups and features:**\n You can specify the URI of the data source in the feature view.\n Note that in this scenario, you can't format your data as a time series or\n include historical data in the BigQuery source. Each row must contain\n the latest feature values corresponding to a unique ID. Multiple occurrences\n of the same entity ID in different rows are not supported.\n\nSince Vertex AI Feature Store lets you maintain feature data\nin BigQuery and serves features from the BigQuery\ndata source, there's no need to import or copy the features to an offline\nstore.\n\nData source preparation guidelines\n----------------------------------\n\nFollow these guidelines to understand the schema and constraints while preparing\nthe data source in BigQuery:\n\n1. Include the following columns in the data source:\n\n - **Entity ID columns** : The data source must have at least one entity ID\n column with `string` or `int` values. The default name for this column is\n `entity_id`. You can optionally use a different name for this column. The\n size of each value in this column must be less than 4 KB.\n\n Note that you can also designate a feature record by constructing the entity\n ID using features from multiple columns. In this scenario, you can include\n multiple entity ID columns in the data source. The name of each entity ID\n column must be unique. If you register the data source by creating feature\n groups, set the entity ID columns for each feature group.\n Otherwise, if you directly associate the data source with a feature view,\n configure the feature views to specify the entity ID columns.\n\n Note that you can include multiple ID columns in a data source. In such a\n scenario, the name of each entity ID column must be unique. You can\n configure your feature groups or feature views to construct the entity ID\n using the values from each column for a feature record.\n - **Feature timestamp column** : Optional. If you register the data source\n using feature groups and features, and need to format the data as a time\n series, include a feature timestamp column. The timestamp column contains\n values of type `timestamp`. The default name for the timestamp column is\n `feature_timestamp`. If you want to use a different column name, use the\n `time_series` parameter to set the timestamp column for the feature group.\n\n If you don't specify a timestamp column to format your data as a time series,\n Vertex AI Feature Store manages the timestamps for the features\n and serves the latest feature values.\n\n If you directly associate a BigQuery data source with a feature\n view, the `feature_timestamp` column isn't required. In this scenario, you\n must include only the latest feature values in the data source and\n Vertex AI Feature Store doesn't look up the timestamp.\n - **Embedding and filtering columns**: Optional. If you want to use embedding\n management in an online store created for Optimized online serving, the\n data source must contain the following columns:\n\n - An `embedding` column containing arrays of type `float`.\n\n - Optional: One or more filtering columns of type `string` or `string` array.\n\n - Optional: A crowding column of type `int`.\n\n2. Each row in data source is a complete record of feature values associated\n with an entity ID. If a feature value is missing in one of the columns, then\n it's considered a null value.\n\n3. Each column of the BigQuery table or view represents a feature.\n Provide the values for each feature in a separate column. If you're associating\n the data source with a feature group and features, associate each column with a separate feature.\n\n4. Supported data types for feature values include `bool`, `int`, `float`,\n `string`, `timestamp`, arrays of these data types, and bytes. Note that during\n [data sync](/vertex-ai/docs/featurestore/latest/sync-data), feature values of type `timestamp` are converted to\n `int64`.\n\n5. The data source must be located in the same region as the online store\n instance, or in a multi-region that includes or overlaps with the region for the\n online store. For example, if the online store is in `us-central`, the\n BigQuery source might be located in `us-central` or `US`.\n\n6. [Sync the data in a feature view](/vertex-ai/docs/featurestore/latest/create-featureview#sync_featuredata)\n before online serving to ensure that you serve only the latest feature values.\n If you're using scheduled data sync, you might need to [manually sync the data\n in the feature view](/vertex-ai/docs/featurestore/latest/sync-data).\n However, if you're using continuous data sync with Optimized online serving,\n then you don't need to manually sync the data.\n\nWhat's next\n-----------\n\n- Learn how to create [feature groups](/vertex-ai/docs/featurestore/latest/create-featuregroup) and [features](/vertex-ai/docs/featurestore/latest/create-feature).\n\n- Learn how to [create a feature view](/vertex-ai/docs/featurestore/latest/create-featureview).\n\n- [Online serving types](/vertex-ai/docs/featurestore/latest/online-serving-types) in Vertex AI Feature Store."]]