You can create data stores from BigQuery tables in two ways:
One-time ingestion: You import data from a BigQuery table into a data store. The data in the data store does not change unless you manually refresh the data.
Periodic ingestion: You import data from one or more BigQuery tables, and you set a sync frequency that determines how often the data stores are updated with the most recent data from the BigQuery dataset.
The following table compares the two ways that you can import BigQuery data into Agentspace Enterprise data stores.
One-time ingestion | Periodic ingestion |
---|---|
Generally available (GA). | Public preview. |
Data must be refreshed manually. | Data updates automatically every 1, 3, or 5 days. Data cannot be manually refreshed. |
Agentspace Enterprise creates a single data store from one table in a BigQuery. | Agentspace Enterprise creates a data connector for a BigQuery dataset and a data store (called an entity data store) for each table specified. For each data connector, the tables must have the same data type (for example, structured) and be in the same BigQuery dataset. |
Data from multiple tables can be combined in one data store by first ingesting data from one table and then more data from another source or BigQuery table. | Because manual data import is not supported, the data in an entity data store can only be sourced from one BigQuery table. |
Data source access control is supported. | Data source access control is not supported. The imported data can contain access controls but these controls won't be respected. |
You can create a data store using either the Google Cloud console or the API. | You must use the console to create data connectors and their entity data stores. |
CMEK-compliant. | CMEK-compliant. |
Import once from BigQuery
To ingest data from a BigQuery table, use the following steps to create a data store and ingest data using either the Google Cloud console or the API.
Before importing your data, review Prepare data for ingesting.
Console
To use the Google Cloud console to ingest data from BigQuery, follow these steps:
In the Google Cloud console, go to the Agentspace page.
Go to the Data Stores page.
Click New data store.
On the Source page, select BigQuery.
Select what kind of data you are importing.
Click One time.
In the BigQuery path field, click Browse, select a table that you have prepared for ingesting, and then click Select. Alternatively, enter the table location directly in the BigQuery path field.
Click Continue.
If you are doing one-time import of structured data:
Map fields to key properties.
If there are important fields missing from the schema, use Add new field to add them.
For more information, see About auto-detect and edit.
Click Continue.
Choose a region for your data store.
Enter a name for your data store.
Click Create.
To check the status of your ingestion, go to the Data Stores page and click your data store name to see details about it on its Data page. When the status column on the Activity tab changes from In progress to Import completed, the ingestion is complete.
Depending on the size of your data, ingestion can take several minutes to several hours.
REST
To use the command line to create a data store and import data from BigQuery, follow these steps.
Create a data store.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ -H "X-Goog-User-Project: PROJECT_ID" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \ -d '{ "displayName": "DATA_STORE_DISPLAY_NAME", "industryVertical": "GENERIC", "solutionTypes": ["SOLUTION_TYPE_SEARCH"] }'
Replace the following:
PROJECT_ID
: the ID of your project.DATA_STORE_ID
: the ID of the data store that you want to create. This ID can contain only lowercase letters, digits, underscores, and hyphens.DATA_STORE_DISPLAY_NAME
: the display name of the data store that you want to create.
Optional: If you're uploading unstructured data and want to configure document parsing or to turn on document chunking for RAG, specify the
documentProcessingConfig
object and include it in your data store creation request. Configuring an OCR parser for PDFs is recommended if you're ingesting scanned PDFs. For how to configure parsing or chunking options, see Parse and chunk documents.Import data from BigQuery.
If you defined a schema, make sure the data conforms to that schema.
curl -X POST \ -H "Authorization: Bearer $(gcloud auth print-access-token)" \ -H "Content-Type: application/json" \ "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \ -d '{ "bigquerySource": { "projectId": "PROJECT_ID", "datasetId":"DATASET_ID", "tableId": "TABLE_ID", "dataSchema": "DATA_SCHEMA", "aclEnabled": "BOOLEAN" }, "reconciliationMode": "RECONCILIATION_MODE", "autoGenerateIds": "AUTO_GENERATE_IDS", "idField": "ID_FIELD", "errorConfig": { "gcsPrefix": "ERROR_DIRECTORY" } }'
Replace the following:
PROJECT_ID
: the ID of your project.DATA_STORE_ID
: the ID of the data store.DATASET_ID
: the ID of the BigQuery dataset.TABLE_ID
: the ID of the BigQuery table.- If the BigQuery table is not under
PROJECT_ID
, you need to give the service accountservice-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com
"BigQuery Data Viewer" permission for the BigQuery table. For example, if you are importing a BigQuery table from source project "123" to destination project "456", giveservice-456@gcp-sa-discoveryengine.iam.gserviceaccount.com
permissions for the BigQuery table under project "123".
- If the BigQuery table is not under
DATA_SCHEMA
: optional. Values aredocument
andcustom
. The default isdocument
.document
: the BigQuery table that you use must conform to the default BigQuery schema provided in Prepare data for ingesting. You can define the ID of each document yourself, while wrapping all the data in the jsonData string.custom
: Any BigQuery table schema is accepted, and Agentspace Enterprise automatically generates the IDs for each document that is imported.
ERROR_DIRECTORY
: optional. A Cloud Storage directory for error information about the import—for example,gs://<your-gcs-bucket>/directory/import_errors
. Google recommends leaving this field empty to let Agentspace Enterprise automatically create a temporary directory.RECONCILIATION_MODE
: optional. Values areFULL
andINCREMENTAL
. Default isINCREMENTAL
. SpecifyingINCREMENTAL
causes an incremental refresh of data from BigQuery to your data store. This does an upsert operation, which adds new documents and replaces existing documents with updated documents with the same ID. SpecifyingFULL
causes a full rebase of the documents in your data store. In other words, new and updated documents are added to your data store, and documents that are not in BigQuery are removed from your data store. TheFULL
mode is helpful if you want to automatically delete documents that you no longer need.AUTO_GENERATE_IDS
: optional. Specifies whether to automatically generate document IDs. If set totrue
, document IDs are generated based on a hash of the payload. Note that generated document IDs might not remain consistent over multiple imports. If you auto-generate IDs over multiple imports, Google highly recommends settingreconciliationMode
toFULL
to maintain consistent document IDs.Specify
autoGenerateIds
only whenbigquerySource.dataSchema
is set tocustom
. Otherwise anINVALID_ARGUMENT
error is returned. If you don't specifyautoGenerateIds
or set it tofalse
, you must specifyidField
. Otherwise the documents fail to import.ID_FIELD
: optional. Specifies which fields are the document IDs. For BigQuery source files,idField
indicates the name of the column in the BigQuery table that contains the document IDs.Specify
idField
only when: (1)bigquerySource.dataSchema
is set tocustom
, and (2)auto_generate_ids
is set tofalse
or is unspecified. Otherwise anINVALID_ARGUMENT
error is returned.The value of the BigQuery column name must be of string type, must be between 1 and 63 characters, and must conform to RFC-1034. Otherwise, the documents fail to import.
Connect to BigQuery with periodic syncing
Before importing your data, review Prepare data for ingesting.
The following procedure describes how to create a data connector that associates a BigQuery dataset with an Agentspace Enterprise data connector and how to specify a table on the dataset for each data store you want to create. Data stores that are children of data connectors are called entity data stores.
Data from the dataset is synced periodically to the entity data stores. You can specify synchronization daily, every three days, or every five days.
Console
To use the Google Cloud console to create a connector that periodically syncs data from a BigQuery dataset to Agentspace Enterprise, follow these steps:
In the Google Cloud console, go to the Agentspace page.
In the navigation menu, click Data Stores.
Click Create data store.
On the Source page, select BigQuery.
Select the kind of data that you are importing.
Click Periodic.
Select the Sync frequency, how often you want the Agentspace Enterprise connector to sync with the BigQuery dataset. You can change the frequency later.
In the BigQuery dataset path field, click Browse, select the dataset that contains the tables that you have prepared for ingesting. Alternatively, enter the table location directly in the BigQuery path field. The format for the path is
projectname.datasetname
.In the Tables to sync field, click Browse, and then select a table that contains the data that you want for your data store.
If there are additional tables in the dataset that that you want to use for data stores, click Add table and specify those tables too.
Click Continue.
Choose a region for your data store, enter a name for your data connector, and click Create.
You have now created a data connector, which will periodically sync data with the BigQuery dataset. And, you have created one or more entity data stores. The data stores have the same names as the BigQuery tables.
To check the status of your ingestion, go to the Data Stores page and click your data connector name to see details about it on its Data page > Data ingestion activity tab. When the status column on the Activity tab changes from In progress to succeeded, the first ingestion is complete.
Depending on the size of your data, ingestion can take several minutes to several hours.
After you set up your data source and import data the first time, the data store syncs data from that source at a frequency that you select during setup. About an hour after the data connector is created, the first sync occurs. The next sync then occurs around 24 hours, 72 hours, or 120 hours later.
Next steps
To attach your data store to an app, create an app and select your data store following the steps in Create a search app.
To preview how your search results appear after your app and data store are set up, see Preview search results.