Import data into a media data store

This page explains how to add data to a data store that is connected to a media app.

Ideally, you should update your data store daily, by importing fresh data. Scheduling periodic imports prevents model quality from degrading over time. You can use Google Cloud Scheduler to automate imports.

You can update only new or changed documents, or you can import the entire data store. If you import documents that are already in your data store, they are not added again. Any document that has changed is updated.

Before you begin

Make sure that you do the following:

Import documents and user events

To import documents and user events, go to the section for the source that you plan to use:

BigQuery

Console

To use the Google Cloud console to import documents and user events from BigQuery, follow these steps:

  1. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  2. To import documents, do the following:

    1. In the navigation menu, make sure Apps is selected.

    2. Click the app that you previously created and connected to an empty data store.

    3. In the Requirements tab, click Import documents.

    4. On the Select a data source page, select BigQuery.

    5. In the BigQuery path field, click Browse, select a documents table that you have prepared for ingesting, and then click Select. Alternatively, enter the table location directly in the BigQuery path field.

    6. Click Import. In the Activity tab, the Status column indicates Import in progress. Import complete displays when all of your documents are imported. This can take hours or days, depending on the amount of data.

  3. To import user events, do the following:

    1. Click Events.

    2. Click Import events.

    3. On the Select a data source page, select BigQuery.

    4. In the BigQuery path field, click Browse, select a user events table that you have prepared for ingesting, and then click Select. Alternatively, enter the table location directly in the BigQuery path field.

    5. Click Import. In the Activity tab, the Status column indicates Import in progress. Import complete displays when all of your user events are imported. This can take hours or days, depending on the amount of data.

  4. Click Requirements to see if the data requirements are met. It can take up to one hour after completing data imports for the data requirements to refresh. If you still have data requirements that are not met, import documents or user events as required.

Cloud Storage

Console

To use the Google Cloud console to import documents and user events from Cloud Storage, follow these steps:

  1. In the Google Cloud console, go to the Agent Builder page.

    Agent Builder

  2. To import documents, do the following:

    1. In the navigation menu, make sure Apps is selected.

    2. Click the app that you previously created and connected to an empty data store.

    3. In the Requirements tab, click Import documents.

    4. On the Select a data source page, select Cloud Storage.

    5. In the Import data from Cloud Storage pane, select Folder or File, depending on whether you are importing multiple files or a single file.

    6. In the gs:// field, click Browse, select the document(s) that you have prepared for ingesting, and then click Select. Alternatively, enter the path directly in the Cloud Storage path field.

    7. Click Import. In the Activity tab, the Status column indicates Import in progress. Import complete displays when all of your documents are imported. This can take hours or days, depending on the amount of data.

  3. To import user events, do the following:

    1. Click Events.

    2. Click Import events.

    3. On the Select a data source page, select Cloud Storage.

    4. In the Import data from Cloud Storage pane, select Folder or File, depending on whether you are importing multiple files or a single file.

    5. In the gs:// field, click Browse, select the user events(s) that you have prepared for ingesting, and then click Select. Alternatively, enter the path directly in the Cloud Storage path field.

    6. Click Import. In the Activity tab, the Status column indicates Import in progress. Import complete displays when all of your user events are imported. This can take hours or days, depending on the amount of data.

  4. Click Requirements to see if the data requirements are met. It can take up to one hour after completing data imports for the data requirements to refresh. If you still have data requirements that are not met, import documents or user events as required.

Import documents using the API

You import your documents by making a POST request to the Documents:import REST method, using the InlineSource object to specify your data.

For an example of the JSON document format, see JSON document format.

Import requirements

Here are the requirements for importing media documents using the API:

  • Each document must be on its own line.

  • The maximum number of documents in a single import is 100.

Procedure

To import media documents using the API, do the following:

  1. Create the JSON file for your document and call it ./data.json:

    {
    "inlineSource": {
    "documents": [
      { DOCUMENT_1 },
      { DOCUMENT_2 }
    ]
    }
    }
    
  2. Call the POST method:

    curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     --data @./data.json \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/dataStores/DATA_STORE_ID/branches/0/documents:import"
    
    • PROJECT_ID: The ID of your project.
    • DATA_STORE_ID: The ID of your data store.

JSON document format

The following examples show Document entries in JSON format.

Provide an entire document on a single line. Each document should be on its own line.

Minimum required fields:

{
   "id": "sample-01",
   "schemaId": "default_schema",
   "jsonData": "{\"title\":\"Test document title\",\"categories\":[\"sports > clip\",\"sports > highlight\"],\"uri\":\"http://www.example.com\",\"media_type\":\"sports-game\",\"available_time\":\"2022-08-26T23:00:17Z\"}"
}

Complete object:

{
   "id": "child-sample-0",
   "schemaId": "default_schema",
   "jsonData": "{\"title\":\"Test document title\",\"description\":\"Test document description\",\"language_code\":\"en-US\",\"categories\":[\"sports > clip\",\"sports > highlight\"],\"uri\":\"http://www.example.com\",\"images\":[{\"uri\":\"http://example.com/img1\",\"name\":\"image_1\"}],\"media_type\":\"sports-game\",\"in_languages\":[\"en-US\"],\"country_of_origin\":\"US\",\"content_index\":0,\"persons\":[{\"name\":\"sports person\",\"role\":\"player\",\"rank\":0,\"uri\":\"http://example.com/person\"},],\"organizations \":[{\"name\":\"sports team\",\"role\":\"team\",\"rank\":0,\"uri\":\"http://example.com/team\"},],\"hash_tags\":[\"tag1\"],\"filter_tags\":[\"filter_tag\"],\"production_year\":1900,\"duration\":\"100s\",\"content_rating\":[\"PG-13\"],\"aggregate_ratings\":[{\"rating_source\":\"imdb\",\"rating_score\":4.5,\"rating_count\":1250}],\"available_time\":\"2022-08-26T23:00:17Z\"}"
}