About apps and data stores

This page describes Vertex AI Search apps and data stores. For information about Vertex AI Agents data stores, see Vertex AI Agents data stores.

With Vertex AI Search, you create a search or recommendations app and connect it to a data store. A Google Cloud project can contain multiple apps.

Relationship between apps and data stores

The relationship between apps and data stores depends on the type of app:

  • Generic search apps have a many-to-many relationship with data stores. When multiple data stores are connected to a single generic search app, this is referred to as blended search. For information about limitations of connecting a search app to more than one data store, see About connecting multiple data stores.

  • A generic recommendations app has a one-to-one connection with its data store.

  • A media app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a media search app and a media recommendations app can share a data store.

  • A healthcare search app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a patient-facing app and a provider-facing app can connect to the same data store.

    For a batch data import of healthcare data, data is imported into a data store that's within an app. For streaming data import (Preview) of healthcare data, data is imported into an entity, which is a type of data store that's within a data connector. A data connector is also a type of data store that's within an app.

After a data store is connected to an app, it can't be disconnected.

Order of creation of apps and data stores

The order in which you create a data store and an app depends on the type of app:

  • For generic search and recommendations, you can either create a data store along with an app, or create a data store separately and then connect it when you create the app.

  • For media search and recommendations, data stores can only be created during app creation; media data stores can't be created from the Data stores page.

  • For healthcare search, you can either create a data store along with an app, or create a data store separately and then connect it when you create the app.

Method of app creation and data ingestion

How you create an app and ingest data depends on the type of data you have:

  • For website data, you use the Google Cloud console, not the API, to create your app and ingest data.

  • For structured or unstructured data, you can use either the Google Cloud console or the API.

  • For healthcare data, you can use either the Google Cloud console or the API.

Documents

Each data store has one or more data records, called documents. What a document represents varies depending on the type of data in the data store:

  • Website. A document is a web page.

  • Structured data. A document is a row in a table or a JSON record that follows a particular schema. You can provide this schema yourself or you can let Vertex AI Agent Builder derive the schema from the ingested data.

  • Structured data for media. A document is a row in a table or a JSON record that follows a predefined schema, which is specific to media. The documents are records pertaining to media content, such as videos, news articles, music files, and podcasts. A document contains information that describes the media item, such as title, URI to the content location, description, category, language, year, and rating.

  • Structured data for third-party data sources (Preview with allowlist). A document is an entity specific to the third-party data source, such as a Jira issue or Confluence space.

  • Unstructured data. A document is a file in HTML, PDF with embedded text, or TXT format. PPTX and DOCX formats are available in Preview.

  • Healthcare FHIR data. A document is a supported FHIR R4 resource. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference.

Data stores and apps

In Vertex AI Agent Builder, there are various kinds of data stores. A data store can contain only one type of data.

Website data

A data store with website data uses data indexed from public websites. You can provide a set of domains and set up search or recommendations over data crawled from the domains. This data includes text and images tagged with metadata.

For example, you can provide domains such as yourexamplewebsite.com/faq and yourexamplewebsite.com/events and enable search or recommendations over the content at those domains.

There are two types of indexing:

  • Basic website indexing
  • Advanced website indexing

For more information, see Advanced website indexing.

You can include URL patterns either from the same or different domains that need to be indexed and exclude patterns that must not be indexed. The number of URL patterns that you can include and exclude differs in the following way:

Indexing type Included sites Excluded sites
Basic website indexing Maximum of 50 URL patterns Maximum of 50 URL patterns
Advanced website indexing Maximum of 500 URL patterns Maximum of 500 URL patterns

A website contains unstructured data, but you can add metadata in the form of meta tags and edit the website's schema as explained in Add metadata for advanced website indexing.

Structured data

A data store with structured data enables semantic search or recommendations over structured data. You can import data from BigQuery or Cloud Storage. You can also submit structured data as an HTTP request payload.

For example, you can enable search or recommendations over a product catalog for your ecommerce experience, a movie catalog for movie search or recommendations, or a directory of doctors for provider search or recommendations.

Vertex AI Agent Builder auto-detects the schema from the data that you import. Optionally, you can provide a schema for your data. Providing a schema for your data typically improves the quality of results.

Structured data for media

Media apps can only be connected to media data stores. Media data stores are structured data stores with a specific, predefined schema. For more information about the schema, see About media documents and data stores.

For example, you can enable recommendations by creating a media recommendations app for a movie catalog or a news site so that your users will have suitable and personalized suggestion made for them.

In addition to media documents, media data stores also contain the user event information that allows Vertex AI Search to customize recommendations and search for your users. User events are required for media recommendations apps and are recommended for media search apps. For information about user events, see Record real-time user events.

Structured data for third-party data stores

The following third-party data source connectors are available in Preview with allowlist:

  • Confluence
  • Jira
  • Salesforce
  • Sharepoint Online
  • Slack

Data from these third parties is considered structured data.

When you set up a new connector, you select a synchronization frequency. You also select which entities to sync. Entities vary depending on the source, such as issues for Jira and content and spaces for Confluence. A unique data store is created for each entity. Entity data stores are grouped by connector instance.

Unstructured data

An unstructured data store enables semantic search or recommendations over data such as documents and images.

Unstructured data stores support documents in HTML, PDF with embedded text, and TXT format. PPTX and DOCX formats are available in Preview.

Search provides results in the form of 10 URLs and summarized answers for natural language queries. Documents must be uploaded to a Cloud Storage bucket with appropriate access permissions. For example, a financial institution can enable search or recommendations over their private corpus of financial research publications, or a biotech company can enable search or recommendations over their private repository of medical research.

Healthcare FHIR data

A healthcare search app uses FHIR R4 data imported from a Cloud Healthcare API FHIR store. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference. A FHIR R4 data store must satisfy some requirements before it can be used as a data source for Vertex AI Search data store. For more information, see how to prepare healthcare FHIR data for ingestion.

Create an app and ingest data

The process for creating an app and ingesting data varies depending on the kind of search or recommendations app you want. See the following documentation:

For search:

For generic recommendations:

For media data:

For healthcare FHIR R4 search: