About apps and data stores

This page describes Vertex AI Search apps and data stores. For information about Vertex AI Agents data stores, see Vertex AI Agents data stores.

With Vertex AI Search, you create a search or recommendations app and connect it to a data store. A Google Cloud project can contain multiple apps.

Relationship between apps and data stores

The relationship between apps and data stores depends on the type of app:

  • Generic search apps have a many-to-many relationship with data stores. When multiple data stores are connected to a single generic search app, this is referred to as blended search. For information about limitations of connecting a search app to more than one data store, see About blended search.

  • A generic recommendations app has a one-to-one connection with its data store.

  • A media app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a media search app and a media recommendations app can share a data store.

  • A healthcare search app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a patient-facing app and a provider-facing app can connect to the same data store.

    For a batch data import of healthcare data, data is imported into a data store that's within an app. For streaming data import (Preview) of healthcare data, data is imported into an entity, which is a type of data store that's within a data connector. A data connector is also a type of data store that's within an app.

After a data store is connected to an app, it can't be disconnected.

Method of app creation and data ingestion

How you create an app and ingest data depends on the type of data you have:

  • For website data, you use the Google Cloud console, not the API, to create your app and ingest data.

  • For structured or unstructured data, you can use either the Google Cloud console or the API.

  • For healthcare data, you can use either the Google Cloud console or the API.

Documents

Each data store has one or more data records, called documents. What a document represents varies depending on the type of data in the data store:

  • Website. A document is a web page.

  • Structured data. A document is a row in a table or a JSON record that follows a particular schema. You can provide this schema yourself or you can let Vertex AI Agent Builder derive the schema from the ingested data.

  • Structured data for media. A document is a row in a table or a JSON record that follows a schema that is specific to media. The documents are records pertaining to media content, such as videos, news articles, music files, and podcasts. A document contains information that describes the media item, at minimum: title, URI to the content location, categories, duration, and available date.

  • Structured data for third-party data sources (Preview with allowlist). A document is an entity specific to the third-party data source, such as a Jira issue or Confluence space.

  • Unstructured data. A document is a file in HTML, PDF with embedded text, or TXT format. PPTX and DOCX formats are available in Preview.

  • Healthcare FHIR data. A document is a supported FHIR R4 resource. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference.

Data stores and apps

In Vertex AI Agent Builder, there are various kinds of data stores. A data store can contain only one type of data.

Website data

A data store with website data uses data indexed from public websites. You can provide a set of URL patterns that you want to include in your data store. The web pages that fit the URL patterns are called included web pages. You can then set up search or recommendations over data crawled from the included web pages.

For example, you can provide URL patterns such as yourexamplewebsite.com/faq/* and yourexamplewebsite.com/events/* and enable search or recommendations over the data crawled from these web pages that fit the pattern. This data includes text, images tagged with metadata, and other structured data such as meta tags, PageMap attributes, and schema.org data.

There are two types of website data stores:

  • Basic website search:

    • Provides search capabilities over the existing Google Search index for the included websites.
    • Doesn't require domain verification.
  • Advanced website indexing:

    • Provides advanced search capabilities over an index that's generated based on either of the following: the existing Google Search index for the included websites.
      • The Vertex AI Search app owners can control which web pages are indexed by submitting sitemaps and maintaining them. For more information, see Index and refresh web pages using sitemaps. This process keeps the index fresh without manual intervention.
      • The Vertex AI Search app owners can perform an initial indexing that mirrors the Google Search index and then expand the index's coverage by recrawling the websites whenever necessary, keeping it fresh. For more information, see Refresh web pages. The advanced capabilities of advanced website indexing are listed in Advanced website indexing.
    • Requires Vertex AI Search data stores owners to verify the domains to which the included websites belong. For more information, see Verify website domains.
    • Provides the capability to add structured data to the data store schema. A website contains unstructured data, but you can add structured data in the form of meta tags, PageMap attributes, and schema.org data to your web pages. You can then use this structured data to edit the data store schema as explained in Use structured data for advanced website indexing.

What's next

For website search:

For recommendations:

Structured data

A data store with structured data enables semantic search or recommendations over structured data. You can import data from BigQuery or Cloud Storage. You can also manually upload structured JSON data through the API.

For example, you can enable search or recommendations over a product catalog for your ecommerce experience or a directory of doctors for provider search or recommendations.

Vertex AI Agent Builder auto-detects the schema from the data that you import. Optionally, you can provide a schema for your data. Providing a schema for your data typically improves the quality of results.

What's next

For generic search:

For generic recommendations:

Structured data for media

Media apps can only be connected to media data stores. Media data stores are structured data stores with a Google-defined schema or with your own custom schema that contains a specific set of five media-related fields. For more information about the schema, see About media documents and data stores.

For example, you can enable recommendations by creating a media recommendations app for a movie catalog or a news site so that your users will have suitable and personalized suggestion made for them.

In addition to media documents, media data stores also contain the user event information that allows Vertex AI Search to customize recommendations and search for your users. User events are required for media recommendations apps and are recommended for media search apps. For information about user events, see Record real-time user events.

What's next

Structured data for third-party data stores

The following third-party data source connectors are available in Preview with allowlist:

  • Confluence
  • Jira
  • Salesforce
  • Sharepoint Online
  • Slack

Data from these third parties is considered structured data.

When you set up a new connector, you select a synchronization frequency. You also select which entities to sync. Entities vary depending on the source, such as issues for Jira and content and spaces for Confluence. A unique data store is created for each entity. Entity data stores are grouped by connector instance.

What's next

For search:

For recommendations:

Unstructured data

An unstructured data store enables semantic search or recommendations over data such as documents and images.

Unstructured data stores support documents in HTML, PDF with embedded text, and TXT format. PPTX and DOCX formats are available in Preview.

Search provides results in the form of 10 URLs and summarized answers for natural language queries. Documents must be uploaded to a Cloud Storage bucket with appropriate access permissions. For example, a financial institution can enable search or recommendations over their private corpus of financial research publications, or a biotech company can enable search or recommendations over their private repository of medical research.

What's next

For search:

For generic recommendations:

Healthcare FHIR data

A healthcare search app uses FHIR R4 data imported from a Cloud Healthcare API FHIR store. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference. A FHIR R4 data store must satisfy some requirements before it can be used as a data source for Vertex AI Search data store. For more information, see how to prepare healthcare FHIR data for ingestion.

What's next

About blended search

You can create a blended search app, where multiple data stores can be connected to a single generic search app. This feature lets you use one app to search across multiple sources and types of data.

To make a blended search app, select multiple data stores when creating a new generic search app. If you don't select multiple data stores during creation, then you can't add additional data stores later.

When getting search results, you can either search across all data stores, or filter for results from a single data store.

The following limitations apply:

  • Adding and removing data stores:
    • To turn on blended search for an app, you must connect at least two data stores to it during app creation.
    • You can add or remove data stores from a blended search app, but the app can't have fewer than two data stores connected to it at any time.
    • If you connect a single data store to a search app during app creation, then you can't add or remove that data store.
  • Website data stores need to have advanced website indexing turned on in order to be used for blended search. For more information, see Advanced website indexing.
  • Data stores that contain unstructured data imported using BigQuery are not supported.
  • Blended search allows the following fields in search requests:
    • query
    • pageSize
    • offset
    • dataStoreSpecs
    • pageToken
    • filter
    • spellCorrectionSpec
    • session
    • contentSearchSpec
      • summarySpec
      • extractiveContentSpec
      • searchResultMode
      • chunkSpec
  • In addition to the fields listed previously, the following fields are supported for blended search apps only when search requests are filtered to get results from a single data store. They aren't supported when getting results from more than one data store:
    • facetSpec
  • Blended search allows the following fields in dataStoreSpecs:
    • boostSpec
    • filter: If there are filters specified for both SearchRequest and dataStoreSpecs, both filters are applied to search results
  • Create, Read, Update, and Delete (CRUD) operations on serving configs are supported for blended apps. Only the following fields can be added or updated in a serving config:
    • name
    • displayName
    • solutionType
    • genericConfig:
      • contentSearchSpec:
        • summarySpec
        • extractiveContentSpec
        • searchResultMode
        • chunkSpec
    • boostControlIds
    • synonymsControlIds
    • onewaySynonymsControlIds
  • CRUD operations on the following controls are supported for blended search apps:
    • boostAction
    • synonymACtion
  • Blended search apps don't support the following features:
    • Filter, redirect, ignore, replacement, and dissociate serving controls
    • Snippets using contentSearchSpec.snippetSpec in either search requests or serving configs
    • Search with follow-ups