This page describes Vertex AI Search apps and data stores. For information about Vertex AI Agents data stores, see Vertex AI Agents data stores.
With Vertex AI Search, you create a search or recommendations app and connect it to a data store. A Google Cloud project can contain multiple apps.
Relationship between apps and data stores
The relationship between apps and data stores depends on the type of app:
Generic search apps have a many-to-many relationship with data stores. When multiple data stores are connected to a single generic search app, this is referred to as blended search. For information about limitations of connecting a search app to more than one data store, see About blended search.
A generic recommendations app has a one-to-one connection with its data store.
A media app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a media search app and a media recommendations app can share a data store.
A healthcare search app has a many-to-one relationship with its data store. An app can only connect to one data store, whereas a given data store can be connected to several apps. For example, a patient-facing app and a provider-facing app can connect to the same data store.
For a batch data import of healthcare data, data is imported into a data store that's within an app. For streaming data import (Preview) of healthcare data, data is imported into an entity, which is a type of data store that's within a data connector. A data connector is also a type of data store that's within an app.
After a data store is connected to an app, it can't be disconnected.
Method of app creation and data ingestion
How you create an app and ingest data depends on the type of data you have:
For website data, you use the Google Cloud console, not the API, to create your app and ingest data.
For structured or unstructured data, you can use either the Google Cloud console or the API.
For healthcare data, you can use either the Google Cloud console or the API.
Documents
Each data store has one or more data records, called documents. What a document represents varies depending on the type of data in the data store:
Website. A document is a web page.
Structured data. A document is a row in a table or a JSON record that follows a particular schema. You can provide this schema yourself or you can let Vertex AI Agent Builder derive the schema from the ingested data.
Structured data for media. A document is a row in a table or a JSON record that follows a schema that is specific to media. The documents are records pertaining to media content, such as videos, news articles, music files, and podcasts. A document contains information that describes the media item, at minimum: title, URI to the content location, categories, duration, and available date.
Structured data for third-party data sources (Preview with allowlist). A document is an entity specific to the third-party data source, such as a Jira issue or Confluence space.
Unstructured data. A document is a file in HTML, PDF with embedded text, or TXT format. PPTX and DOCX formats are available in Preview.
Healthcare FHIR data. A document is a supported FHIR R4 resource. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference.
Data stores and apps
In Vertex AI Agent Builder, there are various kinds of data stores. A data store can contain only one type of data.
- Website data
- Structured data
- Structured content (media)
- Structured content for third-party data sources
- Unstructured data
- Healthcare FHIR data
Website data
A data store with website data uses data indexed from public websites. You can provide a set of URL patterns that you want to include in your data store. The web pages that fit the URL patterns are called included web pages. You can then set up search or recommendations over data crawled from the included web pages.
For example, you can provide URL patterns such as yourexamplewebsite.com/faq/*
and yourexamplewebsite.com/events/*
and enable search or recommendations over
the data crawled from these web pages that fit the pattern. This data includes
text, images tagged with metadata, and other structured data such as meta
tags, PageMap attributes, and schema.org data.
There are two types of website data stores:
Basic website search:
- Provides search capabilities over the existing Google Search index for the included websites.
- Doesn't require domain verification.
Advanced website indexing:
- Provides advanced search capabilities over an index that's generated based
on either of the following: the existing Google Search index for the included websites.
- The Vertex AI Search app owners can control which web pages are indexed by submitting sitemaps and maintaining them. For more information, see Index and refresh web pages using sitemaps. This process keeps the index fresh without manual intervention.
- The Vertex AI Search app owners can perform an initial indexing that mirrors the Google Search index and then expand the index's coverage by recrawling the websites whenever necessary, keeping it fresh. For more information, see Refresh web pages. The advanced capabilities of advanced website indexing are listed in Advanced website indexing.
- Requires Vertex AI Search data stores owners to verify the domains to which the included websites belong. For more information, see Verify website domains.
- Provides the capability to add structured data to the data store schema.
A website contains unstructured data, but you can add structured data in the
form of
meta
tags, PageMap attributes, and schema.org data to your web pages. You can then use this structured data to edit the data store schema as explained in Use structured data for advanced website indexing.
- Provides advanced search capabilities over an index that's generated based
on either of the following: the existing Google Search index for the included websites.
What's next
For website search:
- To understand the indexing prerequisites, see how to prepare data for website search.
- Create a data store using website content.
- Create a search app.
For recommendations:
Structured data
A data store with structured data enables semantic search or recommendations over structured data. You can import data from BigQuery or Cloud Storage. You can also manually upload structured JSON data through the API.
For example, you can enable search or recommendations over a product catalog for your ecommerce experience or a directory of doctors for provider search or recommendations.
Vertex AI Agent Builder auto-detects the schema from the data that you import. Optionally, you can provide a schema for your data. Providing a schema for your data typically improves the quality of results.
What's next
For generic search:
- Prepare structured data for ingestion.
- Create a search data store using one of these methods:
- Create a search app.
For generic recommendations:
Structured data for media
Media apps can only be connected to media data stores. Media data stores are structured data stores with a Google-defined schema or with your own custom schema that contains a specific set of five media-related fields. For more information about the schema, see About media documents and data stores.
For example, you can enable recommendations by creating a media recommendations app for a movie catalog or a news site so that your users will have suitable and personalized suggestion made for them.
In addition to media documents, media data stores also contain the user event information that allows Vertex AI Search to customize recommendations and search for your users. User events are required for media recommendations apps and are recommended for media search apps. For information about user events, see Record real-time user events.
What's next
Structured data for third-party data stores
The following third-party data source connectors are available in Preview with allowlist:
- Confluence
- Jira
- Salesforce
- Sharepoint Online
- Slack
Data from these third parties is considered structured data.
When you set up a new connector, you select a synchronization frequency. You also select which entities to sync. Entities vary depending on the source, such as issues for Jira and content and spaces for Confluence. A unique data store is created for each entity. Entity data stores are grouped by connector instance.
What's next
For search:
For recommendations:
Unstructured data
An unstructured data store enables semantic search or recommendations over data such as documents and images.
Unstructured data stores support documents in HTML, PDF with embedded text, and TXT format. PPTX and DOCX formats are available in Preview.
Search provides results in the form of 10 URLs and summarized answers for natural language queries. Documents must be uploaded to a Cloud Storage bucket with appropriate access permissions. For example, a financial institution can enable search or recommendations over their private corpus of financial research publications, or a biotech company can enable search or recommendations over their private repository of medical research.
What's next
For search:
- Prepare unstructured data for ingestion.
- Create a search data store using one of these methods:
- Create a search data store for your unstructured data.
- Create a search app.
For generic recommendations:
Healthcare FHIR data
A healthcare search app uses FHIR R4 data imported from a Cloud Healthcare API FHIR store. For a list of FHIR R4 resources that Vertex AI Search supports, see Healthcare FHIR R4 data schema reference. A FHIR R4 data store must satisfy some requirements before it can be used as a data source for Vertex AI Search data store. For more information, see how to prepare healthcare FHIR data for ingestion.
What's next
- Prepare FHIR R4 data for ingestion.
- Create a healthcare search data store.
- Create a healthcare search app.
About blended search
You can create a blended search app, where multiple data stores can be connected to a single generic search app. This feature lets you use one app to search across multiple sources and types of data.
To make a blended search app, select multiple data stores when creating a new generic search app. If you don't select multiple data stores during creation, then you can't add additional data stores later.
When getting search results, you can either search across all data stores, or filter for results from a single data store.
The following limitations apply:
- Adding and removing data stores:
- To turn on blended search for an app, you must connect at least two data stores to it during app creation.
- You can add or remove data stores from a blended search app, but the app can't have fewer than two data stores connected to it at any time.
- If you connect a single data store to a search app during app creation, then you can't add or remove that data store.
- Website data stores need to have advanced website indexing turned on in order to be used for blended search. For more information, see Advanced website indexing.
- Data stores that contain unstructured data imported using BigQuery are not supported.
- Blended search allows the following fields in
search requests:
query
pageSize
offset
dataStoreSpecs
pageToken
filter
spellCorrectionSpec
session
contentSearchSpec
summarySpec
extractiveContentSpec
searchResultMode
chunkSpec
- In addition to the fields listed previously, the following fields are
supported for blended search apps only when search requests are filtered to
get results from a single data store. They aren't supported when getting
results from more than one data store:
facetSpec
- Blended search allows the following fields in
dataStoreSpecs
:boostSpec
filter
: If there are filters specified for bothSearchRequest
anddataStoreSpecs
, both filters are applied to search results
- Create, Read, Update, and Delete (CRUD) operations on serving configs are
supported for blended apps. Only the following fields can be added or updated
in a serving config:
name
displayName
solutionType
genericConfig
:contentSearchSpec
:summarySpec
extractiveContentSpec
searchResultMode
chunkSpec
boostControlIds
synonymsControlIds
onewaySynonymsControlIds
- CRUD operations on the following controls are supported for blended search apps:
boostAction
synonymACtion
- Blended search apps don't support the following features:
- Filter, redirect, ignore, replacement, and dissociate serving controls
- Snippets using
contentSearchSpec.snippetSpec
in either search requests or serving configs - Search with follow-ups