Introduction to connectors and data stores

Connectors get data from Google and third-party data sources into Agentspace Enterprise, storing it in dedicated data stores. This document provides an overview of these connectors. Centralizing your data in Agentspace Enterprise enhances data accessibility, search functionality, and analytical capabilities.

An overview image that shows how the connecotrs get data into Agentspace Enterprise.
Connector overview

Connector and data store concepts

Data stores
Each connector supports a set of entity types. For example, Jira Cloud has entities such as issues, attachments, comments, and worklogs, which are unique to the data source. Agentspace Enterprise creates a separate data store for each entity. Therefore, when you create a data store using the Google Cloud console, you get a collection of data stores representing these ingested data entities.
Data federation versus ingestion (indexing)
Data federation directly retrieves information from the specified data source. Because data isn't copied into the Vertex AI Search index, you don't need to worry about data storage. However, because the data is not indexed, search quality might be lower.

Data ingestion (indexing) copies data into the Vertex AI Search index. This can result in improved search quality. However, this process consumes more storage and time.
Unstructured data
The supported data format is connector and entity specific. If the content in an entity is stored in an unstructured format such as HTML, PDF, TXT, PPTX, DOCX etc, an unstructured data store is created by Vertex AI Search. For more information and supported file types, see Unstructured search.
Structured data
The supported data format is connector and entity specific. If the content in an entity is stored in a structured format, a structured data store is created by Vertex AI Search. For more information, see Structured search.
Data schemas
The data schema defines the data structure. When you import structured data using Agentspace Enterprise, the system auto-detects the schema. You can use the auto-detected schema or define the schema using the API. For more information, see Provide or auto-detect a schema.
Data store regions
When ingesting data, you need to select the region where you want to store the data, such as global, the US, or the EU. For more information, see Agentspace Locations. Data stored in the US or EU regions require data encryption. The default encryption is with Google-owned and Google-managed encryption keys, but alternatively, you can use customer-managed encryption keys.
Sync frequency

Defines how often the identity data, such as roles, permissions, and users, and entity data, such as the data related to a specific connector is pulled and updated from the original data source. For example, a system like Jira that manages and tracks bug fixes might need a more frequent entity sync frequency, such as every 3 hours.

Data sync types:
  • A full sync captures the entire state of the third party app, including additions, updates, and deletions, and replaces the existing contents of the data store.
  • An incremental sync periodically captures entity data that has been added or updated since the last sync. It does not sync identity data or deletions of entity data.
Data sync frequency options:
  • Full syncs of all identity data and entity data can be scheduled simultaneously for every 3 hours, 6 hours, 12 hours, 1 day, or every 3 days.
  • Independent full syncs of all identity data, and independent full syncs of all entity data, can be scheduled separately using any of the following custom sync frequencies:
    • Entity data: Every 3 hours, 6 hours, 12 hours, 1 day, 3 days, 5 days, and every 7 days.
    • Identity data: Every 30 minutes, 1 hour, 3 hours, 6 hours, 12 hours, 1 day, 3 days, 5 days, and every 7 days.
  • Incremental syncs of updated or added entity data can be scheduled for every 3 hours, 6 hours, 12 hours, 1 day, 3 days, 5 days, or every 7 days. By default, an incremental sync is performed every 3 hours.

You can pause, and resume, both full syncs and incremental syncs:

  • When you pause a sync type, the connector cancels ongoing syncs of that type and stops scheduling new syncs of that type.
  • When you resume a sync type, the connector schedules the new sync based on the last scheduled sync time, but does not continue the previously interrupted sync.

For example, if you pause full sync while a full sync is occurring, the connector cancels that sync; if you later resume full sync, the connector automatically schedules a new full sync according to the full sync schedule.

Data sync frequency recommendations

Choose a data sync frequency that aligns with the volume of records fetched and the recommended queries per second (QPS).

The following table shows the typical number of records retrieved for one, three, five, and seven-day syncs. The actual number of records may vary depending on the data source and its configuration.

QPS Record volume for 1-day sync Record volume for 3-day sync Record volume for 5-day sync Record volume for 7-day sync
5 432k 1.296M 2.16M 3M
10 864k 2.592M 4.32M 6M
20 1.7M 5.1M 8.5M 11.9M
50 4.3M 12.9M 21.5M 30.1M
100 8.6M 25.8M 43M 60.2M

Third-party connectors

Third-party connectors ingest third-party application data into Agentspace Enterprise.

Checklist for third-party connectors

Before creating a connector to Agentspace Enterprise, go through the following checklist:

  • Connector-specific scopes and permissions must be configured for certain connectors. An administrator of the third-party application must review the required credentials to connect a data source and set up authentication and permissions. For information about the specific scopes and permissions, see the respective connector documentation.

  • Set up access control for your data source. For more information, see Identity and permissions

  • Decide whether data should be federated or ingested (indexed).

  • Decide how often the data should be synced.

  • If you are using customer-managed encryption keys (CMEK), create multi-region and single-region keys. For more information, see About single-region keys for third-party connectors.

  • If you have personally identifiable information (PII) and intend to use autocomplete for query suggestions, see protect against PII leaks.

Supported third-party connectors

Microsoft Entra ID Box Confluence Cloud Confluence Data Center
The logo of Microsoft Entra ID. The logo of Box. The logo of Confluence Cloud. The logo of Confluence Data Center.
Dropbox Dynamics 365 Jira Cloud Jira Data Center
The logo of Dropbox. The logo of Microsoft Teams. The logo of Jira Cloud. The logo of Jira Data Center.
OneDrive Microsoft Outlook Salesforce ServiceNow
The logo of OneDrive. The logo of Microsoft Outlook. The logo of Salesforce. The logo of ServiceNow.
SharePoint Slack Microsoft Teams Trello
The logo of SharePoint. The logo of Slack.

First-party connectors

First-party connectors retrieve data from Google data sources, such as BigQuery, Spanner, Google Drive, and many more.

Checklist for first-party connectors

Before sending data to Agentspace Enterprise, go through the following checklist:

Supported first-party connectors

Shows the first-party connectors.
First-party connectors