This page provides a list of supported data sources, shows you how to use data connectors to access those data sources, such as Cloud Storage, Google Drive, Slack, Jira, or SharePoint, and how to use that data with Vertex AI RAG Engine. The Import RagFiles API provides data connectors to these data sources.
Data sources supported for RAG
The following data sources are supported:
- Upload a local file: A single-file upload using
upload_file
(up to 25 MB), which is a synchronous call. - Cloud Storage: Import file(s) from Cloud Storage.
Google Drive: Import a directory from Google Drive.
The service account must be granted the correct permissions to import files. Otherwise, no files are imported and no error message displays. For more information on file size limits, see Supported document types.
To authenticate and grant permissions, do the following:
- Go to the IAM page of your Google Cloud project.
- Select Include Google-provided role grant.
- Search for the Vertex AI RAG Data Service Agent service account.
- Click Share on the drive folder, and share with the service account.
- Grant
Viewer
permission to the service account on your Google Drive folder or file. The Google Drive resource ID can be found in the web URL.
Slack: Import files from Slack by using a data connector.
Jira: Import files from Jira by using a data connector.
For more information, see the RAG API reference.
Import files from Cloud Storage or Google Drive
To import files from Cloud Storage or Google Drive into your corpus, do the following:
- Create a corpus by following the instructions at Create a RAG corpus.
- Import your files from Cloud Storage or Google Drive by using the template.
Import files from Slack
To import files from Slack into your corpus, do the following:
- Create a corpus, which is an index that structures and optimizes your data for searching. Follow the instructions at Create a RAG corpus.
- Get your
CHANNEL_ID
from the Slack channel ID. - Create and set up an app to use with Vertex AI RAG Engine.
- From the Slack UI, in the Add features and functionality section, click Permissions.
- Add the following permissions:
channels:history
groups:history
im:history
mpim:history
- Click Install to Workspace to install the app into your Slack workspace.
- Click Copy to get your API token, which authenticates your identity and grants you access to an API.
- Add your API token to your Secret Manager.
- To view the stored secret, grant the Secret Manager Secret Accessor role to your project's Vertex AI RAG Engine service account.
The following curl and Python code samples demonstrate how to import files from your Slack resources.
curl
If you want to get messages from a specific channel, change the
CHANNEL_ID
.
API_KEY_SECRET_VERSION=SLACK_API_KEY_SECRET_VERSION
CHANNEL_ID=SLACK_CHANNEL_ID
PROJECT_ID=us-central1
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/${ PROJECT_ID }/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
"import_rag_files_config": {
"slack_source": {
"channels": [
{
"apiKeyConfig": {
"apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
},
"channels": [
{
"channel_id": "'"${ CHANNEL_ID }"'"
}
]
}
]
}
}
}'
Python
If you want to get messages for a given range of time or from a specific channel, change any of the following fields:
- START_TIME
- END_TIME
- CHANNEL1 or CHANNEL2
# Slack example
start_time = protobuf.timestamp_pb2.Timestamp()
start_time.GetCurrentTime()
end_time = protobuf.timestamp_pb2.Timestamp()
end_time.GetCurrentTime()
source = rag.SlackChannelsSource(
channels = [
SlackChannel("CHANNEL1", "api_key1"),
SlackChannel("CHANNEL2", "api_key2", START_TIME, END_TIME)
],
)
response = rag.import_files(
corpus_name="projects/my-project/locations/us-central1/ragCorpora/my-corpus-1",
source=source,
chunk_size=512,
chunk_overlap=100,
)
Import files from Jira
To import files from Jira into your corpus, do the following:
- Create a corpus, which is an index that structures and optimizes your data for searching. Follow the instructions at Create a RAG corpus.
- To create an API token, sign in to the Atlassian site.
- Use {YOUR_ORG_ID}.atlassian.net as the SERVER_URI in the request.
- Use your Atlassian email as the EMAIL in the request.
- Provide
projects
orcustomQueries
with your request. To learn more about custom queries, see Use advanced search with Jira Query Language (JQL).When you import
projects
,projects
is expanded into the corresponding queries to get the entire project. For example,MyProject
is expanded toproject = MyProject
. - Click Copy to get your API token, which authenticates your identity and grants you access to an API.
- Add your API token to your Secret Manager.
- Grant Secret Manager Secret Accessor role to your project's Vertex AI RAG Engine service account.
curl
EMAIL=JIRA_EMAIL
API_KEY_SECRET_VERSION=JIRA_API_KEY_SECRET_VERSION
SERVER_URI=JIRA_SERVER_URI
CUSTOM_QUERY=JIRA_CUSTOM_QUERY
PROJECT_ID=JIRA_PROJECT
REGION= "us-central1"
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/REGION>/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
"import_rag_files_config": {
"jiraSource": {
"jiraQueries": [{
"projects": ["'"${ PROJECT_ID }"'"],
"customQueries": ["'"${ CUSTOM_QUERY }"'"],
"email": "'"${ EMAIL }"'",
"serverUri": "'"${ SERVER_URI }"'",
"apiKeyConfig": {
"apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
}
}]
}
}
}'
Python
# Jira Example
jira_query = rag.JiraQuery(
email="xxx@yyy.com",
jira_projects=["project1", "project2"],
custom_queries=["query1", "query2"],
api_key="api_key",
server_uri="server.atlassian.net"
)
source = rag.JiraSource(
queries=[jira_query],
)
response = rag.import_files(
corpus_name="projects/my-project/locations/REGION/ragCorpora/my-corpus-1",
source=source,
chunk_size=512,
chunk_overlap=100,
)
Import files from SharePoint
To import files from your SharePoint site into your corpus, do the following:
- Create a corpus, which is an index that structures and optimizes your data for searching. Follow the instructions at Create a RAG corpus.
- Create an Azure app to access your SharePoint site.
- To create a registration, go to
App Registrations.
- Provide a name for the application.
- Choose the option, Accounts in this organizational directory only.
- Verify that the redirect URIs are empty.
- In the Overview section, use your Application (client) ID as the CLIENT_ID, and use your "Directory (tenant) ID" as the TENANT_ID.
- In the Manage section, update the API permissions by doing the
following:
- Add the SharePoint
Sites.Read.All
permission. - Add the Microsoft Graph
Files.Read.All
andBrowser SiteLists.Read.All
permissions. - Grant admin consent for these permission changes to take effect.
- Add the SharePoint
- In the Manage section, do the following:
- Update Certificates and Secrets with a new client secret.
- Use the API_KEY_SECRET_VERSION to add the secret value to the Secret Manager.
- To create a registration, go to
App Registrations.
- Grant Secret Manager Secret Accessor role to your project's Vertex AI RAG Engine service account.
- Use {YOUR_ORG_ID}.sharepoint.com as the SHAREPOINT_SITE_NAME.
- A drive name or drive ID in the SharePoint site must be specified in the request.
- Optional: A folder path or folder ID on the drive can be specified. If the folder path or folder ID isn't specified, all of the folders and files on the drive are imported.
curl
CLIENT_ID=SHAREPOINT_CLIENT_ID
API_KEY_SECRET_VERSION=SHAREPOINT_API_KEY_SECRET_VERSION
TENANT_ID=SHAREPOINT_TENANT_ID
SITE_NAME=SHAREPOINT_SITE_NAME
FOLDER_PATH=SHAREPOINT_FOLDER_PATH
DRIVE_NAME=SHAREPOINT_DRIVE_NAME
curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/REGION>/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
"import_rag_files_config": {
"sharePointSources": {
"sharePointSource": [{
"clientId": "'"${ CLIENT_ID }"'",
"apiKeyConfig": {
"apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
},
"tenantId": "'"${ TENANT_ID }"'",
"sharepointSiteName": "'"${ SITE_NAME }"'",
"sharepointFolderPath": "'"${ FOLDER_PATH }"'",
"driveName": "'"${ DRIVE_NAME }"'"
}]
}
}
}'
Python
from vertexai.preview import rag
from vertexai.preview.rag.utils import resources
CLIENT_ID="SHAREPOINT_CLIENT_ID"
API_KEY_SECRET_VERSION="SHAREPOINT_API_KEY_SECRET_VERSION"
TENANT_ID="SHAREPOINT_TENANT_ID"
SITE_NAME="SHAREPOINT_SITE_NAME"
FOLDER_PATH="SHAREPOINT_FOLDER_PATH"
DRIVE_NAME="SHAREPOINT_DRIVE_NAME"
# SharePoint Example.
source = resources.SharePointSources(
share_point_sources=[
resources.SharePointSource(
client_id=CLIENT_ID,
client_secret=API_KEY_SECRET_VERSION,
tenant_id=TENANT_ID,
sharepoint_site_name=SITE_NAME,
folder_path=FOLDER_PATH,
drive_id=DRIVE_ID,
)
]
)
response = rag.import_files(
corpus_name="projects/my-project/locations/REGION/ragCorpora/my-corpus-1",
source=source,
chunk_size=512,
chunk_overlap=100,
)