Use data connectors with LlamaIndex on Vertex AI for RAG

This page shows you how to use data connectors to access your data stored in Cloud Storage, Google Drive, Slack, or Jira and how to use that data with LlamaIndex on Vertex AI for RAG. The Import RagFiles API provides data connectors to these data sources.

Import files from Cloud Storage or Google Drive

To import files from Cloud Storage or Google Drive into your corpus, do the following:

  1. Create a corpus by following the instructions at Create a RAG corpus.
  2. Import your files from Cloud Storage or Google Drive by using the template.

Import files from Slack

To import files from Slack into your corpus, do the following:

  1. Create a corpus, which is an index that structures and optimizes your data for searching. Follow the instructions at Create a RAG corpus.
  2. Get your CHANNEL_ID from the Slack channel ID.
  3. Create and set up an app to use with LlamaIndex on Vertex AI for RAG.
    1. From the Slack UI, in the Add features and functionality section, click Permissions.
    2. Add the following permissions:
      • channels:history
      • groups:history
      • im:history
      • mpim:history
    3. Click Install to Workspace to install the app into your Slack workspace.
  4. Click Copy to get your API token, which authenticates your identity and grants you access to an API.
  5. Add your API token to your Secret Manager.
  6. To view the stored secret, grant the Secret Manager Secret Accessor role to your project's LlamaIndex on Vertex AI for RAG service account.

The following curl and Python code samples demonstrate how to import files from your Slack resources.

curl

If you want to get messages from a specific channel, change the CHANNEL_ID.

API_KEY_SECRET_VERSION=SLACK_API_KEY_SECRET_VERSION
CHANNEL_ID=SLACK_CHANNEL_ID
PROJECT_ID=us-central1

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/${ PROJECT_ID }/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
  "import_rag_files_config": {
    "slack_source": {
      "channels": [
        {
          "apiKeyConfig": {
            "apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
          },
          "channels": [
            {
              "channel_id": "'"${ CHANNEL_ID }"'"
            }
          ]
        }
      ]
    }
  }
}'

Python

If you want to get messages for a given range of time or from a specific channel, change any of the following fields:

  • START_TIME
  • END_TIME
  • CHANNEL1 or CHANNEL2
    # Slack example
    start_time = protobuf.timestamp_pb2.Timestamp()
    start_time.GetCurrentTime()
    end_time = protobuf.timestamp_pb2.Timestamp()
    end_time.GetCurrentTime()
    source = rag.SlackChannelsSource(
        channels = [
            SlackChannel("CHANNEL1", "api_key1"),
            SlackChannel("CHANNEL2", "api_key2", START_TIME, END_TIME)
        ],
    )

    response = rag.import_files(
        corpus_name="projects/my-project/locations/us-central1/ragCorpora/my-corpus-1",
        source=source,
        chunk_size=512,
        chunk_overlap=100,
    )

Import files from Jira

To import files from Jira into your corpus, do the following:

  1. Create a corpus, which is an index that structures and optimizes your data for searching. Follow the instructions at Create a RAG corpus.
  2. To create an API token, sign in to the Atlassian site.
  3. Use {YOUR_ORG_ID}.atlassian.net as the SERVER_URI in the request.
  4. Use your Atlassian email as the EMAIL in the request.
  5. Provide projects or customQueries with your request. To learn more about custom queries, see Use advanced search with Jira Query Language (JQL).

    When you import projects, projects is expanded into the corresponding queries to get the entire project. For example, MyProject is expanded to project = MyProject.

  6. Click Copy to get your API token, which authenticates your identity and grants you access to an API.
  7. Add your API token to your Secret Manager.
  8. Grant Secret Manager Secret Accessor role to your project's LlamaIndex on Vertex AI for RAG service account.

curl

EMAIL=JIRA_EMAIL
API_KEY_SECRET_VERSION=JIRA_API_KEY_SECRET_VERSION
SERVER_URI=JIRA_SERVER_URI
CUSTOM_QUERY=JIRA_CUSTOM_QUERY
PROJECT_ID=JIRA_PROJECT
REGION= "us-central1"

curl -X POST \
-H "Authorization: Bearer $(gcloud auth print-access-token)" \
-H "Content-Type: application/json" \
https://${ ENDPOINT }/v1beta1/projects/${ PROJECT_ID }/locations/REGION>/ragCorpora/${ RAG_CORPUS_ID }/ragFiles:import \
-d '{
  "import_rag_files_config": {
    "jiraSource": {
      "jiraQueries": [{
        "projects": ["'"${ PROJECT_ID }"'"],
        "customQueries": ["'"${ CUSTOM_QUERY }"'"],
        "email": "'"${ EMAIL }"'",
        "serverUri": "'"${ SERVER_URI }"'",
        "apiKeyConfig": {
          "apiKeySecretVersion": "'"${ API_KEY_SECRET_VERSION }"'"
        }
      }]
    }
  }
}'

Python

    # Jira Example
    jira_query = rag.JiraQuery(
        email="xxx@yyy.com",
        jira_projects=["project1", "project2"],
        custom_queries=["query1", "query2"],
        api_key="api_key",
        server_uri="server.atlassian.net"
    )
    source = rag.JiraSource(
        queries=[jira_query],
    )

    response = rag.import_files(
        corpus_name="projects/my-project/locations/REGION/ragCorpora/my-corpus-1",
        source=source,
        chunk_size=512,
        chunk_overlap=100,
    )

What's next