Guia de início rápido da RAG

Nesta página, mostramos como usar o SDK da Vertex AI para executar tarefas do mecanismo de RAG da Vertex AI.

Você também pode acompanhar usando o notebook Introdução ao mecanismo de RAG da Vertex AI.

Funções exigidas

Grant roles to your user account. Run the following command once for each of the following IAM roles: roles/aiplatform.user

gcloud projects add-iam-policy-binding PROJECT_ID --member="user:USER_IDENTIFIER" --role=ROLE

Replace the following:

  • PROJECT_ID: Your project ID.
  • USER_IDENTIFIER: The identifier for your user account. For example, myemail@example.com.
  • ROLE: The IAM role that you grant to your user account.

Preparar o console Google Cloud

Para usar o mecanismo de RAG da Vertex AI, faça o seguinte:

  1. Instale o SDK da Vertex AI para Python.

  2. Execute este comando no console Google Cloud para configurar seu projeto.

    gcloud config set project {project}

  3. Execute este comando para autorizar o login.

    gcloud auth application-default login

Executar o Vertex AI RAG Engine

Copie e cole este exemplo de código no console Google Cloud para executar o mecanismo de RAG da Vertex AI.

Python

Para saber como instalar o SDK da Vertex AI para Python, consulte Instalar o SDK da Vertex AI para Python. Para mais informações, consulte a documentação de referência da API Python.

from vertexai import rag
from vertexai.generative_models import GenerativeModel, Tool
import vertexai

# Create a RAG Corpus, Import Files, and Generate a response

# TODO(developer): Update and un-comment below lines
# PROJECT_ID = "your-project-id"
# display_name = "test_corpus"
# paths = ["https://drive.google.com/file/d/123", "gs://my_bucket/my_files_dir"]  # Supports Google Cloud Storage and Google Drive Links

# Initialize Vertex AI API once per session
vertexai.init(project=PROJECT_ID, location="us-central1")

# Create RagCorpus
# Configure embedding model, for example "text-embedding-005".
embedding_model_config = rag.RagEmbeddingModelConfig(
    vertex_prediction_endpoint=rag.VertexPredictionEndpoint(
        publisher_model="publishers/google/models/text-embedding-005"
    )
)

rag_corpus = rag.create_corpus(
    display_name=display_name,
    backend_config=rag.RagVectorDbConfig(
        rag_embedding_model_config=embedding_model_config
    ),
)

# Import Files to the RagCorpus
rag.import_files(
    rag_corpus.name,
    paths,
    # Optional
    transformation_config=rag.TransformationConfig(
        chunking_config=rag.ChunkingConfig(
            chunk_size=512,
            chunk_overlap=100,
        ),
    ),
    max_embedding_requests_per_min=1000,  # Optional
)

# Direct context retrieval
rag_retrieval_config = rag.RagRetrievalConfig(
    top_k=3,  # Optional
    filter=rag.Filter(vector_distance_threshold=0.5),  # Optional
)
response = rag.retrieval_query(
    rag_resources=[
        rag.RagResource(
            rag_corpus=rag_corpus.name,
            # Optional: supply IDs from `rag.list_files()`.
            # rag_file_ids=["rag-file-1", "rag-file-2", ...],
        )
    ],
    text="What is RAG and why it is helpful?",
    rag_retrieval_config=rag_retrieval_config,
)
print(response)

# Enhance generation
# Create a RAG retrieval tool
rag_retrieval_tool = Tool.from_retrieval(
    retrieval=rag.Retrieval(
        source=rag.VertexRagStore(
            rag_resources=[
                rag.RagResource(
                    rag_corpus=rag_corpus.name,  # Currently only 1 corpus is allowed.
                    # Optional: supply IDs from `rag.list_files()`.
                    # rag_file_ids=["rag-file-1", "rag-file-2", ...],
                )
            ],
            rag_retrieval_config=rag_retrieval_config,
        ),
    )
)

# Create a Gemini model instance
rag_model = GenerativeModel(
    model_name="gemini-2.0-flash-001", tools=[rag_retrieval_tool]
)

# Generate response
response = rag_model.generate_content("What is RAG and why it is helpful?")
print(response.text)
# Example response:
#   RAG stands for Retrieval-Augmented Generation.
#   It's a technique used in AI to enhance the quality of responses
# ...

curl

  1. Crie um corpus RAG.

      export LOCATION=LOCATION
      export PROJECT_ID=PROJECT_ID
      export CORPUS_DISPLAY_NAME=CORPUS_DISPLAY_NAME
    
      // CreateRagCorpus
      // Output: CreateRagCorpusOperationMetadata
      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/ragCorpora \
      -d '{
            "display_name" : "'"CORPUS_DISPLAY_NAME"'"
        }'
    

    Para mais informações, consulte Criar um exemplo de corpus RAG.

  2. Importe um arquivo RAG.

      // ImportRagFiles
      // Import a single Cloud Storage file or all files in a Cloud Storage bucket.
      // Input: LOCATION, PROJECT_ID, RAG_CORPUS_ID, GCS_URIS
      export RAG_CORPUS_ID=RAG_CORPUS_ID
      export GCS_URIS=GCS_URIS
      export CHUNK_SIZE=CHUNK_SIZE
      export CHUNK_OVERLAP=CHUNK_OVERLAP
      export EMBEDDING_MODEL_QPM_RATE=EMBEDDING_MODEL_QPM_RATE
    
      // Output: ImportRagFilesOperationMetadataNumber
      // Use ListRagFiles, or import_result_sink to get the correct rag_file_id.
      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:import \
      -d '{
        "import_rag_files_config": {
          "gcs_source": {
            "uris": "GCS_URIS"
          },
          "rag_file_chunking_config": {
            "chunk_size": CHUNK_SIZE,
            "chunk_overlap": CHUNK_OVERLAP
          },
          "max_embedding_requests_per_min": EMBEDDING_MODEL_QPM_RATE
        }
      }'
    

    Para mais informações, consulte o exemplo de importação de arquivos RAG.

  3. Execute uma consulta de recuperação de RAG.

      export RAG_CORPUS_RESOURCE=RAG_CORPUS_RESOURCE
      export VECTOR_DISTANCE_THRESHOLD=VECTOR_DISTANCE_THRESHOLD
      export SIMILARITY_TOP_K=SIMILARITY_TOP_K
    
      {
      "vertex_rag_store": {
          "rag_resources": {
            "rag_corpus": "RAG_CORPUS_RESOURCE"
          },
          "vector_distance_threshold": VECTOR_DISTANCE_THRESHOLD
        },
        "query": {
        "text": TEXT
        "similarity_top_k": SIMILARITY_TOP_K
        }
      }
    
      curl -X POST \
          -H "Authorization: Bearer $(gcloud auth print-access-token)" \
          -H "Content-Type: application/json; charset=utf-8" \
          -d @request.json \
          "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts"
    

    Para mais informações, consulte a API do mecanismo RAG.

  4. Gerar conteúdo.

    {
    "contents": {
      "role": "USER",
      "parts": {
        "text": "INPUT_PROMPT"
      }
    },
    "tools": {
      "retrieval": {
      "disable_attribution": false,
      "vertex_rag_store": {
        "rag_resources": {
          "rag_corpus": "RAG_CORPUS_RESOURCE"
        },
        "similarity_top_k": "SIMILARITY_TOP_K",
        "vector_distance_threshold": VECTOR_DISTANCE_THRESHOLD
      }
      }
    }
    }
    
    curl -X POST \
        -H "Authorization: Bearer $(gcloud auth print-access-token)" \
        -H "Content-Type: application/json; charset=utf-8" \
        -d @request.json \
        "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD"
    

    Para mais informações, consulte a API do mecanismo RAG.

A seguir