本頁面由 Cloud Translation API 翻譯而成。

Vector Search 快速入門導覽課程

在 Vertex AI Vector Search 快速入門中，瞭解如何從虛構的電子商務服飾網站中，建立樣本資料集的索引。為了方便您瞭解本快速入門導覽課程，我們已建立嵌入項目。本快速入門課程旨在協助您在 30 分鐘內開始建立及部署索引。

事前準備

這個教學課程需要 Google Cloud 與帳單帳戶連結的專案。如要建立新專案，請參閱「設定專案和開發環境」。您需要建立專案並設定帳單帳戶。

選擇執行階段環境

本教學課程可在 Colab 或 Vertex AI Workbench 上執行。

Colab：在 Colab 中開啟本教學課程
Vertex AI Workbench：在 Vertex AI Workbench 中開啟本教學課程。如果您是第一次在 Google Cloud 專案中使用 Vertex AI Workbench，請前往 Google Cloud 控制台的 Vertex AI Workbench 專區，然後按一下「啟用」，啟用 Notebooks API。

如要在 GitHub 中查看這個筆記本，請參閱 GitHub。

完成本快速入門導覽課程的費用

完成本教學課程大約需要幾美元。本教學課程中使用的 Google Cloud 服務定價資訊，請參閱以下頁面：

您也可以使用Pricing Calculator，根據您的預測使用量產生預估費用。

設定

開始使用 Vertex AI 前，您需要設定下列項目：

安裝 Vertex AI SDK for Python
設定環境變數
驗證 (僅限 Colab)
設定 IAM 權限
啟用 API

安裝 Vertex AI SDK for Python

您可以透過多種方式存取 Vertex AI 和 Cloud Storage API，包括 REST API 和 Python 適用的 Vertex AI SDK。本教學課程會使用 Python 適用的 Vertex AI SDK。

!pip install --upgrade --user google-cloud-aiplatform>=1.29.0 google-cloud-storage

如要在這個 Jupyter 執行階段中使用新安裝的套件，您必須重新啟動執行階段，如以下程式碼片段所示。

# Restart kernel after installs so that your environment can access the new packages
import IPython

app = IPython.Application.instance()
app.kernel.do_shutdown(True)

環境變數

設定環境變數。如果系統要求，請將 your-project-id 替換為專案 ID，然後執行該儲存格。

# get project ID
PROJECT_ID = ! gcloud config get-value project
PROJECT_ID = PROJECT_ID[0]
LOCATION = "us-central1"
if PROJECT_ID == "(unset)":
    print(f"Please set the project ID manually below")

# define project information
if PROJECT_ID == "(unset)":
  PROJECT_ID = "[your-project-id]"

# generate a unique id for this session
from datetime import datetime
UID = datetime.now().strftime("%m%d%H%M")

驗證 (僅限 Colab)

如果您要在 Colab 上執行這個筆記本，則需要執行下列單元驗證。如果您使用 Vertex AI Workbench，則不必執行這個步驟，因為該工具已預先驗證。

import sys

# if it's Colab runtime, authenticate the user with Google Cloud
if 'google.colab' in sys.modules:
    from google.colab import auth
    auth.authenticate_user()

設定身分與存取權管理權限

您必須為預設服務帳戶新增存取權限，才能使用服務。

前往 Google Cloud 控制台的「IAM」頁面。
找出預設運算服務帳戶的主體。應該會像這樣：compute@developer.gserviceaccount.com
按一下「編輯」按鈕，並授予預設運算服務帳戶下列角色：Vertex AI 使用者、Storage 管理員和服務用量管理員。

啟用 API

執行下列指令，為此 Google Cloud 專案啟用 Compute Engine、Vertex AI 和 Cloud Storage 的 API。

! gcloud services enable compute.googleapis.com aiplatform.googleapis.com storage.googleapis.com --project {PROJECT_ID}

準備範例資料

在本教學課程中，我們使用 TheLook 資料集，其中包含產品資料表，內含約 5,000 列的虛構電子商務服飾網站合成產品資料。

我們已根據這個表格準備 product-embs.json 檔案。

這個檔案採用 JSONL 格式，每列都包含產品 ID 的 ID、產品名稱的名稱，以及產品名稱在 768 個維度的嵌入資料，這些資料是先前使用 Vertex AI 文字嵌入功能產生的。

文字嵌入表示服裝產品名稱的意義。在本教學課程中，我們會使用 Vector Search 完成項目的語意搜尋。這個範例程式碼可用於其他快速推薦系統，讓您快速找到「與此相似的其他產品」。

如要進一步瞭解如何根據 BigQuery 資料表中的資料建立嵌入資料，並將其儲存在 JSON 檔案中，請參閱「文字嵌入資料 + Vertex AI 向量搜尋簡介」。

準備 Cloud Storage 中的資料

如要使用 Vertex AI 建構索引，請將嵌入檔案放在 Cloud Storage 儲存桶中。以下程式碼會完成兩項工作：

建立 Cloud Storage 值區。
將範例檔案複製到 Cloud Storage 值區。

BUCKET_URI = f"gs://{PROJECT_ID}-vs-quickstart-{UID}"

! gcloud storage buckets create $BUCKET_URI --location=$LOCATION --project=$PROJECT_ID
! gcloud storage cp "gs://github-repo/data/vs-quickstart/product-embs.json" $BUCKET_URI

如要使用向量搜尋執行查詢，您也需要將嵌入檔案複製到本機目錄：

! gcloud storage cp "gs://github-repo/data/vs-quickstart/product-embs.json" . # for query tests

建構及部署 Vector Search 索引

瞭解如何建立索引、建立索引端點，然後將索引部署至端點。

建立索引

接下來，我們要將嵌入項目載入至 Vector Search。這些 API 可在 SDK 的 aiplatform 套件中使用。

# init the aiplatform package
from google.cloud import aiplatform
aiplatform.init(project=PROJECT_ID, location=LOCATION)

使用 create_tree_ah_index 函式建立 MatchingEngineIndex (Match Engine 是 Vector Search 的舊稱)。

# create Index
my_index = aiplatform.MatchingEngineIndex.create_tree_ah_index(
    display_name = f"vs-quickstart-index-{UID}",
    contents_delta_uri = BUCKET_URI,
    dimensions = 768,
    approximate_neighbors_count = 100,
)

MatchingEngineIndex.create_tree_ah_index() 方法會建構索引。如果資料集較小，這項作業會在 10 分鐘內完成；如果資料集較大，則約需 60 分鐘或更久的時間，具體取決於資料集的大小。您可以在 Vector Search 控制台上查看索引建立狀態 Google Cloud

查看索引

建立索引的參數：

contents_delta_uri：儲存嵌入式 JSON 檔案的 Cloud Storage 目錄 URI
dimensions：每個嵌入項目的維度大小。在本例中，由於您使用的是文字嵌入 API 的嵌入資料，因此為 768。
approximate_neighbors_count：在一般情況下，您要擷取多少個相似項目

如要進一步瞭解如何建立索引和可用的參數，請參閱「建立及管理索引」一文。

建立索引端點並部署索引

如要使用索引，您必須建立索引端點。它會做為伺服器執行個體，接受索引的查詢要求。

## create `IndexEndpoint`
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint.create(
    display_name = f"vs-quickstart-index-endpoint-{UID}",
    public_endpoint_enabled = True
)

使用索引端點時，請指定專屬的已部署索引 ID 來部署索引。

DEPLOYED_INDEX_ID = f"vs_quickstart_deployed_{UID}"

# deploy the Index to the Index Endpoint
my_index_endpoint.deploy_index(
    index = my_index, deployed_index_id = DEPLOYED_INDEX_ID
)

如果這是首次將這個索引部署至索引端點，系統可能需要約 30 分鐘的時間自動建構及啟動後端。如要查看索引部署作業的狀態，請前往Google Cloud 控制台的「Vertex AI」專區，然後前往「部署和使用」專區。選取「索引」。

查看索引

使用向量搜尋執行查詢

在下列程式碼中，它會找出指定產品名稱的嵌入資料，並使用向量搜尋功能找出相似的產品名稱。

取得嵌入項目以執行查詢

首先，請載入嵌入 JSON 檔案，建立產品名稱和嵌入項目的 dict。

import json

# build dicts for product names and embs
product_names = {}
product_embs = {}
with open('product-embs.json') as f:
    for l in f.readlines():
        p = json.loads(l)
        id = p['id']
        product_names[id] = p['name']
        product_embs[id] = p['embedding']

您可以使用 product_embs 字典指定產品 ID，取得對應的嵌入項目。

 # Get the embedding for ID 6523 "cloudveil women's excursion short"
 # You can also try with other IDs such as 12711, 18090, 19536 and 11863
query_emb = product_embs['6523']

執行查詢

將嵌入項目傳遞至 Endpoint.find_neighbors() 方法，找出相似的產品名稱。

# run query
response = my_index_endpoint.find_neighbors(
    deployed_index_id = DEPLOYED_INDEX_ID,
    queries = [query_emb],
    num_neighbors = 10
)

# show the results
for idx, neighbor in enumerate(response[0]):
    print(f"{neighbor.distance:.2f} {product_names[neighbor.id]}")

即使索引中含有數十億個項目，find_neighbors() 方法也只需花費幾毫秒就能擷取類似項目，這要歸功於 ScaNN 演算法。Vector Search 也支援自動調整資源配置功能，可根據工作負載需求自動調整節點數量。

正在清除所用資源

如果您使用的是自己的 Cloud 專案 (而非 Qwiklabs 上的臨時專案)，請務必在完成本教學課程後刪除所有索引、索引端點和 Cloud Storage 儲存桶。否則，您可能會因剩餘資源產生意外費用。

如果您使用了 Workbench，可能還需要從控制台刪除 Notebook。


# wait for a confirmation
input("Press Enter to delete Index Endpoint, Index and Cloud Storage bucket:")

# delete Index Endpoint
my_index_endpoint.undeploy_all()
my_index_endpoint.delete(force = True)

# delete Index
my_index.delete()

# delete Cloud Storage bucket
! gcloud storage rm {BUCKET_URI} --recursive

公用程式

建立或部署索引可能需要一些時間，在此期間，您可能會與 Colab 執行階段失去連線。如果連線中斷，請檢查 Vector SearchGoogle Cloud 控制台，並使用現有的索引繼續操作，而不要重新建立或部署新的索引。

取得現有索引

如要取得已存在的索引物件，請將下列 your-index-id 替換為索引 ID，然後執行該儲存格。您可以查看 Vector Search Google Cloud 主控台，取得索引 ID。在 Google Cloud 控制台的 Vertex AI 專區中，前往「部署及使用」專區。選取「索引」。

查看索引

my_index_id = "[your-index-id]"
my_index = aiplatform.MatchingEngineIndex(my_index_id)

取得現有的索引端點

如要取得已存在的索引端點物件，請將下列 your-index-endpoint-id 替換為索引端點 ID，然後執行該儲存格。您可以查看 Vector Search Google Cloud 主控台，取得索引端點。在 Google Cloud 控制台的 Vertex AI 專區中，前往「部署及使用」專區。選取「索引端點」。

查看索引端點

my_index_endpoint_id = "[your-index-endpoint-id]"
my_index_endpoint = aiplatform.MatchingEngineIndexEndpoint(my_index_endpoint_id)

Vector Search 快速入門導覽課程 透過集合功能整理內容 你可以依據偏好儲存及分類內容。

事前準備

選擇執行階段環境

完成本快速入門導覽課程的費用

設定

安裝 Vertex AI SDK for Python

環境變數

驗證 (僅限 Colab)

設定身分與存取權管理權限

啟用 API

準備範例資料

準備 Cloud Storage 中的資料

建構及部署 Vector Search 索引

建立索引

建立索引端點並部署索引

使用向量搜尋執行查詢

取得嵌入項目以執行查詢

執行查詢

正在清除所用資源

公用程式

取得現有索引

取得現有的索引端點

Vector Search 快速入門導覽課程