建立自訂推薦內容資料儲存庫

如要建立資料儲存庫,並擷取自訂推薦內容的資料,請前往您打算使用的來源專屬的章節:

BigQuery

您可以透過兩種方式,從 BigQuery 資料表建立資料儲存庫:

  • 一次性擷取:您將資料從 BigQuery 資料表匯入資料儲存庫。除非您手動重新整理資料,否則資料儲存庫中的資料不會變更。

  • 定期擷取:您可以從一或多個 BigQuery 資料表匯入資料,並設定同步頻率,決定資料儲存空間更新 BigQuery 資料集最新資料的頻率。

下表比較了兩種將 BigQuery 資料匯入 Vertex AI Search 資料儲存庫的方式。

一次性攝入 定期擷取
正式發布 (GA)。 公開測試。
必須手動重新整理資料。 資料會每隔 1、3 或 5 天自動更新。無法手動重新整理資料。
Vertex AI Search 會根據 BigQuery 中的一個資料表建立單一資料儲存庫。 Vertex AI Search 會為 BigQuery 資料集建立資料連接器,並為每個指定的資料表建立資料儲存庫 (稱為實體資料儲存庫)。每個資料連接器的資料表都必須使用相同的資料類型 (例如結構化資料),且位於相同的 BigQuery 資料集中。
您可以先從一個資料表擷取資料,然後從其他來源或 BigQuery 資料表擷取更多資料,藉此將多個資料表的資料合併至一個資料儲存庫。 由於系統不支援手動匯入資料,實體資料儲存庫中的資料只能來自一個 BigQuery 資料表。
支援資料來源存取權控管。 不支援資料來源存取權控管。匯入的資料可能包含存取控制項,但系統不會遵循這些控制項。
您可以使用Google Cloud 控制台或 API 建立資料儲存庫。 您必須使用控制台建立資料連接器及其實體資料儲存庫。
符合 CMEK 規定。 符合 CMEK 規定。

從 BigQuery 匯入一次

如要從 BigQuery 資料表擷取資料,請按照下列步驟建立資料儲存庫,然後使用 Google Cloud 主控台或 API 擷取資料。

匯入資料前,請先參閱「為資料進行擷取的準備作業」。

控制台

如要使用 Google Cloud 控制台從 BigQuery 擷取資料,請按照下列步驟操作:

  1. 前往 Google Cloud 控制台的「AI Applications」頁面。

    AI 應用程式

  2. 前往「資料儲存庫」頁面。

  3. 按一下 「建立資料儲存庫」

  4. 在「Source」頁面中,選取「BigQuery」

  5. 在「要匯入哪種資料」部分,選取要匯入的資料類型。

  6. 在「Synchronization frequency」部分選取「One time」

  7. 在「BigQuery 路徑」欄位中,按一下「瀏覽」,選取您準備用於擷取的資料表,然後按一下「選取」。或者,您也可以直接在「BigQuery 路徑」欄位中輸入資料表位置。

  8. 按一下「繼續」

  9. 如果要一次性匯入結構化資料,請按照下列步驟操作:

    1. 將欄位對應至主要屬性。

    2. 如果結構定義缺少重要欄位,請使用「Add new field」新增這些欄位。

      詳情請參閱「關於自動偵測和編輯功能」。

    3. 按一下「繼續」

  10. 選擇資料儲存庫的地區。

  11. 輸入資料儲存庫的名稱。

  12. 按一下 [建立]。

  13. 如要查看擷取狀態,請前往「資料儲存庫」頁面,然後點選資料儲存庫名稱,即可在「資料」頁面查看相關詳細資料。當「活動」分頁的狀態欄從「進行中」變更為「匯入完成」,表示擷取作業已完成。

    視資料大小而定,擷取作業可能需要數分鐘到數小時才能完成。

REST

如要使用指令列建立資料儲存庫,並從 BigQuery 匯入資料,請按照下列步驟操作。

  1. 建立資料儲存庫。

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DATA_STORE_DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
    }'
    

    更改下列內容:

    • PROJECT_ID: Google Cloud 專案的 ID。
    • DATA_STORE_ID:您要建立的 Vertex AI Search 資料儲存庫 ID。這個 ID 只能包含小寫英文字母、數字、底線和連字號。
    • DATA_STORE_DISPLAY_NAME:您要建立的 Vertex AI Search 資料儲存庫的顯示名稱。
  2. 從 BigQuery 匯入資料。

    如果您定義了結構定義,請確認資料符合該結構定義。

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
    -d '{
      "bigquerySource": {
        "projectId": "PROJECT_ID",
        "datasetId":"DATASET_ID",
        "tableId": "TABLE_ID",
        "dataSchema": "DATA_SCHEMA",
        "aclEnabled": "BOOLEAN"
      },
      "reconciliationMode": "RECONCILIATION_MODE",
      "autoGenerateIds": "AUTO_GENERATE_IDS",
      "idField": "ID_FIELD",
      "errorConfig": {
        "gcsPrefix": "ERROR_DIRECTORY"
      }
    }'
    

    更改下列內容:

    • PROJECT_ID: Google Cloud 專案的 ID。
    • DATA_STORE_ID:Vertex AI Search 資料儲存庫的 ID。
    • DATASET_ID:BigQuery 資料集的 ID。
    • TABLE_ID:BigQuery 資料表的 ID。
      • 如果 BigQuery 資料表不在 PROJECT_ID 下,您必須為 BigQuery 資料表授予服務帳戶 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com「BigQuery 資料檢視者」權限。舉例來說,如果您要從來源專案「123」匯入 BigQuery 資料表到目的專案「456」,請為專案「123」中的 BigQuery 資料表授予 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 權限。
    • DATA_SCHEMA:選用。值為 documentcustom。預設為 document
      • document:您使用的 BigQuery 資料表必須符合「準備要擷取的資料」中提供的預設 BigQuery 結構定義。您可以自行定義每份文件的 ID,並將所有資料包裝在 jsonData 字串中。
      • custom:系統會接受任何 BigQuery 資料表結構定義,並為每個匯入的文件自動產生 ID。
    • ERROR_DIRECTORY:選用。匯入作業的錯誤資訊 Cloud Storage 目錄,例如 gs://<your-gcs-bucket>/directory/import_errors。Google 建議您將這個欄位留空,讓 Vertex AI Search 自動建立暫存資料夾。
    • RECONCILIATION_MODE:選用。值為 FULLINCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致從 BigQuery 到資料儲存庫的資料逐漸重新整理。這會執行更新/插入作業,新增新文件,並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新設定。換句話說,新建和更新的文件會新增至資料儲存庫,而未在 BigQuery 中的文件會從資料儲存庫中移除。如果您想自動刪除不再需要的文件,FULL 模式就很實用。
    • AUTO_GENERATE_IDS:選用。指定是否要自動產生文件 ID。如果設為 true,系統會根據酬載的雜湊值產生文件 ID。請注意,產生的文件 ID 在多次匯入後可能會有所變動。如果您要為多個匯入作業自動產生 ID,Google 強烈建議您將 reconciliationMode 設為 FULL,以便維持一致的文件 ID。

      只有在 bigquerySource.dataSchema 設為 custom 時,才能指定 autoGenerateIds。否則會傳回 INVALID_ARGUMENT 錯誤。如果未指定 autoGenerateIds 或將其設為 false,則必須指定 idField。否則文件無法匯入。

    • ID_FIELD:選用。指定哪些欄位是文件 ID。對於 BigQuery 來源檔案,idField 會指出 BigQuery 資料表中包含文件 ID 的欄位名稱。

      只有在下列情況下,才指定 idField:(1) bigquerySource.dataSchema 設為 custom,以及 (2) auto_generate_ids 設為 false 或未指定。否則會傳回 INVALID_ARGUMENT 錯誤。

      BigQuery 資料欄名稱的值必須是字串類型,長度介於 1 到 63 個半形字元之間,且必須符合 RFC-1034 規定。否則文件無法匯入。

C#

詳情請參閱 AI Applications C# API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

匯入文件

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

詳情請參閱 AI Applications Go API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

匯入文件


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

詳情請參閱 AI Applications Java API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

匯入文件

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

詳情請參閱 AI Applications Node.js API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

匯入文件

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

詳情請參閱 AI Applications Python API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"
# bigquery_dataset = "YOUR_BIGQUERY_DATASET"
# bigquery_table = "YOUR_BIGQUERY_TABLE"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    bigquery_source=discoveryengine.BigQuerySource(
        project_id=project_id,
        dataset_id=bigquery_dataset,
        table_id=bigquery_table,
        data_schema="custom",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

詳情請參閱 AI Applications Ruby API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

匯入文件

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

透過定期同步處理連結至 BigQuery

匯入資料前,請先參閱「為資料進行擷取的準備作業」。

以下程序說明如何建立資料連接器,將 BigQuery 資料集與 Vertex AI Search 資料連接器建立關聯,以及如何為每個要建立的資料儲存庫,在資料集中指定資料表。資料連接器的子項資料儲存庫稱為「實體」資料儲存庫。

資料集的資料會定期同步至實體資料儲存庫。您可以指定每天、每三天或每五天進行同步處理。

控制台

如要使用 Google Cloud 控制台建立連接器,以便定期將 BigQuery 資料集中的資料同步處理至 Vertex AI Search,請按照下列步驟操作:

  1. 前往 Google Cloud 控制台的「AI Applications」頁面。

    AI 應用程式

  2. 在導覽選單中,按一下「資料儲存庫」

  3. 按一下「Create data store」(建立資料儲存庫)

  4. 在「Source」頁面中,選取「BigQuery」

  5. 選取要匯入的資料類型。

  6. 按一下「定期」

  7. 選取「同步處理頻率」,也就是 Vertex AI Search 連接器與 BigQuery 資料集同步處理的頻率。您之後可以變更頻率。

  8. 在「BigQuery 資料集路徑」欄位中,按一下「瀏覽」,選取包含您準備用於擷取的資料表的資料集。或者,您也可以直接在「BigQuery 路徑」欄位中輸入資料表位置。路徑的格式為 projectname.datasetname

  9. 在「要同步處理的資料表」欄位中,按一下「瀏覽」,然後選取含有資料儲存庫所需資料的資料表。

  10. 如果資料集中還有其他資料表要用於資料儲存庫,請按一下「新增資料表」,然後指定這些資料表。

  11. 按一下「繼續」

  12. 選擇資料儲存庫的地區、輸入資料連接器的名稱,然後按一下「Create」(建立)

    您現在已建立資料連接器,該連接器會定期將資料與 BigQuery 資料集同步。您已建立一或多個實體資料儲存庫。資料儲存庫的名稱與 BigQuery 資料表相同。

  13. 如要查看擷取狀態,請前往「資料儲存庫」頁面,然後點選資料連接器名稱,即可在「資料」頁面 >「資料擷取活動」分頁中查看詳細資料。當「活動」分頁中的狀態欄從「進行中」變更為「成功」時,第一個擷取作業就會完成。

    視資料大小而定,擷取作業可能需要數分鐘到數小時才能完成。

設定資料來源並首次匯入資料後,資料儲存庫會根據您在設定期間選取的頻率,同步處理該來源的資料。建立資料連接器後,系統會在約一小時後進行首次同步處理。下次同步作業會在 24 小時、72 小時或 120 小時後執行。

後續步驟

  • 如要將資料儲存庫附加至應用程式,請建立應用程式,並按照「建立自訂推薦應用程式」一文中的步驟選取資料儲存庫。

  • 如要在設定應用程式和資料儲存庫後預覽或取得推薦內容,請參閱「取得推薦內容」一文。

Cloud Storage

您可以透過兩種方式,從 Cloud Storage 資料表建立資料儲存庫:

  • 一次性擷取:您將資料從 Cloud Storage 資料夾或檔案匯入資料儲存庫。除非您手動重新整理資料,否則資料儲存庫中的資料不會變更。

  • 定期擷取:您可以從 Cloud Storage 資料夾或檔案匯入資料,並設定同步頻率,決定資料儲存空間更新 Cloud Storage 位置最新資料的頻率。

下表比較了兩種將 Cloud Storage 資料匯入 Vertex AI Search 資料儲存庫的方式。

一次性攝入 定期擷取
正式發布 (GA)。 公開測試。
必須手動重新整理資料。 資料會自動每隔一天、三天或五天更新一次。無法手動重新整理資料。
Vertex AI Search 會根據 Cloud Storage 中的一個資料夾或檔案,建立單一資料儲存庫。 Vertex AI Search 會建立資料連接器,並將資料儲存庫 (稱為實體資料儲存庫) 與指定的檔案或資料夾建立關聯。每個 Cloud Storage 資料連接器都可以有單一實體資料儲存庫。
您可以先從一個 Cloud Storage 位置擷取資料,然後再從其他位置擷取更多資料,將多個檔案、資料夾和值區的資料合併至一個資料儲存庫。 由於系統不支援手動資料匯入,實體資料儲存庫中的資料只能來自一個 Cloud Storage 檔案或資料夾。
支援資料來源存取權控管。詳情請參閱「資料來源存取權控管」。 不支援資料來源存取權控管。匯入的資料可能包含存取控制項,但系統不會遵循這些控制項。
您可以使用Google Cloud 控制台或 API 建立資料儲存庫。 您必須使用控制台建立資料連接器及其實體資料儲存庫。
符合 CMEK 規定。 符合 CMEK 規定。

從 Cloud Storage 匯入一次

如要從 Cloud Storage 擷取資料,請按照下列步驟建立資料儲存庫,然後使用 Google Cloud 控制台或 API 擷取資料。

匯入資料前,請先參閱「為資料進行擷取的準備作業」。

控制台

如要使用控制台從 Cloud Storage 值區擷取資料,請按照下列步驟操作:

  1. 前往 Google Cloud 控制台的「AI Applications」頁面。

    AI 應用程式

  2. 前往「資料儲存庫」頁面。

  3. 按一下 「建立資料儲存庫」

  4. 在「Source」頁面中,選取「Cloud Storage」

  5. 在「選取要匯入的資料夾或檔案」部分,選取「資料夾」或「檔案」

  6. 按一下「瀏覽」,然後選擇您已準備好擷取的資料,然後按一下「選取」。您也可以直接在 gs:// 欄位中輸入位置。

  7. 選取要匯入的資料類型。

  8. 按一下「繼續」

  9. 如果要一次性匯入結構化資料,請按照下列步驟操作:

    1. 將欄位對應至主要屬性。

    2. 如果結構定義缺少重要欄位,請使用「Add new field」新增這些欄位。

      詳情請參閱「關於自動偵測和編輯功能」。

    3. 按一下「繼續」

  10. 選擇資料儲存庫的地區。

  11. 輸入資料儲存庫的名稱。

  12. 選用:如果您選取了非結構化文件,可以為文件選取剖析和分割選項。如要比較剖析器,請參閱「剖析文件」一文。如要瞭解分割作業,請參閱「RAG 的分割文件」。

    OCR 剖析器和版面配置剖析器可能會產生額外費用。請參閱Document AI 功能定價

    如要選取剖析器,請展開「Document processing options」,然後指定要使用的剖析器選項。

  13. 按一下 [建立]。

  14. 如要查看擷取狀態,請前往「資料儲存庫」頁面,然後點選資料儲存庫名稱,即可在「資料」頁面查看相關詳細資料。當「活動」分頁的狀態欄從「進行中」變更為「匯入完成」,表示擷取作業已完成。

    視資料大小而定,擷取作業可能需要數分鐘或數小時才能完成。

REST

如要使用指令列建立資料儲存庫,並擷取 Cloud Storage 中的資料,請按照下列步驟操作。

  1. 建立資料儲存庫。

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DATA_STORE_DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
    }'
    

    更改下列內容:

    • PROJECT_ID: Google Cloud 專案的 ID。
    • DATA_STORE_ID:您要建立的 Vertex AI Search 資料儲存庫 ID。這個 ID 只能包含小寫英文字母、數字、底線和連字號。
    • DATA_STORE_DISPLAY_NAME:您要建立的 Vertex AI Search 資料儲存庫的顯示名稱。
  2. 從 Cloud Storage 匯入資料。

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents:import" \
      -d '{
        "gcsSource": {
          "inputUris": ["INPUT_FILE_PATTERN_1", "INPUT_FILE_PATTERN_2"],
          "dataSchema": "DATA_SCHEMA",
        },
        "reconciliationMode": "RECONCILIATION_MODE",
        "autoGenerateIds": "AUTO_GENERATE_IDS",
        "idField": "ID_FIELD",
        "errorConfig": {
          "gcsPrefix": "ERROR_DIRECTORY"
        }
      }'
    

    更改下列內容:

    • PROJECT_ID: Google Cloud 專案的 ID。
    • DATA_STORE_ID:Vertex AI Search 資料儲存庫的 ID。
    • INPUT_FILE_PATTERN:Cloud Storage 中包含文件的檔案模式。

      對於結構化資料或含有結構化資料的非結構化資料,輸入檔案模式的範例為 gs://<your-gcs-bucket>/directory/object.json,而比對一或多個檔案的模式範例為 gs://<your-gcs-bucket>/directory/*.json

      以非結構化文件為例,gs://<your-gcs-bucket>/directory/*.pdf 就是其中一種。每個模式相符的檔案都會成為文件。

      如果 <your-gcs-bucket> 不在 PROJECT_ID 下,您需要為服務帳戶 service-<project number>@gcp-sa-discoveryengine.iam.gserviceaccount.com 授予 Cloud Storage 值區的「Storage 物件檢視者」權限。舉例來說,如果您要將 Cloud Storage 值區從來源專案「123」匯入目的專案「456」,請為專案「123」中的 Cloud Storage 值區授予 service-456@gcp-sa-discoveryengine.iam.gserviceaccount.com 權限。

    • DATA_SCHEMA:選用。值為 documentcustomcsvcontent。預設為 document

      • document:上傳含有中繼資料的非結構化資料,以便用於非結構化文件。檔案中的每一行都必須採用下列格式之一。您可以定義每份文件的 ID:

        • { "id": "<your-id>", "jsonData": "<JSON string>", "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
        • { "id": "<your-id>", "structData": <JSON object>, "content": { "mimeType": "<application/pdf or text/html>", "uri": "gs://<your-gcs-bucket>/directory/filename.pdf" } }
      • custom:上傳結構化文件的 JSON 檔案。資料會根據結構定義進行整理。您可以指定結構定義,否則系統會自動偵測。您可以將文件的 JSON 字串以一致的格式直接放在每行中,Vertex AI Search 會自動為每個匯入的文件產生 ID。

      • content:上傳非結構化文件 (PDF、HTML、DOC、TXT、PPTX)。每份文件的 ID 會自動產生為 SHA256(GCS_URI) 的前 128 位元,並以十六進制字串編碼。只要相符的檔案不超過 10 萬個,您就可以指定多個輸入檔案格式。

      • csv:在 CSV 檔案中加入標頭列,並將每個標頭對應至文件欄位。使用 inputUris 欄位指定 CSV 檔案路徑。

    • ERROR_DIRECTORY:選用。匯入作業的錯誤資訊 Cloud Storage 目錄,例如 gs://<your-gcs-bucket>/directory/import_errors。Google 建議您將這個欄位留空,讓 Vertex AI Search 自動建立暫存資料夾。

    • RECONCILIATION_MODE:選用。值為 FULLINCREMENTAL。預設值為 INCREMENTAL。指定 INCREMENTAL 會導致從 Cloud Storage 到資料儲存庫的資料逐漸重新整理。這會執行更新/插入作業,新增新文件,並以 ID 相同的更新文件取代現有文件。指定 FULL 會導致資料儲存庫中的文件完全重新設定基底。換句話說,系統會將新文件和更新的文件新增至資料儲存庫,並從資料儲存庫中移除不在 Cloud Storage 中的文件。如果您想自動刪除不再需要的文件,FULL 模式就很實用。

    • AUTO_GENERATE_IDS:選用。指定是否要自動產生文件 ID。如果設為 true,系統會根據酬載的雜湊產生文件 ID。請注意,產生的文件 ID 在多次匯入後可能會有所變動。如果您要透過多個匯入作業自動產生 ID,Google 強烈建議您將 reconciliationMode 設為 FULL,以便維持一致的文件 ID。

      只有在 gcsSource.dataSchema 設為 customcsv 時,才能指定 autoGenerateIds。否則會傳回 INVALID_ARGUMENT 錯誤。如果未指定 autoGenerateIds 或將其設為 false,則必須指定 idField。否則文件無法匯入。

    • ID_FIELD:選用。指定哪些欄位是文件 ID。針對 Cloud Storage 來源文件,idField 會在 JSON 欄位中指定文件 ID 的名稱。舉例來說,如果 {"my_id":"some_uuid"} 是其中一個文件中的文件 ID 欄位,請指定 "idField":"my_id"。這會將所有名稱為 "my_id" 的 JSON 欄位識別為文件 ID。

      只有在下列情況下,才指定這個欄位:(1) gcsSource.dataSchema 設為 customcsv,以及 (2) auto_generate_ids 設為 false 或未指定。否則會傳回 INVALID_ARGUMENT 錯誤。

      請注意,Cloud Storage JSON 欄位的值必須是字串類型,長度介於 1 到 63 個字元之間,且必須符合 RFC-1034 規定。否則文件無法匯入。

      請注意,id_field 指定的 JSON 欄位名稱必須是字串類型,長度介於 1 至 63 個字元之間,且必須符合 RFC-1034 規定。否則文件無法匯入。

C#

詳情請參閱 AI Applications C# API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;

public sealed partial class GeneratedDataStoreServiceClientSnippets
{
    /// <summary>Snippet for CreateDataStore</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void CreateDataStoreRequestObject()
    {
        // Create client
        DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.Create();
        // Initialize request argument(s)
        CreateDataStoreRequest request = new CreateDataStoreRequest
        {
            ParentAsCollectionName = CollectionName.FromProjectLocationCollection("[PROJECT]", "[LOCATION]", "[COLLECTION]"),
            DataStore = new DataStore(),
            DataStoreId = "",
            CreateAdvancedSiteSearch = false,
            CmekConfigNameAsCmekConfigName = CmekConfigName.FromProjectLocation("[PROJECT]", "[LOCATION]"),
            SkipDefaultSchemaCreation = false,
        };
        // Make the request
        Operation<DataStore, CreateDataStoreMetadata> response = dataStoreServiceClient.CreateDataStore(request);

        // Poll until the returned long-running operation is complete
        Operation<DataStore, CreateDataStoreMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        DataStore result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<DataStore, CreateDataStoreMetadata> retrievedResponse = dataStoreServiceClient.PollOnceCreateDataStore(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            DataStore retrievedResult = retrievedResponse.Result;
        }
    }
}

匯入文件

using Google.Cloud.DiscoveryEngine.V1;
using Google.LongRunning;
using Google.Protobuf.WellKnownTypes;

public sealed partial class GeneratedDocumentServiceClientSnippets
{
    /// <summary>Snippet for ImportDocuments</summary>
    /// <remarks>
    /// This snippet has been automatically generated and should be regarded as a code template only.
    /// It will require modifications to work:
    /// - It may require correct/in-range values for request initialization.
    /// - It may require specifying regional endpoints when creating the service client as shown in
    ///   https://cloud.google.com/dotnet/docs/reference/help/client-configuration#endpoint.
    /// </remarks>
    public void ImportDocumentsRequestObject()
    {
        // Create client
        DocumentServiceClient documentServiceClient = DocumentServiceClient.Create();
        // Initialize request argument(s)
        ImportDocumentsRequest request = new ImportDocumentsRequest
        {
            ParentAsBranchName = BranchName.FromProjectLocationDataStoreBranch("[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]"),
            InlineSource = new ImportDocumentsRequest.Types.InlineSource(),
            ErrorConfig = new ImportErrorConfig(),
            ReconciliationMode = ImportDocumentsRequest.Types.ReconciliationMode.Unspecified,
            UpdateMask = new FieldMask(),
            AutoGenerateIds = false,
            IdField = "",
            ForceRefreshContent = false,
        };
        // Make the request
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> response = documentServiceClient.ImportDocuments(request);

        // Poll until the returned long-running operation is complete
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> completedResponse = response.PollUntilCompleted();
        // Retrieve the operation result
        ImportDocumentsResponse result = completedResponse.Result;

        // Or get the name of the operation
        string operationName = response.Name;
        // This name can be stored, then the long-running operation retrieved later by name
        Operation<ImportDocumentsResponse, ImportDocumentsMetadata> retrievedResponse = documentServiceClient.PollOnceImportDocuments(operationName);
        // Check if the retrieved long-running operation has completed
        if (retrievedResponse.IsCompleted)
        {
            // If it has completed, then access the result
            ImportDocumentsResponse retrievedResult = retrievedResponse.Result;
        }
    }
}

Go

詳情請參閱 AI Applications Go API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDataStoreClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.CreateDataStoreRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#CreateDataStoreRequest.
	}
	op, err := c.CreateDataStore(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

匯入文件


package main

import (
	"context"

	discoveryengine "cloud.google.com/go/discoveryengine/apiv1"
	discoveryenginepb "cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb"
)

func main() {
	ctx := context.Background()
	// This snippet has been automatically generated and should be regarded as a code template only.
	// It will require modifications to work:
	// - It may require correct/in-range values for request initialization.
	// - It may require specifying regional endpoints when creating the service client as shown in:
	//   https://pkg.go.dev/cloud.google.com/go#hdr-Client_Options
	c, err := discoveryengine.NewDocumentClient(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	defer c.Close()

	req := &discoveryenginepb.ImportDocumentsRequest{
		// TODO: Fill request struct fields.
		// See https://pkg.go.dev/cloud.google.com/go/discoveryengine/apiv1/discoveryenginepb#ImportDocumentsRequest.
	}
	op, err := c.ImportDocuments(ctx, req)
	if err != nil {
		// TODO: Handle error.
	}

	resp, err := op.Wait(ctx)
	if err != nil {
		// TODO: Handle error.
	}
	// TODO: Use resp.
	_ = resp
}

Java

詳情請參閱 AI Applications Java API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

import com.google.cloud.discoveryengine.v1.CollectionName;
import com.google.cloud.discoveryengine.v1.CreateDataStoreRequest;
import com.google.cloud.discoveryengine.v1.DataStore;
import com.google.cloud.discoveryengine.v1.DataStoreServiceClient;

public class SyncCreateDataStore {

  public static void main(String[] args) throws Exception {
    syncCreateDataStore();
  }

  public static void syncCreateDataStore() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DataStoreServiceClient dataStoreServiceClient = DataStoreServiceClient.create()) {
      CreateDataStoreRequest request =
          CreateDataStoreRequest.newBuilder()
              .setParent(CollectionName.of("[PROJECT]", "[LOCATION]", "[COLLECTION]").toString())
              .setDataStore(DataStore.newBuilder().build())
              .setDataStoreId("dataStoreId929489618")
              .setCreateAdvancedSiteSearch(true)
              .setSkipDefaultSchemaCreation(true)
              .build();
      DataStore response = dataStoreServiceClient.createDataStoreAsync(request).get();
    }
  }
}

匯入文件

import com.google.cloud.discoveryengine.v1.BranchName;
import com.google.cloud.discoveryengine.v1.DocumentServiceClient;
import com.google.cloud.discoveryengine.v1.ImportDocumentsRequest;
import com.google.cloud.discoveryengine.v1.ImportDocumentsResponse;
import com.google.cloud.discoveryengine.v1.ImportErrorConfig;
import com.google.protobuf.FieldMask;

public class SyncImportDocuments {

  public static void main(String[] args) throws Exception {
    syncImportDocuments();
  }

  public static void syncImportDocuments() throws Exception {
    // This snippet has been automatically generated and should be regarded as a code template only.
    // It will require modifications to work:
    // - It may require correct/in-range values for request initialization.
    // - It may require specifying regional endpoints when creating the service client as shown in
    // https://cloud.google.com/java/docs/setup#configure_endpoints_for_the_client_library
    try (DocumentServiceClient documentServiceClient = DocumentServiceClient.create()) {
      ImportDocumentsRequest request =
          ImportDocumentsRequest.newBuilder()
              .setParent(
                  BranchName.ofProjectLocationDataStoreBranchName(
                          "[PROJECT]", "[LOCATION]", "[DATA_STORE]", "[BRANCH]")
                      .toString())
              .setErrorConfig(ImportErrorConfig.newBuilder().build())
              .setUpdateMask(FieldMask.newBuilder().build())
              .setAutoGenerateIds(true)
              .setIdField("idField1629396127")
              .setForceRefreshContent(true)
              .build();
      ImportDocumentsResponse response = documentServiceClient.importDocumentsAsync(request).get();
    }
  }
}

Node.js

詳情請參閱 AI Applications Node.js API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  Resource name of the CmekConfig to use for protecting this DataStore.
 */
// const cmekConfigName = 'abc123'
/**
 *  DataStore without CMEK protections. If a default CmekConfig is set for
 *  the project, setting this field will override the default CmekConfig as
 *  well.
 */
// const disableCmek = true
/**
 *  Required. The parent resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}`.
 */
// const parent = 'abc123'
/**
 *  Required. The DataStore google.cloud.discoveryengine.v1.DataStore  to
 *  create.
 */
// const dataStore = {}
/**
 *  Required. The ID to use for the
 *  DataStore google.cloud.discoveryengine.v1.DataStore, which will become
 *  the final component of the
 *  DataStore google.cloud.discoveryengine.v1.DataStore's resource name.
 *  This field must conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  standard with a length limit of 63 characters. Otherwise, an
 *  INVALID_ARGUMENT error is returned.
 */
// const dataStoreId = 'abc123'
/**
 *  A boolean flag indicating whether user want to directly create an advanced
 *  data store for site search.
 *  If the data store is not configured as site
 *  search (GENERIC vertical and PUBLIC_WEBSITE content_config), this flag will
 *  be ignored.
 */
// const createAdvancedSiteSearch = true
/**
 *  A boolean flag indicating whether to skip the default schema creation for
 *  the data store. Only enable this flag if you are certain that the default
 *  schema is incompatible with your use case.
 *  If set to true, you must manually create a schema for the data store before
 *  any documents can be ingested.
 *  This flag cannot be specified if `data_store.starting_schema` is specified.
 */
// const skipDefaultSchemaCreation = true

// Imports the Discoveryengine library
const {DataStoreServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DataStoreServiceClient();

async function callCreateDataStore() {
  // Construct request
  const request = {
    parent,
    dataStore,
    dataStoreId,
  };

  // Run request
  const [operation] = await discoveryengineClient.createDataStore(request);
  const [response] = await operation.promise();
  console.log(response);
}

callCreateDataStore();

匯入文件

/**
 * This snippet has been automatically generated and should be regarded as a code template only.
 * It will require modifications to work.
 * It may require correct/in-range values for request initialization.
 * TODO(developer): Uncomment these variables before running the sample.
 */
/**
 *  The Inline source for the input content for documents.
 */
// const inlineSource = {}
/**
 *  Cloud Storage location for the input content.
 */
// const gcsSource = {}
/**
 *  BigQuery input source.
 */
// const bigquerySource = {}
/**
 *  FhirStore input source.
 */
// const fhirStoreSource = {}
/**
 *  Spanner input source.
 */
// const spannerSource = {}
/**
 *  Cloud SQL input source.
 */
// const cloudSqlSource = {}
/**
 *  Firestore input source.
 */
// const firestoreSource = {}
/**
 *  AlloyDB input source.
 */
// const alloyDbSource = {}
/**
 *  Cloud Bigtable input source.
 */
// const bigtableSource = {}
/**
 *  Required. The parent branch resource name, such as
 *  `projects/{project}/locations/{location}/collections/{collection}/dataStores/{data_store}/branches/{branch}`.
 *  Requires create/update permission.
 */
// const parent = 'abc123'
/**
 *  The desired location of errors incurred during the Import.
 */
// const errorConfig = {}
/**
 *  The mode of reconciliation between existing documents and the documents to
 *  be imported. Defaults to
 *  ReconciliationMode.INCREMENTAL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL.
 */
// const reconciliationMode = {}
/**
 *  Indicates which fields in the provided imported documents to update. If
 *  not set, the default is to update all fields.
 */
// const updateMask = {}
/**
 *  Whether to automatically generate IDs for the documents if absent.
 *  If set to `true`,
 *  Document.id google.cloud.discoveryengine.v1.Document.id s are
 *  automatically generated based on the hash of the payload, where IDs may not
 *  be consistent during multiple imports. In which case
 *  ReconciliationMode.FULL google.cloud.discoveryengine.v1.ImportDocumentsRequest.ReconciliationMode.FULL 
 *  is highly recommended to avoid duplicate contents. If unset or set to
 *  `false`, Document.id google.cloud.discoveryengine.v1.Document.id s have
 *  to be specified using
 *  id_field google.cloud.discoveryengine.v1.ImportDocumentsRequest.id_field,
 *  otherwise, documents without IDs fail to be imported.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const autoGenerateIds = true
/**
 *  The field indicates the ID field or column to be used as unique IDs of
 *  the documents.
 *  For GcsSource google.cloud.discoveryengine.v1.GcsSource  it is the key of
 *  the JSON field. For instance, `my_id` for JSON `{"my_id": "some_uuid"}`.
 *  For others, it may be the column name of the table where the unique ids are
 *  stored.
 *  The values of the JSON field or the table column are used as the
 *  Document.id google.cloud.discoveryengine.v1.Document.id s. The JSON field
 *  or the table column must be of string type, and the values must be set as
 *  valid strings conform to RFC-1034 (https://tools.ietf.org/html/rfc1034)
 *  with 1-63 characters. Otherwise, documents without valid IDs fail to be
 *  imported.
 *  Only set this field when
 *  auto_generate_ids google.cloud.discoveryengine.v1.ImportDocumentsRequest.auto_generate_ids 
 *  is unset or set as `false`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  If it is unset, a default value `_id` is used when importing from the
 *  allowed data sources.
 *  Supported data sources:
 *  * GcsSource google.cloud.discoveryengine.v1.GcsSource.
 *  GcsSource.data_schema google.cloud.discoveryengine.v1.GcsSource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * BigQuerySource google.cloud.discoveryengine.v1.BigQuerySource.
 *  BigQuerySource.data_schema google.cloud.discoveryengine.v1.BigQuerySource.data_schema 
 *  must be `custom` or `csv`. Otherwise, an INVALID_ARGUMENT error is thrown.
 *  * SpannerSource google.cloud.discoveryengine.v1.SpannerSource.
 *  * CloudSqlSource google.cloud.discoveryengine.v1.CloudSqlSource.
 *  * FirestoreSource google.cloud.discoveryengine.v1.FirestoreSource.
 *  * BigtableSource google.cloud.discoveryengine.v1.BigtableSource.
 */
// const idField = 'abc123'
/**
 *  Optional. Whether to force refresh the unstructured content of the
 *  documents.
 *  If set to `true`, the content part of the documents will be refreshed
 *  regardless of the update status of the referencing content.
 */
// const forceRefreshContent = true

// Imports the Discoveryengine library
const {DocumentServiceClient} = require('@google-cloud/discoveryengine').v1;

// Instantiates a client
const discoveryengineClient = new DocumentServiceClient();

async function callImportDocuments() {
  // Construct request
  const request = {
    parent,
  };

  // Run request
  const [operation] = await discoveryengineClient.importDocuments(request);
  const [response] = await operation.promise();
  console.log(response);
}

callImportDocuments();

Python

詳情請參閱 AI Applications Python API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫


from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"


def create_data_store_sample(
    project_id: str,
    location: str,
    data_store_id: str,
) -> str:
    #  For more information, refer to:
    # https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
    client_options = (
        ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
        if location != "global"
        else None
    )

    # Create a client
    client = discoveryengine.DataStoreServiceClient(client_options=client_options)

    # The full resource name of the collection
    # e.g. projects/{project}/locations/{location}/collections/default_collection
    parent = client.collection_path(
        project=project_id,
        location=location,
        collection="default_collection",
    )

    data_store = discoveryengine.DataStore(
        display_name="My Data Store",
        # Options: GENERIC, MEDIA, HEALTHCARE_FHIR
        industry_vertical=discoveryengine.IndustryVertical.GENERIC,
        # Options: SOLUTION_TYPE_RECOMMENDATION, SOLUTION_TYPE_SEARCH, SOLUTION_TYPE_CHAT, SOLUTION_TYPE_GENERATIVE_CHAT
        solution_types=[discoveryengine.SolutionType.SOLUTION_TYPE_SEARCH],
        # TODO(developer): Update content_config based on data store type.
        # Options: NO_CONTENT, CONTENT_REQUIRED, PUBLIC_WEBSITE
        content_config=discoveryengine.DataStore.ContentConfig.CONTENT_REQUIRED,
    )

    request = discoveryengine.CreateDataStoreRequest(
        parent=parent,
        data_store_id=data_store_id,
        data_store=data_store,
        # Optional: For Advanced Site Search Only
        # create_advanced_site_search=True,
    )

    # Make the request
    operation = client.create_data_store(request=request)

    print(f"Waiting for operation to complete: {operation.operation.name}")
    response = operation.result()

    # After the operation is complete,
    # get information from operation metadata
    metadata = discoveryengine.CreateDataStoreMetadata(operation.metadata)

    # Handle the response
    print(response)
    print(metadata)

    return operation.operation.name

匯入文件

from google.api_core.client_options import ClientOptions
from google.cloud import discoveryengine

# TODO(developer): Uncomment these variables before running the sample.
# project_id = "YOUR_PROJECT_ID"
# location = "YOUR_LOCATION" # Values: "global"
# data_store_id = "YOUR_DATA_STORE_ID"

# Examples:
# - Unstructured documents
#   - `gs://bucket/directory/file.pdf`
#   - `gs://bucket/directory/*.pdf`
# - Unstructured documents with JSONL Metadata
#   - `gs://bucket/directory/file.json`
# - Unstructured documents with CSV Metadata
#   - `gs://bucket/directory/file.csv`
# gcs_uri = "YOUR_GCS_PATH"

#  For more information, refer to:
# https://cloud.google.com/generative-ai-app-builder/docs/locations#specify_a_multi-region_for_your_data_store
client_options = (
    ClientOptions(api_endpoint=f"{location}-discoveryengine.googleapis.com")
    if location != "global"
    else None
)

# Create a client
client = discoveryengine.DocumentServiceClient(client_options=client_options)

# The full resource name of the search engine branch.
# e.g. projects/{project}/locations/{location}/dataStores/{data_store_id}/branches/{branch}
parent = client.branch_path(
    project=project_id,
    location=location,
    data_store=data_store_id,
    branch="default_branch",
)

request = discoveryengine.ImportDocumentsRequest(
    parent=parent,
    gcs_source=discoveryengine.GcsSource(
        # Multiple URIs are supported
        input_uris=[gcs_uri],
        # Options:
        # - `content` - Unstructured documents (PDF, HTML, DOC, TXT, PPTX)
        # - `custom` - Unstructured documents with custom JSONL metadata
        # - `document` - Structured documents in the discoveryengine.Document format.
        # - `csv` - Unstructured documents with CSV metadata
        data_schema="content",
    ),
    # Options: `FULL`, `INCREMENTAL`
    reconciliation_mode=discoveryengine.ImportDocumentsRequest.ReconciliationMode.INCREMENTAL,
)

# Make the request
operation = client.import_documents(request=request)

print(f"Waiting for operation to complete: {operation.operation.name}")
response = operation.result()

# After the operation is complete,
# get information from operation metadata
metadata = discoveryengine.ImportDocumentsMetadata(operation.metadata)

# Handle the response
print(response)
print(metadata)

Ruby

詳情請參閱 AI Applications Ruby API 參考說明文件

如要向 AI Applications 進行驗證,請設定應用程式預設憑證。詳情請參閱「為本機開發環境設定驗證機制」。

建立資料儲存庫

require "google/cloud/discovery_engine/v1"

##
# Snippet for the create_data_store call in the DataStoreService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client#create_data_store.
#
def create_data_store
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DataStoreService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::CreateDataStoreRequest.new

  # Call the create_data_store method.
  result = client.create_data_store request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

匯入文件

require "google/cloud/discovery_engine/v1"

##
# Snippet for the import_documents call in the DocumentService service
#
# This snippet has been automatically generated and should be regarded as a code
# template only. It will require modifications to work:
# - It may require correct/in-range values for request initialization.
# - It may require specifying regional endpoints when creating the service
# client as shown in https://cloud.google.com/ruby/docs/reference.
#
# This is an auto-generated example demonstrating basic usage of
# Google::Cloud::DiscoveryEngine::V1::DocumentService::Client#import_documents.
#
def import_documents
  # Create a client object. The client can be reused for multiple calls.
  client = Google::Cloud::DiscoveryEngine::V1::DocumentService::Client.new

  # Create a request. To set request fields, pass in keyword arguments.
  request = Google::Cloud::DiscoveryEngine::V1::ImportDocumentsRequest.new

  # Call the import_documents method.
  result = client.import_documents request

  # The returned object is of type Gapic::Operation. You can use it to
  # check the status of an operation, cancel it, or wait for results.
  # Here is how to wait for a response.
  result.wait_until_done! timeout: 60
  if result.response?
    p result.response
  else
    puts "No response received."
  end
end

透過定期同步處理功能連線至 Cloud Storage

匯入資料前,請先參閱「為資料進行擷取的準備作業」。

以下程序說明如何建立資料連接器,將 Cloud Storage 位置與 Vertex AI Search 資料連接器建立關聯,以及如何為要建立的資料儲存庫,在該位置指定資料夾或檔案。資料連接器的子項資料儲存庫稱為「實體」資料儲存庫。

資料會定期同步至實體資料儲存庫。您可以指定每天、每三天或每五天進行同步處理。

控制台

  1. 前往 Google Cloud 控制台的「AI Applications」頁面。

    AI 應用程式

  2. 前往「資料儲存庫」頁面。

  3. 按一下「Create data store」(建立資料儲存庫)

  4. 在「Source」頁面中,選取「Cloud Storage」

  5. 選取要匯入的資料類型。

  6. 按一下「定期」

  7. 選取「同步處理頻率」,設定 Vertex AI Search 連接器與 Cloud Storage 位置同步處理的頻率。您之後可以變更頻率。

  8. 在「選取要匯入的資料夾或檔案」部分,選取「資料夾」或「檔案」

  9. 按一下「瀏覽」,然後選擇您已準備好擷取的資料,然後按一下「選取」。您也可以直接在 gs:// 欄位中輸入位置。

  10. 按一下「繼續」

  11. 選擇資料連接器的區域。

  12. 輸入資料連接器的名稱。

  13. 選用:如果您選取了非結構化文件,可以為文件選取剖析和分割選項。如要比較剖析器,請參閱「剖析文件」一文。如要瞭解分割作業,請參閱「RAG 的分割文件」。

    OCR 剖析器和版面配置剖析器可能會產生額外費用。請參閱Document AI 功能定價

    如要選取剖析器,請展開「Document processing options」,然後指定要使用的剖析器選項。

  14. 按一下 [建立]。

    您現在已建立資料連接器,該連接器會定期將資料同步至 Cloud Storage 位置。您也建立了名為 gcs_store 的實體資料儲存庫。

  15. 如要查看擷取狀態,請前往「資料儲存庫」頁面,然後按一下資料連接器名稱,在「資料」頁面查看詳細資料。

    「資料擷取活動」分頁。當「資料攝入活動」分頁中的狀態欄從「進行中」變更為「成功」時,第一個攝入作業就會完成。

    視資料大小而定,擷取作業可能需要數分鐘到數小時才能完成。

設定資料來源並首次匯入資料後,系統會根據您在設定期間選取的頻率,從該來源同步處理資料。建立資料連接器後,系統會在約一小時後進行首次同步處理。下次同步作業會在 24 小時、72 小時或 120 小時後執行。

後續步驟

  • 如要將資料儲存庫附加至應用程式,請建立應用程式,並按照「建立自訂推薦應用程式」一文中的步驟選取資料儲存庫。

  • 如要在設定應用程式和資料儲存庫後預覽或取得推薦內容,請參閱「取得推薦內容」一文。

使用 API 上傳結構化 JSON 資料

如要使用 API 直接上傳 JSON 文件或物件,請按照下列步驟操作。

匯入資料前,請準備資料以供擷取

REST

如要使用指令列建立資料儲存庫並匯入結構化 JSON 資料,請按照下列步驟操作:

  1. 建立資料儲存庫。

    curl -X POST \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    -H "X-Goog-User-Project: PROJECT_ID" \
    "https://discoveryengine.googleapis.com/v1alpha/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores?dataStoreId=DATA_STORE_ID" \
    -d '{
      "displayName": "DATA_STORE_DISPLAY_NAME",
      "industryVertical": "GENERIC",
      "solutionTypes": ["SOLUTION_TYPE_RECOMMENDATION"]
    }'
    

    更改下列內容:

    • PROJECT_ID: Google Cloud 專案的 ID。
    • DATA_STORE_ID:您要建立的推薦資料儲存庫 ID。這個 ID 只能包含小寫英文字母、數字、底線和連字號。
    • DATA_STORE_DISPLAY_NAME:您要建立的推薦資料儲存庫的顯示名稱。
  2. 選用步驟:提供您自己的結構定義。提供結構定義通常可獲得更好的結果。詳情請參閱「提供或自動偵測結構定義」。

    curl -X PATCH \
    -H "Authorization: Bearer $(gcloud auth print-access-token)" \
    -H "Content-Type: application/json" \
    "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/schemas/default_schema" \
    -d '{
      "structSchema": JSON_SCHEMA_OBJECT
    }'
    

    更改下列內容:

    • PROJECT_ID: Google Cloud 專案的 ID。
    • DATA_STORE_ID:推薦內容資料儲存庫的 ID。
    • JSON_SCHEMA_OBJECT:JSON 結構定義做為 JSON 物件,例如:

      {
        "$schema": "https://json-schema.org/draft/2020-12/schema",
        "type": "object",
        "properties": {
          "title": {
            "type": "string",
            "keyPropertyMapping": "title"
          },
          "categories": {
            "type": "array",
            "items": {
              "type": "string",
              "keyPropertyMapping": "category"
            }
          },
          "uri": {
            "type": "string",
            "keyPropertyMapping": "uri"
          }
        }
      }
      
  3. 匯入符合定義結構定義的結構化資料。

    您可以透過幾種方式上傳資料,包括:

    • 上傳 JSON 文件。

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
      -d '{
        "jsonData": "JSON_DOCUMENT_STRING"
      }'
      

      JSON_DOCUMENT_STRING 替換為 JSON 文件,並以單一字串的形式呈現。這必須符合您在前一個步驟中提供的 JSON 結構定義,例如:

      ```none
      { \"title\": \"test title\", \"categories\": [\"cat_1\", \"cat_2\"], \"uri\": \"test uri\"}
      ```
      
    • 上傳 JSON 物件。

      curl -X POST \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents?documentId=DOCUMENT_ID" \
      -d '{
        "structData": JSON_DOCUMENT_OBJECT
      }'
      

      JSON_DOCUMENT_OBJECT 替換為 JSON 文件,做為 JSON 物件。這必須符合您在前一個步驟中提供的 JSON 結構定義,例如:

      ```json
      {
        "title": "test title",
        "categories": [
          "cat_1",
          "cat_2"
        ],
        "uri": "test uri"
      }
      ```
      
    • 使用 JSON 文件進行更新。

      curl -X PATCH \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
      -d '{
        "jsonData": "JSON_DOCUMENT_STRING"
      }'
      
    • 使用 JSON 物件更新。

      curl -X PATCH \
      -H "Authorization: Bearer $(gcloud auth print-access-token)" \
      -H "Content-Type: application/json" \
      "https://discoveryengine.googleapis.com/v1beta/projects/PROJECT_ID/locations/global/collections/default_collection/dataStores/DATA_STORE_ID/branches/0/documents/DOCUMENT_ID" \
      -d '{
        "structData": JSON_DOCUMENT_OBJECT
      }'
      

後續步驟

  • 如要將資料儲存庫附加至應用程式,請建立應用程式,並按照「建立自訂推薦應用程式」一文中的步驟選取資料儲存庫。

  • 如要預覽應用程式和資料儲存庫設定完成後的推薦內容顯示方式,請參閱「取得推薦內容」。

使用 Terraform 建立資料儲存庫

您可以使用 Terraform 建立空資料儲存庫。建立空白資料儲存庫後,您可以使用 Google Cloud 控制台或 API 指令,將資料攝入資料儲存庫。

如要瞭解如何套用或移除 Terraform 設定,請參閱「基本 Terraform 指令」。

如要使用 Terraform 建立空資料儲存庫,請參閱 google_discovery_engine_data_store