パブリックエンドポイントを作成する

gcloud CLI または Vertex AI API を使用してモデルをデプロイするには、まずパブリックエンドポイントを作成する必要があります。

既存のパブリックエンドポイントがある場合は、この手順をスキップして、gcloud CLI または Vertex AI API を使用してモデルをデプロイするに進んでください。

このドキュメントでは、新しいパブリックエンドポイントを作成するプロセスについて説明します。

専用パブリックエンドポイントを作成する（推奨）

専用パブリックエンドポイントのデフォルトのリクエストタイムアウトは 10 分です。Vertex AI API と Vertex AI SDK for Python では、次の例に示すように、新しい inferenceTimeout 値を含む clientConnectionConfig オブジェクトを追加して、別のリクエストタイムアウトを指定できます。タイムアウトの最大値は 3,600 秒（1 時間）です。

Google Cloud コンソール

Google Cloud コンソールの [Vertex AI] セクションで、[オンライン予測] ページに移動します。
[オンライン予測] ページに移動
[ 作成] をクリックします。
[新しいエンドポイント] ペインで、次の操作を行います。

[エンドポイント名] を入力します。
アクセスタイプの [標準] を選択します。
[専用 DNS を有効にする] チェックボックスをオンにします。
[続行] をクリックします。

[完了] をクリックします。

REST

リクエストのデータを使用する前に、次のように置き換えます。

LOCATION_ID: 使用するリージョン。
PROJECT_ID: 実際のプロジェクト ID。
ENDPOINT_NAME: エンドポイントの表示名。
INFERENCE_TIMEOUT_SECS: （省略可）省略可能な inferenceTimeout フィールドの秒数。

HTTP メソッドと URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

リクエストの本文（JSON）:

{
  "display_name": "ENDPOINT_NAME"
  "dedicatedEndpointEnabled": true,
  "clientConnectionConfig": {
    "inferenceTimeout": {
      "seconds": INFERENCE_TIMEOUT_SECS
    }
  }
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ユーザーアカウントで gcloud CLI にログインしているか、Cloud Shell を使用して自動的に gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell（Windows）

注: 次のコマンドは、gcloud init または gcloud auth login を実行して、ご自分のユーザーアカウントで gcloud CLI にログインしていることを前提としています。gcloud auth list を実行すると、現在アクティブなアカウントを確認できます。

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

レスポンスに

"done":
true

が含まれるまで、オペレーションのステータスをポーリングできます。

Python

このサンプルを試す前に、Vertex AI クイックスタート: クライアントライブラリの使用にある Python の設定手順を完了してください。詳細については、Vertex AI Python API のリファレンスドキュメントをご覧ください。

Vertex AI に対する認証を行うには、アプリケーションのデフォルト認証情報を設定します。詳細については、ローカル開発環境の認証を設定するをご覧ください。

次のように置き換えます。

PROJECT_ID: 実際のプロジェクト ID。
LOCATION_ID: Vertex AI を使用するリージョン。
ENDPOINT_NAME: エンドポイントの表示名。
INFERENCE_TIMEOUT_SECS: （省略可）省略可能な inference_timeout 値の秒数。

from google.cloud import aiplatform

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION_ID"
ENDPOINT_NAME = "ENDPOINT_NAME"
INFERENCE_TIMEOUT_SECS = "INFERENCE_TIMEOUT_SECS"

aiplatform.init(
    project=PROJECT_ID,
    location=LOCATION,
    api_endpoint=ENDPOINT_NAME,
)

dedicated_endpoint = aiplatform.Endpoint.create(
    display_name=DISPLAY_NAME,
    dedicated_endpoint_enabled=True,
    sync=True,
    inference_timeout=INFERENCE_TIMEOUT_SECS,
)

推論のタイムアウト構成

推論リクエストのデフォルトのタイムアウト時間は 600 秒（10 分）です。このタイムアウトは、エンドポイントの作成時に明示的な推論タイムアウトが指定されていない場合に適用されます。最大許容タイムアウト値は 1 時間です。

エンドポイントの作成時に推論タイムアウトを構成するには、次のコードスニペットに示すように、inference_timeout パラメータを使用します。

timeout_endpoint = aiplatform.Endpoint.create(
    display_name="dedicated-endpoint-with-timeout",
    dedicated_endpoint_enabled=True,
    inference_timeout=1800,  # Unit: Seconds
)

エンドポイントの作成後に推論タイムアウト設定を変更するには、EndpointService.UpdateEndpointLongRunning メソッドを使用します。EndpointService.UpdateEndpoint メソッドはこの変更をサポートしていません。

リクエスト / レスポンスロギング

リクエスト / レスポンスロギング機能は、API インタラクションをキャプチャします。ただし、BigQuery の制限に準拠するため、サイズが 10 MB を超えるペイロードはログから除外されます。

エンドポイントの作成時にリクエスト / レスポンスロギングを有効にして構成するには、次のコードスニペットに示すように、次のパラメータを使用します。

logging_endpoint = aiplatform.Endpoint.create(
    display_name="dedicated-endpoint-with-logging",
    dedicated_endpoint_enabled=True,
    enable_request_response_logging=True,
    request_response_logging_sampling_rate=1.0,  # Default: 0.0
    request_response_logging_bq_destination_table="bq://test_logging",
    # If not set, a new BigQuery table will be created with the name:
    # bq://{project_id}.logging_{endpoint_display_name}_{endpoint_id}.request_response_logging
)

エンドポイントの作成後にリクエスト / レスポンスロギング設定を変更するには、EndpointService.UpdateEndpointLongRunning メソッドを使用します。EndpointService.UpdateEndpoint メソッドはこの変更をサポートしていません。

共有パブリックエンドポイントを作成する

Google Cloud コンソール

Google Cloud コンソールの [Vertex AI] セクションで、[オンライン予測] ページに移動します。
[オンライン予測] ページに移動
[ 作成] をクリックします。
[新しいエンドポイント] ペインで、次の操作を行います。

[エンドポイント名] を入力します。
アクセスタイプの [標準] を選択します。
[続行] をクリックします。

[完了] をクリックします。

gcloud

次の例では、gcloud ai endpoints create コマンドを使用します。

gcloud ai endpoints create \
    --region=LOCATION_ID \
    --display-name=ENDPOINT_NAME

次のように置き換えます。

LOCATION_ID: Vertex AI を使用するリージョン。
ENDPOINT_NAME: エンドポイントの表示名。

Google Cloud CLI ツールがエンドポイントを作成するまでに数秒かかる場合があります。

REST

リクエストのデータを使用する前に、次のように置き換えます。

LOCATION_ID: 使用するリージョン。
PROJECT_ID: 実際のプロジェクト ID。
ENDPOINT_NAME: エンドポイントの表示名。

HTTP メソッドと URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints

リクエストの本文（JSON）:

{
  "display_name": "ENDPOINT_NAME"
}

リクエストを送信するには、次のいずれかのオプションを展開します。

curl（Linux、macOS、Cloud Shell）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints"

PowerShell（Windows）

リクエスト本文を request.json という名前のファイルに保存して、次のコマンドを実行します。

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints" | Select-Object -Expand Content

次のような JSON レスポンスが返されます。

{
  "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.CreateEndpointOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-11-05T17:45:42.812656Z",
      "updateTime": "2020-11-05T17:45:42.812656Z"
    }
  }
}

レスポンスに

"done":
true

が含まれるまで、オペレーションのステータスをポーリングできます。

Terraform

次のサンプルでは、google_vertex_ai_endpoint Terraform リソースを使用してエンドポイントを作成します。

Terraform 構成を適用または削除する方法については、基本的な Terraform コマンドをご覧ください。

# Endpoint name must be unique for the project
resource "random_id" "endpoint_id" {
  byte_length = 4
}

resource "google_vertex_ai_endpoint" "default" {
  name         = substr(random_id.endpoint_id.dec, 0, 10)
  display_name = "sample-endpoint"
  description  = "A sample Vertex AI endpoint"
  location     = "us-central1"
  labels = {
    label-one = "value-one"
  }
}

Java

このサンプルを試す前に、Vertex AI クイックスタート: クライアントライブラリの使用にある Java の設定手順を完了してください。詳細については、Vertex AI Java API のリファレンスドキュメントをご覧ください。


import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.CreateEndpointOperationMetadata;
import com.google.cloud.aiplatform.v1.Endpoint;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.LocationName;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreateEndpointSample {

  public static void main(String[] args)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "YOUR_PROJECT_ID";
    String endpointDisplayName = "YOUR_ENDPOINT_DISPLAY_NAME";
    createEndpointSample(project, endpointDisplayName);
  }

  static void createEndpointSample(String project, String endpointDisplayName)
      throws IOException, InterruptedException, ExecutionException, TimeoutException {
    EndpointServiceSettings endpointServiceSettings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient endpointServiceClient =
        EndpointServiceClient.create(endpointServiceSettings)) {
      String location = "us-central1";
      LocationName locationName = LocationName.of(project, location);
      Endpoint endpoint = Endpoint.newBuilder().setDisplayName(endpointDisplayName).build();

      OperationFuture<Endpoint, CreateEndpointOperationMetadata> endpointFuture =
          endpointServiceClient.createEndpointAsync(locationName, endpoint);
      System.out.format("Operation name: %s\n", endpointFuture.getInitialFuture().get().getName());
      System.out.println("Waiting for operation to finish...");
      Endpoint endpointResponse = endpointFuture.get(300, TimeUnit.SECONDS);

      System.out.println("Create Endpoint Response");
      System.out.format("Name: %s\n", endpointResponse.getName());
      System.out.format("Display Name: %s\n", endpointResponse.getDisplayName());
      System.out.format("Description: %s\n", endpointResponse.getDescription());
      System.out.format("Labels: %s\n", endpointResponse.getLabelsMap());
      System.out.format("Create Time: %s\n", endpointResponse.getCreateTime());
      System.out.format("Update Time: %s\n", endpointResponse.getUpdateTime());
    }
  }
}

Node.js

このサンプルを試す前に、Vertex AI クイックスタート: クライアントライブラリの使用にある Node.js の設定手順を完了してください。詳細については、Vertex AI Node.js API のリファレンスドキュメントをご覧ください。

/**
 * TODO(developer): Uncomment these variables before running the sample.\
 * (Not necessary if passing values as arguments)
 */

// const endpointDisplayName = 'YOUR_ENDPOINT_DISPLAY_NAME';
// const project = 'YOUR_PROJECT_ID';
// const location = 'YOUR_PROJECT_LOCATION';

// Imports the Google Cloud Endpoint Service Client library
const {EndpointServiceClient} = require('@google-cloud/aiplatform');

// Specifies the location of the api endpoint
const clientOptions = {
  apiEndpoint: 'us-central1-aiplatform.googleapis.com',
};

// Instantiates a client
const endpointServiceClient = new EndpointServiceClient(clientOptions);

async function createEndpoint() {
  // Configure the parent resource
  const parent = `projects/${project}/locations/${location}`;
  const endpoint = {
    displayName: endpointDisplayName,
  };
  const request = {
    parent,
    endpoint,
  };

  // Get and print out a list of all the endpoints for this resource
  const [response] = await endpointServiceClient.createEndpoint(request);
  console.log(`Long running operation : ${response.name}`);

  // Wait for operation to complete
  await response.promise();
  const result = response.result;

  console.log('Create endpoint response');
  console.log(`\tName : ${result.name}`);
  console.log(`\tDisplay name : ${result.displayName}`);
  console.log(`\tDescription : ${result.description}`);
  console.log(`\tLabels : ${JSON.stringify(result.labels)}`);
  console.log(`\tCreate time : ${JSON.stringify(result.createTime)}`);
  console.log(`\tUpdate time : ${JSON.stringify(result.updateTime)}`);
}
createEndpoint();

Python

Vertex AI SDK for Python のインストールまたは更新の方法については、Vertex AI SDK for Python をインストールするをご覧ください。詳細については、Python API リファレンスドキュメントをご覧ください。

def create_endpoint_sample(
    project: str,
    display_name: str,
    location: str,
):
    aiplatform.init(project=project, location=location)

    endpoint = aiplatform.Endpoint.create(
        display_name=display_name,
        project=project,
        location=location,
    )

    print(endpoint.display_name)
    print(endpoint.resource_name)
    return endpoint

次のステップ

gcloud CLI または Vertex AI API を使用してモデルをデプロイする。
オンライン推論の取得方法を確認する。

パブリック エンドポイントを作成する

専用パブリック エンドポイントを作成する（推奨）

Google Cloud コンソール

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

Python

推論のタイムアウト構成

リクエスト / レスポンス ロギング

共有パブリック エンドポイントを作成する

Google Cloud コンソール

gcloud

REST

curl（Linux、macOS、Cloud Shell）

PowerShell（Windows）

Terraform

Java

Node.js

Python

次のステップ

パブリックエンドポイントを作成する

専用パブリックエンドポイントを作成する（推奨）

リクエスト / レスポンスロギング

共有パブリックエンドポイントを作成する