gcloud CLI 또는 Vertex AI API를 사용하여 모델 배포

gcloud CLI 또는 Vertex AI API를 사용하여 공개 엔드포인트에 모델을 배포하려면 기존 엔드포인트의 엔드포인트 ID를 가져온 다음 이 엔드포인트에 모델을 배포해야 합니다.

엔드포인트 ID 가져오기

모델을 배포하려면 엔드포인트 ID가 필요합니다.

gcloud

다음 예시에서는 gcloud ai endpoints list 명령어를 사용합니다.

  gcloud ai endpoints list \
      --region=LOCATION_ID \
      --filter=display_name=ENDPOINT_NAME

다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
ENDPOINT_NAME: 엔드포인트의 표시 이름

ENDPOINT_ID 열에 표시되는 번호를 확인합니다. 다음 단계에서 이 ID를 사용합니다.

REST

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
PROJECT_ID: 프로젝트 ID
ENDPOINT_NAME: 엔드포인트의 표시 이름

HTTP 메서드 및 URL:

GET https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하거나 gcloud CLI에 자동으로 로그인하는 Cloud Shell을 사용하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

다음 명령어를 실행합니다.

curl -X GET \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME"

PowerShell(Windows)

참고: 다음 명령어는 gcloud init 또는 gcloud auth login을 실행하여 사용자 계정으로 gcloud CLI에 로그인했다고 가정합니다. gcloud auth list를 실행하면 현재 활성 계정을 확인할 수 있습니다.

다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method GET `
    -Headers $headers `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints?filter=display_name=ENDPOINT_NAME" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "endpoints": [
    {
      "name": "projects/PROJECT_NUMBER/locations/LOCATION_ID/endpoints/ENDPOINT_ID",
      "displayName": "ENDPOINT_NAME",
      "etag": "AMEw9yPz5pf4PwBHbRWOGh0PcAxUdjbdX2Jm3QO_amguy3DbZGP5Oi_YUKRywIE-BtLx",
      "createTime": "2020-04-17T18:31:11.585169Z",
      "updateTime": "2020-04-17T18:35:08.568959Z"
    }
  ]
}

ENDPOINT_ID를 확인합니다.

Python용 Vertex AI SDK

Python용 Vertex AI SDK를 설치하거나 업데이트하는 방법은 Python용 Vertex AI SDK 설치를 참조하세요. 자세한 내용은 Python용 Vertex AI SDK API 참조 문서를 확인하세요.

다음을 바꿉니다.

PROJECT_ID: 프로젝트 ID입니다.
LOCATION_ID: Vertex AI를 사용하는 리전
ENDPOINT_NAME: 엔드포인트의 표시 이름

from google.cloud import aiplatform

PROJECT_ID = "PROJECT_ID"
LOCATION = "LOCATION_ID"
ENDPOINT_NAME = "ENDPOINT_NAME"

aiplatform.init(
    project=PROJECT_ID,
    location=LOCATION,
)

endpoint = aiplatform.Endpoint.list( filter='display_name=ENDPOINT_NAME', )
endpoint_id = endpoint.name.split("/")[-1]

모델 배포

아래에서 언어 또는 환경에 대한 탭을 선택하세요.

gcloud

다음 예시에서는 gcloud ai endpoints deploy-model 명령어를 사용합니다.

다음 예시는 GPU를 사용하지 않고 Model을 Endpoint에 배포하여 여러 DeployedModel 리소스 간에 트래픽을 분할하지 않고 예측 서빙 속도를 높입니다.

아래의 명령어 데이터를 사용하기 전에 다음을 바꿉니다.

ENDPOINT_ID: 엔드포인트의 ID
LOCATION_ID: Vertex AI를 사용하는 리전
MODEL_ID: 배포할 모델의 ID
DEPLOYED_MODEL_NAME: DeployedModel의 이름. DeployedModel의 Model 표시 이름도 사용할 수 있습니다.
MIN_REPLICA_COUNT: 이 배포의 최소 노드 수. 예측 로드 시 필요에 따라 최대 노드 수까지 노드 수를 늘리거나 줄일 수 있으며 이 노드 수 미만으로는 줄일 수 없습니다.
MAX_REPLICA_COUNT: 이 배포의 최대 노드 수. 예측 로드 시 필요에 따라 이 노드 수까지 노드 수를 늘리거나 줄일 수 있으며 최소 노드 수 미만으로는 줄일 수 없습니다. --max-replica-count 플래그를 생략하면 최대 노드 수가 --min-replica-count 값으로 설정됩니다.

gcloud ai endpoints deploy-model 명령어를 실행합니다.

Linux, macOS 또는 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=100

Windows(PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME `
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=100

Windows(cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME ^
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=100

트래픽 분할

앞의 예시에서 --traffic-split=0=100 플래그는 Endpoint가 수신하는 예측 트래픽의 100%를 새 DeployedModel로 전송하며 임시 ID는 0으로 표현됩니다. Endpoint에 이미 다른 DeployedModel 리소스가 있으면 새 DeployedModel 및 이전 모델 간에 트래픽을 분할할 수 있습니다. 예를 들어 트래픽의 20%를 새 DeployedModel로, 80%를 이전 모델로 전송하려면 다음 명령어를 실행합니다.

아래의 명령어 데이터를 사용하기 전에 다음을 바꿉니다.

OLD_DEPLOYED_MODEL_ID: 기존 DeployedModel의 ID

gcloud ai endpoints deploy-model 명령어를 실행합니다.

Linux, macOS 또는 Cloud Shell

gcloud ai endpoints deploy-model ENDPOINT_ID\
  --region=LOCATION_ID \
  --model=MODEL_ID \
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT \
  --max-replica-count=MAX_REPLICA_COUNT \
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows(PowerShell)

gcloud ai endpoints deploy-model ENDPOINT_ID`
  --region=LOCATION_ID `
  --model=MODEL_ID `
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT `
  --max-replica-count=MAX_REPLICA_COUNT `
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

Windows(cmd.exe)

gcloud ai endpoints deploy-model ENDPOINT_ID^
  --region=LOCATION_ID ^
  --model=MODEL_ID ^
  --display-name=DEPLOYED_MODEL_NAME \ 
  --min-replica-count=MIN_REPLICA_COUNT ^
  --max-replica-count=MAX_REPLICA_COUNT ^
  --traffic-split=0=20,OLD_DEPLOYED_MODEL_ID=80

REST

모델 배포

요청 데이터를 사용하기 전에 다음을 바꿉니다.

LOCATION_ID: Vertex AI를 사용하는 리전
PROJECT_ID: 프로젝트 ID
ENDPOINT_ID: 엔드포인트의 ID
MODEL_ID: 배포할 모델의 ID
DEPLOYED_MODEL_NAME: DeployedModel의 이름. DeployedModel의 Model 표시 이름도 사용할 수 있습니다.
MACHINE_TYPE: (선택사항) 이 배포의 각 노드에 사용되는 머신 리소스. 기본 설정은 n1-standard-2입니다. 머신 유형에 대해 자세히 알아보세요.
ACCELERATOR_TYPE: 머신에 연결할 가속기 유형. ACCELERATOR_COUNT가 지정되지 않았거나 0인 경우 선택사항입니다. GPU가 아닌 이미지를 사용하는 AutoML 모델 또는 커스텀 학습 모델에 사용하지 않는 것이 좋습니다. 자세히 알아보기
ACCELERATOR_COUNT: 사용할 각 복제본의 가속기 수. 선택사항. GPU가 아닌 이미지를 사용하는 AutoML 모델 또는 커스텀 학습 모델의 경우 0이거나 지정되지 않은 상태여야 합니다.
MIN_REPLICA_COUNT: 이 배포의 최소 노드 수. 예측 로드 시 필요에 따라 최대 노드 수까지 노드 수를 늘리거나 줄일 수 있으며 이 노드 수 미만으로는 줄일 수 없습니다. 값은 1 이상이어야 합니다.
MAX_REPLICA_COUNT: 이 배포의 최대 노드 수. 예측 로드 시 필요에 따라 이 노드 수까지 노드 수를 늘리거나 줄일 수 있으며 최소 노드 수 미만으로는 줄일 수 없습니다.
REQUIRED_REPLICA_COUNT: 선택사항. 이 배포가 성공으로 표시되기 위해 필요한 노드 수입니다. 1 이상이고 최소 노드 수 이하여야 합니다. 지정하지 않으면 기본값은 최소 노드 수입니다.
TRAFFIC_SPLIT_THIS_MODEL: 이 작업과 함께 배포되는 모델로 라우팅될 이 엔드포인트에 대한 예측 트래픽 비율입니다. 기본값은 100입니다. 모든 트래픽 비율의 합은 100이 되어야 합니다. 트래픽 분할에 대해 자세히 알아보기
DEPLOYED_MODEL_ID_N: (선택사항) 다른 모델이 이 엔드포인트에 배포된 경우 모든 비율의 합이 100이 되도록 트래픽 분할 비율을 업데이트해야 합니다.
TRAFFIC_SPLIT_MODEL_N: 배포된 모델 ID 키의 트래픽 분할 비율 값
PROJECT_NUMBER: 프로젝트의 자동으로 생성된 프로젝트 번호

HTTP 메서드 및 URL:

POST https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel

JSON 요청 본문:

{
  "deployedModel": {
    "model": "projects/PROJECT/locations/us-central1/models/MODEL_ID",
    "displayName": "DEPLOYED_MODEL_NAME",
    "dedicatedResources": {
       "machineSpec": {
         "machineType": "MACHINE_TYPE",
         "acceleratorType": "ACCELERATOR_TYPE",
         "acceleratorCount": "ACCELERATOR_COUNT"
       },
       "minReplicaCount": MIN_REPLICA_COUNT,
       "maxReplicaCount": MAX_REPLICA_COUNT,
       "requiredReplicaCount": REQUIRED_REPLICA_COUNT
     },
  },
  "trafficSplit": {
    "0": TRAFFIC_SPLIT_THIS_MODEL,
    "DEPLOYED_MODEL_ID_1": TRAFFIC_SPLIT_MODEL_1,
    "DEPLOYED_MODEL_ID_2": TRAFFIC_SPLIT_MODEL_2
  },
}

요청을 보내려면 다음 옵션 중 하나를 펼칩니다.

cURL(Linux, macOS, Cloud Shell)

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

curl -X POST \
     -H "Authorization: Bearer $(gcloud auth print-access-token)" \
     -H "Content-Type: application/json; charset=utf-8" \
     -d @request.json \
     "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel"

PowerShell(Windows)

요청 본문을 request.json 파일에 저장하고 다음 명령어를 실행합니다.

$cred = gcloud auth print-access-token
$headers = @{ "Authorization" = "Bearer $cred" }

Invoke-WebRequest `
    -Method POST `
    -Headers $headers `
    -ContentType: "application/json; charset=utf-8" `
    -InFile request.json `
    -Uri "https://LOCATION_ID-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION_ID/endpoints/ENDPOINT_ID:deployModel" | Select-Object -Expand Content

다음과 비슷한 JSON 응답이 표시됩니다.

{
  "name": "projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID/operations/OPERATION_ID",
  "metadata": {
    "@type": "type.googleapis.com/google.cloud.aiplatform.v1.DeployModelOperationMetadata",
    "genericMetadata": {
      "createTime": "2020-10-19T17:53:16.502088Z",
      "updateTime": "2020-10-19T17:53:16.502088Z"
    }
  }
}

Java

이 샘플을 사용해 보기 전에 Vertex AI 빠른 시작: 클라이언트 라이브러리 사용의 Java 설정 안내를 따르세요. 자세한 내용은 Vertex AI Java API 참고 문서를 참조하세요.

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

import com.google.api.gax.longrunning.OperationFuture;
import com.google.cloud.aiplatform.v1.DedicatedResources;
import com.google.cloud.aiplatform.v1.DeployModelOperationMetadata;
import com.google.cloud.aiplatform.v1.DeployModelResponse;
import com.google.cloud.aiplatform.v1.DeployedModel;
import com.google.cloud.aiplatform.v1.EndpointName;
import com.google.cloud.aiplatform.v1.EndpointServiceClient;
import com.google.cloud.aiplatform.v1.EndpointServiceSettings;
import com.google.cloud.aiplatform.v1.MachineSpec;
import com.google.cloud.aiplatform.v1.ModelName;
import java.io.IOException;
import java.util.HashMap;
import java.util.Map;
import java.util.concurrent.ExecutionException;

public class DeployModelCustomTrainedModelSample {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException {
    // TODO(developer): Replace these variables before running the sample.
    String project = "PROJECT";
    String endpointId = "ENDPOINT_ID";
    String modelName = "MODEL_NAME";
    String deployedModelDisplayName = "DEPLOYED_MODEL_DISPLAY_NAME";
    deployModelCustomTrainedModelSample(project, endpointId, modelName, deployedModelDisplayName);
  }

  static void deployModelCustomTrainedModelSample(
      String project, String endpointId, String model, String deployedModelDisplayName)
      throws IOException, ExecutionException, InterruptedException {
    EndpointServiceSettings settings =
        EndpointServiceSettings.newBuilder()
            .setEndpoint("us-central1-aiplatform.googleapis.com:443")
            .build();
    String location = "us-central1";

    // Initialize client that will be used to send requests. This client only needs to be created
    // once, and can be reused for multiple requests. After completing all of your requests, call
    // the "close" method on the client to safely clean up any remaining background resources.
    try (EndpointServiceClient client = EndpointServiceClient.create(settings)) {
      MachineSpec machineSpec = MachineSpec.newBuilder().setMachineType("n1-standard-2").build();
      DedicatedResources dedicatedResources =
          DedicatedResources.newBuilder().setMinReplicaCount(1).setMachineSpec(machineSpec).build();

      String modelName = ModelName.of(project, location, model).toString();
      DeployedModel deployedModel =
          DeployedModel.newBuilder()
              .setModel(modelName)
              .setDisplayName(deployedModelDisplayName)
              // `dedicated_resources` must be used for non-AutoML models
              .setDedicatedResources(dedicatedResources)
              .build();
      // key '0' assigns traffic for the newly deployed model
      // Traffic percentage values must add up to 100
      // Leave dictionary empty if endpoint should not accept any traffic
      Map<String, Integer> trafficSplit = new HashMap<>();
      trafficSplit.put("0", 100);
      EndpointName endpoint = EndpointName.of(project, location, endpointId);
      OperationFuture<DeployModelResponse, DeployModelOperationMetadata> response =
          client.deployModelAsync(endpoint, deployedModel, trafficSplit);

      // You can use OperationFuture.getInitialFuture to get a future representing the initial
      // response to the request, which contains information while the operation is in progress.
      System.out.format("Operation name: %s\n", response.getInitialFuture().get().getName());

      // OperationFuture.get() will block until the operation is finished.
      DeployModelResponse deployModelResponse = response.get();
      System.out.format("deployModelResponse: %s\n", deployModelResponse);
    }
  }
}

Python용 Vertex AI SDK

def deploy_model_with_dedicated_resources_sample(
    project,
    location,
    model_name: str,
    machine_type: str,
    endpoint: Optional[aiplatform.Endpoint] = None,
    deployed_model_display_name: Optional[str] = None,
    traffic_percentage: Optional[int] = 0,
    traffic_split: Optional[Dict[str, int]] = None,
    min_replica_count: int = 1,
    max_replica_count: int = 1,
    accelerator_type: Optional[str] = None,
    accelerator_count: Optional[int] = None,
    explanation_metadata: Optional[explain.ExplanationMetadata] = None,
    explanation_parameters: Optional[explain.ExplanationParameters] = None,
    metadata: Optional[Sequence[Tuple[str, str]]] = (),
    sync: bool = True,
):
    """
    model_name: A fully-qualified model resource name or model ID.
          Example: "projects/123/locations/us-central1/models/456" or
          "456" when project and location are initialized or passed.
    """

    aiplatform.init(project=project, location=location)

    model = aiplatform.Model(model_name=model_name)

    # The explanation_metadata and explanation_parameters should only be
    # provided for a custom trained model and not an AutoML model.
    model.deploy(
        endpoint=endpoint,
        deployed_model_display_name=deployed_model_display_name,
        traffic_percentage=traffic_percentage,
        traffic_split=traffic_split,
        machine_type=machine_type,
        min_replica_count=min_replica_count,
        max_replica_count=max_replica_count,
        accelerator_type=accelerator_type,
        accelerator_count=accelerator_count,
        explanation_metadata=explanation_metadata,
        explanation_parameters=explanation_parameters,
        metadata=metadata,
        sync=sync,
    )

    model.wait()

    print(model.display_name)
    print(model.resource_name)
    return model

Node.js

이 샘플을 사용해 보기 전에 Vertex AI 빠른 시작: 클라이언트 라이브러리 사용의 Node.js 설정 안내를 따르세요. 자세한 내용은 Vertex AI Node.js API 참고 문서를 참조하세요.

Vertex AI에 인증하려면 애플리케이션 기본 사용자 인증 정보를 설정합니다. 자세한 내용은 로컬 개발 환경의 인증 설정을 참조하세요.

const automl = require('@google-cloud/automl');
const client = new automl.v1beta1.AutoMlClient();

/**
 * Demonstrates using the AutoML client to create a model.
 * TODO(developer): Uncomment the following lines before running the sample.
 */
// const projectId = '[PROJECT_ID]' e.g., "my-gcloud-project";
// const computeRegion = '[REGION_NAME]' e.g., "us-central1";
// const datasetId = '[DATASET_ID]' e.g., "TBL2246891593778855936";
// const tableId = '[TABLE_ID]' e.g., "1991013247762825216";
// const columnId = '[COLUMN_ID]' e.g., "773141392279994368";
// const modelName = '[MODEL_NAME]' e.g., "testModel";
// const trainBudget = '[TRAIN_BUDGET]' e.g., "1000",
// `Train budget in milli node hours`;

// A resource that represents Google Cloud Platform location.
const projectLocation = client.locationPath(projectId, computeRegion);

// Get the full path of the column.
const columnSpecId = client.columnSpecPath(
  projectId,
  computeRegion,
  datasetId,
  tableId,
  columnId
);

// Set target column to train the model.
const targetColumnSpec = {name: columnSpecId};

// Set tables model metadata.
const tablesModelMetadata = {
  targetColumnSpec: targetColumnSpec,
  trainBudgetMilliNodeHours: trainBudget,
};

// Set datasetId, model name and model metadata for the dataset.
const myModel = {
  datasetId: datasetId,
  displayName: modelName,
  tablesModelMetadata: tablesModelMetadata,
};

// Create a model with the model metadata in the region.
client
  .createModel({parent: projectLocation, model: myModel})
  .then(responses => {
    const initialApiResponse = responses[1];
    console.log(`Training operation name: ${initialApiResponse.name}`);
    console.log('Training started...');
  })
  .catch(err => {
    console.error(err);
  });

예측 로깅의 기본 설정을 변경하는 방법을 알아보세요.

작업 상태 가져오기

일부 요청은 완료하는 데 시간이 걸리는 장기 실행 작업을 시작합니다. 이러한 요청은 작업 상태를 보거나 작업을 취소하는 데 사용할 수 있는 작업 이름을 반환합니다. Vertex AI는 장기 실행 작업을 호출하는 도우미 메서드를 제공합니다. 자세한 내용은 장기 실행 작업 다루기를 참조하세요.

다음 단계

온라인 예측 가져오기 방법 알아보기
비공개 엔드포인트 알아보기

gcloud CLI 또는 Vertex AI API를 사용하여 모델 배포 컬렉션을 사용해 정리하기 내 환경설정을 기준으로 콘텐츠를 저장하고 분류하세요.

엔드포인트 ID 가져오기

gcloud

REST

cURL(Linux, macOS, Cloud Shell)

PowerShell(Windows)

Python용 Vertex AI SDK

모델 배포

gcloud

Linux, macOS 또는 Cloud Shell

Windows(PowerShell)

Windows(cmd.exe)

트래픽 분할

Linux, macOS 또는 Cloud Shell

Windows(PowerShell)

Windows(cmd.exe)

REST

cURL(Linux, macOS, Cloud Shell)

PowerShell(Windows)

Java

Python용 Vertex AI SDK

Node.js

작업 상태 가져오기

다음 단계

gcloud CLI 또는 Vertex AI API를 사용하여 모델 배포