建立及使用先占 VM


本頁面說明如何建立及使用先占虛擬機器 (VM) 執行個體。與標準 VM 價格相比,先占 VM 提供 60% 至 91% 的折扣優惠。不過,如果 Compute Engine 需要回收這些資源來執行其他工作,可能會停止 (先占) 這些 VM。先占 VM 一律會在 24 小時後停止。建議僅針對能承受 VM 搶佔影響的容錯應用程式使用先占 VM。決定建立可先占 VM 之前,請務必確保您的應用程式可以處理先占。如要瞭解使用先占 VM 的風險與好處,請參閱先占 VM 執行個體說明文件。

事前準備

建立先占 VM

使用 gcloud CLI 或 Compute Engine API 建立可搶佔的 VM。如要使用 Google Cloud 控制台,請改為建立 Spot VM

gcloud

透過 gcloud compute,使用與您用來建立一般 VM 相同的 instances create 指令,但要加上 --preemptible 旗標。

gcloud compute instances create [VM_NAME] --preemptible

其中 [VM_NAME] 是 VM 的名稱

Go

import (
	"context"
	"fmt"
	"io"

	compute "cloud.google.com/go/compute/apiv1"
	computepb "cloud.google.com/go/compute/apiv1/computepb"
	"google.golang.org/protobuf/proto"
)

// createPreemtibleInstance creates a new preemptible VM instance
// with Debian 10 operating system.
func createPreemtibleInstance(
	w io.Writer, projectID, zone, instanceName string,
) error {
	// projectID := "your_project_id"
	// zone := "europe-central2-b"
	// instanceName := "your_instance_name"
	// preemptible := true

	ctx := context.Background()
	instancesClient, err := compute.NewInstancesRESTClient(ctx)
	if err != nil {
		return fmt.Errorf("NewInstancesRESTClient: %w", err)
	}
	defer instancesClient.Close()

	imagesClient, err := compute.NewImagesRESTClient(ctx)
	if err != nil {
		return fmt.Errorf("NewImagesRESTClient: %w", err)
	}
	defer imagesClient.Close()

	// List of public operating system (OS) images:
	// https://cloud.google.com/compute/docs/images/os-details.
	newestDebianReq := &computepb.GetFromFamilyImageRequest{
		Project: "debian-cloud",
		Family:  "debian-11",
	}
	newestDebian, err := imagesClient.GetFromFamily(ctx, newestDebianReq)
	if err != nil {
		return fmt.Errorf("unable to get image from family: %w", err)
	}

	inst := &computepb.Instance{
		Name: proto.String(instanceName),
		Disks: []*computepb.AttachedDisk{
			{
				InitializeParams: &computepb.AttachedDiskInitializeParams{
					DiskSizeGb:  proto.Int64(10),
					SourceImage: newestDebian.SelfLink,
					DiskType:    proto.String(fmt.Sprintf("zones/%s/diskTypes/pd-standard", zone)),
				},
				AutoDelete: proto.Bool(true),
				Boot:       proto.Bool(true),
			},
		},
		Scheduling: &computepb.Scheduling{
			// Set the preemptible setting
			Preemptible: proto.Bool(true),
		},
		MachineType: proto.String(fmt.Sprintf("zones/%s/machineTypes/n1-standard-1", zone)),
		NetworkInterfaces: []*computepb.NetworkInterface{
			{
				Name: proto.String("global/networks/default"),
			},
		},
	}

	req := &computepb.InsertInstanceRequest{
		Project:          projectID,
		Zone:             zone,
		InstanceResource: inst,
	}

	op, err := instancesClient.Insert(ctx, req)
	if err != nil {
		return fmt.Errorf("unable to create instance: %w", err)
	}

	if err = op.Wait(ctx); err != nil {
		return fmt.Errorf("unable to wait for the operation: %w", err)
	}

	fmt.Fprintf(w, "Instance created\n")

	return nil
}

Java


import com.google.cloud.compute.v1.AttachedDisk;
import com.google.cloud.compute.v1.AttachedDiskInitializeParams;
import com.google.cloud.compute.v1.InsertInstanceRequest;
import com.google.cloud.compute.v1.Instance;
import com.google.cloud.compute.v1.InstancesClient;
import com.google.cloud.compute.v1.NetworkInterface;
import com.google.cloud.compute.v1.Operation;
import com.google.cloud.compute.v1.Scheduling;
import java.io.IOException;
import java.util.concurrent.ExecutionException;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.TimeoutException;

public class CreatePreemptibleInstance {

  public static void main(String[] args)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {
    // TODO(developer): Replace these variables before running the sample.
    // projectId: project ID or project number of the Cloud project you want to use.
    // zone: name of the zone you want to use. For example: “us-west3-b”
    // instanceName: name of the new virtual machine.
    String projectId = "your-project-id-or-number";
    String zone = "zone-name";
    String instanceName = "instance-name";

    createPremptibleInstance(projectId, zone, instanceName);
  }

  // Send an instance creation request with preemptible settings to the Compute Engine API
  // and wait for it to complete.
  public static void createPremptibleInstance(String projectId, String zone, String instanceName)
      throws IOException, ExecutionException, InterruptedException, TimeoutException {

    String machineType = String.format("zones/%s/machineTypes/e2-small", zone);
    String sourceImage = "projects/debian-cloud/global/images/family/debian-11";
    long diskSizeGb = 10L;
    String networkName = "default";

    try (InstancesClient instancesClient = InstancesClient.create()) {

      AttachedDisk disk =
          AttachedDisk.newBuilder()
              .setBoot(true)
              .setAutoDelete(true)
              .setType(AttachedDisk.Type.PERSISTENT.toString())
              .setInitializeParams(
                  // Describe the size and source image of the boot disk to attach to the instance.
                  AttachedDiskInitializeParams.newBuilder()
                      .setSourceImage(sourceImage)
                      .setDiskSizeGb(diskSizeGb)
                      .build())
              .build();

      // Use the default VPC network.
      NetworkInterface networkInterface = NetworkInterface.newBuilder()
          .setName(networkName)
          .build();

      // Collect information into the Instance object.
      Instance instanceResource =
          Instance.newBuilder()
              .setName(instanceName)
              .setMachineType(machineType)
              .addDisks(disk)
              .addNetworkInterfaces(networkInterface)
              // Set the preemptible setting.
              .setScheduling(Scheduling.newBuilder()
                  .setPreemptible(true)
                  .build())
              .build();

      System.out.printf("Creating instance: %s at %s %n", instanceName, zone);

      // Prepare the request to insert an instance.
      InsertInstanceRequest insertInstanceRequest = InsertInstanceRequest.newBuilder()
          .setProject(projectId)
          .setZone(zone)
          .setInstanceResource(instanceResource)
          .build();

      // Wait for the create operation to complete.
      Operation response = instancesClient.insertAsync(insertInstanceRequest)
          .get(3, TimeUnit.MINUTES);
      ;

      if (response.hasError()) {
        System.out.println("Instance creation failed ! ! " + response);
        return;
      }

      System.out.printf("Instance created : %s\n", instanceName);
      System.out.println("Operation Status: " + response.getStatus());
    }
  }
}

Node.js

/**
 * TODO(developer): Uncomment and replace these variables before running the sample.
 */
// const projectId = 'YOUR_PROJECT_ID';
// const zone = 'europe-central2-b';
// const instanceName = 'YOUR_INSTANCE_NAME';

const compute = require('@google-cloud/compute');

async function createPreemptible() {
  const instancesClient = new compute.InstancesClient();

  const [response] = await instancesClient.insert({
    instanceResource: {
      name: instanceName,
      disks: [
        {
          initializeParams: {
            diskSizeGb: '64',
            sourceImage:
              'projects/debian-cloud/global/images/family/debian-11/',
          },
          autoDelete: true,
          boot: true,
        },
      ],
      scheduling: {
        // Set the preemptible setting
        preemptible: true,
      },
      machineType: `zones/${zone}/machineTypes/e2-small`,
      networkInterfaces: [
        {
          name: 'global/networks/default',
        },
      ],
    },
    project: projectId,
    zone,
  });
  let operation = response.latestResponse;
  const operationsClient = new compute.ZoneOperationsClient();

  // Wait for the create operation to complete.
  while (operation.status !== 'DONE') {
    [operation] = await operationsClient.wait({
      operation: operation.name,
      project: projectId,
      zone: operation.zone.split('/').pop(),
    });
  }

  console.log('Instance created.');
}

createPreemptible();

Python

from __future__ import annotations

import re
import sys
from typing import Any
import warnings

from google.api_core.extended_operation import ExtendedOperation
from google.cloud import compute_v1


def get_image_from_family(project: str, family: str) -> compute_v1.Image:
    """
    Retrieve the newest image that is part of a given family in a project.

    Args:
        project: project ID or project number of the Cloud project you want to get image from.
        family: name of the image family you want to get image from.

    Returns:
        An Image object.
    """
    image_client = compute_v1.ImagesClient()
    # List of public operating system (OS) images: https://cloud.google.com/compute/docs/images/os-details
    newest_image = image_client.get_from_family(project=project, family=family)
    return newest_image


def disk_from_image(
    disk_type: str,
    disk_size_gb: int,
    boot: bool,
    source_image: str,
    auto_delete: bool = True,
) -> compute_v1.AttachedDisk:
    """
    Create an AttachedDisk object to be used in VM instance creation. Uses an image as the
    source for the new disk.

    Args:
         disk_type: the type of disk you want to create. This value uses the following format:
            "zones/{zone}/diskTypes/(pd-standard|pd-ssd|pd-balanced|pd-extreme)".
            For example: "zones/us-west3-b/diskTypes/pd-ssd"
        disk_size_gb: size of the new disk in gigabytes
        boot: boolean flag indicating whether this disk should be used as a boot disk of an instance
        source_image: source image to use when creating this disk. You must have read access to this disk. This can be one
            of the publicly available images or an image from one of your projects.
            This value uses the following format: "projects/{project_name}/global/images/{image_name}"
        auto_delete: boolean flag indicating whether this disk should be deleted with the VM that uses it

    Returns:
        AttachedDisk object configured to be created using the specified image.
    """
    boot_disk = compute_v1.AttachedDisk()
    initialize_params = compute_v1.AttachedDiskInitializeParams()
    initialize_params.source_image = source_image
    initialize_params.disk_size_gb = disk_size_gb
    initialize_params.disk_type = disk_type
    boot_disk.initialize_params = initialize_params
    # Remember to set auto_delete to True if you want the disk to be deleted when you delete
    # your VM instance.
    boot_disk.auto_delete = auto_delete
    boot_disk.boot = boot
    return boot_disk


def wait_for_extended_operation(
    operation: ExtendedOperation, verbose_name: str = "operation", timeout: int = 300
) -> Any:
    """
    Waits for the extended (long-running) operation to complete.

    If the operation is successful, it will return its result.
    If the operation ends with an error, an exception will be raised.
    If there were any warnings during the execution of the operation
    they will be printed to sys.stderr.

    Args:
        operation: a long-running operation you want to wait on.
        verbose_name: (optional) a more verbose name of the operation,
            used only during error and warning reporting.
        timeout: how long (in seconds) to wait for operation to finish.
            If None, wait indefinitely.

    Returns:
        Whatever the operation.result() returns.

    Raises:
        This method will raise the exception received from `operation.exception()`
        or RuntimeError if there is no exception set, but there is an `error_code`
        set for the `operation`.

        In case of an operation taking longer than `timeout` seconds to complete,
        a `concurrent.futures.TimeoutError` will be raised.
    """
    result = operation.result(timeout=timeout)

    if operation.error_code:
        print(
            f"Error during {verbose_name}: [Code: {operation.error_code}]: {operation.error_message}",
            file=sys.stderr,
            flush=True,
        )
        print(f"Operation ID: {operation.name}", file=sys.stderr, flush=True)
        raise operation.exception() or RuntimeError(operation.error_message)

    if operation.warnings:
        print(f"Warnings during {verbose_name}:\n", file=sys.stderr, flush=True)
        for warning in operation.warnings:
            print(f" - {warning.code}: {warning.message}", file=sys.stderr, flush=True)

    return result


def create_instance(
    project_id: str,
    zone: str,
    instance_name: str,
    disks: list[compute_v1.AttachedDisk],
    machine_type: str = "n1-standard-1",
    network_link: str = "global/networks/default",
    subnetwork_link: str = None,
    internal_ip: str = None,
    external_access: bool = False,
    external_ipv4: str = None,
    accelerators: list[compute_v1.AcceleratorConfig] = None,
    preemptible: bool = False,
    spot: bool = False,
    instance_termination_action: str = "STOP",
    custom_hostname: str = None,
    delete_protection: bool = False,
) -> compute_v1.Instance:
    """
    Send an instance creation request to the Compute Engine API and wait for it to complete.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        zone: name of the zone to create the instance in. For example: "us-west3-b"
        instance_name: name of the new virtual machine (VM) instance.
        disks: a list of compute_v1.AttachedDisk objects describing the disks
            you want to attach to your new instance.
        machine_type: machine type of the VM being created. This value uses the
            following format: "zones/{zone}/machineTypes/{type_name}".
            For example: "zones/europe-west3-c/machineTypes/f1-micro"
        network_link: name of the network you want the new instance to use.
            For example: "global/networks/default" represents the network
            named "default", which is created automatically for each project.
        subnetwork_link: name of the subnetwork you want the new instance to use.
            This value uses the following format:
            "regions/{region}/subnetworks/{subnetwork_name}"
        internal_ip: internal IP address you want to assign to the new instance.
            By default, a free address from the pool of available internal IP addresses of
            used subnet will be used.
        external_access: boolean flag indicating if the instance should have an external IPv4
            address assigned.
        external_ipv4: external IPv4 address to be assigned to this instance. If you specify
            an external IP address, it must live in the same region as the zone of the instance.
            This setting requires `external_access` to be set to True to work.
        accelerators: a list of AcceleratorConfig objects describing the accelerators that will
            be attached to the new instance.
        preemptible: boolean value indicating if the new instance should be preemptible
            or not. Preemptible VMs have been deprecated and you should now use Spot VMs.
        spot: boolean value indicating if the new instance should be a Spot VM or not.
        instance_termination_action: What action should be taken once a Spot VM is terminated.
            Possible values: "STOP", "DELETE"
        custom_hostname: Custom hostname of the new VM instance.
            Custom hostnames must conform to RFC 1035 requirements for valid hostnames.
        delete_protection: boolean value indicating if the new virtual machine should be
            protected against deletion or not.
    Returns:
        Instance object.
    """
    instance_client = compute_v1.InstancesClient()

    # Use the network interface provided in the network_link argument.
    network_interface = compute_v1.NetworkInterface()
    network_interface.network = network_link
    if subnetwork_link:
        network_interface.subnetwork = subnetwork_link

    if internal_ip:
        network_interface.network_i_p = internal_ip

    if external_access:
        access = compute_v1.AccessConfig()
        access.type_ = compute_v1.AccessConfig.Type.ONE_TO_ONE_NAT.name
        access.name = "External NAT"
        access.network_tier = access.NetworkTier.PREMIUM.name
        if external_ipv4:
            access.nat_i_p = external_ipv4
        network_interface.access_configs = [access]

    # Collect information into the Instance object.
    instance = compute_v1.Instance()
    instance.network_interfaces = [network_interface]
    instance.name = instance_name
    instance.disks = disks
    if re.match(r"^zones/[a-z\d\-]+/machineTypes/[a-z\d\-]+$", machine_type):
        instance.machine_type = machine_type
    else:
        instance.machine_type = f"zones/{zone}/machineTypes/{machine_type}"

    instance.scheduling = compute_v1.Scheduling()
    if accelerators:
        instance.guest_accelerators = accelerators
        instance.scheduling.on_host_maintenance = (
            compute_v1.Scheduling.OnHostMaintenance.TERMINATE.name
        )

    if preemptible:
        # Set the preemptible setting
        warnings.warn(
            "Preemptible VMs are being replaced by Spot VMs.", DeprecationWarning
        )
        instance.scheduling = compute_v1.Scheduling()
        instance.scheduling.preemptible = True

    if spot:
        # Set the Spot VM setting
        instance.scheduling.provisioning_model = (
            compute_v1.Scheduling.ProvisioningModel.SPOT.name
        )
        instance.scheduling.instance_termination_action = instance_termination_action

    if custom_hostname is not None:
        # Set the custom hostname for the instance
        instance.hostname = custom_hostname

    if delete_protection:
        # Set the delete protection bit
        instance.deletion_protection = True

    # Prepare the request to insert an instance.
    request = compute_v1.InsertInstanceRequest()
    request.zone = zone
    request.project = project_id
    request.instance_resource = instance

    # Wait for the create operation to complete.
    print(f"Creating the {instance_name} instance in {zone}...")

    operation = instance_client.insert(request=request)

    wait_for_extended_operation(operation, "instance creation")

    print(f"Instance {instance_name} created.")
    return instance_client.get(project=project_id, zone=zone, instance=instance_name)


def create_preemptible_instance(
    project_id: str, zone: str, instance_name: str
) -> compute_v1.Instance:
    """
    Create a new preemptible VM instance with Debian 10 operating system.

    Args:
        project_id: project ID or project number of the Cloud project you want to use.
        zone: name of the zone to create the instance in. For example: "us-west3-b"
        instance_name: name of the new virtual machine (VM) instance.

    Returns:
        Instance object.
    """
    newest_debian = get_image_from_family(project="debian-cloud", family="debian-11")
    disk_type = f"zones/{zone}/diskTypes/pd-standard"
    disks = [disk_from_image(disk_type, 10, True, newest_debian.self_link)]
    instance = create_instance(project_id, zone, instance_name, disks, preemptible=True)
    return instance

REST

請在 API 中建構一個建立 VM的一般要求,但要在 scheduling 下方加上 preemptible 屬性,並把該屬性設為 true。例如:

POST https://compute.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances

{
  'machineType': 'zones/[ZONE]/machineTypes/[MACHINE_TYPE]',
  'name': '[INSTANCE_NAME]',
  'scheduling':
  {
    'preemptible': true
  },
  ...
}

先占 CPU 配額

先占 VM 像標準 VM 一樣,需有可用的 CPU 配額。為避免先占 VM 耗用標準 VM 的 CPU 配額,您可以要求特別的「先占 CPU」配額。當 Compute Engine 授予您該地區的先占 CPU 配額後,所有先占 VM 都會計入該配額,而所有標準 VM 則會繼續計入標準 CPU 配額。

在未提供先占 CPU 配額的地區,您可以使用標準 CPU 配額啟用先占 VM。跟平常一樣,您也需有足夠的 IP 與磁碟配額。除非 Compute Engine 已授予配額,否則先占 CPU 配額不會出現在 gcloud CLI 或 Google Cloud 控制台配額頁面中。

如要進一步瞭解配額的相關資訊,請造訪資源配額頁面

啟動先佔 VM

與其他 VM 一樣,如果先占 VM 已停止或先占,您可以再次啟動 VM,並將其恢復為 RUNNING 狀態。啟動先占 VM 會重設 24 小時計數器,但由於仍是先占 VM,Compute Engine 可以在 24 小時前先佔。在先占型 VM 執行期間,無法將其轉換為標準 VM。

如果 Compute Engine 在自動調度資源的代管執行個體群組 (MIG) 或 Google Kubernetes Engine (GKE) 叢集中,停止先占 VM,則群組會在資源再次可用時重新啟動 VM。

使用關閉指令碼處理先占

當 Compute Engine 先占 VM 時,您可以使用關機指令碼,在 VM 先占前嘗試執行清理動作。例如,您可以完善地停止運作中的程序,並將查核點檔案複製到 Cloud Storage。值得注意的是,預先中斷通知的關機期間最長時間,比使用者啟動的關機時間短。如要進一步瞭解優先處理通知的關閉期間,請參閱概念說明文件中的「優先處理程序」。

以下是您可以新增至執行中先占 VM 或在建立新先占 VM 時新增的關閉指令碼。這個指令碼的執行時機是執行個體開始關閉時,且作業系統的一般 kill 指令終止所有剩餘程序之前。在完善地停止所需程式之後,指令碼會將查核點檔案平行上傳至 Cloud Storage 值區。

#!/bin/bash

MY_PROGRAM="[PROGRAM_NAME]" # For example, "apache2" or "nginx"
MY_USER="[LOCAL_USERNAME]"
CHECKPOINT="/home/$MY_USER/checkpoint.out"
BUCKET_NAME="[BUCKET_NAME]" # For example, "my-checkpoint-files" (without gs://)

echo "Shutting down!  Seeing if ${MY_PROGRAM} is running."

# Find the newest copy of $MY_PROGRAM
PID="$(pgrep -n "$MY_PROGRAM")"

if [[ "$?" -ne 0 ]]; then
  echo "${MY_PROGRAM} not running, shutting down immediately."
  exit 0
fi

echo "Sending SIGINT to $PID"
kill -2 "$PID"

# Portable waitpid equivalent
while kill -0 "$PID"; do
   sleep 1
done

echo "$PID is done, copying ${CHECKPOINT} to gs://${BUCKET_NAME} as ${MY_USER}"

su "${MY_USER}" -c "gcloud storage cp $CHECKPOINT gs://${BUCKET_NAME}/"

echo "Done uploading, shutting down."

如要將這個指令碼新增至 VM,請設定指令碼,使其能與 VM 上的應用程式搭配使用,並將它新增至 VM 的中繼資料。

  1. 關閉指令碼複製或下載到您的本機工作站。
  2. 開啟檔案以編輯並變更下列變數:
    • [PROGRAM_NAME] 是您要關閉之程序或程式的名稱。例如 apache2nginx
    • [LOCAL_USER] 是您以其身分登入虛擬機器的使用者名稱。
    • [BUCKET_NAME] 是您想儲存程式查核點檔案之 Cloud Storage 值區的名稱。請注意,本例中的值區名稱開頭不是 gs://
  3. 儲存變更。
  4. 將關閉指令碼新增至新的 VM現有的 VM

這個指令碼假設:

  • 已建立至少具備 Cloud Storage 讀取/寫入權限的 VM。如需建立具有適當範圍的 VM 操作說明,請參閱驗證說明文件

  • 您有現有的 Cloud Storage 值區並有權限對其進行寫入。

找出先占 VM

如要檢查 VM 是否為先占 VM,請按照這篇文章的步驟操作,找出 VM 的佈建模型和終止動作。

判斷是否先佔了 VM

請使用 Google Cloud 控制台gcloud CLIAPI 來判斷 VM 是否遭到先占。

主控台

您可以查看系統活動記錄,檢查 VM 是否遭到先占。

  1. 前往 Google Cloud 控制台的「Logs」頁面。

    前往「Logs」

  2. 選取您的專案並點選 [繼續]

  3. compute.instances.preempted 新增至 [filter by label or text search] (按標籤或搜尋字詞篩選) 欄位。

  4. 或者,如果您想查看特定 VM 的先占作業,也可以輸入 VM 名稱。

  5. 按下 Enter 鍵,套用指定篩選器。Google Cloud 控制台會將記錄清單更新為僅顯示 VM 遭到先占的作業。

  6. 選取清單中的作業,即可查看遭到先占的 VM 詳細資料。

gcloud


請使用 gcloud compute operations list 指令搭配 filter 參數,取得專案的先占事件清單。

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted"

您可以使用 filter 參數進一步指定結果範圍。舉例來說,如果只想查看代管執行個體群組中的 VM 先占事件,即可按照以下方設定參數:

gcloud compute operations list \
    --filter="operationType=compute.instances.preempted AND targetLink:instances/[BASE_VM_NAME]"

gcloud 會傳回類似以下內容的回應:

NAME                  TYPE                         TARGET                                   HTTP_STATUS STATUS TIMESTAMP
systemevent-xxxxxxxx  compute.instances.preempted  us-central1-f/instances/example-vm-xxx  200         DONE   2015-04-02T12:12:10.881-07:00

compute.instances.preempted 作業類型代表 VM 遭到先占。您可以使用 operations describe 指令取得特定先占作業的相關詳細資訊。

gcloud compute operations describe \
    systemevent-xxxxxxxx

gcloud 會傳回類似以下內容的回應:

...
operationType: compute.instances.preempted
progress: 100
selfLink: https://compute.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/us-central1-f/operations/systemevent-xxxxxxxx
startTime: '2015-04-02T12:12:10.881-07:00'
status: DONE
statusMessage: Instance was preempted.
...

REST


如要取得最近的系統作業清單,請傳送 GET 要求至區域作業 URI。

GET https://compute.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/operations

回應會包含近期作業的清單。

{
  "kind": "compute#operation",
  "id": "15041793718812375371",
  "name": "systemevent-xxxxxxxx",
  "zone": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/us-central1-f",
  "operationType": "compute.instances.preempted",
  "targetLink": "https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/us-central1-f/instances/example-vm",
  "targetId": "12820389800990687210",
  "status": "DONE",
  "statusMessage": "Instance was preempted.",
  ...
}

如要將回應範圍限制為僅顯示先占作業,您可以在 API 要求中新增篩選條件:operationType="compute.instances.preempted"。如要查看特定 VM 的先占作業,請將 targetLink 參數新增到篩選器:operationType="compute.instances.preempted" AND targetLink="https://www.googleapis.com/compute/v1/projects/[PROJECT_ID]/zones/[ZONE]/instances/[VM_NAME]"

或者,您也可以從 VM 內部判斷 VM 是否遭到先占。如果您想在關閉指令碼中,以與一般關閉不同的方式處理因 Compute Engine 先占導致的關閉情形,此方法很實用。只要在中繼資料伺服器中,查看 VM 的預設執行個體中繼資料中是否有 preempted 值即可判斷。

例如,從 VM 內使用 curl 來取得 preempted 的值:

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted" -H "Metadata-Flavor: Google"
TRUE

如果這個值為 TRUE,則 VM 遭到 Compute Engine 先占,否則此值會是 FALSE

如果您想在關閉指令碼以外的地方使用它,請將 ?wait_for_change=true 附加至網址。這會執行等待 HTTP GET 要求,而該要求只會在中繼資料變更,且執行個體遭到先占時傳回。

curl "http://metadata.google.internal/computeMetadata/v1/instance/preempted?wait_for_change=true" -H "Metadata-Flavor: Google"
TRUE

測試先占設定

您可以在 VM 上執行模擬維護作業,以強制先占這些 VM。使用這項功能可測試應用程式處理先占 VM 的方式。請參閱測試可用性政策一節,瞭解如何在 VM 上測試維護作業。

您也可以停止 VM,藉以模擬 VM 先占行為,這個模擬操作可用來取代模擬維護作業,並避免超過配額限制。

最佳做法

以下是可協助您善用先佔 VM 執行個體的幾個最佳做法。

使用大量執行個體 API

您可以使用大量執行個體 API,而非建立單一 VM。

挑選較小的機器類型

先占 VM 的資源來自多餘和備份 Google Cloud容量。較小的機器類型通常較容易取得容量,也就是說,較小的機器類型指的是具備較少 vCPU 和記憶體等資源的機器類型。您可以選取較小的自訂機器類型,為先占 VM 提供更多容量,但較小的預先定義機器類型更有可能提供容量。舉例來說,與 n2-standard-32 預先定義機器類型的容量相比,n2-custom-24-96 自訂機器類型的容量更有可能達到,但 n2-standard-16 預先定義機器類型的容量更有可能達到。

在離峰時段執行大型先占 VM 叢集

Google Cloud 資料中心的負載量會因地點和時段而異,但通常在夜間和週末時最低。因此,夜晚與週末最適合執行大型先佔 VM 叢集。

將應用程式設計為容錯且能承受先占

請務必做好準備,以因應不同時間點的優先順序模式變更。舉例來說,如果某個區域發生部分停機情形,大量可先占的 VM 就會遭到先占,以便為需要遷移的標準 VM 騰出空間,以便進行復原作業。在這短暫的時間內,先佔率看起來與其他任何日期的先佔率非常不同。如果應用程式假設預取作業一律以小群組執行,您可能無法為這類事件做好準備。您可以停止 VM 執行個體來測試應用程式在先佔事件下的行為。

再次嘗試建立已遭先占的 VM

如果您的 VM 執行個體已遭到先占,請先嘗試建立新的先占 VM 一兩次,再退而求其次使用標準 VM。視您的需求為何,可以考慮混合使用叢集中的標準與可先占 VM,以確保工作能夠按照適當的速度進行。

使用關閉指令碼

使用關閉指令碼管理關閉與先佔通知,能夠儲存工作進度以接續上次進度,而不用從頭開始。

後續步驟