Compute Engine instances provisioning models


This document describes the provisioning models for Compute Engine instances. To learn more about deployment options, see Choose a Compute Engine deployment strategy for your workload.

Provisioning models determine the availability, lifespan, and pricing of your instances. By understanding these models, you can choose the best option for your workload.

Available provisioning models

When you create a compute instance, you can specify one of the following provisioning models. If you don't specify a provisioning model, then Compute Engine uses the standard provisioning model by default.

  • Standard

  • Spot

  • Flex-start (Preview)

  • Reservation-bound

The following table helps you compare the use cases and pricing for each provisioning model:

Standard Spot Flex-start (Preview) Reservation-bound
Summary
  • Based on resource availability, you can immediately create instances.
  • You can control when to stop or delete instances.
  • Based on resource availability, you can immediately create instances.
  • You can control when to stop or delete instances. However, you also allow Compute Engine to stop or delete instances at any time to reclaim capacity.
  • After you create a zonal managed instance group (MIG), you request Compute Engine to add instances with GPUs attached to the MIG. Compute Engine schedules the provisioning of the instances based on resource availability.
  • You can control when to delete instances. However, you can't stop, suspend, or recreate them. The instances run for up to seven days. Then, Compute Engine automatically deletes them.
  • You can request to reserve capacity at a future date for creating instances with GPUs attached. If Google Cloud approves your request, then it creates a reservation that you can start consuming at your specified date.
  • During the approved reservation period, you can stop, restart, delete, and recreate instances to consume the reservation as needed. When the reservation period ends, Compute Engine automatically deletes the reservation and any instances that are consuming it.
Use cases Ideal for workloads that require stability and continuous operation, such as the following workloads:
  • Web servers
  • Databases
  • Enterprise applications
  • Development and testing
Ideal for workloads that can tolerate interruptions, such as the following workloads:
  • Batch processing
  • High performance computing (HPC)
  • Continuous integration and continuous deployment (CI/CD)
  • Data analytics
  • Media encoding
  • Online inference
Workloads that require stability and need to run for no more than seven days, such as the following workloads:
  • Small model pre-training
  • Model fine-tuning
  • HPC simulation
  • Batch inference
Ideal for workloads that require stability and a specific run time, such as the following:
  • Pre-training foundation models
  • Multi-host foundation model inference
Pricing You incur standard pricing for instances. For more information, see VM instance pricing. Most vCPUs, GPUs, and Local SSD are available at a 60-91% discount. For more information, see Spot VMs pricing. See VM instance pricing. Based on the machine family that your instances use, you get up to a 53% discount for vCPUs and GPUs. For more information, see pricing for accelerator-optimized VMs.
Quota When you create an instance, standard quota is consumed. When you create an instance, preemptible quota is consumed. If your project lacks preemptible quota, then standard quota is consumed. Google Cloud Free Tier credits don't apply to Spot VMs. When the MIG adds instances to the group, preemptible quota is consumed. If your project lacks preemptible quota, then standard quota is consumed. Quota doesn't apply to the reservation-bound provisioning model.

Instance availability and lifespan

The following table shows you the compute instances availability and lifespan for each provisioning model:

Standard Spot Flex-start (Preview) Reservation-bound
Creation prerequisites No creation prerequisites. No creation prerequisites. No creation prerequisites. To create instances, you must first reserve capacity by creating future reservation requests for multiple blocks. At your specified date and time, Compute Engine provisions your requested capacity. You can then start consuming it by creating instances.
Supported machine series You can use any machine series, except A4 and A3 Ultra. You can use any machine series, except the following:
  • M2 and M3
  • C3 and X4 bare metal instances
You can only use the following machine series:
You can only use A4 and A3 Ultra machine series.
Instance availability You can create instances at any time, as long as your requested resources are available. You can create instances at any time, as long as your requested resources are available. You can only create instances by creating resize requests in a MIG. Compute Engine uses DWS to schedule the provisioning of your requested capacity based on resource availability. DWS helps increase your chances of obtaining high-demand resources like GPUs. You can only create instances after reserving capacity for a future date. On your requested date, Compute Engine delivers your requested capacity, which you can then use to create instances.
Instance lifespan You can control when to stop or delete an instance, except in the following cases:
  • If the machine type that the instance uses doesn't support live migration, then Compute Engine stops your instances during host maintenance events.
  • In rare cases, the instance may stop due to a host error.
You can control when to stop or delete an instance, except in the following cases:
  • Compute Engine might stop or delete the instance at any time to reclaim capacity. This process is called preemption.
  • If the machine type that the instance uses doesn't support live migration, then Compute Engine stops your instances during host maintenance events.
  • In rare cases, the instance may stop due to a host error.
The provisioned instances run for your chosen run duration, which can be up to seven days. You can't stop, suspend, or recreate the instances.

Compute Engine deletes instances when one of the following happens:
  • You request to delete instances.
  • The instances reach the end of their run duration.
You can control when to stop or delete an instance, except in the following cases:
  • Compute Engine stops your instance during host maintenance events.
  • The automatically created reservation to provision your requested capacity reaches the end of its committed reservation period. At that time, Compute Engine automatically deletes the reservation and any instances that are consuming it.
  • In rare cases, the instance may stop due to a host error.

What's next