Consumption options

The following provides a description of the consumption options that are supported for AI Hypercomputer. Consumption options are the methods used to request capacity while provisioning models are how you specify what type of capacity to use when creating your VMs or clusters.

When deploying your VMs or clusters, you must specify a provisioning model that matches your required consumption option. For more information about provisioning models, see Provisioning models.

Consumption option How it works Best used for GPU machine type supported
Reservations You request compute resources in advance for a specific amount of time. These resources are dedicated to you for that period of time.

Reservations provide the highest level of assurance for capacity and are cost effective as they are available at a much lower price than an on-demand request.

Reservations are ideal for long running training jobs and inference workloads. All GPU machine types
  • For A4 and A3 Ultra machines, you can reserve densely allocated resources for your long running workloads. For instructions about reserving resources using this method, see Request capacity in this documentation.
  • For A3 High and A3 Mega, you can request densely allocated capacity by contacting Technical Account Management (TAM) services.
  • For A2, G2, and N1+GPU machine series, you can also reserve capacity by using the Reservations API in the Compute Engine documentation. These reservations won't be densely allocated.
Dynamic workload scheduler (DWS) You request compute resources for a specific amount of time.

As these are delivered from a secured pool of resources, the availability of these are much higher than an on-demand request.

DWS calendar mode is ideal for workloads that need to run at a specific time. Training workloads that run anywhere between a few days to a few weeks are ideal candidates. These include small model pre-training jobs, model fine-tuning jobs, HPC simulation workloads, and short-term expected increases in inference workloads.

DWS calendar mode is available for only A3 High and A3 Mega machine types that have 8 GPUs attached. This feature is in private preview. To get access, contact Technical Account Management (TAM) services.

DWS flex is ideal for small model pre-training jobs, model fine-tuning jobs as well as HPC simulation workloads that require just-in-time capacity.

DWS flex is available for all GPU machine types except A3 Ultra and A4. To use DWS Flex, select one of the following methods:

Spot You request compute resources which are delivered based on availability.

These spot resources might be easier to obtain than the on-demand resource but can be deleted at any time by the system. These resources are cost effective as they are available at a much lower price than the standard model.
Spot is a good fit for scheduling lower priority workloads like model pre-training, model fine-tuning jobs and simulation jobs that are tolerant to availability disruptions. All GPU machine types

To use spot capacity, you specify spot as your provisioning model during the creation of Compute Engine instances or Google Kubernetes Engine containers.

Pricing and discount

The accelerator-optimized machine types are billed for their attached GPUs, predefined vCPU, memory, and bundled Local SSD (if applicable). For more pricing information for accelerator-optimized VMs, see Accelerator-optimized machine type family section on the VM instance pricing page.