You can use GPUs on Compute Engine to accelerate specific workloads on your VMs such as machine learning (ML) and data processing. To use GPUs, you can either deploy an accelerator-optimized VM that has attached GPUs, or attach GPUs to an N1 general-purpose VM.
Compute Engine provides GPUs for your VMs in passthrough mode so that your VMs have direct control over the GPUs and their associated memory.
For more information about GPUs on Compute Engine, see About GPUs.
If you have graphics-intensive workloads, such as 3D visualization, 3D rendering, or virtual applications, you can use NVIDIA RTX virtual workstations (formerly known as NVIDIA GRID).
This document provides an overview of the different GPU VMs that are available on Compute Engine.
To view available regions and zones for GPUs on Compute Engine, see GPUs regions and zone availability.
GPUs for compute workloads
For compute workloads, GPUs are supported for the following machine types:
- A3 VMs: these VMs have NVIDIA H100 80GB or NVIDIA H200 141GB GPUs automatically attached.
- A2 VMs: these VMs have either NVIDIA A100 80GB or NVIDIA A100 40GB GPUs automatically attached.
- G2 VMs: these VMs have NVIDIA L4 GPUs automatically attached.
- N1 VMs: for these VMs, you can attach the following GPU models: NVIDIA T4, NVIDIA V100, NVIDIA P100, or NVIDIA P4.
A3 machine series
To use NVIDIA H100 80GB or NVIDIA H200 141GB GPUs, you must use an A3 accelerator-optimized machine. Each A3 machine type has a fixed GPU count, vCPU count, and memory size.
A3 Ultra machine type
To use NVIDIA H200 141GB GPUs, you must use the A3 Ultra machine type.
This machine type has H200 141GB GPUs (nvidia-h200-141gb
)
and provide the highest network performance. They are ideal for foundation
model training and serving.
Machine type | GPU count | GPU memory* (GB HBM3e) |
vCPU count† | VM memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps)‡ | Network protocol |
---|---|---|---|---|---|---|---|---|
a3-ultragpu-8g |
8 | 1128 | 224 | 2,952 | 12,000 | 10 | 3,200 | RDMA over Converged Ethernet (RoCE) |
*GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the VM's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
†A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms.
‡Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
See Network bandwidth.
A3 Mega, High, and Edge machine types
To use NVIDIA H100 80GB you have the following options:
- A3 Mega: these machine types have H100 80GB GPUs (
nvidia-h100-mega-80gb
) and are ideal for large-scale training and serving workloads. - A3 High: these machine types have H100 80GB GPUs (
nvidia-h100-80gb
) and are well-suited for both training and serving tasks. - A3 Edge: these machine types have H100 80GB GPUs (
nvidia-h100-80gb
), are designed specifically for serving, and are available in a limited set of regions.
A3 Mega
Machine type | GPU count | GPU memory* (GB HBM3) |
vCPU count† | VM memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps)‡ | Network protocol |
---|---|---|---|---|---|---|---|---|
a3-megagpu-8g |
8 | 640 | 208 | 1,872 | 6,000 | 9 | 1,800 | GPUDirect-TCPXO |
A3 High
Machine type | GPU count | GPU memory* (GB HBM3) |
vCPU count† | VM memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps)‡ | Network protocol |
---|---|---|---|---|---|---|---|---|
a3-highgpu-1g |
1 | 80 | 26 | 234 | 750 | 1 | 25 | GPUDirect-TCPX |
a3-highgpu-2g |
2 | 160 | 52 | 468 | 1,500 | 1 | 50 | GPUDirect-TCPX |
a3-highgpu-4g |
4 | 320 | 104 | 936 | 3,000 | 1 | 100 | GPUDirect-TCPX |
a3-highgpu-8g |
8 | 640 | 208 | 1,872 | 6,000 | 5 | 1,000 | GPUDirect-TCPX |
A3 Edge
Machine type | GPU count | GPU memory* (GB HBM3) |
vCPU count† | VM memory (GB) | Attached Local SSD (GiB) | Physical NIC count | Maximum network bandwidth (Gbps)‡ | Network protocol |
---|---|---|---|---|---|---|---|---|
a3-edgegpu-8g |
8 | 640 | 208 | 1,872 | 6,000 | 5 |
|
GPUDirect-TCPX |
*GPU memory is the memory on a GPU device that can be used for
temporary storage of data. It is separate from the VM's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
†A vCPU is implemented as a single hardware hyper-thread on one of
the available CPU platforms.
‡Maximum egress bandwidth cannot exceed the number given. Actual
egress bandwidth depends on the destination IP address and other factors.
See Network bandwidth.
A2 machine series
To use NVIDIA A100 GPUs on Google Cloud, you must deploy an A2 accelerator-optimized machine. Each A2 machine type has a fixed GPU count, vCPU count, and memory size.
A2 machine series are available in two types:
- A2 Ultra: these machine types have A100 80GB GPUs (
nvidia-a100-80gb
) and Local SSD disks attached. - A2 Standard: these machine types have A100 40GB GPUs (
nvidia-tesla-a100
) attached.
A2 Ultra
Machine type | GPU count | GPU memory* (GB HBM3) |
vCPU count† | VM memory (GB) | Attached Local SSD (GiB) | Maximum network bandwidth (Gbps)‡ |
---|---|---|---|---|---|---|
a2-ultragpu-1g |
1 | 80 | 12 | 170 | 375 | 24 |
a2-ultragpu-2g |
2 | 160 | 24 | 340 | 750 | 32 |
a2-ultragpu-4g |
4 | 320 | 48 | 680 | 1,500 | 50 |
a2-ultragpu-8g |
8 | 640 | 96 | 1,360 | 3,000 | 100 |
A2 Standard
Machine type | GPU count | GPU memory* (GB HBM3) |
vCPU count† | VM memory (GB) | Attached Local SSD (GiB) | Maximum network bandwidth (Gbps)‡ |
---|---|---|---|---|---|---|
a2-highgpu-1g |
1 | 40 | 12 | 85 | Yes | 24 |
a2-highgpu-2g |
2 | 80 | 24 | 170 | Yes | 32 |
a2-highgpu-4g |
4 | 160 | 48 | 340 | Yes | 50 |
a2-highgpu-8g |
8 | 320 | 96 | 680 | Yes | 100 |
a2-megagpu-16g |
16 | 640 | 96 | 1,360 | Yes | 100 |
*GPU memory is the memory available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
G2 machine series
To use NVIDIA L4 GPUs
(nvidia-l4
or nvidia-l4-vws
), you must deploy a
G2 accelerator-optimized
machine.
Each G2 machine type has a fixed number of NVIDIA L4 GPUs and vCPUs attached. Each G2 machine type also has a default memory and a custom memory range. The custom memory range defines the amount of memory that you can allocate to your VM for each machine type. You can specify your custom memory during VM creation.
Machine type | GPU count | GPU memory* (GB GDDR6) | vCPU count† | Default VM memory (GB) | Custom VM memory range (GB) | Max Local SSD supported (GiB) | Maximum network bandwidth (Gbps)‡ |
---|---|---|---|---|---|---|---|
g2-standard-4 |
1 | 24 | 4 | 16 | 16 to 32 | 375 | 10 |
g2-standard-8 |
1 | 24 | 8 | 32 | 32 to 54 | 375 | 16 |
g2-standard-12 |
1 | 24 | 12 | 48 | 48 to 54 | 375 | 16 |
g2-standard-16 |
1 | 24 | 16 | 64 | 54 to 64 | 375 | 32 |
g2-standard-24 |
2 | 48 | 24 | 96 | 96 to 108 | 750 | 32 |
g2-standard-32 |
1 | 24 | 32 | 128 | 96 to 128 | 375 | 32 |
g2-standard-48 |
4 | 96 | 48 | 192 | 192 to 216 | 1,500 | 50 |
g2-standard-96 |
8 | 192 | 96 | 384 | 384 to 432 | 3,000 | 100 |
*GPU memory is the memory available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
N1 machine series
You can attach the following GPU models to an N1 machine type with the exception of the N1 shared-core machine type.
N1 VMs with lower numbers of GPUs are limited to a maximum number of vCPUs. In general, a higher number of GPUs lets you create VM instances with a higher number of vCPUs and memory.
N1+T4 GPUs
You can attach NVIDIA T4 GPUs to N1 general-purpose VMs with the following VM configurations.
Accelerator type | GPU count | GPU memory* (GB GDDR6) | vCPU count | VM memory (GB) | Local SSD supported |
---|---|---|---|---|---|
nvidia-tesla-t4 or nvidia-tesla-t4-vws
|
1 | 16 | 1 to 48 | 1 to 312 | Yes |
2 | 32 | 1 to 48 | 1 to 312 | Yes | |
4 | 64 | 1 to 96 | 1 to 624 | Yes |
*GPU memory is the memory available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
N1+P4 GPUs
You can attach NVIDIA P4 GPUs to N1 general-purpose VMs with the following VM configurations.
Accelerator type | GPU count | GPU memory* (GB GDDR5) | vCPU count | VM memory (GB) | Local SSD supported† |
---|---|---|---|---|---|
nvidia-tesla-p4 or nvidia-tesla-p4-vws
|
1 | 8 | 1 to 24 | 1 to 156 | Yes |
2 | 16 | 1 to 48 | 1 to 312 | Yes | |
4 | 32 | 1 to 96 | 1 to 624 | Yes |
*GPU memory is the memory that is available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
†For VMs with attached NVIDIA P4 GPUs, Local SSD disks
are only supported in zones us-central1-c
and
northamerica-northeast1-b
.
N1+V100 GPUs
You can attach NVIDIA V100 GPUs to N1 general-purpose VMs with the following VM configurations.
Accelerator type | GPU count | GPU memory* (GB HBM2) | vCPU count | VM memory (GB) | Local SSD supported† |
---|---|---|---|---|---|
nvidia-tesla-v100 |
1 | 16 | 1 to 12 | 1 to 78 | Yes |
2 | 32 | 1 to 24 | 1 to 156 | Yes | |
4 | 64 | 1 to 48 | 1 to 312 | Yes | |
8 | 128 | 1 to 96 | 1 to 624 | Yes |
*GPU memory is the memory available on a GPU device that can be used
for temporary storage of data. It is separate from the VM's memory and is
specifically designed to handle the higher bandwidth demands of your
graphics-intensive workloads.
†For VMs with attached NVIDIA V100 GPUs, Local SSD disks
aren't supported in us-east1-c
.
N1+P100 GPUs
You can attach NVIDIA P100 GPUs to N1 general-purpose VMs with the following VM configurations.
For some NVIDIA P100 GPUs, the maximum CPU and memory that is available for some configurations is dependent on the zone in which the GPU resource is running.
Accelerator type | GPU count | GPU memory* (GB HBM2) | vCPU count | VM memory (GB) | Local SSD supported |
---|---|---|---|---|---|
nvidia-tesla-p100 or nvidia-tesla-p100-vws
|
1 | 16 | 1 to 16 | 1 to 104 | Yes |
2 | 32 | 1 to 32 | 1 to 208 | Yes | |
4 | 64 | 1 to 64 1 to 96 |
1 to 208 1 to 624 |
Yes |
*GPU memory is the memory available on a GPU device that can be used for temporary storage of data. It is separate from the VM's memory and is specifically designed to handle the higher bandwidth demands of your graphics-intensive workloads.
NVIDIA RTX Virtual Workstations (vWS) for graphics workloads
If you have graphics-intensive workloads, such as 3D visualization, you can create virtual workstations that use NVIDIA RTX Virtual Workstations (vWS) (formerly known as NVIDIA GRID). When you create a virtual workstation, an NVIDIA RTX Virtual Workstation (vWS) license is automatically added to your VM.
For information about pricing for virtual workstations, see GPU pricing page.
For graphics workloads, NVIDIA RTX virtual workstation (vWS) models are available:
G2 machine series: for G2 machine types you can enable NVIDIA L4 Virtual Workstations (vWS):
nvidia-l4-vws
N1 machine series: for N1 machine types, you can enable the following virtual workstations:
- NVIDIA T4 Virtual Workstations:
nvidia-tesla-t4-vws
- NVIDIA P100 Virtual Workstations:
nvidia-tesla-p100-vws
- NVIDIA P4 Virtual Workstations:
nvidia-tesla-p4-vws
- NVIDIA T4 Virtual Workstations:
General comparison chart
The following table describes the GPU memory size, feature availability, and ideal workload types of different GPU models that are available on Compute Engine.
GPU model | GPU memory | Interconnect | NVIDIA RTX Virtual Workstation (vWS) support | Best used for |
---|---|---|---|---|
H200 141GB | 141 GB HBM3e @ 4.8 TBps | NVLink Full Mesh @ 900 GBps | Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM | |
H100 80GB | 80 GB HBM3 @ 3.35 TBps | NVLink Full Mesh @ 900 GBps | Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM | |
A100 80GB | 80 GB HBM2e @ 1.9 TBps | NVLink Full Mesh @ 600 GBps | Large models with massive data tables for ML Training, Inference, HPC, BERT, DLRM | |
A100 40GB | 40 GB HBM2 @ 1.6 TBps | NVLink Full Mesh @ 600 GBps | ML Training, Inference, HPC | |
L4 | 24 GB GDDR6 @ 300 GBps | N/A | ML Inference, Training, Remote Visualization Workstations, Video Transcoding, HPC | |
T4 | 16 GB GDDR6 @ 320 GBps | N/A | ML Inference, Training, Remote Visualization Workstations, Video Transcoding | |
V100 | 16 GB HBM2 @ 900 GBps | NVLink Ring @ 300 GBps | ML Training, Inference, HPC | |
P4 | 8 GB GDDR5 @ 192 GBps | N/A | Remote Visualization Workstations, ML Inference, and Video Transcoding | |
P100 | 16 GB HBM2 @ 732 GBps | N/A | ML Training, Inference, HPC, Remote Visualization Workstations |
To compare GPU pricing for the different GPU models and regions that are available on Compute Engine, see GPU pricing.
Performance comparison chart
The following table describes the performance specifications of different GPU models that are available on Compute Engine.
Compute performance
GPU model | FP64 | FP32 | FP16 | INT8 |
---|---|---|---|---|
H200 141GB | 34 TFLOPS | 67 TFLOPS | ||
H100 80GB | 34 TFLOPS | 67 TFLOPS | ||
A100 80GB | 9.7 TFLOPS | 19.5 TFLOPS | ||
A100 40GB | 9.7 TFLOPS | 19.5 TFLOPS | ||
L4 | 0.5 TFLOPS* | 30.3 TFLOPS | ||
T4 | 0.25 TFLOPS* | 8.1 TFLOPS | ||
V100 | 7.8 TFLOPS | 15.7 TFLOPS | ||
P4 | 0.2 TFLOPS* | 5.5 TFLOPS | 22 TOPS† | |
P100 | 4.7 TFLOPS | 9.3 TFLOPS | 18.7 TFLOPS |
*To allow FP64 code to work correctly, a small number of FP64
hardware units are included in the T4, L4, and P4 GPU architecture.
†TeraOperations per Second.
Tensor core performance
GPU model | FP64 | TF32 | Mixed-precision FP16/FP32 | INT8 | INT4 | FP8 |
---|---|---|---|---|---|---|
H200 141GB | 67 TFLOPS | 989 TFLOPS† | 1,979 TFLOPS*, † | 3,958 TOPS† | 3,958 TFLOPS† | |
H100 80GB | 67 TFLOPS | 989 TFLOPS† | 1,979 TFLOPS*, † | 3,958 TOPS† | 3,958 TFLOPS† | |
A100 80GB | 19.5 TFLOPS | 156 TFLOPS | 312 TFLOPS* | 624 TOPS | 1248 TOPS | |
A100 40GB | 19.5 TFLOPS | 156 TFLOPS | 312 TFLOPS* | 624 TOPS | 1248 TOPS | |
L4 | 120 TFLOPS† | 242 TFLOPS*, † | 485 TOPS† | 485 TFLOPS† | T4 | 65 TFLOPS | 130 TOPS | 260 TOPS |
V100 | 125 TFLOPS | |||||
P4 | ||||||
P100 |
*For mixed precision training, NVIDIA H100, A100, and L4 GPUs
also support the bfloat16
data type.
†For H100 and L4 GPUs, structural sparsity is supported which you
can use to double the performance value. The values shown are with
sparsity. Specifications are one-half lower without sparsity.
What's next?
- For more information about GPUs on Compute Engine, see About GPUs.
- Review the GPU regions and zones availability.
- Review Network bandwidths and GPUs.
- Learn about GPU pricing.