Resource requests in Autopilot

Autopilot

This page describes how Google Kubernetes Engine (GKE) Autopilot manages the values of workload resource requests, such as CPU, memory, or ephemeral storage. This page includes the following information, which you can use to plan efficient, stable, and cost-effective workloads:

Default values that Autopilot applies to Pods that don't specify values.
Minimum and maximum values that Autopilot enforces for resource requests.
How the default, minimum, and maximum values vary based on the hardware that your Pods request.

This page is for Operators and Developers who provision and configure cloud resources, and deploy workloads. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Before reading this page, ensure that you're familiar with Kubernetes resource management concepts.

Overview of resource requests in Autopilot

Autopilot uses the resource requests that you specify in your workload configuration to configure the nodes that run your workloads. Autopilot enforces minimum and maximum resource requests based on the compute class or the hardware configuration that your workloads use. If you don't specify requests for some containers, Autopilot assigns default values to let those containers run correctly.

When you deploy a workload in an Autopilot cluster, GKE validates the workload configuration against the allowed minimum and maximum values for the selected compute class or hardware configuration (such as GPUs). If your requests are less than the minimum, Autopilot automatically modifies your workload configuration to bring your requests within the allowed range. If your requests are greater than the maximum, Autopilot rejects your workload and displays an error message.

The following list summarizes the categories of resource requests:

Default resource requests: Autopilot adds these if you don't specify your own requests for workloads
Minimum and maximum resource requests: Autopilot validates your specified requests to ensure that they're within these limits. If your requests are outside the limits, Autopilot modifies your workload requests.
Workload separation and extended duration requests: Autopilot has different default values and different minimum values for workloads that you separate from each other, or for Pods that get extended protection from GKE-initiated eviction.
Resource requests for DaemonSets: Autopilot has different default, minimum, and maximum values for containers in DaemonSets.

How to request resources

In Autopilot, you request resources in your Pod specification. The supported minimum and maximum resources that you can request change based on the hardware configuration of the node on which the Pods run. To learn how to request specific hardware configurations, refer to the following pages:

Default resource requests

If you don't specify resource requests for some containers in a Pod, Autopilot applies default values. These defaults are suitable for many smaller workloads.

Additionally, Autopilot applies the following default resource requests regardless of the selected compute class or hardware configuration:

Containers in DaemonSets
- CPU: 50 mCPU
- Memory: 100 MiB
- Ephemeral storage: 100 MiB
All other containers
- Ephemeral storage: 1 GiB

For more information about Autopilot cluster limits, see Quotas and limits.

Default requests for compute classes

Autopilot applies the following default values to resources that are not defined in the Pod specification for Pods that run on compute classes. If you only set one of the requests and leave the other blank, GKE uses the CPU:memory ratio defined in the Minimum and maximum requests section to set the missing request to a value that complies with the ratio.

Compute class	Resource	Default request
General-purpose (default)	CPU	0.5 vCPU
General-purpose (default)	Memory	2 GiB
Accelerator	See the Default resources for accelerators section.
Balanced	CPU	0.5 vCPU
Balanced	Memory	2 GiB
Performance	CPU	C3 machine series: 2 vCPU C3 machine series with Local SSD: 2 vCPU C3D machine series: 2 vCPU C3D machine series with Local SSD: 4 vCPU C4D machine series (1.33.0-gke.1439000 or later): No default requests enforced. C4D machine series with Local SSD (1.33.1-gke.1171000 or later): No default requests enforced. H3 machine series: 80 vCPU C2 machine series: 2 vCPU C2D machine series: 2 vCPU T2A machine series: 2 vCPU T2D machine series: 2 vCPU
	Memory	C3 machine series: 8 GiB C3 machine series with Local SSD: 8 GiB C3D machine series: 8 GiB C3D machine series with Local SSD: 16 GiB C4D machine series (1.33.0-gke.1439000 or later): No default requests enforced. C4D machine series with Local SSD (1.33.1-gke.1171000 or later): No default requests enforced. H3 machine series: 320 GiB C2 machine series: 8 GiB C2D machine series: 8 GiB T2A machine series: 8 GiB T2D machine series: 8 GiB
	Ephemeral storage	C3 machine series: 1 GiB C3 machine series with Local SSD: 1 GiB C3D machine series: 1 GiB C3D machine series with Local SSD: 1 GiB C4D machine series (1.33.0-gke.1439000 or later): No default requests enforced. C4D machine series with Local SSD (1.33.1-gke.1171000 or later): No default requests enforced. H3 machine series: 1 GiB C2 machine series: 1 GiB C2D machine series: 1 GiB T2A machine series: 1 GiB T2D machine series: 1 GiB
Scale-Out	CPU	0.5 vCPU
Scale-Out	Memory	2 GiB

Default requests for accelerators

In version 1.29.4-gke.1427000 and later, Autopilot doesn't enforce default requests for accelerators. To learn more, see Pricing.

The following table describes the default values that GKE assigns to Pods that don't specify values in the requests field of the Pod specification. This table applies to Pods that run on versions earlier than 1.29.4-gke.1427000 that use the Accelerator compute class, which is the recommended way to run accelerators in Autopilot clusters.

Accelerator	Resource	Total default request
NVIDIA B200 GPUs `nvidia-b200`	No default requests enforced.
NVIDIA H200 (141GB) GPUs `nvidia-h200-141gb`	No default requests enforced.
NVIDIA H100 Mega (80GB) GPUs `nvidia-h100-mega-80gb`	CPU	8 GPUs: 200 vCPU
	Memory	8 GPUs: 1400 GiB
	Ephemeral storage	8 GPUs: 1 GiB
NVIDIA H100 (80GB) GPUs `nvidia-h100-80gb`	CPU	8 GPUs: 200 vCPU
	Memory	8 GPUs: 1400 GiB
	Ephemeral storage	8 GPUs: 1 GiB
NVIDIA A100 (40GB) GPUs `nvidia-tesla-a100`	CPU	1 GPU: 9 vCPU 2 GPUs: 20 vCPU 4 GPUs: 44 vCPU 8 GPUs: 92 vCPU 16 GPUs: 92 vCPU
NVIDIA A100 (40GB) GPUs `nvidia-tesla-a100`	Memory	1 GPU: 60 GiB 2 GPUs: 134 GiB 4 GPUs: 296 GiB 8 GPUs: 618 GiB 16 GPUs: 1250 GiB
NVIDIA A100 (80GB) GPUs `nvidia-a100-80gb`	CPU	1 GPU: 9 vCPU 2 GPUs: 20 vCPU 4 GPUs: 44 vCPU 8 GPUs: 92 vCPU
	Memory	1 GPU: 134 GiB 2 GPUs: 296 GiB 4 GPUs: 618 GiB 8 GPUs: 1250 GiB
	Ephemeral storage	1 GPU: 1 GiB 2 GPUs: 1 GiB 4 GPUs: 1 GiB 8 GPUs: 1 GiB
NVIDIA L4 GPUs `nvidia-l4`	CPU	1 GPU: 2 vCPU 2 GPUs: 21 vCPU 4 GPUs: 45 vCPU 8 GPUs: 93 vCPU
NVIDIA L4 GPUs `nvidia-l4`	Memory	1 GPU: 7 GiB 2 GPUs: 78 GiB 4 GPUs: 170 GiB 8 GPUs: 355 GiB
NVIDIA T4 GPUs `nvidia-tesla-t4`	CPU	1 GPU: 0.5 vCPU 4 GPUs: 0.5 vCPU
NVIDIA T4 GPUs `nvidia-tesla-t4`	Memory	1 GPU: 2 GiB 4 GPUs: 2 GiB
TPU Trillium (v6e) `tpu-v6e-slice` (single-host)	CPU	All topologies: 1 mCPU
TPU Trillium (v6e) `tpu-v6e-slice` (single-host)	Memory	All topologies: 1 MiB
TPU Trillium (v6e) `tpu-v6e-slice` (multi-host)	CPU	All topologies: 1 mCPU
TPU Trillium (v6e) `tpu-v6e-slice` (multi-host)	Memory	All topologies: 1 MiB
TPU v5e `tpu-v5-lite-podslice` (multi-host)	CPU	All topologies: 1 mCPU
TPU v5e `tpu-v5-lite-podslice` (multi-host)	Memory	All topologies: 1 MiB
TPU v5p `tpu-v5p-slice`	CPU	All topologies: 1 mCPU
TPU v5p `tpu-v5p-slice`	Memory	All topologies: 1 MiB
TPU v4 `tpu-v4-podslice`	CPU	All topologies: 1 mCPU
TPU v4 `tpu-v4-podslice`	Memory	All topologies: 1 MiB

Supported GPUs without the Accelerator compute class

If you don't use the Accelerator compute class, only the following GPUs are supported. The default resource requests for these GPUs are the same as in the Accelerator compute class:

NVIDIA A100 (40GB)
NVIDIA A100 (80GB)
NVIDIA L4
NVIDIA Tesla T4

Minimum and maximum resource requests

The total resources requested by your deployment configuration should be within the supported minimum and maximum values that Autopilot allows. The following conditions apply:

Ephemeral storage requests:
- Ephemeral storage uses the VM boot disk unless your nodes have local SSDs attached.
  
  Compute hardware that includes local SSDs like A100 (80GB) GPUs, H100 (80GB) GPUs, or the Z3 machine series support a maximum request that's equal to the size of the local SSD minus any system overhead. For information about this system overhead, see Ephemeral storage backed by local SSDs.
- In GKE version 1.29.3-gke.1038000 and later, Performance class Pods and hardware accelerator Pods support a maximum ephemeral storage request of 56 Ti unless the hardware includes local SSDs.
  
  In all other Autopilot Pods regardless of the GKE version, the total ephemeral storage request across all of the containers in the Pod must be between 10 MiB and 10 GiB unless otherwise specified.
- For larger volumes, use generic ephemeral volumes, which provide equivalent functionality and performance to ephemeral storage but with significantly more flexibility as they can be used with any GKE storage option. For example, the maximum size for a generic ephemeral volume using pd-balanced is 64 TiB.
For DaemonSet Pods, the minimum resource requests are as follows:
- Clusters that support bursting: 1 mCPU per Pod, 2 MiB of memory per Pod, and 10 MiB of ephemeral storage per container in the Pod.
- Clusters that don't support bursting: 10 mCPU per Pod, 10 MiB of memory per Pod, and 10 MiB of ephemeral storage per container in the Pod.
To check whether your cluster supports bursting, see Bursting availability in GKE.
If your cluster supports bursting, Autopilot doesn't enforce 0.25 vCPU increments for your Pod CPU requests. If your cluster doesn't support bursting, Autopilot rounds up your CPU requests to the nearest 0.25 vCPU. To check whether your cluster supports bursting, see Bursting availability in GKE.
The CPU:memory ratio must be within the allowed range for the selected compute class or hardware configuration. If your CPU:memory ratio is outside the allowed range, Autopilot automatically increases the smaller resource. For example, if you request 1 vCPU and 16 GiB of memory (1:16 ratio) for Pods running on the Scale-Out class, Autopilot increases the CPU request to 4 vCPUs, which changes the ratio to 1:4.

Minimums and maximums for compute classes

The following table describes the minimum, maximum, and allowed CPU-to-memory ratio for each compute class that Autopilot supports:

Compute class	CPU:memory ratio (vCPU:GiB)	Resource	Minimum	Maximum
General-purpose (default)	Between 1:1 and 1:6.5	CPU	The value depends on whether your cluster supports bursting, as follows: Clusters that support bursting: 50m CPU Clusters that don't support bursting: 250m CPU To check whether your cluster supports bursting, see Bursting availability in GKE.	30 vCPU
General-purpose (default)	Between 1:1 and 1:6.5	Memory	The value depends on whether your cluster supports bursting, as follows: Clusters that support bursting: 52 MiB Clusters that don't support bursting: 512 MiB To check whether your cluster supports bursting, see Bursting availability in GKE.	110 GiB
Accelerator	See Minimums and maximums for accelerators
Balanced	Between 1:1 and 1:8	CPU	0.25 vCPU	222 vCPU If minimum CPU platform selected: Intel platforms: 126 vCPU AMD platforms: 222 vCPU
Balanced	Between 1:1 and 1:8	Memory	0.5 GiB	851 GiB If minimum CPU platform selected: Intel platforms: 823 GiB AMD platforms: 851 GiB
Performance	N/A	CPU	0.001 vCPU	C3 machine series: 174 vCPU C3 machine series with Local SSD: 174 vCPU C3D machine series: 358 vCPU C3D machine series with Local SSD: 358 vCPU C4D machine series (1.33.0-gke.1439000 or later): 382 vCPU C4D machine series with Local SSD (1.33.1-gke.1171000 or later): 382 vCPU H3 machine series: 86 vCPU C2 machine series: 58 vCPU C2D machine series: 110 vCPU T2A machine series: 46 vCPU T2D machine series: 58 vCPU
		Memory	1 MiB	C3 machine series: 1,345 GiB C3 machine series with Local SSD: 670 GiB C3D machine series: 2750 GiB C3D machine series with Local SSD: 1,375 GiB C4D machine series (1.33.0-gke.1439000 or later): 2905 GiB C4D machine series with Local SSD (1.33.1-gke.1171000 or later): 2905 GiB H3 machine series: 330 GiB C2 machine series: 218 GiB C2D machine series: 835 GiB T2A machine series: 172 GiB T2D machine series: 218 GiB
		Ephemeral storage	10 MiB	In GKE version 1.29.3-gke.1038000 and later, you can specify a maximum ephemeral storage request of 56 Ti. The C4D machine series is available with version 1.33.0-gke.1439000 or later and supports requests of up to 56 Ti with or without Local SSD. For versions earlier than 1.29.3-gke.1038000, the following limits apply: C3 machine series: 250 GiB C3 machine series with Local SSD: 10,000 GiB C3D machine series: 250 GiB C3D machine series with Local SSD: 10,000 GiB H3 machine series: 250 GiB C2 machine series: 250 GiB C2D machine series: 250 GiB T2A machine series: 250 GiB T2D machine series: 250 GiB
Scale-Out	Exactly 1:4	CPU	0.25 vCPU	`arm64`: 43 vCPU `amd64`: 54 vCPU
Scale-Out	Exactly 1:4	Memory	1 GiB	`arm64`: 172 GiB `amd64`: 216 GiB

To learn how to request compute classes in your Autopilot Pods, refer to Choose compute classes for Autopilot Pods.

Minimums and maximums for accelerators

The following sections describe the minimum, maximum, and allowed CPU-to-memory ratio for Pods that use hardware accelerators like GPUs and TPUs.

Unless specified, the maximum ephemeral storage supported is 122 GiB in versions 1.28.6-gke.1369000 or later, and 1.29.1-gke.1575000 or later. For earlier versions, the maximum ephemeral storage supported is 10 GiB.

Minimums and maximums for the Accelerator compute class

The following table shows the minimum and maximum resource requests for Pods that use the Accelerator compute class, which is the recommended way to run accelerators with GKE Autopilot clusters. In the Accelerator compute class, GKE doesn't enforce CPU-to-memory request ratios.

Accelerator type	Resource	Minimum	Maximum
NVIDIA B200 `nvidia-B200`	CPU	No minimum requests enforced	8 GPUs: 224 vCPU
	Memory	No minimum requests enforced	8 GPUs: 3968 GiB
	Ephemeral storage	No minimum requests enforced	8 GPUs: 10 TiB
NVIDIA H200 (141GB) `nvidia-h200-141gb`	CPU	No minimum requests enforced	8 GPUs: 224 vCPU
	Memory	No minimum requests enforced	8 GPUs: 2952 GiB
	Ephemeral storage	No minimum requests enforced	8 GPUs: 10 TiB (1.32.2-gke.1182000 or later) 8 GPUs: 2540 GiB (earlier than 1.32.2-gke.1182000)
NVIDIA H100 Mega (80GB) `nvidia-h100-mega-80gb`	CPU	8 GPUs: 0.001 vCPU	8 GPUs: 206 vCPU
	Memory	8 GPUs: 1 MiB	8 GPUs: 1795 GiB
	Ephemeral storage	8 GPU: 10 MiB	8 GPUs: 5250 GiB
NVIDIA H100 (80GB) `nvidia-h100-80gb`	CPU	8 GPUs: 0.001 vCPU	8 GPUs: 206 vCPU
	Memory	8 GPUs: 1 MiB	8 GPUs: 1795 GiB
	Ephemeral storage	8 GPU: 10 MiB	8 GPUs: 5250 GiB
NVIDIA A100 (40GB) `nvidia-tesla-a100`	CPU	0.001 vCPU	1 GPU: 11 vCPU 2 GPUs: 22 vCPU 4 GPUs: 46 vCPU 8 GPUs: 94 vCPU 16 GPUs: 94 vCPU The sum of CPU requests of all DaemonSets that run on an A100 GPU node must not exceed 2 vCPU.
NVIDIA A100 (40GB) `nvidia-tesla-a100`	Memory	1 MiB	1 GPU: 74 GiB 2 GPUs: 148 GiB 4 GPUs: 310 GiB 8 GPUs: 632 GiB 16 GPUs: 1264 GiB The sum of memory requests of all DaemonSets that run on an A100 GPU node must not exceed 14 GiB.
NVIDIA A100 (80GB) `nvidia-a100-80gb`	CPU	0.001 vCPU	1 GPU: 11 vCPU 2 GPUs: 22 vCPU 4 GPUs: 46 vCPU 8 GPUs: 94 vCPU The sum of CPU requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 2 vCPU.
	Memory	1 MiB	1 GPU: 148 GiB 2 GPUs: 310 GiB 4 GPUs: 632 GiB 8 GPUs: 1264 GiB The sum of memory requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 14 GiB.
	Ephemeral storage	512 MiB	1 GPU: 280 GiB 2 GPUs: 585 GiB 4 GPUs: 1220 GiB 8 GPUs: 2540 GiB
NVIDIA L4 `nvidia-l4`	CPU	0.001 vCPU	1 GPU: 31 vCPU 2 GPUs: 23 vCPU 4 GPUs: 47 vCPU 8 GPUs: 95 vCPU The sum of CPU requests of all DaemonSets that run on an L4 GPU node must not exceed 2 vCPU.
NVIDIA L4 `nvidia-l4`	Memory	1 MiB	1 GPU: 115 GiB 2 GPUs: 83 GiB 4 GPUs: 177 GiB 8 GPUs: 363 GiB The sum of memory requests of all DaemonSets that run on an L4 GPU node must not exceed 14 GiB.
NVIDIA Tesla T4 `nvidia-tesla-t4`	CPU	0.001 vCPU	1 GPU: 46 vCPU 2 GPUs: 46 vCPU 4 GPUs: 94 vCPU
NVIDIA Tesla T4 `nvidia-tesla-t4`	Memory	1 MiB	1 GPU: 287.5 GiB 2 GPUs: 287.5 GiB 4 GPUs: 587.5 GiB
TPU v5e `tpu-v5-lite-podslice`	CPU	0.001 vCPU	1x1 topology: 24 vCPU 2x2 topology: 112 vCPU 2x4 topology (4-chip request): 112 vCPU 2x4 topology (8-chip request): 224 vCPU 4x4 topology: 112 vCPU 4x8 topology: 112 vCPU 8x8 topology: 112 vCPU 8x16 topology: 112 vCPU 16x16 topology: 112 vCPU
	Memory	1 MiB	1x1 topology: 48 GiB 2x2 topology: 192 GiB 2x4 topology (4-chip request): 192 GiB 2x4 topology (8-chip request): 384 GiB 4x4 topology: 192 GiB 4x8 topology: 192 GiB 8x8 topology: 192 GiB 8x16 topology: 192 GiB 16x16 topology: 192 GiB
	Ephemeral storage	10 MiB	56 TiB
TPU v5p `tpu-v5p-slice`	CPU	0.001 vCPU	280 vCPU
	Memory	1 MiB	448 GiB
	Ephemeral storage	10 MiB	56 TiB
TPU v4 `tpu-v4-podslice`	CPU	0.001 vCPU	240 vCPU
	Memory	1 MiB	407 GiB
	Ephemeral storage	10 MiB	56 TiB

To learn how to request GPUs in your Autopilot Pods, refer to Deploy GPU workloads in Autopilot.

Minimums and maximums for GPUs without a compute class

The following table shows the minimum and maximum resource requests for Pods that don't use the Accelerator compute class:

GPU type	CPU:memory ratio (vCPU:GiB)	Resource	Minimum	Maximum
NVIDIA A100 (40GB) `nvidia-tesla-a100`	Not enforced	CPU	1 GPU: 9 vCPU 2 GPUs: 20 vCPU 4 GPUs: 44 vCPU 8 GPUs: 92 vCPU 16 GPUs: 92 vCPU	1 GPU: 11 vCPU 2 GPUs: 22 vCPU 4 GPUs: 46 vCPU 8 GPUs: 94 vCPU 16 GPUs: 94 vCPU The sum of CPU requests of all DaemonSets that run on an A100 GPU node must not exceed 2 vCPU.
NVIDIA A100 (40GB) `nvidia-tesla-a100`	Not enforced	Memory	1 GPU: 60 GiB 2 GPUs: 134 GiB 4 GPUs: 296 GiB 8 GPUs: 618 GiB 16 GPUs: 1250 GiB	1 GPU: 74 GiB 2 GPUs: 148 GiB 4 GPUs: 310 GiB 8 GPUs: 632 GiB 16 GPUs: 1264 GiB The sum of memory requests of all DaemonSets that run on an A100 GPU node must not exceed 14 GiB.
NVIDIA A100 (80GB) `nvidia-a100-80gb`	Not enforced	CPU	1 GPU: 9 vCPU 2 GPUs: 20 vCPU 4 GPUs: 44 vCPU 8 GPUs: 92 vCPU	1 GPU: 11 vCPU 2 GPUs: 22 vCPU 4 GPUs: 46 vCPU 8 GPUs: 94 vCPU The sum of CPU requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 2 vCPU.
		Memory	1 GPU: 134 GiB 2 GPUs: 296 GiB 4 GPUs: 618 GiB 8 GPUs: 1250 GiB	1 GPU: 148 GiB 2 GPUs: 310 GiB 4 GPUs: 632 GiB 8 GPUs: 1264 GiB The sum of memory requests of all DaemonSets that run on an A100 (80GB) GPU node must not exceed 14 GiB.
		Ephemeral storage	1 GPU: 512 MiB 2 GPUs: 512 MiB 4 GPUs: 512 MiB 8 GPUs: 512 MiB	1 GPU: 280 GiB 2 GPUs: 585 GiB 4 GPUs: 1220 GiB 8 GPUs: 2540 GiB
NVIDIA L4 `nvidia-l4`	1 GPU: Between 1:3.5 and 1:4 2, 4, and 8 GPUs: Not enforced	CPU	1 GPU: 2 vCPU 2 GPUs: 21 vCPU 4 GPUs: 45 vCPU 8 GPUs: 93 vCPU	1 GPU: 31 vCPU 2 GPUs: 23 vCPU 4 GPUs: 47 vCPU 8 GPUs: 95 vCPU The sum of CPU requests of all DaemonSets that run on an L4 GPU node must not exceed 2 vCPU.
NVIDIA L4 `nvidia-l4`		Memory	1 GPU: 7 GiB 2 GPUs: 78 GiB 4 GPUs: 170 GiB 8 GPUs: 355 GiB	1 GPU: 115 GiB 2 GPUs: 83 GiB 4 GPUs: 177 GiB 8 GPUs: 363 GiB The sum of memory requests of all DaemonSets that run on an L4 GPU node must not exceed 14 GiB.
NVIDIA Tesla T4 `nvidia-tesla-t4`	Between 1:1 and 1:6.25	CPU	0.5 vCPU	1 GPU: 46 vCPU 2 GPUs: 46 vCPU 4 GPUs: 94 vCPU
NVIDIA Tesla T4 `nvidia-tesla-t4`	Between 1:1 and 1:6.25	Memory	0.5 GiB	1 GPU: 287.5 GiB 2 GPUs: 287.5 GiB 4 GPUs: 587.5 GiB

To learn how to request GPUs in your Autopilot Pods, refer to Deploy GPU workloads in Autopilot.

Resource requests for workload separation and extended duration

Autopilot lets you manipulate Kubernetes scheduling and eviction behavior using methods such as the following:

Use taints and tolerations and node selectors to ensure that certain Pods only get placed on specific nodes. For details, see Configure workload separation in GKE.
Use Pod anti-affinity to prevent Pods from co-locating on the same node. The default and minimum resource requests for workloads that use these methods to control scheduling behavior are higher than for workloads that don't.
Use an annotation to protect Pods from eviction caused by node auto-upgrades and scale-down events for up to seven days. For details, see Extend the run time of Autopilot Pods.

If your specified requests are less than the minimums, the behavior of Autopilot changes based on the method that you used, as follows:

Taints, tolerations, selectors, and extended duration Pods: Autopilot modifies your Pods to increase the requests when scheduling the Pods.
Pod anti-affinity: Autopilot rejects the Pod and displays an error message.

The following table describes the default requests and the minimum resource requests that you can specify. If a configuration or compute class isn't in this table, Autopilot doesn't enforce special minimum or default values.

Compute class	Resource	Default	Minimum
General-purpose	CPU	0.5 vCPU	0.5 vCPU
General-purpose	Memory	2 GiB	0.5 GiB
Balanced	CPU	2 vCPU	1 vCPU
Balanced	Memory	8 GiB	4 GiB
Scale-Out	CPU	0.5 vCPU	0.5 vCPU
Scale-Out	Memory	2 GiB	2 GiB

Init containers

Init containers run in serial and must complete before the application containers start. If you don't specify resource requests for your Autopilot init containers, GKE allocates the total resources available to the Pod to each init container. This behavior is different than in GKE Standard, where each init container can use any unallocated resources available on the node on which the Pod is scheduled.

Unlike application containers, GKE recommends that you don't specify resource requests for Autopilot init containers, so that each container gets the full resources available to the Pod. If you request less resources than the defaults, you constrain your init container. If you request more resources than the Autopilot defaults, you might increase your bill for the lifetime of the Pod.

Setting resource limits in Autopilot

Kubernetes lets you set both requests and limits for resources in your Pod specification. The behavior of your Pods changes depending on whether your limits are different than your requests, as described in the following table:

Values set	Autopilot behavior
`requests` equal to `limits`	Pods use the `Guaranteed` QoS class. Note: Ephemeral storage limits must always be explicitly set equal to requests. GKE modifies your Pods to enforce this rule.
`requests` set, `limits` not set	The behavior depends on whether your cluster supports bursting, as follows: Clusters that support bursting: Pods can burst into available burstable capacity. Clusters that don't support bursting: GKE sets the `limits` equal to the `requests` To check whether your cluster supports bursting, see Bursting availability in GKE.
`requests` not set, `limits` set	Autopilot sets `requests` to the value of `limits`, which is the default Kubernetes behavior. Before: resources: limits: cpu: "400m" After: resources: requests: cpu: "400m" limits: cpu: "400m"
`requests` less than `limits`	The behavior depends on whether your cluster supports bursting, as follows: Clusters that support bursting: Pods can burst up to the value specified in `limits`. Clusters that don't support bursting: GKE sets the `limits` equal to the `requests` To check whether your cluster supports bursting, see Bursting availability in GKE.
`requests` greater than `limits`	Autopilot sets `requests` to the value of `limits`. Before: resources: requests: cpu: "450m" limits: cpu: "400m" After: resources: requests: cpu: "400m" limits: cpu: "400m"
`requests` not set, `limits` not set	Autopilot sets `requests` to the default values for the compute class or hardware configuration. The behavior for `limits` depends on whether your cluster supports bursting, as follows: Clusters that support bursting: Autopilot doesn't set `limits`. Clusters that don't support bursting: GKE sets the `limits` equal to the `requests` To check whether your cluster supports bursting, see Bursting availability in GKE.

In most situations, set adequate resource requests and equal limits for your workloads.

For workloads that temporarily need more resources than their steady-state, like during boot up or during higher traffic periods, set your limits higher than your requests to let the Pods burst. For details, see Configure Pod bursting in GKE.

Automatic resource management in Autopilot

If your specified resource requests for your workloads are outside of the allowed ranges, or if you don't request resources for some containers, Autopilot modifies your workload configuration to comply with the allowed limits. Autopilot calculates resource ratios and the resource scale up requirements after applying default values to containers with no request specified.

Missing requests: If you don't request resources in some containers, Autopilot applies the default requests for the compute class or hardware configuration.
CPU:memory ratio: Autopilot scales up the smaller resource to bring the ratio within the allowed range.
Ephemeral storage: Autopilot modifies your ephemeral storage requests to meet the minimum amount required by each container. The cumulative value of storage requests across all containers cannot be more than the maximum allowed value. Prior to 1.28.6-gke.1317000, Autopilot scales down the requested ephemeral storage if the value exceeds the maximum. In version 1.28.6-gke.1317000 and later, Autopilot rejects your workload.
Requests below minimums: If you request fewer resources than the allowed minimum for the selected hardware configuration, Autopilot automatically modifies the Pod to request at least the minimum resource value.

By default, when Autopilot automatically scales a resource up to meet a minimum or default resource value, GKE allocates the extra capacity to the first container in the Pod manifest. In GKE version 1.27.2-gke.2200 and later, you can tell GKE to allocate the extra resources to a specific container by adding the following to the annotations field in your Pod manifest:

autopilot.gke.io/primary-container: "CONTAINER_NAME"

Replace CONTAINER_NAME with the name of the container.

Resource modification examples

The following example scenario shows how Autopilot modifies your workload configuration to meet the requirements of your running Pods and containers.

Single container with < 0.05 vCPU

Container number	Original request	Modified request
1	CPU: 30 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB	CPU: 50 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB

Multiple containers with total CPU < 0.05 vCPU

Container number	Original requests	Modified requests
1	CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB	CPU: 30 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB
2	CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB	CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB
3	CPU: 10 mvCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB	CPU: 10 mCPU Memory: 0.5 GiB Ephemeral storage: 10 MiB
Total Pod resources		CPU: 50 mCPU Memory: 1.5 GiB Ephemeral storage: 30 MiB

Single container with memory too low for requested CPU

In this example, the memory is too low for the amount of CPU (1 vCPU:1 GiB minimum). The minimum allowed ratio for CPU to memory is 1:1. If the ratio is lower than that, the memory request is increased.

Container number	Original request	Modified request
1	CPU: 4 vCPU Memory: 1 GiB Ephemeral storage: 10 MiB	CPU: 4 vCPU Memory: 4 GiB Ephemeral storage: 10 MiB
Total Pod resources		CPU: 4 vCPU Memory: 4 GiB Ephemeral storage: 10 MiB