ComputeClass
is a Kubernetes Custom Resource Definition (CRD) that lets you
define configurations and fallback priorities for GKE node scaling
decisions. To learn more, see
About custom compute classes.
apiVersion: cloud.google.com/v1 kind: ComputeClass metadata: name: my-class spec: activeMigration: optimizeRulePriority: false autoscalingPolicy: consolidationDelayMinutes: 1 consolidationThreshold: 0 gpuConsolidationThreshold: 0 nodePoolAutoCreation: enabled: false priorityDefaults: nodeSystemConfig: linuxNodeConfig: sysctls: net.core.somaxconn: 256 kubeletConfig: cpuCfsQuota: true priorities: - machineFamily: n2 maxRunDurationSeconds: 360 minCores: 16 minMemoryGb: 64 reservations: affinity: Specific specific: - name: n2-shared-reservation project: reservation-project spot: true storage: bootDiskKMSKey: projects/example/locations/us-central1/keyRings/example/cryptoKeys/key-1 secondaryBootDisks: - diskImageName: pytorch-mnist project: k8s-staging-jobset mode: CONTAINER_IMAGE_CACHE nodeSystemConfig: linuxNodeConfig: sysctls: net.core.somaxconn: 512 - machineType: n2-standard-32 spot: true reservations: affinity: AnyBestEffort storage: bootDiskSize: 100 bootDiskType: pd-balanced localSSDCount: 1 - nodepools: ['example-first-nodepool-name', 'example-second-nodepool-name'] - gpu: count: 1 driverVersion: default type: nvidia-l4 - tpu: count: 8 topology: "2x4" type: tpu-v5-lite-device whenUnsatisfiable: ScaleUpAnyway status: conditions: - lastTransitionTime: 2024-10-10T00:00:00Z message: example-message observedGeneration: 1 reason: example-reason status: "True" type: example-type
ComputeClass specification
metadata: name: string spec: activeMigration: object(activeMigration) autoscalingPolicy: object(autoscalingPolicy) nodePoolAutoCreation: object(nodePoolAutoCreation) priorities: [ object(priorities) ] priorityDefaults: object(priorityDefaults) whenUnsatisfiable: string
Fields | |
---|---|
required |
A field that identifies the compute class. |
optional |
The name of the compute class. |
required |
The compute class specification, which defines how the compute class works. |
optional |
A specification that lets you choose whether GKE automatically replaces existing nodes that are lower in a compute class priority list with new nodes that are higher in that priority list. |
optional |
A specification that lets you fine-tune the timing and thresholds that cause GKE to remove underused nodes and consolidate workloads on other nodes. |
optional |
A specification that lets you choose whether GKE can create and delete node pools in Standard mode clusters based on the compute class priority rules. Requires node auto-provisioning to be enabled on the cluster. |
required |
A list of priority rules that defines how GKE configures nodes during scaling operations. When a cluster needs to scale up, GKE tries to create nodes that match the first priority rule in this field. If GKE can't create those nodes, it attempts the next rule. The process repeats until GKE successfully creates nodes or exhausts all of the rules in the list. |
optional |
Requires GKE version 1.32.1-gke.1729000 or later A specification that sets default values for specific fields that are
omitted in entries in the The values in the |
optional |
A specification that lets you define what GKE does if none of the rules in the
|
activeMigration
Choose whether GKE migrates workloads to higher priority nodes for the compute class as resources become available. For details, see Configure active migration to higher priority nodes.
activeMigration: optimizeRulePriority: boolean
Fields | |
---|---|
required |
Choose whether GKE migrates workloads to higher priority
nodes when resources are available. If you omit this field, the default
value is |
autoscalingPolicy
autoscalingPolicy: consolidationDelayMinutes: integer consolidationThreshold: integer gpuConsolidationThreshold: integer
Fields | |
---|---|
optional |
The number of minutes after which GKE removes underutilized nodes.
The value must be between |
optional |
The CPU and memory utilization threshold as a percentage of the total
resources on the node. A node becomes eligible for removal only when the
resource utilization is less than this threshold. The value must be between
|
optional |
The GPU utilization threshold as a percentage of the total GPU
resources on the node. A node becomes eligible for removal only when the
resource utilization is less than this threshold. The value must be between
Consider setting this value to either |
gpu
gpu: count: integer driverVersion: string type: string
Fields | |
---|---|
required |
The number of GPUs to attach to each node. The value must be at
least |
optional |
Requires GKE version 1.31.1-gke.1858000 or later The NVIDIA driver version to install. The supported values are as follows:
|
required |
The GPU type to attach to each node, such as |
hugepageConfig
hugepageConfig: hugepage_size1g: integer hugepage_size2m: integer
Fields | |
---|---|
optional |
The number of 1 GB huge pages to allocate. Huge pages are
a memory management feature that can improve performance for
memory-intensive applications. By using huge pages, the system can
reduce the overhead associated with Translation Lookaside Buffer
(TLB) misses. Allocating 1 GB huge pages is beneficial for
workloads that require large, contiguous memory allocations, such as
large databases or in-memory computing. If you specify this field, the
value must be at least is |
optional |
Number of 2 MB huge pages to allocate. Similar to 1 GB
huge pages, 2 MB huge pages can also improve performance by reducing TLB
misses. However, 2 MB huge pages are suitable for applications with
smaller large memory requirements. They provide a balance between
performance improvement and memory flexibility. If you specify this
field, the value must be at least |
kubeletConfig
kubeletConfig: cpuCfsQuota: boolean cpuCfsQuotaPeriod: string cpuManagerPolicy: string podPidsLimit: integer
Fields | |
---|---|
optional |
Enables CPU Completely Fair Scheduler (CFS) quota enforcement for containers that specify CPU limits. The following values are supported:
The default value is |
optional |
Sets the value, in microseconds, of the CFS quota period. This period defines how often the kernel reallocates CPU resources to each control group (cgroup). You can use this value to tune CPU throttling behavior. The value must be between |
optional |
Controls the kubelet CPU management policy. Some Pods are more sensitive to CPU limits. In these Pods, the latency of CPU reassignment during CFS quota enforcement might impact performance. The kubelet CPU Manager feature provides exclusive CPU access to specific Pods. The kubelet doesn't enforce CPU limits for those Pods even if CPU CFS quota enforcement is enabled. The following values are supported:
The default value is For more information, see the Kubernetes documentation about policies for assigning CPUs to Pods. |
optional |
Sets the maximum number of process IDs (PIDs) that each Pod can use.
This setting controls the maximum number of processes and threads that can run
simultaneously in a Pod. The value must be between |
linuxNodeConfig
linuxNodeConfig: hugepageConfig: object (hugepageConfig) sysctls: object (sysctls)
Fields | |
---|---|
optional |
The huge pages configuration for the node. |
optional |
The sysctls configuration for the node. |
nodePoolAutoCreation
nodePoolAutoCreation: enabled: boolean
Fields | |
---|---|
optional |
Choose whether GKE can create and delete node pools in Standard mode
clusters based on the compute class priority rules. Requires node auto-provisioning
to be enabled on the cluster. If you omit
this field, the default value is |
nodeSystemConfig
nodeSystemConfig: kubeletConfig: object(kubeletConfig) linuxNodeConfig: object(linuxNodeConfig)
Fields | |
---|---|
optional |
The kubelet configuration for the node. |
optional |
The Linux kernel configuration for the node. |
priorities
- gpu: object(gpu) spot: boolean - machineFamily: string maxRunDurationSeconds: integer minCores: integer minMemoryGb: integer reservations: object(reservations) spot: boolean storage: object(storage) nodeSystemConfig: object(nodeSystemConfig) - machineType: string maxRunDurationSeconds: integer reservations: object(reservations) spot: boolean storage: object(storage) nodeSystemConfig: object(nodeSystemConfig) - nodepools: []string - tpu: object(tpu) reservations: object(reservations) spot: boolean storage: object(storage)
Fields | |
---|---|
optional |
The GPU configuration. |
optional |
The Compute Engine machine series
to use, such as |
optional |
The predefined Compute Engine machine type to use, such as
|
optional |
The maximum number of Pods that GKE can place on each
node. The value must be between |
optional |
The maximum duration, in seconds, that the nodes can exist before being shut down. If you omit this field, the nodes can exist indefinitely. |
optional |
The minimum number of vCPU cores that each node can have. If you omit
this field, the default value is |
optional |
The minimum memory capacity, in GiB, that each node can have. If you omit
this field, the default value is |
optional |
A list of existing manually created node pools in Standard mode clusters. You must associate these node pools with the compute class by using node labels and node taints. GKE doesn't process the node pools in this list in any order. Example: |
optional |
The node system configuration. |
optional |
The Compute Engine capacity reservations to consume during node provisioning. |
optional |
The Spot VMs configuration. If you set this field to |
optional |
The boot disk configuration of each node. |
optional |
The TPU configuration. |
optional |
Requires GKE version 1.32.1-gke.1729000 or later The node system configuration. |
priorityDefaults
priorityDefaults: nodeSystemConfig: object(nodeSystemConfig)
Fields | |
---|---|
optional |
Default values for the node system configuration. These values apply
to a priority rule in the |
reservations
reservations: affinity: string specific: [ object(specific) ]
Fields | |
---|---|
required |
The type of reservation to consume when creating nodes. The following values are supported:
|
optional* |
The parameters for consuming specific reservations. If you set the
|
secondaryBootDisks
secondaryBootDisks: - diskImageName: string mode: string project: string
Fields | |
---|---|
required |
The name of the disk image. |
optional |
The mode in which the secondary boot disk should be used. The following values are supported:
|
optional |
The project ID of the Google Cloud project that the disk image belongs to. If you omit this field, the default value is the project ID of the cluster project. |
specific
specific: - name: string project: string
Fields | |
---|---|
required |
The name of the specific reservation to consume. |
optional |
The project ID of the Google Cloud project that contains the specific reservation. To use a shared reservation from a different project, this field is required. |
storage
storage: bootDiskKMSKey: string bootDiskSize: integer bootDiskType: string localSSDCount: integer secondaryBootDisks: [ object(secondarybootdisks) ]
Fields | |
---|---|
optional |
The path to the Cloud KMS key to use to encrypt the boot disk. |
optional |
The size, in GiB, of the boot disk for each node. The minimum value is
|
optional |
The type of disk to attach to the node. The value that you specify must be supported by the machine series or the machine type in your priority rule. The following values are supported:
For details about the disk types that specific machine series support, see the Machine series comparison table. Filter the table properties for "Hyperdisk" and "PD". |
optional |
The number of Local SSDs to attach to each node. If you specify this
field, the minimum value is |
optional |
Requires GKE version 1.31.2-gke.1105000 or later The configuration of secondary boot disks that are used to preload nodes with data, such as ML models or container images. |
sysctls
sysctls: net.core.netdev_max_backlog: integer net.core.rmem_max: integer net.core.wmem_default: integer net.core.wmem_max: integer net.core.optmem_max: integer net.core.somaxconn: integer net.ipv4.tcp_rmem: string net.ipv4.tcp_wmem: string net.ipv4.tcp_tw_reuse: integer net.core.busy_poll: integer net.core.busy_read: integer net.ipv6.conf.all.disable_ipv6: boolean net.ipv6.conf.default.disable_ipv6: boolean vm.max_map_count: integer
Fields | |
---|---|
optional |
The maximum number of packets, queued on the INPUT side, when the
interface receives packets faster than the kernel can process them. This
setting is crucial for high-traffic network interfaces. Increasing this
value can help to prevent packet loss under heavy load, but it also
increases memory consumption. The value must be between |
optional |
The maximum receive socket buffer size in bytes. This setting limits
the amount of data that a socket can buffer when receiving data.
Increasing this value can improve performance for high-bandwidth
connections by allowing the socket to handle larger bursts of data.
The value must be between |
optional |
The default setting, in bytes, of the socket send buffer. This value
defines the initial size of the buffer allocated for sending data on a
socket. The value must be between |
optional |
The maximum send socket buffer size in bytes. This setting limits the
amount of data that a socket can buffer when it sends data. Increasing
this value can improve performance for applications that send large
amounts of data. The value must be between |
optional |
The maximum ancillary buffer size allowed per socket. Ancillary data
is a sequence of
cmsghdr structures with
appended data, which are used to send and receive control information along with
socket data. The value must be between |
optional |
The maximum size of the socket |
optional |
The minimum size, in bytes, of the receive buffer that's used by TCP
sockets
in moderation. Each TCP socket can use the size for receiving
data, even if the total pages of UDP sockets exceed |
optional |
The minimal size, in bytes, of send buffer that's used by TCP sockets in
moderation. Each TCP socket can use the size for sending data,
even if the total pages of TCP sockets exceed |
optional |
Reuse
|
optional |
The approximate time, in microseconds, to wait for packets on the
device queue to do socket
polls or
selects.
This setting is related to network performance tuning for low-latency
applications. The value must be between |
optional |
The approximate time, in microseconds, to wait for packets on the
device queue to perform
read
operations. This setting is used for low-latency network tuning, specifically
for read operations. The value must be between |
optional |
Globally disables IPv6 on all future and existing
interfaces. Changing this value has the same effect as changing the
|
optional |
Disable IPv6 operations in all future network interfaces. The following values are supported:
|
optional |
Limit the number of distinct memory regions that a process can map into its address space. You might need to increase this value for applications that require a large number of shared libraries or that perform extensive memory mapping. The value must be between |
tpu
- tpu: count: integer topology: string type: string
Fields | |
---|---|
required |
The number of TPUs to attach to the node. |
required |
The TPU topology
to use, such as |
required |
The TPU type
to use, such as |
ComputeClass status
The status
field is a list of status messages. This field is informational and
is updated by the Kubernetes API server and the kubelet on each node.
status: conditions: [ object(conditions) ]
Fields | |
---|---|
|
List of status conditions for the |
conditions
conditions: - type: string status: boolean reason: string message: string lastTransitionTime: string observedGeneration: integer
Fields | |
---|---|
|
The type of condition, which helps to organize status messages. |
|
The status of the condition. The value is one of the following:
|
|
A machine-readable reason why a specific condition type made its most recent transition. |
|
A human-readable message that provides details about the most recent transition. This field might be empty. |
|
The timestamp of the most recent change to the condition. |
|
A count of how many times the ComputeClass controller observed a change
to the |