Provision and use Local SSD-backed ephemeral storage


This page explains how to provision Local SSD storage on Google Kubernetes Engine (GKE) clusters, and how to configure workloads to consume data from Local SSD-backed ephemeral storage attached to nodes in your cluster.

To learn more about Local SSD support on GKE, see About Local SSD storage.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Create a cluster or node pool with Local SSD-backed ephemeral storage

Use the Google Cloud CLI to create a cluster or node pool with Local SSD-backed ephemeral storage.

Use the --ephemeral-storage-local-ssd option to attach fully-managed local ephemeral storage backed by Local SSD volumes. This storage is tied to the lifecycle of your Pods. When your Pods request ephemeral storage, GKE schedules them to run on nodes that have Local SSD volumes configured as ephemeral storage. If you want more specialized or granular control over your Local SSDs, we recommend using Local SSD-backed raw block storage instead.

If you have cluster autoscaling enabled, GKE autoscales your nodes when the cluster needs more ephemeral storage space. Your Pods can access data on Local SSD volumes through the emptyDir volume.

The gcloud CLI command you run to create the cluster or node pool depends on which machine series generation of your selected machine type. For example, N1 and N2 machine types belong to a first and second generation machine series respectively, while C3 machine types belong to a third generation machine series.

Create a cluster with Local SSD

1st or 2nd Generation

If you are using a machine type from a first or second generation machine series, create your cluster by specifying the --ephemeral-storage-local-ssd count=NUMBER_OF_DISKS option. This option provisions the specified number of Local SSD volumes on each node to use for kubelet ephemeral storage.

These settings apply to the default node pool only. If subsequent node pools need Local SSD, specify that during node pool creation.

To create a cluster running on GKE version 1.25.3-gke.1800 or later in which the default pool uses Local SSD volumes, run the following command:

gcloud container clusters create CLUSTER_NAME \
    --ephemeral-storage-local-ssd count=NUMBER_OF_DISKS \
    --machine-type=MACHINE_TYPE \
    --release-channel CHANNEL_NAME

Replace the following:

  • CLUSTER_NAME: the name of the cluster.
  • NUMBER_OF_DISKS: the number of Local SSD volumes to provision on each node. These volumes are combined into a single logical volume during node setup. The maximum number of volumes varies by machine type and region. Note that some Local SSD capacity is reserved for system use.
  • MACHINE_TYPE: the machine type to use. This field is required, as Local SSD cannot be used with the default e2-medium type.
  • CHANNEL_NAME: a release channel that includes GKE versions later than 1.25.3-gke.1800. If you prefer not to use a release channel, you can also use the --cluster-version flag instead of --release-channel, specifying a valid version later than 1.25.3-gke.1800. To determine the valid versions, use the gcloud container get-server-config command.

3rd Generation

If you use a machine type from a third generation machine series, you do not need to specify any Local SSD options when creating a cluster. The number of disks attached to each node depends on the machine type.

To create a cluster, run the following command:

gcloud container clusters create CLUSTER_NAME \
  --machine-type=MACHINE_TYPE \
  --cluster-version CLUSTER_VERSION

Replace the following:

  • CLUSTER_NAME: the name of the cluster.
  • MACHINE_TYPE: the machine type to use from a third generation machine series.
  • CLUSTER_VERSION: a GKE cluster version that supports Local SSD on machines types from a third generation machine series.

Create a node pool with Local SSD

1st or 2nd Generation

To create a node pool running on GKE version 1.25.3-gke.1800 or later that uses Local SSD volumes, run the following command:

gcloud container node-pools create POOL_NAME \
    --cluster=CLUSTER_NAME \
    --ephemeral-storage-local-ssd count=NUMBER_OF_DISKS \
    --machine-type=MACHINE_TYPE

Replace the following:

  • POOL_NAME: the name of your new node pool.
  • CLUSTER_NAME: the name of the cluster.
  • NUMBER_OF_DISKS: the number of Local SSD volumes to provision on each node. These volumes are combined into a single logical volume during node setup. The maximum number of volumes varies by machine type and region. Note that some Local SSD capacity is reserved for system use.
  • MACHINE_TYPE: the machine type to use. This field is required, as Local SSD cannot be used with the default e2-medium type.

3rd Generation

If you use a machine type from a third generation machine series, you do not need to specify any Local SSD options when creating a node pool. The number of volumes attached to each node depends on the machine type.

To create a node pool, run the following command:

gcloud container node-pools create POOL_NAME \
  --cluster=CLUSTER_NAME \
  --machine-type=MACHINE_TYPE \
  --node-version NODE_VERSION

Replace the following:

  • POOL_NAME: the name of the new node pool.
  • CLUSTER_NAME: the name of the cluster.
  • MACHINE_TYPE: the machine type to use from a third generation machine series.
  • NODE_VERSION: a GKE node pool version that supports Local SSD on machines types from a third generation machine series.

Nodes in the node pool are created with a cloud.google.com/gke-ephemeral-storage-local-ssd=true label. You can verify the labels by running the following command:

kubectl describe node NODE_NAME

Use Local SSD-backed ephemeral storage with Autopilot clusters

You can use Local SSD in the following Autopilot compute classes:

  • Performance
  • Accelerator

For the Performance class, follow the instructions to use Local SSDs in Performance class Pods.

For the Accelerator compute class, you can use Local SSD for ephemeral storage if using NVIDIA L4 GPUs, and running GKE patch version 1.28.6-gke.1369000 and later or 1.29.1-gke.1575000 and later. NVIDIA H100 (80GB) GPUs and NVIDIA A100 (80GB) GPUs always use Local SSDs for ephemeral storage, and you can't specify the following node selector for those GPUs.

To use Local SSD for ephemeral storage, add the cloud.google.com/gke-ephemeral-storage-local-ssd: "true" nodeSelector to your workload manifest. Your Pod specification should look similar to the following example:

apiVersion: v1
kind: Pod
metadata:
  name: l4-localssd-pod
spec:
  containers:
  - name: my-gpu-container
    image: nvidia/cuda:11.0.3-runtime-ubuntu20.04
    command: ["/bin/bash", "-c", "--"]
    args: ["while true; do sleep 600; done;"]
    resources:
      requests:
        cpu: 16
        memory: 64Gi
        ephemeral-storage: 800Gi
      limits:
       cpu: 16
       memory: 64Gi
       ephemeral-storage: 800Gi
       nvidia.com/gpu: 8
  nodeSelector:
    cloud.google.com/gke-accelerator: nvidia-l4
    cloud.google.com/gke-ephemeral-storage-local-ssd: "true"

Using the legacy API parameter

The --local-ssd-count option is a legacy API parameter that supports SCSI Local SSD. The Compute Engine third generation machine series does not support SCSI and only supports NVMe. You should only use this option with Windows Server clusters. If you are currently using the legacy API parameter on Linux clusters, we recommend that you use the --ephemeral-storage-local-ssd option instead.

Local SSD on Windows Server clusters

When you use Local SSD with your clusters running Windows Server node pools, you need to log in to the node and format the volume before using it. In the following example, the Local SSD volume is formatted with the NTFS file system. You can also create directories under the volume. In this example, the directories are under disk D.

PS C:\> Get-Disk | Where partitionstyle -eq 'raw' | Initialize-Disk -PartitionStyle MBR -PassThru | New-Partition -AssignDriveLetter -UseMaximumSize | Format-Volume -FileSystem ntfs -Confirm:$false
PS C:\> mkdir D:\test-ssd

Access Local SSD volumes

The following example shows how you can access Local SSD-backed ephemeral storage.

Ephemeral storage as an emptyDir volume

A GKE node pool can be configured to use Local SSD for ephemeral storage, including emptyDir volumes.

The following Pod manifest uses an emptyDir and a node selector of cloud.google.com/gke-ephemeral-storage-local-ssd. You can apply a similar technique for Deployment manifests or StatefulSet manifests.

When choosing the ephemeral storage resource request, take into account the Local SSD capacity reserved for system use.

apiVersion: v1
kind: Pod
metadata:
  name: POD_NAME
spec:
  containers:
    - name: CONTAINER_NAME
      image: "registry.k8s.io/pause"
      resources:
        requests:
          ephemeral-storage: "200Gi"
      volumeMounts:
        - mountPath: /cache
          name: scratch-volume
  nodeSelector:
    cloud.google.com/gke-ephemeral-storage-local-ssd: "true"
  volumes:
    - name: scratch-volume
      emptyDir: {}

Troubleshooting

For troubleshooting instructions, refer to Troubleshooting storage in GKE.

What's next