Google Kubernetes Engine (GKE) Pod snapshots help improve workload startup latency by restoring snapshots of running Pods. A Pod snapshot saves the entire Pod state, including memory and changes to the root file system. When new replicas are created, instead of initializing the Pod from a fresh state, the snapshot is restored. The Pod then resumes execution from the point the snapshot was taken.
This document explains how to enable and configure GKE Pod snapshots for your workloads.
For more information about how Pod snapshots work, see About Pod snapshots.
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.
Enable Pod snapshots
To enable Pod snapshots, first create or update a cluster with the Pod snapshot feature enabled. Then, create or update a node pool to run in GKE Sandbox.
To enable the feature on a cluster, complete one of the following steps:
To enable Pod snapshots on a new cluster, run the following command:
gcloud beta container clusters create CLUSTER_NAME \ --enable-pod-snapshots \ --cluster-version=CLUSTER_VERSION \ --workload-pool=PROJECT_ID.svc.id.goog \ --workload-metadata=GKE_METADATAReplace the following:
CLUSTER_NAME: the name of your cluster.CLUSTER_VERSION: the version of your new cluster, which must be 1.34.1-gke.3084001 or later.PROJECT_ID: your project ID.
To enable Pod snapshots on an existing cluster, complete the following steps:
Update the cluster to version 1.34.1-gke.3084001 or later:
gcloud container clusters upgrade CLUSTER_NAME \ --node-pool=NODEPOOL_NAME \ --cluster-version=CLUSTER_VERSIONReplace the following:
CLUSTER_NAME: the name of your cluster.NODEPOOL_VERSION: the name of your nodepool.CLUSTER_VERSION: the version to update your new cluster, which must be 1.34.1-gke.3084001 or later.
Enable Pod snapshots on your cluster:
gcloud container clusters update CLUSTER_NAME \ --workload-pool=PROJECT_ID .svc.id.goog" \ --enable-pod-snapshotsReplace
PROJECT_IDwith your project ID.
Enable GKE Sandbox on your Standard cluster:
gcloud container node-pools create NODE_POOL_NAME \ --cluster=CLUSTER_NAME \ --node-version=NODE_VERSION \ --machine-type=MACHINE_TYPE \ --image-type=cos_containerd \ --sandbox type=gvisorReplace the following variables:
NODE_POOL_NAME: the name of your new node pool.NODE_VERSION: the version to use for the node pool.MACHINE_TYPE: the type of machine to use for the nodes.
For more information about using gVisor, see Isolate your workloads using GKE Sandbox.
Store snapshots
Pod snapshots are stored in a Cloud Storage bucket, which contains the memory and (optionally) GPU state. Pod snapshots require Workload Identity Federation for GKE to enable and use the Pod's service account to authenticate to Cloud Storage.
Pod snapshots require the following configuration for the bucket:
- Hierarchical namespaces: must be enabled to allow for higher read and write queries per second. Hierarchical namespaces also requires that uniform bucket-level access is enabled.
- Soft delete: because Pod snapshots use parallel composite uploads, you should disable data protection features like soft delete. If left enabled, the deletions of the temporary objects can increase your storage bill significantly.
- Location: Cloud Storage bucket location must be the same location as the GKE cluster because performance might be impacted if snapshots are transferred across different regions.
Create Cloud Storage bucket
To create the bucket and the permissions required, complete the following steps:
Create a Cloud Storage bucket. The following command creates a bucket with the required configuration:
gcloud storage buckets create "gs://BUCKET_NAME" \ --uniform-bucket-level-access \ --enable-hierarchical-namespace \ --soft-delete-duration=0d \ --location="LOCATION"Replace the following:
BUCKET_NAME: the name of your bucket.LOCATION: the location of your bucket.
For a complete list of options for bucket creation, see
buckets createoptions.
Grant workloads permission to access the Cloud Storage bucket
By default, GKE doesn't have permissions to access Cloud Storage. To read and write snapshot files, you must grant IAM permissions to the Kubernetes service account (KSA) used by your workload Pods.
Get credentials so that you can communicate with your cluster with
kubectlcommands:gcloud container clusters get-credentials "CLUSTER_NAME"For each Pod, complete the following steps:
Create a KSA for each Pod:
kubectl create serviceaccount "KSA_NAME" \ --namespace "NAMESPACE"Replace the following:
KSA_NAME: the name of your KSA.NAMESPACE: the namespace for your Pods.
Grant the KSA permission to access the bucket:
gcloud storage buckets add-iam-policy-binding "gs://BUCKET_NAME" \ --member="principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/NAMESPACE/sa/KSA_NAME" \ --role="roles/storage.bucketViewer"gcloud storage buckets add-iam-policy-binding "gs://BUCKET_NAME" \ --member="principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/NAMESPACE/sa/KSA_NAME" \ --role="roles/storage.objectUser"Replace the following:
PROJECT_NUMBER: your project number.PROJECT_ID: your project ID.
(Optional) Create managed folders for the Cloud Storage bucket
Creating folders lets you isolate permissions for snapshots from mutually-untrusted Pods, which is useful in multi-tenant use cases. To set up managed folders, complete the following steps:
Create a custom IAM role that contains only the necessary permissions for Pod snapshots:
gcloud iam roles create podSnapshotGcsReadWriter \ --project="PROJECT_ID" \ --permissions="storage.objects.get,storage.objects.create,storage.objects.delete,storage.folders.create"Grant the
roles/storage.bucketViewerrole to all KSAs in the target namespace. This role lets KSAs read bucket metadata, but does not grant read or write permissions to objects in the bucket.gcloud storage buckets add-iam-policy-binding "gs://BUCKET_NAME" \ --member="principalSet://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/namespace/NAMESPACE" \ --role="roles/storage.bucketViewer"Replace the following:
PROJECT_NUMBER: your project number.PROJECT_ID: your project ID.
For each KSA that needs to store Pod snapshots, complete the following steps:
Create a managed folder for the KSA:
gcloud storage managed-folders create "gs://BUCKET_NAME/FOLDER_PATH/"Replace
FOLDER_PATHwith the path for the managed folder, for examplemy-app-snapshots.Grant the KSA the custom
podSnapshotGcsReadWriterrole on the managed folder:gcloud storage managed-folders add-iam-policy-binding "gs://BUCKET_NAME/FOLDER_PATH/" \ --member="principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/NAMESPACE/sa/KSA_NAME" \ --role="projects/PROJECT_ID/roles/podSnapshotGcsReadWriter"Replace
KSA_NAMEwith the name of the KSA.
Configure storage for snapshots
To specify where to store snapshot files, create a PodSnapshotStorageConfig
resource.
The following example configures GKE to store Pod snapshots in the
FOLDER_PATH/path inside the Cloud Storage bucket BUCKET_NAME. Save the following manifest asexample-pod-snapshot-storage-config:apiVersion: podsnapshot.gke.io/v1alpha1 kind: PodSnapshotStorageConfig metadata: name: example-pod-snapshot-storage-config namespace: NAMESPACE spec: snapshotStorageConfig: gcs: bucket: "BUCKET_NAME" path: "FOLDER_PATH"Replace the following:
NAMESPACE: the namespace for your Pods. By default, this isdefault.BUCKET_NAME: the name of your Cloud Storage bucket.FOLDER_PATH: the path for the Cloud Storage managed folder.
Apply the manifest:
kubectl apply -f example-pod-snapshot-storage-config.yaml
Create a snapshot policy
To enable snapshots for a Pod, create a PodSnapshotPolicy resource with a
selector that matches the Pod's labels.
The following example creates a policy that applies to Pods with the
app: my-applabel and uses theexample-pod-snapshot-storage-configstorage configuration. Save the following manifest asexample-pod-snapshot-policy.yaml:apiVersion: podsnapshot.gke.io/v1alpha1 kind: PodSnapshotPolicy metadata: name: example-pod-snapshot-policy namespace: NAMESPACE spec: storageConfigName: example-pod-snapshot-storage-config selector: matchLabels: app: my-app triggerConfig: type: workload postCheckpoint: resumeApply the manifest:
kubectl apply -f example-pod-snapshot-policy.yaml --namespace NAMESPACE
Optimize snapshot size
When a Pod snapshot is triggered, gVisor captures the entire state of all containers, including:
- Application state, such as memory and registers
- Changes to the root file system and
tmpfs(includingemptyDirvolumes) - Kernel state, such as open file descriptors, threads, and sockets
The size of the snapshot is determined by these factors. Larger snapshots take longer to save and restore. To optimize performance, before triggering a snapshot, you should clean up any application state or files that aren't required after the Pod is restored from the snapshot.
Optimizing snapshot size is particularly important for workloads like large
language models (LLMs). LLM servers often download model weights into local
storage (rootfs or tmpfs) before loading them into the GPU. When a snapshot
is taken, both the GPU state and the model weight files are saved. In this
scenario, if the model is 100 GB, the resulting snapshot is roughly 200 GB (100 GB
of model files, plus 100 GB representing the GPU state). After the model
weights are loaded into the GPU, the files on the file system are often not
needed for the application to run. By deleting these model files before you
trigger the snapshot, you can reduce the snapshot size by half and restore the
application with significantly lower latency.
Trigger a snapshot from a workload
To trigger a snapshot from within your application code, configure your
application to send a signal when it's ready for a snapshot. To signal
readiness, write 1 to the /proc/gvisor/checkpoint file, for example
echo 1 > /proc/gvisor/checkpoint. The write operation starts the snapshot process
asynchronously and returns immediately. Reading from the same file descriptor
will block the reading process until both the snapshot and restore is complete
and the workload is ready to resume.
The exact usage will vary depending on your application, but the following example shows a snapshot trigger for a Python application. To trigger a Snapshot from this example workload, complete the following steps:
Save the following manifest as
my-app.yaml:apiVersion: v1 kind: Pod metadata: name: my-app namespace: NAMESPACE labels: app: my-app spec: serviceAccountName: KSA_NAME runtimeClassName: gvisor containers: - name: my-container image: python:3.10-slim command: ["python3", "-c"] args: - | import time def trigger_snapshot(): try: with open("/proc/gvisor/checkpoint", "r+") as f: f.write("1") res = f.read().rstrip() print(f"GKE Pod Snapshot: {res}") except FileNotFoundError: print("GKE Pod Snapshot file does not exist -- Pod Snapshots is disabled") return except OSError as e: return e i = 0 while True: print(f"Count: {i}", flush=True) if (i == 20): #simulate the application being ready to snapshot at 20th count trigger_snapshot() i += 1 time.sleep(1) resources: limits: cpu: "500m" memory: "512Mi" requests: cpu: "250m" memory: "256Mi"Deploy the application:
kubectl apply -f my-app.yamlTo verify that a snapshot was taken, check the event history for the
GKEPodSnapshottingevent:kubectl get events -o \ custom-columns=NAME:involvedObject.name,CREATIONTIME:.metadata.creationTimestamp,REASON:.reason,MESSAGE:.message \ --namespace NAMESPACE \ --field-selector involvedObject.name=my-app,reason=GKEPodSnapshottingThe output resembles the following:
NAME CREATIONTIME REASON MESSAGE default/5b449f9c7c-bd7pc 2025-11-05T16:25:11Z GKEPodSnapshotting Successfully checkpointed the pod to PodSnapshot
Manage snapshots
When you create a Pod snapshot, a PodSnapshot CRD resource is created to
store the Pod's state at that time.
To view all PodSnapshot resources in a namespace, run the following command:
kubectl get podsnapshots.gke.io --namespace NAMESPACE
The output resembles the following:
NAME STATUS POLICY AGE
de334898-1e7a-4cdb-9f2e-7cc2181c29e4 AllSnapshotsAvailable example-policy 47h
Restore a workload from a snapshot
To restore your workload from the latest snapshot, you can delete the existing Pod after a snapshot is taken, and then re-deploy the Pod. Alternatively, you can deploy a new Pod with an identical specification. GKE automatically restores the Pod from the matching snapshot.
The following steps show how a Pod is restored from a matching snapshot by deleting and re-deploying the Pod:
Delete the Pod:
kubectl delete -f POD_NAME.yamlReplace
POD_NAMEwith the name of your Pod, for examplemy-app.Re-apply the Pod:
kubectl apply -f POD_NAME.yamlView the logs to confirm snapshot restore:
kubectl logs my-app --namespace NAMESPACEThe output depends on how you've configured your application. In the example application, the logs show
GKE Pod Snapshot: restorewhen a restore operation occurs.
Disable snapshots
Removing the PodSnapshotPolicy CRD prevents Pods from being snapshotted and
restored. Running Pods are unaffected by the resource deletion. However, if you
delete the policy while a Pod is being saved or restored, the Pod might enter a
failed state.
To disable snapshotting and restoration for new Pods governed by a policy,
delete the PodSnapshotPolicy by running the following command:
kubectl delete podsnapshotpolicies.podsnapshot.gke.io SNAPSHOT_POLICY --namespace=NAMESPACE
Replace SNAPSHOT_POLICY with the name of the
PodSnapshotPolicy that you want to delete, for example
example-pod-snapshot-policy.
You can also delete a specific PodSnapshot resource so that Pods are no longer
restored from that specific snapshot. Deleting the
PodSnapshot resource also removes the files stored in Cloud Storage.
To prevent a specific snapshot from being used for future restorations,
delete the PodSnapshot object by running the following command:
kubectl delete podsnapshots.podsnapshot.gke.io POD_SNAPSHOT_NAME --namespace=NAMESPACE
Replace POD_SNAPSHOT_NAME with the name of the
snapshot that you want to delete, for example example-podsnapshot.
What's next
- Learn more about Pod snapshot concepts.
- Refer to the Pod snapshot custom resource definitions (CRDs):