Use Image streaming to pull container images


This page shows you how to use Image streaming in Google Kubernetes Engine (GKE) to pull container images by streaming the image data as your applications need it.

New Autopilot clusters that run GKE version 1.25.5-gke.1000 and later automatically use Image streaming to pull eligible images. The instructions on this page only apply to Standard clusters.

Overview

Image streaming is a method of pulling container images in which GKE streams data from eligible images as requested by your applications. You can use Image streaming to allow your workloads to initialize without waiting for the entire image to download, which leads to significant improvements in initialization times. The shortened pull time provides you with benefits including the following:

  • Faster autoscaling
  • Reduced latency when pulling large images
  • Faster Pod startup

With Image streaming, GKE uses a remote filesystem as the root filesystem for any containers that use eligible container images. GKE streams image data from the remote filesystem as needed by your workloads. Without Image streaming, GKE downloads the entire container image onto each node and uses it as the root filesystem for your workloads.

While streaming the image data, GKE downloads the entire container image onto the local disk in the background and caches it. GKE then serves future data read requests from the cached image.

When you deploy workloads that need to read specific files in the container image, the Image streaming backend serves only those requested files.

Requirements

You must meet the following requirements to use Image streaming in GKE Autopilot and Standard clusters:

  • You must enable the Container File System API.

    Enable Container File System API

  • New Autopilot clusters must run GKE version 1.25.5-gke.1000 or later to have Image streaming automatically enabled. For instructions, refer to Set the version and release channel of a new Autopilot cluster.

  • New and existing GKE Standard clusters must run version 1.18.6-gke.4801 or later.

  • You must use the Container-Optimized OS with containerd node image. Autopilot nodes always use this node image.

  • Your container images must be stored in Artifact Registry.

  • The Artifact Registry repository must be in the same region as your GKE nodes, or in a multi-region that corresponds with the region where your nodes are running. For example:

    • If your nodes are in us-east1, Image streaming is available for repositories in the us-east1 region or the us multi-region since both GKE and Artifact Registry are running in data center locations within the United States.
    • If your nodes are in the northamerica-northeast1 region, the nodes are running in Canada. In this situation, Image streaming is only available for repositories in the same region.
  • If you enable private nodes on your cluster, you must enable Private Google Access on the subnet for your nodes to access the Image streaming Service.

  • If VPC Service Controls protects your container images and you use Image streaming, you must also include the Image streaming API (containerfilesystem.googleapis.com) in the service perimeter.

  • If the GKE nodes in the cluster don't use the default service account, you must ensure that your custom service account has the Service Usage Consumer (roles/serviceusage.serviceUsageConsumer) IAM role in the project that hosts the container image.

Limitations

  • You can't use a Secret to pull container images on GKE versions prior to 1.23.5-gke.1900.
  • Container images that use the V2 Image Manifest, schema version 1 are not eligible.
  • Container images encrypted with customer-managed encryption keys (CMEK) are eligible for Image streaming on GKE version 1.25.3-gke.1000 or later. In previous versions, GKE downloads these images without streaming the data. You can still use CMEK to protect attached persistent disks and custom boot disks in clusters that use Image streaming.
  • Container images with duplicate layers are not supported. GKE downloads these images without streaming the data. Check your container image for empty layers or duplicate layers.
  • If your workloads read many files in an image during initialization, you might notice increased initialization times because of the latency added by the remote file reads.
  • You might not notice the benefits of Image streaming during the first pull of an eligible image. However, after Image streaming caches the image, future image pulls on any cluster benefit from Image streaming.
  • GKE Standard clusters use the cluster-level configuration to determine whether to enable Image streaming on new node pools created using node auto-provisioning. However, you cannot use workload separation to create node pools with Image streaming enabled when Image streaming is disabled at the cluster level.
  • Linux file capabilities such as CAP_NET_RAW are supported with Image streaming in GKE version 1.22.6-gke.300 and later. For previous GKE versions, these capabilities are not available when the image file is streamed, or when the image is saved to the local disk. To avoid potential disruptions, do not use Image streaming for containers with these capabilities in GKE versions prior to 1.22.6-gke.300. If your container relies on Linux file capabilities, it might fail to start with permission denied errors when running with Image streaming enabled.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Enable Image streaming on clusters

You can enable Image streaming on new or existing Standard clusters by using the gcloud CLI --enable-image-streaming flag, or using the Google Cloud console. By default, node pools in the cluster inherit the Image streaming setting at the cluster level. You can change this behaviour by enabling or disabling Image streaming on node pools in the cluster.

All new Autopilot clusters that run GKE version 1.25.5-gke.1000 and later use Image streaming to pull eligible images. For instructions, refer to Set the version and release channel of a new Autopilot cluster. The following instructions only apply to GKE Standard clusters.

On a new cluster

You can enable Image streaming on new clusters using the gcloud CLI or the Google Cloud console.

gcloud

To create a new cluster with Image streaming enabled, run the following command:

gcloud container clusters create CLUSTER_NAME \
    --zone=COMPUTE_ZONE \
    --image-type="COS_CONTAINERD" \
    --enable-image-streaming

Replace the following:

  • CLUSTER_NAME: the name of your new cluster.
  • COMPUTE_ZONE: the Compute Engine zone for your new cluster. For regional clusters, use the --region=COMPUTE_REGION flag instead. Ensure that the zone or region is the same region or is within the multi-region of the Artifact Registry repository that contains the image.

Console

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click Create.

  3. In the GKE Standard section, click Configure.

  4. From the navigation pane, under Cluster, click Features.

  5. In the Other section, select the Enable Image streaming checkbox.

  6. Configure the cluster as needed, and then click Create.

On an existing cluster

You can enable Image streaming on existing clusters that meet the requirements using either the gcloud CLI or the Google Cloud console.

gcloud

To update an existing cluster to use Image streaming, run the following command using the gcloud CLI:

gcloud container clusters update CLUSTER_NAME \
    --enable-image-streaming

Console

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click the name of the cluster you want to modify.

  3. On the Clusters page, in the Features section, click next to Image streaming.

  4. In the Edit Image streaming dialog box, select the Enable Image streaming checkbox.

  5. Click Save changes.

After you modify the cluster, GKE enables Image streaming on your existing node pools automatically by default. If you explicitly enabled or disabled Image streaming on individual node pools, those node pools don't inherit the changes to the cluster-level setting.

Changing the Image streaming setting respects maintenance availability when updated at the cluster level, but not at the node pool level.

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy and respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.

Verify Image streaming is enabled on a cluster

You can check whether Image streaming is enabled at the cluster level using either the gcloud CLI or the Google Cloud console.

gcloud

Run the following command:

gcloud container clusters describe CLUSTER_NAME \
    --flatten "nodePoolDefaults.nodeConfigDefaults"

The setting is enabled if the output is similar to the following:

gcfsConfig:
  enabled: true
...

The setting is disabled if the output is similar to the following:

gcfsConfig: {}
...

Console

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click the name of the cluster you want to check.

  3. On the Clusters page, in the Features section, next to Image streaming it will show whether the setting is enabled.

Enable Image streaming on node pools

By default, node pools inherit the Image streaming setting at the cluster level. You can enable or disable Image streaming on specific node pools using the gcloud CLI.

On a new node pool

To create a new node pool with Image streaming enabled, run the following command:

gcloud container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --zone=COMPUTE_ZONE \
    --image-type="COS_CONTAINERD" \
    --enable-image-streaming

Replace the following:

  • NODE_POOL_NAME: the name of your new node pool.
  • CLUSTER_NAME: the name of the cluster for the node pool.
  • COMPUTE_ZONE: the Compute Engine zone of your cluster. For regional clusters, use the --region=COMPUTE_REGION flag instead.

On an existing node pool

You can enable Image streaming on existing node pools that meet the requirements.

To update an existing node pool to use Image streaming, run the following command:

gcloud container node-pools update POOL_NAME \
    --cluster=CLUSTER_NAME \
    --enable-image-streaming

Changing the Image streaming setting respects maintenance availability when updated at the cluster level, but not at the node pool level.

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.

Verify Image streaming is enabled on a node pool

Check whether Image streaming is enabled for a node pool:

gcloud container node-pools describe POOL_NAME \
    --cluster=CLUSTER_NAME \

The setting is enabled if the output is similar to the following:

gcfsConfig:
  enabled: true
...

The setting is disabled if the output is similar to the following:

gcfsConfig: {}
...

Schedule a workload using Image streaming

After you enable Image streaming on your cluster, GKE automatically uses Image streaming when pulling eligible container images from Artifact Registry without requiring further configuration.

GKE adds the cloud.google.com/gke-image-streaming: "true" label to nodes in node pools with Image streaming enabled. On GKE Standard, if you enable or disable Image streaming on specific node pools so that your cluster has a mix of nodes that use Image streaming and nodes that don't, you can use node selectors in your deployments to control whether GKE schedules your workloads on nodes that use Image streaming.

In the following example, you schedule a Deployment that uses a large container image on a cluster with Image streaming enabled. You can then optionally compare the performance to an image pull without Image streaming enabled.

  1. Create a new cluster with Image streaming enabled:

    gcloud container clusters create CLUSTER_NAME \
        --zone=COMPUTE_ZONE \
        --enable-image-streaming \
        --image-type="COS_CONTAINERD"
    
  2. Get credentials for the cluster:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --zone=COMPUTE_ZONE
    
  3. Save the following manifest as frontend-deployment.yaml:

    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: frontend
    spec:
      replicas: 1
      selector:
        matchLabels:
          app: guestbook
          tier: frontend
      template:
        metadata:
          labels:
            app: guestbook
            tier: frontend
        spec:
          containers:
          - name: php-redis
            image: us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5
            env:
            - name: GET_HOSTS_FROM
              value: "dns"
            resources:
              requests:
                cpu: 100m
                memory: 100Mi
            ports:
            - containerPort: 80
    

    The gb-frontend container image is 327 MB in size.

  4. Apply the manifest to your cluster:

    kubectl apply -f frontend-deployment.yaml
    
  5. Verify that GKE created the Deployment:

    kubectl get pods -l app=guestbook
    

    The output is similar to the following:

    NAMESPACE    NAME                          READY    STATUS       RESTARTS    AGE
    default      frontend-64bcc69c4b-pgzgm     1/1      Completed    0           3s
    
  6. Get the Kubernetes event log to see image pull events:

    kubectl get events --all-namespaces
    

    The output is similar to the following:

    NAMESPACE  LAST SEEN  TYPE    REASON          OBJECT                                                 MESSAGE
    default    11m        Normal  Pulling         pod/frontend-64bcc69c4b-pgzgm                          Pulling image "us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5"
    default    11m        Normal  Pulled          pod/frontend-64bcc69c4b-pgzgm                          Successfully pulled image "us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5" in 1.536908032s
    default    11m        Normal  ImageStreaming  node/gke-riptide-cluster-default-pool-f1552ec4-0pjv    Image us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5 is backed by image streaming.
    ...
    

    In this output:

    • The Pulled event shows the time taken for Image streaming to pull the image.
    • The ImageStreaming event shows that the node uses Image streaming to serve the container image.

Compare performance with standard image pulls

In this optional example, you create a new cluster with Image streaming disabled and deploy the frontend Deployment to compare performance with Image streaming.

  1. Create a new cluster with Image streaming disabled:

    gcloud container clusters create CLUSTER2_NAME\
        --zone=COMPUTE_ZONE \
        --image-type="COS_CONTAINERD"
    
  2. Get credentials for the cluster:

    gcloud container clusters get-credentials CLUSTER2_NAME \
        --zone=COMPUTE_ZONE
    
  3. Deploy the frontend Deployment from the previous example:

    kubectl apply -f frontend-deployment.yaml
    
  4. Get the Kubernetes event log:

    kubectl get events --all-namespaces
    

    The output is similar to the following:

     NAMESPACE  LAST SEEN  TYPE    REASON     OBJECT                             MESSAGE
     default    87s        Normal  Pulled     pod/frontend-64bcc69c4b-qwmfp      Successfully pulled image "us-docker.pkg.dev/google-samples/containers/gke/gb-frontend:v5" in 23.929723476s
    

    Notice the time GKE took to pull the entire image. In this example output, GKE needed almost 24 seconds. With Image streaming enabled, GKE only needed 1.5 seconds to pull the image data that the workload required to start.

Clean up

To avoid charges, delete the clusters you created in the previous examples:

gcloud container clusters delete CLUSTER_NAME CLUSTER2_NAME

Disable Image streaming

If you use GKE Autopilot, you can't disable Image streaming on individual clusters. You can disable the Container File System API, which disables Image streaming for the entire project.

If you use GKE Standard clusters, you can disable Image streaming on individual clusters or specific node pools, as described in the following sections.

Disable Image streaming on a GKE Standard cluster

You can disable Image streaming on existing GKE Standard clusters using the gcloud CLI or the Google Cloud console.

gcloud

To disable Image streaming on an existing cluster, run the following command:

gcloud container clusters update CLUSTER_NAME \
    --no-enable-image-streaming

Console

  1. Go to the Google Kubernetes Engine page in the Google Cloud console.

    Go to Google Kubernetes Engine

  2. Click the name of the cluster you want to modify.

  3. On the Clusters page, under Features, click next to Image streaming.

  4. In the Edit Image streaming dialog box, clear the Enable Image streaming checkbox.

  5. Click Save changes.

Changing the Image streaming setting respects maintenance availability when updated at the cluster level, but not at the node pool level.

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy and respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.

On a new node pool

To disable Image streaming when creating a new node pool, specify the --no-enable-image-streaming flag, such as in the following command:

gcloud container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --zone=COMPUTE_ZONE \
    --no-enable-image-streaming

On an existing node pool

To disable Image streaming on an existing node pool, run the following command:

gcloud container node-pools update NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --no-enable-image-streaming

Changing the Image streaming setting respects maintenance availability when updated at the cluster level, but not at the node pool level.

This change requires recreating the nodes, which can cause disruption to your running workloads. For details about this specific change, find the corresponding row in the manual changes that recreate the nodes using a node upgrade strategy without respecting maintenance policies table. To learn more about node updates, see Planning for node update disruptions.

Memory reservation for Image streaming

GKE reserves memory resources for Image streaming in addition to the memory that is reserved for node system components to run. GKE does not reserve additional CPU resources for Image streaming. In GKE Standard clusters, this reservation changes the memory resources that are available for you to request in your Pods. In GKE Autopilot, GKE manages system allocations, so there's no impact to scheduling your workloads.

For details about the memory reservations GKE makes for node components, see Standard cluster architecture.

In nodes that use Image streaming, GKE makes the following additional memory reservations for new reservations:

  • No additional memory for machines with less than 1 GiB of memory
  • 1% of the first 4 GiB of memory
  • 0.8% of the next 4 GiB of memory (up to 8 GiB)
  • 0.4% of the next 8 GiB of memory (up to 16 GiB)
  • 0.24% of the next 112 GiB of memory (up to 128 GiB)
  • 0.08% of any memory above 128 GiB

Troubleshooting

GKE doesn't use the Image streaming filesystem

If your GKE event log doesn't show the Image streaming events, your image is not backed by the remote filesystem. If GKE previously pulled the image on the node, this is expected behavior because GKE uses the local cache of the image for subsequent pulls instead of using Image streaming. You can verify this by looking for Container image IMAGE_NAME already present on machine in the Message field for the Pod Pulled event.

If you don't see the Image streaming event during the first image pull on the node, ensure that you meet the requirements for Image streaming. If you meet the requirements, you can diagnose the issue by checking the logs of the Image streaming Service (named gcfsd):

  1. Go to the Logs Explorer page in the Google Cloud console:

    Go to Logs Explorer

  2. In the Query field, specify the following query:

    logName="projects/PROJECT_ID/logs/gcfsd"
    resource.labels.cluster_name="CLUSTER_NAME"
    

    Replace the following:

    • PROJECT_ID: The name of your project.
    • CLUSTER_NAME: The name of your cluster.
  3. Click Run query.

You can also check the gcfsd logs using Logs Explorer:

  1. Go to the Logs Explorer in the Google Cloud console:

    Go to Logs Explorer

  2. In the Query field, specify the following query:

    logName="projects/PROJECT_ID/logs/gcfsd"
    

    Replace PROJECT_ID with your Google Cloud project ID.

PermissionDenied

If the gcfsd logs display an error message similar to the following, the node doesn't have the correct API scope. GKE pulls container images for workloads without using Image streaming.

level=fatal msg="Failed to create a Container File System client: rpc error:
code = PermissionDenied desc = failed to probe endpoint: rpc error: code = PermissionDenied
desc = Request had insufficient authentication scopes."

You can fix this by granting the correct scope to the node to allow it to use Image streaming. Add the devstorage.read_only scope to the cluster or node pool, similar to the following command:

gcloud container node-pools create NODE_POOL_NAME \
    --cluster=CLUSTER_NAME \
    --zone=COMPUTE_ZONE \
    --image-type="COS_CONTAINERD" \
    --enable-image-streaming \
    --scope="https://www.googleapis.com/auth/devstorage.read_only"

FailedPrecondition

If you notice an error message with code = FailedPrecondition, the image wasn't imported to the Image streaming remote filesystem.

You might notice this error if you tried to use Image streaming with an existing node pool. If a node in the node pool already has the container image on-disk, GKE uses the local image instead of using Image streaming to get the image.

To fix this, try the following:

  • Wait a few minutes and try to deploy your workload again.
  • Add new nodes or a new node pool and schedule the workload on those nodes.

InvalidArgument

If you notice an error message with code=InvalidArgument, the container image your workload uses is not eligible for Image streaming. Ensure that the image meets the requirements. If your image is not on Artifact Registry, try migrating to Artifact Registry.

backend.FileContent failed

The following error might appear when reading container files with Image streaming enabled:

level=error msg="backend.FileContent failed" error="rpc error: code = ResourceExhausted desc = Quota exceeded for quota metric 'Content requests per project per region' and limit 'Content requests per project per region per minute per region' of service 'containerfilesystem.googleapis.com' for consumer 'project_number:PROJECT_NUMBER'." layer_id="sha256:1234567890" module=gcfs_backend offset=0 path=etc/passwd size=4096

This error indicates project has exceeded the quota required to read files from the remote container file system service. To resolve this issue, increase the following quotas:

  • Content requests per project per region per minute per region
  • Content requests per project per region

GKE downloads the image without streaming the data

Container images using customer-managed encryption keys (CMEK) are only eligible for Image streaming on GKE version 1.25.3-gke.1000 or later. Container images with duplicate layers are not eligible for Image streaming. See the Limitations for more information.

Checking for empty layers or duplicate layers

To check the container image for empty layers or duplicate layers, run the following command:

docker inspect IMAGE_NAME

Replace IMAGE_NAME with the name of the container image.

In the output of the command, inspect the entries under "Layers".

If one of the entries exactly matches the following"sha256" output, the container image has an empty layer and is not eligible for Image streaming.

"Layers": [
  ...
  "sha256:a3ed95caeb02ffe68cdd9fd84406680ae93d633cb16422d00e8a7c22955b46d4",
  ...
]

If there are duplicate entries like in the following example, the container image has duplicate layers and is not eligible for Image streaming.

"Layers": [
  "sha256:28699c71935fe3ffa56533db44ad93e5a30322639f7be70d5d614e06a1ae6d9b",
  ...
  "sha256:28699c71935fe3ffa56533db44ad93e5a30322639f7be70d5d614e06a1ae6d9b",
  ...
]

mv command and renameat2 system calls fail on symlink files

For GKE nodes running version 1.25 and later, when Image streaming is enabled, the mv command and renameat2 system call might fail on symlink files in container images with the error message "No such device or address". The issue is caused by a regression on recent Linux kernels.

These system calls are not common, so the majority of images are not affected by this problem. The issue typically happens on container initialization stages when an application is being prepared to run and move around files. It is not possible to test the image locally, so GKE recommends to use Image streaming on test environments to find the issue before the image is used in production.

The fix is available in the following GKE patch versions:

  • 1.25: 1.25.14-gke.1351000 and later
  • 1.26: 1.26.9-gke.1345000 and later
  • 1.27: 1.27.6-gke.100 and later
  • 1.28: 1.28.1-gke.1157000 and later

Alternatively, to mitigate this issue for any affected workloads, you can try replacing the code leading to the renameat2 system call. If you cannot modify the code, you must disable Image streaming on the node pool to mitigate the issue.