Access Cloud Storage buckets with the Cloud Storage FUSE CSI driver


Filesystem in Userspace (FUSE) is an interface used to export a filesystem to the Linux kernel. Cloud Storage FUSE allows you to mount Cloud Storage buckets as a file system so that applications can access the objects in a bucket using common File IO operations (e.g. open, read, write, close) rather than using cloud-specific APIs.

The Cloud Storage FUSE CSI driver lets you use the Kubernetes API to consume pre-existing Cloud Storage buckets as volumes. Your applications can upload and download objects using Cloud Storage FUSE file system semantics. The Cloud Storage FUSE CSI driver provides a fully-managed experience powered by the open source Google Cloud Storage FUSE CSI driver.

The driver natively supports the following ways for you to configure your Cloud Storage-backed volumes:

You can use the Cloud Storage FUSE CSI driver with file caching to improve the read performance of applications handling small files from Cloud Storage buckets. The Cloud Storage FUSE file cache feature is a client-based read cache that allows repeated file reads to be served more quickly from cache storage of your choice. You can choose from a range of storage options for the read cache, including Local SSDs and Persistent Disk-based storage, based on your price-performance needs. You must opt-in to enable file caching with the Cloud Storage FUSE CSI driver. To learn more about best practices for caching, refer to Cloud Storage FUSE performance.

Benefits

  • The Cloud Storage FUSE CSI driver on your cluster turns on automatic deployment and management of the driver. The driver works on both Standard and Autopilot clusters.
  • The Cloud Storage FUSE CSI driver does not need privileged access that is typically required by FUSE clients. This enables a better security posture.
  • The support of CSI ephemeral volumes simplifies volume configuration and management by eliminating the need for PersistentVolumeClaim and PersistentVolume objects.
  • The Cloud Storage FUSE CSI driver supports the ReadWriteMany, ReadOnlyMany, and ReadWriteOnce access modes.
  • You can use Workload Identity Federation for GKE to manage authentication while having granular control over how your Pods access Cloud Storage objects. Uniform bucket-level access is required for read-write workloads when using Workload Identity Federation.
  • If you are running ML training and serving workloads with frameworks like Ray, PyTorch, Spark, and TensorFlow, the portability and simplicity provided by the Cloud Storage FUSE CSI driver allow you to run your workloads directly on your GKE clusters without additional code changes.
  • You can read Cloud Storage objects with file caching enabled to boost the read performance. File caching accelerates repeat reads, by serving objects from local storage. To learn more about the benefits of file caching, refer to the Cloud Storage FUSE documentation.
  • With Cloud Storage FUSE v.2.4.0 and file cache enabled, you can use the parallel download feature to accelerate reading large files from Cloud Storage for multi-threaded downloads. You can use this feature to improve model load times, especially for reads over 1 GB in size (for example, up to twice as fast when loading Llama2 70B).
  • You can consume Cloud Storage FUSE volumes in init containers.
  • You can view metrics insights for Cloud Storage FUSE, including file system, Cloud Storage, and file cache usage.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

Limitations

Requirements

To use the Cloud Storage FUSE CSI driver, your clusters must meet the following requirements:

Enable the Cloud Storage FUSE CSI driver

To create a Standard cluster with the Cloud Storage FUSE CSI driver enabled, you can use the gcloud CLI:

gcloud container clusters create CLUSTER_NAME \
    --addons GcsFuseCsiDriver \
    --cluster-version=VERSION \
    --location=LOCATION \
    --workload-pool=PROJECT_ID.svc.id.goog

Replace the following:

  • CLUSTER_NAME: the name of your cluster.
  • VERSION: the GKE version number. You must select 1.24 or later.
  • LOCATION: the Compute Engine location for the cluster.
  • PROJECT_ID: your project ID.

To enable the driver on an existing Standard cluster, use the gcloud container clusters update command:

gcloud container clusters update CLUSTER_NAME \
    --update-addons GcsFuseCsiDriver=ENABLED \
    --location=LOCATION

Replace the following:

After you enable the Cloud Storage FUSE CSI driver, you can use the driver in Kubernetes volumes by specifying the driver and provisioner name: gcsfuse.csi.storage.gke.io.

Configure access to Cloud Storage buckets using GKE Workload Identity Federation for GKE

To make your Cloud Storage buckets accessible by your GKE cluster using Workload Identity Federation for GKE, follow these steps. See Configure applications to use Workload Identity Federation for GKE for more information.

  1. Get credentials for your cluster:

    gcloud container clusters get-credentials CLUSTER_NAME \
        --location=LOCATION
    

    Replace the following:

    • CLUSTER_NAME: the name of your cluster that has Workload Identity Federation for GKE enabled.
    • LOCATION: the Compute Engine location for the cluster.
  2. Create a namespace to use for the Kubernetes ServiceAccount. You can also use the default namespace or any existing namespace.

    kubectl create namespace NAMESPACE
    

    Replace the following:

    • NAMESPACE: the name of the Kubernetes namespace for the Kubernetes ServiceAccount.
  3. Create a Kubernetes ServiceAccount for your application to use. You can also use any existing Kubernetes ServiceAccount in any namespace, including the default Kubernetes ServiceAccount.

    kubectl create serviceaccount KSA_NAME \
        --namespace NAMESPACE
    

    Replace the following:

    • KSA_NAME: the name of your new Kubernetes ServiceAccount.
    • NAMESPACE: the name of the Kubernetes namespace for the Kubernetes ServiceAccount.
  4. Grant one of the IAM roles for Cloud Storage to the Kubernetes ServiceAccount.

    You can grant the role to your Kubernetes ServiceAccount to only access a specific Cloud Storage bucket using the following command:

    gcloud storage buckets add-iam-policy-binding gs://BUCKET_NAME \
        --member "principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/NAMESPACE/sa/KSA_NAME" \
        --role "ROLE_NAME"
    

    Replace the following:

    • BUCKET_NAME: your Cloud Storage bucket name.
    • PROJECT_NUMBER: the numerical project number of your GKE cluster. To find your project number, see Identifying projects.
    • PROJECT_ID: the project ID of your GKE cluster.
    • NAMESPACE: the name of the Kubernetes namespace for the Kubernetes ServiceAccount.
    • KSA_NAME: the name of your new Kubernetes ServiceAccount.
    • ROLE_NAME: the IAM role to assign to your Kubernetes ServiceAccount.
      • For read-only workloads, use the Storage Object Viewer role (roles/storage.objectViewer).
      • For read-write workloads, use the Storage Object User role (roles/storage.objectUser).

    Optionally, you can grant the role to your Kubernetes ServiceAccount to access all your Cloud Storage buckets in the project using the following command:

    gcloud projects add-iam-policy-binding GCS_PROJECT \
        --member "principal://iam.googleapis.com/projects/PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/NAMESPACE/sa/KSA_NAME" \
        --role "ROLE_NAME"
    

    Replace the following:

    • GCS_PROJECT: the project ID of your Cloud Storage buckets.
    • PROJECT_NUMBER: the numerical project number of your GKE cluster. To find your project number, see Identifying projects.
    • PROJECT_ID: the project ID of your GKE cluster.
    • NAMESPACE: the name of the Kubernetes namespace for the Kubernetes ServiceAccount.
    • KSA_NAME: the name of your new Kubernetes ServiceAccount.
    • ROLE_NAME: the IAM role to assign to your Kubernetes ServiceAccount.
      • For read-only workloads, use the Storage Object Viewer role (roles/storage.objectViewer).
      • For read-write workloads, use the Storage Object User role (roles/storage.objectUser).

Prepare to mount Cloud Storage FUSE buckets

This section covers how to prepare to mount Cloud Storage FUSE buckets on your clusters.

Specify Pod annotations

The CSI driver relies on Pod annotations to identify if your Pod uses Cloud Storage-backed volumes. If the driver detects the necessary annotations, it injects a sidecar container called gke-gcsfuse-sidecar into your workload Pod. The Cloud Storage FUSE instances run inside the sidecar container and mount the Cloud Storage buckets for your workload.

To enable the CSI driver to mount the Cloud Storage buckets, make sure you specify the annotation gke-gcsfuse/volumes: "true" in your Pod specification, under the metadata field. If you want your Cloud Storage-backed volumes to be consumed by other Kubernetes workload types (for instance, Job, Deployment, or StatefulSet), make sure you configure the annotations under the spec.template.metadata.annotations field.

If you are using Istio or Cloud Service Mesh, add the following Pod-level annotations:

proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }'
traffic.sidecar.istio.io/excludeOutboundIPRanges: 169.254.169.254/32

Configure resources for the sidecar container

By default, the sidecar container is configured with the following resource requests, with resource limits unset (for Standard cluster):

  • 250m CPU
  • 256 MiB memory
  • 5 GiB ephemeral storage

To overwrite these values, you can optionally specify the annotation gke-gcsfuse/[cpu-limit|memory-limit|ephemeral-storage-limit|cpu-request|memory-request|ephemeral-storage-request] as shown in the following example:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    gke-gcsfuse/volumes: "true"
    gke-gcsfuse/cpu-limit: "10"
    gke-gcsfuse/memory-limit: 10Gi
    gke-gcsfuse/ephemeral-storage-limit: 1Ti
    gke-gcsfuse/cpu-request: 500m
    gke-gcsfuse/memory-request: 1Gi
    gke-gcsfuse/ephemeral-storage-request: 50Gi

Use the following considerations when deciding the amount of resources to allocate:

  • If you set only one of the resource request or limit annotations, GKE Autopilot enforces the same values for the resource request and resource limit.
  • If your workload Pod consumes multiple Cloud Storage volumes, the sidecar container resources are shared by multiple Cloud Storage FUSE instances. If this applies to you, consider increasing the resource allocation for multiple Cloud Storage volumes.
  • Allocate more CPU to the sidecar container if your workloads need higher throughput. Insufficient CPU will cause Cloud Storage FUSE throttling.
  • If your workloads need to process a large number of files, and the Cloud Storage FUSE metadata caching is enabled, increase the sidecar container's memory allocation. Cloud Storage FUSE memory consumption for metadata caching is proportional to the number of files but not the file size. Insufficient memory will cause Cloud Storage FUSE out-of-memory errors and crash the workload application.
  • For file caching, Cloud Storage FUSE by default caches the files in a local temporary directory. Estimate how much free space your workload needs for file caching, and increase your ephemeral storage limit accordingly. To learn more, see volume attributes.
  • For write operations, Cloud Storage FUSE by default stages the files in a local temporary directory before the files are uploaded to the Cloud Storage bucket. Estimate how much free space your workload needs for staging when writing large files, and increase your ephemeral storage limit accordingly. To learn more, see Read/Writes semantics in the Cloud Storage FUSE GitHub documentation.
  • You can use value "0" to unset any resource limits or requests on Standard clusters. For example, annotation gke-gcsfuse/memory-limit: "0" leaves the sidecar container memory limit empty with the default memory request. This is useful when you cannot decide on the amount of resources Cloud Storage FUSE needs for your workloads, and want to let Cloud Storage FUSE consume all the available resources on a node. After calculating the resource requirements for Cloud Storage FUSE based on your workload metrics, you can set appropriate limits.

Configure a private image for the sidecar container

This section describes how to use the sidecar container image if you are hosting it in a private container registry. This scenario might apply if you need to use private nodes for security purposes. To configure and consume the private sidecar container image, follow these steps:

  1. Refer to this page to look for a compatible public sidecar container image.

  2. Pull it to your local environment and push it to your private container registry.

  3. In the manifest, specify a container named gke-gcsfuse-sidecar with only the image field. GKE will use the specified sidecar container image to prepare for the sidecar container injection. Here is an example:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  containers:
  - name: gke-gcsfuse-sidecar
    image: PRIVATE_REGISTRY/gcs-fuse-csi-driver-sidecar-mounter:PRIVATE_IMAGE_TAG
  - name: main # your main workload container.

Replace the following:

  • PRIVATE_REGISTRY: your private container registry.
  • PRIVATE_IMAGE_TAG: your private sidecar container image tag.

Configure a custom write buffer volume for the sidecar container

This section describes how to configure a custom buffer volume for Cloud Storage FUSE write buffering. This scenario might apply if you need to replace the default emptyDir volume for Cloud Storage FUSE to stage the files in write operations. You can specify any type of storage supported by GKE, such as a PersistentVolumeClaim, and GKE will use the specified volume for file write buffering. This is useful if you need to write files larger than 10 GiB on Autopilot clusters. To use the custom buffer volume, you must specify a non-zero fsGroup. The following example shows how you can use a predefined PVC as the buffer volume:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  securityContext:
    fsGroup: FS_GROUP
  containers:
  ...
  volumes:
  - name: gke-gcsfuse-buffer
    persistentVolumeClaim:
      claimName: BUFFER_VOLUME_PVC

Replace the following:

  • FS_GROUP: the fsGroup ID.
  • BUFFER_VOLUME_PVC: the predefined PVC name.

Configure a custom read cache volume for the sidecar container

This section describes how to configure a custom cache volume for Cloud Storage FUSE read caching. This scenario might apply if you need to replace the default emptyDir volume for Cloud Storage FUSE to cache the files in read operations. You can specify any type of storage supported by GKE, such as a PersistentVolumeClaim, and GKE will use the specified volume for file caching. This is useful if you need to cache files larger than 10 GiB on Autopilot clusters. To use the custom cache volume, you must specify a non-zero fsGroup. The following example shows how you can use a predefined PVC as the cache volume:

apiVersion: v1
kind: Pod
metadata:
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  securityContext:
    fsGroup: FS_GROUP
  containers:
  ...
  volumes:
  - name: gke-gcsfuse-cache
    persistentVolumeClaim:
      claimName: CACHE_VOLUME_PVC

Replace the following:

  • FS_GROUP: the fsGroup ID.
  • CACHE_VOLUME_PVC: the predefined PVC name.

Provision your volume as a CSI ephemeral volume

CSI ephemeral volumes backed by Cloud Storage buckets are tied to the Pod lifecycle. With this provisioning approach, you don't need to maintain the PersistentVolume and PersistentVolumeClaim objects associated with the Cloud Storage buckets after Pod termination.

Consume the CSI ephemeral storage volume in a Pod

  1. Save the following YAML manifest:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gcs-fuse-csi-example-ephemeral
      namespace: NAMESPACE
      annotations:
        gke-gcsfuse/volumes: "true"
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - image: busybox
        name: busybox
        command: ["sleep"]
        args: ["infinity"]
        volumeMounts:
        - name: gcs-fuse-csi-ephemeral
          mountPath: /data
          readOnly: true
      serviceAccountName: KSA_NAME
      volumes:
      - name: gcs-fuse-csi-ephemeral
        csi:
          driver: gcsfuse.csi.storage.gke.io
          readOnly: true
          volumeAttributes:
            bucketName: BUCKET_NAME
            mountOptions: "implicit-dirs"
            gcsfuseLoggingSeverity: warning
    

    The previous example shows how you can specify the Cloud Storage bucket inline in the Pod manifest. The example includes the following fields:

    • metadata.annotations: the annotation gke-gcsfuse/volumes: "true" is required. See Configure resources for the sidecar container for optional annotations.
    • spec.terminationGracePeriodSeconds: optional. By default, this is set to 30. If you need to write large files to the Cloud Storage bucket, increase this value to make sure that Cloud Storage FUSE has enough time to flush the data after your application exits. To learn more, see Kubernetes best practices: Terminating with grace.
    • spec.serviceAccountName: use the same Kubernetes ServiceAccount as in the Configure access to Cloud Storage buckets using GKE Workload Identity Federation for GKE step.
    • spec.volumes[n].csi.driver: use gcsfuse.csi.storage.gke.io as the CSI driver name.
    • spec.volumes[n].csi.volumeAttributes.bucketName: specify your Cloud Storage FUSE bucket name. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic Mounting in the Cloud Storage FUSE documentation.
    • spec.volumes[n].csi.volumeAttributes.mountOptions: optional. Pass mount options to Cloud Storage FUSE. Specify the flags in one string separated by commas, without spaces.
    • spec.volumes[n].csi.volumeAttributes: optional. Pass other volume attributes to Cloud Storage FUSE.
    • spec.volumes[n].csi.readOnly: optional. Specify true if all the volume mounts are read-only.
    • spec.containers[n].volumeMounts[m].readOnly: optional. Specify true if only a specific volume mount is read-only.
  2. Apply the manifest to the cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to the YAML file.

Consume the CSI ephemeral storage volume in a Job workload

  1. Save the following YAML manifest:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gcs-fuse-csi-job-example
      namespace: NAMESPACE
    spec:
      template:
        metadata:
          annotations:
            gke-gcsfuse/volumes: "true"
        spec:
          serviceAccountName: KSA_NAME
          containers:
          - name: writer
            image: busybox
            command:
              - "/bin/sh"
              - "-c"
              - touch /data/test && echo $(date) >> /data/test && sleep 10
            volumeMounts:
            - name: gcs-fuse-csi-ephemeral
              mountPath: /data
          - name: reader
            image: busybox
            command:
              - "/bin/sh"
              - "-c"
              - sleep 10 && cat /data/test
            volumeMounts:
            - name: gcs-fuse-csi-ephemeral
              mountPath: /data
              readOnly: true
          volumes:
          - name: gcs-fuse-csi-ephemeral
            csi:
              driver: gcsfuse.csi.storage.gke.io
              volumeAttributes:
                bucketName: BUCKET_NAME
          restartPolicy: Never
      backoffLimit: 1
    

    Replace the following:

    The manifest deploys a Job that consumes a Cloud Storage FUSE bucket through a CSI ephemeral volume.

  2. Apply the manifest to the cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to the YAML file.

If you are using the CSI driver in a Job workload, or if the Pod RestartPolicy is Never, the sidecar container will exit automatically after all the other workload containers exit.

For additional examples, see Example Applications in the GitHub project documentation.

Provision your volume using static provisioning

With static provisioning, you create one or more PersistentVolume (PV) objects containing the details of the underlying storage system. Pods in your clusters can then consume the storage through PersistentVolumeClaims (PVCs).

Create a PersistentVolume

  1. Save the following YAML manifest:

    apiVersion: v1
    kind: PersistentVolume
    metadata:
      name: gcs-fuse-csi-pv
    spec:
      accessModes:
      - ReadWriteMany
      capacity:
        storage: 5Gi
      storageClassName: example-storage-class
      mountOptions:
        - implicit-dirs
      csi:
        driver: gcsfuse.csi.storage.gke.io
        volumeHandle: BUCKET_NAME
        volumeAttributes:
          gcsfuseLoggingSeverity: warning
      claimRef:
        name: gcs-fuse-csi-static-pvc
        namespace: NAMESPACE
    

    The example manifest shows how you can define a PersistentVolume for Cloud Storage buckets. The example includes the following fields:

    • spec.csi.driver: use gcsfuse.csi.storage.gke.io as the CSI driver name.
    • spec.csi.volumeHandle: specify your Cloud Storage bucket name. You can pass an underscore (_) to mount all the buckets that the Kubernetes ServiceAccount is configured to have access to. To learn more, see Dynamic Mounting in the Cloud Storage FUSE documentation.
    • spec.mountOptions: optional. Pass mount options to Cloud Storage FUSE.
    • spec.csi.volumeAttributes: optional. Pass volume attributes to Cloud Storage FUSE.
  2. Apply the manifest to the cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to the YAML file.

Create a PersistentVolumeClaim

  1. Save the following YAML manifest:

    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
      name: gcs-fuse-csi-static-pvc
      namespace: NAMESPACE
    spec:
      accessModes:
      - ReadWriteMany
      resources:
        requests:
          storage: 5Gi
      storageClassName: example-storage-class
    

    The example manifest shows how you can define a PersistentVolumeClaim to bind the PersistentVolume. The example includes the following fields:

    • metadata.namespace: specify the PersistentVolumeClaim namespace that must be consistent with the namespace of your workload.

    To bind a PersistentVolume to a PersistentVolumeClaim, make sure to follow these guidelines:

    • spec.storageClassName fields on PV and PVC manifests should match. The storageClassName does not need to refer to an existing StorageClass object. To bind the claim to a volume, you can use any name you want but it cannot be empty.
    • spec.accessModes fields on PV and PVC manifests should match.
    • spec.capacity.storage on the PersistentVolume manifest should match spec.resources.requests.storage on the PersistentVolumeClaim manifest. Since Cloud Storage buckets don't have size limits, you can put any number for capacity but it cannot be empty.
  2. Apply the manifest to the cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to the YAML file.

Consume the volume from a PersistentVolumeClaim

  1. Save the following YAML manifest:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gcs-fuse-csi-example-static-pvc
      namespace: NAMESPACE
      annotations:
        gke-gcsfuse/volumes: "true"
    spec:
      containers:
      - image: busybox
        name: busybox
        command: ["sleep"]
        args: ["infinity"]
        volumeMounts:
        - name: gcs-fuse-csi-static
          mountPath: /data
          readOnly: true
      serviceAccountName: KSA_NAME
      volumes:
      - name: gcs-fuse-csi-static
        persistentVolumeClaim:
          claimName: gcs-fuse-csi-static-pvc
          readOnly: true
    

    The example shows how you can define a Pod that consumes a Cloud Storage FUSE bucket through a PersistentVolumeClaim. The example includes the following fields:

  2. Apply the manifest to the cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to the YAML file.

For additional examples, see Example Applications in the GitHub project documentation.

Consume your volumes with file caching enabled

By default, the file caching feature is disabled on GKE. To enable and control file caching, use the volume attribute fileCacheCapacity.

GKE uses an emptyDir volume for Cloud Storage FUSE file caching backed by the node VM boot disk. If you enable Local SSD on the node, GKE uses the Local SSD to back the emptyDir volume.

You can configure a custom read cache volume for the sidecar container to replace the default emptyDir volume for file caching in read operations. For CPU and GPU VM families with Local SSD support, we recommend using Local SSD storage. For TPU families or Autopilot, we recommend using Balanced Persistent Disk or SSD Persistent Disk.

Consume a CSI ephemeral storage volume with file caching enabled

To deploy a Pod that consumes a Cloud Storage FUSE bucket through a CSI ephemeral volume with file caching, follow these steps:

  1. Create a cluster or node pool with Local SSD-backed ephemeral storage.

    Follow the GKE documentation to create a cluster or node pool with Local SSD-backed ephemeral storage.

  2. Save the following YAML manifest:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gcs-fuse-csi-file-cache-example
      namespace: NAMESPACE
      annotations:
        gke-gcsfuse/volumes: "true"
        gke-gcsfuse/ephemeral-storage-limit: "50Gi"
    spec:
      nodeSelector:
        cloud.google.com/gke-ephemeral-storage-local-ssd: "true"
      restartPolicy: Never
      initContainers:
      - name: data-loader
        image: gcr.io/google.com/cloudsdktool/google-cloud-cli:slim
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 1Gi
        command:
          - "/bin/sh"
          - "-c"
          - |
            mkdir -p /test_files
            for i in $(seq 1 1000); do dd if=/dev/zero of=/test_files/file_$i.txt bs=1024 count=64; done
            gcloud storage cp /test_files gs://BUCKET_NAME --recursive
      containers:
      - name: data-validator
        image: busybox
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
        command:
          - "/bin/sh"
          - "-c"
          - |
            echo "first read with cache miss"
            time cat /data/test_files/file_* > /dev/null
    
            echo "second read from local cache"
            time cat /data/test_files/file_* > /dev/null
        volumeMounts:
        - name: gcs-fuse-csi-ephemeral
          mountPath: /data
      serviceAccountName: KSA_NAME
      volumes:
      - name: gcs-fuse-csi-ephemeral
        csi:
          driver: gcsfuse.csi.storage.gke.io
          volumeAttributes:
            bucketName: BUCKET_NAME
            mountOptions: "implicit-dirs"
            fileCacheCapacity: "10Gi"
    

    Replace the following:

    The init container data-loader generates 1,000 files with size of 64 KiB, and uploads the files to a Cloud Storage bucket. The main container data-validator reads all the files from the bucket twice, and logs the duration.

  3. Apply the manifest to the cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to the YAML file.

  4. To view the log output, run the following command:

    kubectl logs -n NAMESPACE gcs-fuse-csi-file-cache-example -c data-validator
    

    Replace NAMESPACE with the namespace of your workload.

    The output is similar to the following:

    first read with cache miss
    real    0m 54.68s
    ...
    second read from local cache
    real    0m 0.38s
    ...
    

    The output shows that the second read with local cache is much faster than the first read with cache miss.

Improve large file read performance using Cloud Storage FUSE parallel download

You can use Cloud Storage FUSE parallel download to accelerate reading large files from Cloud Storage for multi-threaded downloads. Cloud Storage FUSE parallel download can be particularly beneficial for model serving use cases with reads over 1 GB in size.

Common examples include:

  • Model serving, where you need a large prefetch buffer to accelerate model download during instance boot.
  • Checkpoint restores, where you need a read-only data cache to improve one-time access of multiple large files.
Best practice:

Use parallel download for applications that perform single-threaded large file reads. Applications with high read-parallelism (using more than eight threads) may encounter lower performance with this feature.

To use parallel download with the Cloud Storage FUSE CSI driver, follow these steps:

  1. Enable file cache. Create a cluster with file caching enabled, as described in Consume a CSI ephemeral storage volume with file caching enabled.

  2. Enable parallel download. In your manifest, configure these additional settings using mount options:

    1. Set file-cache:enable-parallel-downloads:true.
    2. Adjust file-cache:parallel-downloads-per-file, file-cache:max-parallel-downloads, and file-cache:download-chunk-size-mb as needed.
  3. (Optional) Tune volume attributes. If needed, consider tuning these volume attributes:

    • fileCacheForRangeRead for random or partial reads.
    • metadataTypeCacheCapacity and metadataStatCacheCapacity for training workloads.

Click one of these tabs to see how you can can enable parallel download depending or whether you are using ephemeral storage volumes or static provisioning:

Ephemeral storage

apiVersion: v1
kind: Pod
metadata:
  name: gcs-fuse-csi-example-ephemeral
  namespace: NAMESPACE
  annotations:
    gke-gcsfuse/volumes: "true"
spec:
  containers:
  ...
  volumes:
  - name: gcs-fuse-csi-ephemeral
    csi:
      driver: gcsfuse.csi.storage.gke.io
      volumeAttributes:
        bucketName: BUCKET_NAME
        mountOptions: "implicit-dirs,file-cache:enable-parallel-downloads:true,file-cache:parallel-downloads-per-file:4,file-cache:max-parallel-downloads:-1,file-cache:download-chunk-size-mb:3"
        fileCacheCapacity: "-1"

Static provisioning

apiVersion: v1
kind: PersistentVolume
metadata:
  name: gcs-fuse-csi-pv
spec:
  accessModes:
  - ReadWriteMany
  capacity:
    storage: 5Gi
  storageClassName: example-storage-class
  mountOptions:
    - implicit-dirs
    - file-cache:enable-parallel-downloads:true
    - file-cache:parallel-downloads-per-file:4
    - file-cache:max-parallel-downloads:-1
    - file-cache:download-chunk-size-mb:3
  csi:
    driver: gcsfuse.csi.storage.gke.io
    volumeHandle: BUCKET_NAME
    volumeAttributes:
      fileCacheCapacity: "-1"
  claimRef:
    name: gcs-fuse-csi-static-pvc
    namespace: NAMESPACE

Configure how Cloud Storage FUSE buckets are mounted

This section describes how you can configure the Cloud Storage FUSE volumes.

Mount options

The Cloud Storage FUSE CSI driver supports mount options to configure how Cloud Storage buckets are mounted on your local file system. For the full list of supported mount options, see the gcsfuse CLI documentation.

You can specify the mount flags in the following ways:

  • In the spec.mountOptions field on a PersistentVolume manifest, if you use static provisioning.
  • In the spec.volumes[n].csi.volumeAttributes.mountOptions field, if you use CSI ephemeral volumes.

Volume attributes

Cloud Storage FUSE CSI driver does not allow you to directly specify the Cloud Storage FUSE configuration file. You can configure some of the fields in the configuration file using the following volume attributes. The values are translated to the configuration file fields.

  • gcsfuseLoggingSeverity

    • Description: The severity of logs you want Cloud Storage FUSE to generate, expressed as an enum. This volume attribute is translated to the configuration file field logging:severity.

    • Valid values (ordered from lowest severity to highest severity):

      • trace
      • debug
      • info
      • warning
      • error
    • Default value: info.

  • fileCacheCapacity

    • Description: The maximum size that the file cache can use. If a non-zero value presents, this volume attribute enables file caching in Cloud Storage FUSE. This volume attribute is translated to the configuration file field file-cache:max-size-mb.

    • Valid values:

      • Quantity values, for example: 500Mi, 10Gi.
      • "-1": to use the cache volume's entire available capacity.
      • "0": the file cache is disabled.
    • Default value: "0".

  • fileCacheForRangeRead

    • Description: Whether the full object should be downloaded asynchronously and stored in the Cloud Storage FUSE cache directory when the first read is done from a non-zero offset. This should be set to "true" if you plan on performing several random reads or partial reads. This volume attribute is translated to the configuration file field file-cache:cache-file-for-range-read.

    • Valid values:

      • Boolean values in string format: "true", "false".
    • Default value: "false".

  • metadataStatCacheCapacity

    • Description: The maximum size that the stat cache can use. The stat cache is always entirely kept in memory. If you are already using stat-cache-capacity mount option, the value will still be honored and will be appropriately translated to this new configuration. This volume attribute is translated to the configuration file field metadata-cache:stat-cache-max-size-mb.

    • Valid values:

      • Quantity values, for example: 500Mi, 1Gi.
      • "-1": to let the stat cache use as much memory as needed.
      • "0": the stat cache is disabled.
      • Use the default value of 32Mi if your workload involves up to 20,000 files. If your workload is larger than 20,000 files, increase the size by values of 10 MiB for every additional 6,000 files, an average of ~1,500 bytes per file.
    • Default value: 32Mi.

  • metadataTypeCacheCapacity

    • Description: The maximum size per directory that the type cache can use. The type cache is always entirely kept in memory. This volume attribute is translated to the configuration file field metadata-cache:type-cache-max-size-mb.

    • Valid values:

      • Quantity values, for example: 500Mi, 1Gi.
      • "-1": to let the type cache use as much memory as needed.
      • "0": the type cache is disabled.
      • Use the default value of 4Mi if the maximum number of files within a single directory from the bucket you're mounting contains 20,000 files or less. If the maximum number of files within a single directory that you're mounting contains more than 20,000 files, increase the size by 1 MiB for every 5,000 files, an average of ~200 bytes per file.
    • Default value: 4Mi.

  • metadataCacheTTLSeconds

    • Description: The time to live (TTL), in seconds, of cached metadata entries. If you are already using the stat-cache-ttl or type-cache-ttl mount options, the values will still be honored and will be appropriately translated to this new configuration. This volume attribute is translated to the configuration file field metadata-cache:ttl-secs.

    • Valid values:

      • Integer values in string format, for example: "600".
      • "-1": bypass a TTL expiration and serve the file from the cache whenever it's available.
      • "0": ensure that the most up to date file is read. Using a value of 0 issues a Get metadata call to make sure that the object generation for the file in the cache matches what's stored in Cloud Storage.
    • Default value: "60".

You can specify the volume attributes in the following ways:

  • In the spec.csi.volumeAttributes field on a PersistentVolume manifest, if you use static provisioning.
  • In the spec.volumes[n].csi.volumeAttributes field, if you use CSI ephemeral volumes.

Considerations

Use the following considerations when configuring mounts:

  • The following flags are disallowed: app-name, temp-dir, foreground, log-file, log-format, key-file, token-url, and reuse-token-from-url.
  • Cloud Storage FUSE does not make implicit directories visible by default. To make these directories visible, you can turn on the implicit-dirs mount flag. To learn more, see Files and Directories in the Cloud Storage FUSE GitHub documentation.
  • If you use a Security Context for your Pod or container, or if your container image uses a non-root user or group, you must set the uid and gid mount flags. You also need to use the file-mode and dir-mode mount flags to set the file system permissions. Note that you cannot run chmod, chown, or chgrp commands against a Cloud Storage FUSE file system, so uid, gid, file-mode and dir-mode mount flags are necessary to provide access to a non-root user or group.
  • If you only want to mount a directory in the bucket instead of the entire bucket, pass the directory relative path by using the only-dir=relative/path/to/the/bucket/root flag.
  • To tune Cloud Storage FUSE caching behavior, configure volume attributes. Refer to Cloud Storage FUSE Caching documentation for details.
  • If you need to specify a maximum number of TCP connections allowed per server, you can specify this maximum using max-conns-per-host flag. The maximum number of TCP connections you define becomes effective when --client-protocol is set to http1. The default value is 0 which indicates no limit on TCP connections (limited by the machine specifications).
  • If you need to configure the Linux kernel mount options, you can pass the options using the o flag. For example, if you don't want to permit direct execution of any binaries on the mounted file system, set the o=noexec flag. Each option requires a separate flag, for example, o=noexec,o=noatime. Only the following options are allowed: exec, noexec, atime, noatime, sync, async, and dirsync.
  • If you need to troubleshoot Cloud Storage FUSE issues, set the log-severity flag to TRACE. Then the gcsfuseLoggingSeverity volume attribute is automatically set to trace.
  • Cloud Storage FUSE CSI driver does not allow you to modify the cache-dir field in the Cloud Storage FUSE configuration file, use fileCacheCapacity volume attribute to enable or disable the file caching. To replace the default emptyDir volume for file caching, you can configure a custom cache volume for the sidecar container.

Cloud Storage FUSE metrics

The following Cloud Storage FUSE metrics are now available through the GKE Monitoring API. Details about Cloud Storage FUSE metrics such as labels, type, and unit can be found in GKE System Metrics. These metrics are available for each Pod that uses Cloud Storage FUSE and lets you configure insights per volume and bucket.

File system metrics

File system metrics track the performance and health of your file system, including the number of operations, errors, and operation speed. These metrics can help identify bottlenecks and optimize performance.

  • gcsfusecsi/fs_ops_count
  • gcsfusecsi/fs_ops_error_count
  • gcsfusecsi/fs_ops_latency

Cloud Storage metrics

You can monitor Cloud Storage metrics, including data volume, speed, and request activity, to understand how your applications interact with Cloud Storage buckets. This data can help you identify areas for optimization, such as improving read patterns or reducing the number of requests.

  • gcsfusecsi/gcs_download_bytes_count
  • gcsfusecsi/gcs_read_count
  • gcsfusecsi/gcs_read_bytes_count
  • gcsfusecsi/gcs_reader_count
  • gcsfusecsi/gcs_request_count
  • gcsfusecsi/gcs_request_latencies

File cache metrics

You can monitor file cache metrics, including data read volume, speed, and cache hit rate, to optimize Cloud Storage FUSE and application performance. Analyze these metrics to improve your caching strategy and maximize cache hits.

  • gcsfusecsi/file_cache_read_bytes_count
  • gcsfusecsi/file_cache_read_latencies
  • gcsfusecsi/file_cache_read_count

Disable the Cloud Storage FUSE CSI driver

You cannot disable the Cloud Storage FUSE CSI driver on Autopilot clusters.

You can disable the Cloud Storage FUSE CSI driver on an existing Standard cluster by using the Google Cloud CLI.

gcloud container clusters update CLUSTER_NAME \
    --update-addons GcsFuseCsiDriver=DISABLED

Replace CLUSTER_NAME with the name of your cluster.

Troubleshooting

To troubleshoot issues when using the Cloud Storage FUSE CSI driver, see Troubleshooting Guide in the GitHub project documentation.

What's next