Mount Cloud Storage buckets as CSI ephemeral volumes


This guide shows you how to use CSI ephemeral volumes backed by your Cloud Storage buckets to automatically manage storage resources for your Kubernetes Pods or Jobs on Google Kubernetes Engine (GKE). CSI ephemeral volumes are tied to the Pod or Job lifecycle, and you don't need to manually handle PersistentVolume and PersistentVolumeClaim objects.

This guide is for Platform admins and operators who want to simplify storage management for their GKE applications.

Before reading this page, ensure you're familiar with CSI ephemeral volumes, Kubernetes Pods and Jobs, and Cloud Storage buckets.

If you are already familiar with PersistentVolumes and want consistency with your existing deployments that rely on this resource type, see Mount Cloud Storage buckets as persistent volumes.

Before you begin

Make sure you have completed these prerequisites:

How CSI ephemeral storage for Cloud Storage buckets work

CSI ephemeral volumes simplify storage management for your applications on GKE. You define CSI ephemeral volumes directly within your Pod or Job specification. Using CSI ephemeral volumes eliminates the need for separate PersistentVolume and PersistentVolumeClaim objects.

Using a CSI ephemeral volume involves these operations:

  1. Storage definition: You specify the storage in your Pod or Job's YAML file, including the CSI driver to use and any required parameters. For Cloud Storage FUSE CSI driver, you specify the bucket name and other relevant details.

    Optionally, you can fine-tune the performance of your CSI driver by using the file caching feature. File caching can boost GKE app performance by caching frequently accessed Cloud Storage files on a faster disk.

    Additionally, you can use the parallel download feature to accelerate reading large files from Cloud Storage for multi-threaded downloads. You can use this feature to improve model load times, especially for reads of over 1 GB in size.

  2. Driver invocation: When you create the Pod or Job, GKE detects the ephemeral volume request and calls the Cloud Storage FUSE CSI driver.

  3. Volume mount and attachment: The CSI driver mounts the CSI ephmeral volume (which points to the underlying Cloud Storage bucket) and makes it available to the Pod or Job, making it accessible to your application. To fine-tune how buckets are mounted in the file system, you can use mount options. You can also use volume attributes to configure specific behavior of the Cloud Storage FUSE CSI driver.

  4. Lifecycle management: The ephemeral volume exists for the lifetime of the Pod or Job. When the Pod is deleted or the Job completes, the CSI driver automatically handles cleanup, and unmounting the volume.

Attach the CSI ephemeral volume

Follow these instructions, depending on whether you want to attach the CSI ephemeral volume to a Pod or Job.

Pod

To attach the CSI ephemeral volume in a Pod, follow these steps:

  1. Create a Pod YAML manifest with the following specification:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gcs-fuse-csi-example-ephemeral 
      namespace: NAMESPACE
      annotations:
        gke-gcsfuse/volumes: "true" 
    spec:
      terminationGracePeriodSeconds: 60
      containers:
      - image: busybox
        name: busybox
        command: ["sleep"]
        args: ["infinity"] 
        volumeMounts:
        - name: gcs-fuse-csi-ephemeral
          mountPath: /data
          readOnly: true
      serviceAccountName: KSA_NAME
      volumes:
      - name: gcs-fuse-csi-ephemeral
        csi:
          driver: gcsfuse.csi.storage.gke.io
          readOnly: true
          volumeAttributes:
            bucketName: BUCKET_NAME
            mountOptions: "implicit-dirs" 
    

    Replace the following values:

    • NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
    • KSA_NAME: the name of the Kubernetes ServiceAccount you specified when configuring access to the Cloud Storage buckets.
    • BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.

    The example manifest shows these required settings:

    • metadata.annotations: the annotation gke-gcsfuse/volumes: "true" is required. See Configure the sidecar container for optional annotations.
    • spec.volumes[n].csi.driver: use gcsfuse.csi.storage.gke.io as the CSI driver name.

    Optionally, you can adjust these variables:

    • spec.terminationGracePeriodSeconds: By default, this is set to 30. If you need to write large files to the Cloud Storage bucket, increase this value to make sure that Cloud Storage FUSE has enough time to flush the data after your application exits. To learn more, see Kubernetes best practices: terminating with grace.
    • spec.volumes[n].csi.volumeAttributes.mountOptions: Pass mount options to Cloud Storage FUSE. Specify the flags in one string separated by commas, without spaces.
    • spec.volumes[n].csi.volumeAttributes: Pass additional volume attributes to Cloud Storage FUSE.
    • spec.volumes[n].csi.readOnly: Specify true if all the volume mounts are read-only.
    • spec.containers[n].volumeMounts[m].readOnly: Specify true if only a specific volume mount is read-only.
  2. Run the following command to apply the manifest to your cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to your YAML file.

Pod (file caching)

To attach the CSI ephemeral volume with file caching in a Pod, follow these steps:

  1. Create a cluster or node pool with Local SSD-backed ephemeral storage, by following the steps in Create a cluster or node pool with Local SSD-backed ephemeral storage.

  2. Create a Pod YAML manifest with the following specification:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gcs-fuse-csi-file-cache-example 
      namespace: NAMESPACE
      annotations:
        gke-gcsfuse/volumes: "true"
        gke-gcsfuse/ephemeral-storage-limit: "50Gi" 
    spec:
      nodeSelector:
        cloud.google.com/gke-ephemeral-storage-local-ssd: "true"
      restartPolicy: Never
      initContainers:
      - name: data-loader
        image: gcr.io/google.com/cloudsdktool/google-cloud-cli:slim
        resources:
          limits:
            cpu: 500m
            memory: 1Gi
          requests:
            cpu: 500m
            memory: 1Gi
        command:
          - "/bin/sh"
          - "-c"
          - |
            mkdir -p /test_files
            for i in $(seq 1 1000); do dd if=/dev/zero of=/test_files/file_$i.txt bs=1024 count=64; done
            gcloud storage cp /test_files gs://BUCKET_NAME --recursive
      containers:
      - name: data-validator
        image: busybox
        resources:
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi
        command:
          - "/bin/sh"
          - "-c"
          - |
            echo "first read with cache miss"
            time cat /data/test_files/file_* > /dev/null
    
            echo "second read from local cache"
            time cat /data/test_files/file_* > /dev/null 
        volumeMounts:
        - name: gcs-fuse-csi-ephemeral
          mountPath: /data
      serviceAccountName: KSA_NAME
      volumes:
      - name: gcs-fuse-csi-ephemeral
        csi:
          driver: gcsfuse.csi.storage.gke.io
          volumeAttributes:
            bucketName: BUCKET_NAME
            mountOptions: "implicit-dirs,file-cache:max-size-mb:-1"
    

    Replace the following values:

    • NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
    • KSA_NAME: the name of the Kubernetes ServiceAccount you specified when configuring access to the Cloud Storage buckets.
    • BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.

      In the example manifest, the init container data-loader generates 1,000 files with size of 64 KiB, and uploads the files to a Cloud Storage bucket. The main container data-validator reads all the files from the bucket twice, and logs the duration.

  3. Run the following command to apply the manifest to your cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to your YAML file.

  4. To view the log output, run the following command:

    kubectl logs -n NAMESPACE gcs-fuse-csi-file-cache-example -c data-validator
    

    Replace NAMESPACE with the namespace of your workload.

    The output should look similar to the following:

    first read with cache miss
    real    0m 54.68s
    ...
    second read from local cache
    real    0m 0.38s
    ...
    

    The output shows that the second read with local cache is much faster than the first read with a cache miss.

Pod (parallel download)

To attach the CSI ephemeral volume with parallel download in a Pod, follow these steps:

  1. Create a Pod YAML manifest with the following specification:

    apiVersion: v1
    kind: Pod
    metadata:
      name: gcs-fuse-csi-example-ephemeral 
      namespace: NAMESPACE
      annotations:
        gke-gcsfuse/volumes: "true"
        gke-gcsfuse/ephemeral-storage-limit: "50Gi" 
    spec:
      containers:
      ...
      volumes:
      - name: gcs-fuse-csi-ephemeral 
        csi:
          driver: gcsfuse.csi.storage.gke.io
          volumeAttributes:
            bucketName: BUCKET_NAME
            mountOptions: "implicit-dirs,file-cache:enable-parallel-downloads:true,file-cache:max-size-mb:-1"
            fileCacheCapacity: "-1"
    

    Replace the following values:

    • NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
    • BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.
  2. Run the following command to apply the manifest to your cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to your YAML file.

Job

To attach the CSI ephemeral volume in a Job, follow these steps:

  1. Create a Job YAML manifest with the following specification:

    apiVersion: batch/v1
    kind: Job
    metadata:
      name: gcs-fuse-csi-job-example 
      namespace: NAMESPACE 
    spec:
      template:
        metadata: 
          annotations:
            gke-gcsfuse/volumes: "true"
        spec:
          serviceAccountName: KSA_NAME 
          containers:
          - name: writer
            image: busybox
            command:
              - "/bin/sh"
              - "-c"
              - touch /data/test && echo $(date) >> /data/test && sleep 10
            volumeMounts:
            - name: gcs-fuse-csi-ephemeral
              mountPath: /data
          - name: reader
            image: busybox
            command:
              - "/bin/sh"
              - "-c"
              - sleep 10 && cat /data/test 
            volumeMounts:
            - name: gcs-fuse-csi-ephemeral
              mountPath: /data
              readOnly: true
          volumes:
          - name: gcs-fuse-csi-ephemeral
            csi:
              driver: gcsfuse.csi.storage.gke.io
              volumeAttributes:
                bucketName: BUCKET_NAME
          restartPolicy: Never 
      backoffLimit: 1
    

    Replace the following values:

    • NAMESPACE: the Kubernetes namespace where you deploy your Pod.
    • KSA_NAME: the name of the Kubernetes ServiceAccount you specified when configuring access to the Cloud Storage buckets.
    • BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (_) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.

    The example manifest shows these required settings:

    • metadata.annotations: the annotation gke-gcsfuse/volumes: "true" is required. See Configure the sidecar container for optional annotations.
    • spec.volumes[n].csi.driver: use gcsfuse.csi.storage.gke.io as the CSI driver name.

    Optionally, you can adjust these variables:

    • spec.volumes[n].csi.volumeAttributes.mountOptions: Pass mount options to Cloud Storage FUSE. Specify the flags in one string separated by commas, without spaces.
    • spec.volumes[n].csi.volumeAttributes: Pass additional volume attributes to Cloud Storage FUSE.
    • spec.volumes[n].csi.readOnly: Specify true if all the volume mounts are read-only.
    • spec.containers[n].volumeMounts[m].readOnly: Specify true if only a specific volume mount is read-only.
  2. Run the following command to apply the manifest to your cluster:

    kubectl apply -f FILE_PATH
    

    Replace FILE_PATH with the path to your YAML file.

Troubleshoot issues

If you need to troubleshoot Cloud Storage FUSE issues, you can set the log-severity flag to TRACE. You set the flag in the args section of the driver's container spec within the deployment YAML. This causes the gcsfuseLoggingSeverity volume attribute to be automatically set to trace.

For additional troubleshooting tips, see Troubleshooting Guide in the GitHub project documentation.

What's next