This guide shows you how to use CSI ephemeral volumes backed by your Cloud Storage buckets to automatically manage storage resources for your Kubernetes Pods or Jobs on Google Kubernetes Engine (GKE). CSI ephemeral volumes are tied to the Pod or Job lifecycle, and you don't need to manually handle PersistentVolume and PersistentVolumeClaim objects.
This guide is for Platform admins and operators who want to simplify storage management for their GKE applications.
Before reading this page, ensure you're familiar with CSI ephemeral volumes, Kubernetes Pods and Jobs, and Cloud Storage buckets.
If you are already familiar with PersistentVolumes and want consistency with your existing deployments that rely on this resource type, see Mount Cloud Storage buckets as persistent volumes.
Before you begin
Make sure you have completed these prerequisites:
- Understand the requirements and limitations of the Cloud Storage FUSE CSI driver.
- Create the Cloud Storage bucket
- Enable the Cloud Storage FUSE CSI driver
- Configure access to Cloud Storage buckets
How CSI ephemeral storage for Cloud Storage buckets work
CSI ephemeral volumes simplify storage management for your applications on GKE. You define CSI ephemeral volumes directly within your Pod or Job specification. Using CSI ephemeral volumes eliminates the need for separate PersistentVolume and PersistentVolumeClaim objects.
Using a CSI ephemeral volume involves these operations:
Storage definition: You specify the storage in your Pod or Job's YAML file, including the CSI driver to use and any required parameters. For Cloud Storage FUSE CSI driver, you specify the bucket name and other relevant details.
Optionally, you can fine-tune the performance of your CSI driver by using the file caching feature. File caching can boost GKE app performance by caching frequently accessed Cloud Storage files on a faster disk.
Additionally, you can use the parallel download feature to accelerate reading large files from Cloud Storage for multi-threaded downloads. You can use this feature to improve model load times, especially for reads of over 1 GB in size.
Driver invocation: When you create the Pod or Job, GKE detects the ephemeral volume request and calls the Cloud Storage FUSE CSI driver.
Volume mount and attachment: The CSI driver mounts the CSI ephmeral volume (which points to the underlying Cloud Storage bucket) and makes it available to the Pod or Job, making it accessible to your application. To fine-tune how buckets are mounted in the file system, you can use mount options. You can also use volume attributes to configure specific behavior of the Cloud Storage FUSE CSI driver.
Lifecycle management: The ephemeral volume exists for the lifetime of the Pod or Job. When the Pod is deleted or the Job completes, the CSI driver automatically handles cleanup, and unmounting the volume.
Attach the CSI ephemeral volume
Follow these instructions, depending on whether you want to attach the CSI ephemeral volume to a Pod or Job.
Pod
To attach the CSI ephemeral volume in a Pod, follow these steps:
Create a Pod YAML manifest with the following specification:
apiVersion: v1 kind: Pod metadata: name: gcs-fuse-csi-example-ephemeral namespace: NAMESPACE annotations: gke-gcsfuse/volumes: "true" spec: terminationGracePeriodSeconds: 60 containers: - image: busybox name: busybox command: ["sleep"] args: ["infinity"] volumeMounts: - name: gcs-fuse-csi-ephemeral mountPath: /data readOnly: true serviceAccountName: KSA_NAME volumes: - name: gcs-fuse-csi-ephemeral csi: driver: gcsfuse.csi.storage.gke.io readOnly: true volumeAttributes: bucketName: BUCKET_NAME mountOptions: "implicit-dirs"
Replace the following values:
- NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
- KSA_NAME: the name of the Kubernetes ServiceAccount you specified when configuring access to the Cloud Storage buckets.
- BUCKET_NAME: the Cloud Storage bucket
name you specified when configuring access to the Cloud Storage buckets.
You can specify an underscore (
_
) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.
The example manifest shows these required settings:
metadata.annotations
: the annotationgke-gcsfuse/volumes: "true"
is required. See Configure the sidecar container for optional annotations.spec.volumes[n].csi.driver
: usegcsfuse.csi.storage.gke.io
as the CSI driver name.
Optionally, you can adjust these variables:
spec.terminationGracePeriodSeconds
: By default, this is set to 30. If you need to write large files to the Cloud Storage bucket, increase this value to make sure that Cloud Storage FUSE has enough time to flush the data after your application exits. To learn more, see Kubernetes best practices: terminating with grace.spec.volumes[n].csi.volumeAttributes.mountOptions
: Pass mount options to Cloud Storage FUSE. Specify the flags in one string separated by commas, without spaces.spec.volumes[n].csi.volumeAttributes
: Pass additional volume attributes to Cloud Storage FUSE.spec.volumes[n].csi.readOnly
: Specify true if all the volume mounts are read-only.spec.containers[n].volumeMounts[m].readOnly
: Specify true if only a specific volume mount is read-only.
Run the following command to apply the manifest to your cluster:
kubectl apply -f FILE_PATH
Replace FILE_PATH with the path to your YAML file.
Pod (file caching)
To attach the CSI ephemeral volume with file caching in a Pod, follow these steps:
Create a cluster or node pool with Local SSD-backed ephemeral storage, by following the steps in Create a cluster or node pool with Local SSD-backed ephemeral storage.
Create a Pod YAML manifest with the following specification:
apiVersion: v1 kind: Pod metadata: name: gcs-fuse-csi-file-cache-example namespace: NAMESPACE annotations: gke-gcsfuse/volumes: "true" gke-gcsfuse/ephemeral-storage-limit: "50Gi" spec: nodeSelector: cloud.google.com/gke-ephemeral-storage-local-ssd: "true" restartPolicy: Never initContainers: - name: data-loader image: gcr.io/google.com/cloudsdktool/google-cloud-cli:slim resources: limits: cpu: 500m memory: 1Gi requests: cpu: 500m memory: 1Gi command: - "/bin/sh" - "-c" - | mkdir -p /test_files for i in $(seq 1 1000); do dd if=/dev/zero of=/test_files/file_$i.txt bs=1024 count=64; done gcloud storage cp /test_files gs://BUCKET_NAME --recursive containers: - name: data-validator image: busybox resources: limits: cpu: 500m memory: 512Mi requests: cpu: 500m memory: 512Mi command: - "/bin/sh" - "-c" - | echo "first read with cache miss" time cat /data/test_files/file_* > /dev/null echo "second read from local cache" time cat /data/test_files/file_* > /dev/null volumeMounts: - name: gcs-fuse-csi-ephemeral mountPath: /data serviceAccountName: KSA_NAME volumes: - name: gcs-fuse-csi-ephemeral csi: driver: gcsfuse.csi.storage.gke.io volumeAttributes: bucketName: BUCKET_NAME mountOptions: "implicit-dirs,file-cache:max-size-mb:-1"
Replace the following values:
- NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
- KSA_NAME: the name of the Kubernetes ServiceAccount you specified when configuring access to the Cloud Storage buckets.
BUCKET_NAME: the Cloud Storage bucket name you specified when configuring access to the Cloud Storage buckets. You can specify an underscore (
_
) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.In the example manifest, the init container data-loader generates 1,000 files with size of 64 KiB, and uploads the files to a Cloud Storage bucket. The main container
data-validator
reads all the files from the bucket twice, and logs the duration.
Run the following command to apply the manifest to your cluster:
kubectl apply -f FILE_PATH
Replace FILE_PATH with the path to your YAML file.
To view the log output, run the following command:
kubectl logs -n NAMESPACE gcs-fuse-csi-file-cache-example -c data-validator
Replace NAMESPACE with the namespace of your workload.
The output should look similar to the following:
first read with cache miss real 0m 54.68s ... second read from local cache real 0m 0.38s ...
The output shows that the second read with local cache is much faster than the first read with a cache miss.
Pod (parallel download)
To attach the CSI ephemeral volume with parallel download in a Pod, follow these steps:
Create a Pod YAML manifest with the following specification:
apiVersion: v1 kind: Pod metadata: name: gcs-fuse-csi-example-ephemeral namespace: NAMESPACE annotations: gke-gcsfuse/volumes: "true" gke-gcsfuse/ephemeral-storage-limit: "50Gi" spec: containers: ... volumes: - name: gcs-fuse-csi-ephemeral csi: driver: gcsfuse.csi.storage.gke.io volumeAttributes: bucketName: BUCKET_NAME mountOptions: "implicit-dirs,file-cache:enable-parallel-downloads:true,file-cache:max-size-mb:-1" fileCacheCapacity: "-1"
Replace the following values:
- NAMESPACE: the Kubernetes namespace where you want to deploy your Pod.
- BUCKET_NAME: the Cloud Storage bucket
name you specified when configuring access to the Cloud Storage buckets.
You can specify an underscore (
_
) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.
Run the following command to apply the manifest to your cluster:
kubectl apply -f FILE_PATH
Replace FILE_PATH with the path to your YAML file.
Job
To attach the CSI ephemeral volume in a Job, follow these steps:
Create a Job YAML manifest with the following specification:
apiVersion: batch/v1 kind: Job metadata: name: gcs-fuse-csi-job-example namespace: NAMESPACE spec: template: metadata: annotations: gke-gcsfuse/volumes: "true" spec: serviceAccountName: KSA_NAME containers: - name: writer image: busybox command: - "/bin/sh" - "-c" - touch /data/test && echo $(date) >> /data/test && sleep 10 volumeMounts: - name: gcs-fuse-csi-ephemeral mountPath: /data - name: reader image: busybox command: - "/bin/sh" - "-c" - sleep 10 && cat /data/test volumeMounts: - name: gcs-fuse-csi-ephemeral mountPath: /data readOnly: true volumes: - name: gcs-fuse-csi-ephemeral csi: driver: gcsfuse.csi.storage.gke.io volumeAttributes: bucketName: BUCKET_NAME restartPolicy: Never backoffLimit: 1
Replace the following values:
- NAMESPACE: the Kubernetes namespace where you deploy your Pod.
- KSA_NAME: the name of the Kubernetes ServiceAccount you specified when configuring access to the Cloud Storage buckets.
- BUCKET_NAME: the Cloud Storage bucket
name you specified when configuring access to the Cloud Storage buckets.
You can specify an underscore (
_
) to mount all buckets that the Kubernetes ServiceAccount can access. To learn more, see Dynamic mounting in the Cloud Storage FUSE documentation.
The example manifest shows these required settings:
metadata.annotations
: the annotationgke-gcsfuse/volumes: "true"
is required. See Configure the sidecar container for optional annotations.spec.volumes[n].csi.drive
r: usegcsfuse.csi.storage.gke.io
as the CSI driver name.
Optionally, you can adjust these variables:
spec.volumes[n].csi.volumeAttributes.mountOptions
: Pass mount options to Cloud Storage FUSE. Specify the flags in one string separated by commas, without spaces.spec.volumes[n].csi.volumeAttributes
: Pass additional volume attributes to Cloud Storage FUSE.spec.volumes[n].csi.readOnly
: Specify true if all the volume mounts are read-only.spec.containers[n].volumeMounts[m].readOnly
: Specify true if only a specific volume mount is read-only.
Run the following command to apply the manifest to your cluster:
kubectl apply -f FILE_PATH
Replace
FILE_PATH
with the path to your YAML file.
Troubleshoot issues
If you need to troubleshoot Cloud Storage FUSE issues, you can set the
log-severity
flag to TRACE
. You set the flag in the args
section of the
driver's container spec within the deployment YAML. This causes the
gcsfuseLoggingSeverity
volume attribute to be automatically set to trace.
For additional troubleshooting tips, see Troubleshooting Guide in the GitHub project documentation.
What's next
- Learn how to optimize performance for the Cloud Storage FUSE CSI driver.
- Explore additional samples for using the CSI driver on GitHub.
- Learn more about Cloud Storage FUSE.