This page provides an overview of Cloud Storage FUSE, a FUSE adapter that lets you mount and access Cloud Storage buckets as local file systems, so applications can read and write objects in your bucket using standard file system semantics.
This documentation always reflects the latest version of Cloud Storage FUSE. For details on the latest version, see Cloud Storage FUSE releases on GitHub.
Overview
Cloud Storage FUSE is an open source product supported by Google. Cloud Storage FUSE uses FUSE and Cloud Storage APIs to transparently expose buckets as locally mounted folders on your file system.
Cloud Storage FUSE is integrated with other Google Cloud services. For example, the Cloud Storage FUSE CSI driver lets you use the Google Kubernetes Engine (GKE) API to consume buckets as volumes, so you can read from and write to Cloud Storage from within your Kubernetes pods. For more information on other integrations, see Integrations.
How Cloud Storage FUSE works
Cloud Storage FUSE works by translating object storage names into a directory-like
structure, interpreting the slash character (/
) in object names as a directory
separator. Objects with the same common prefix are treated as files in the
same directory, allowing applications to interact with the mounted bucket like a
file system. Objects can also be organized into a logical file system
structure using hierarchical namespace, which lets you organize objects
into folders.
Cloud Storage FUSE can be run from anywhere with connectivity to Cloud Storage, including Google Kubernetes Engine, Compute Engine VMs, or on-premises systems.
Use cases for Cloud Storage FUSE
Cloud Storage FUSE is ideal for use cases where Cloud Storage has the right performance and scalability characteristics for an application that requires file system semantics. For example, Cloud Storage FUSE is useful for machine learning (ML) projects because it provides a way to store data, models, checkpoints, and logs directly in Cloud Storage. For more information, see Cloud Storage FUSE for ML workloads.
Cloud Storage FUSE for machine learning
Cloud Storage FUSE is a common choice for developers looking to store and access ML training and model data as objects in Cloud Storage. Cloud Storage FUSE provides several benefits for developing ML projects:
Cloud Storage FUSE lets you mount Cloud Storage buckets as a local file system so your applications can access training and model data using standard file system semantics. This means that you can avoid the cost of rewriting or refactoring your application's code when using Cloud Storage to store ML data.
From training to inference, Cloud Storage FUSE lets you use the built-in high scalability, performance, and cost effectiveness of Cloud Storage, so you can run your ML workloads at scale.
Cloud Storage FUSE lets you start training jobs quickly by providing compute resources with direct access to data in Cloud Storage, so you don't need to download training data to the compute resource.
For more information, see Frameworks, operating systems, and architectures supported by Cloud Storage FUSE.
Frameworks, operating systems, and architectures
Cloud Storage FUSE has been validated with the following frameworks:
TensorFlow V2.x
TensorFlow V1.x
PyTorch V2.x
PyTorch V1.x
JAX 0.4.x
Cloud Storage FUSE supports the following operating systems and architectures:
Rocky Linux 8.9 or later
Ubuntu 18.04 or later
Debian 10 or later
CentOS 7.9 or later
RHEL 7.9 or later
SLES 15 or later
x86_64
ARM64
Cloud Storage FUSE integrations with Google Cloud products
Cloud Storage FUSE integrates with the following Google Cloud products:
Product | How Cloud Storage FUSE is integrated |
---|---|
Google Kubernetes Engine (GKE) | The Cloud Storage FUSE CSI driver manages the integration of Cloud Storage FUSE with the Kubernetes API to consume Cloud Storage buckets as volumes. You can use the Cloud Storage FUSE CSI driver to mount buckets as file systems on Google Kubernetes Engine nodes. |
Vertex AI training | You can access data from a Cloud Storage bucket as a mounted file system when you perform custom training on Vertex AI. For more information, see Prepare training code. |
Vertex AI Workbench | Vertex AI Workbench instances include a Cloud Storage integration that lets you browse buckets and work with compatible files located in Cloud Storage from within the JupyterLab interface. The Cloud Storage integration lets you access all of the Cloud Storage buckets and files that your instance has access to within the same project as your Vertex AI Workbench instance. To set up the integration, see Vertex AI Workbench instructions for how to access Cloud Storage buckets and files in JupyterLab. |
Deep Learning VM Images | Cloud Storage FUSE comes pre-installed with Deep Learning VM Images. |
Deep Learning Containers | To mount Cloud Storage buckets for Deep Learning Containers, you can either use the Cloud Storage FUSE CSI driver (recommended) or install Cloud Storage FUSE. |
Batch | Cloud Storage FUSE lets you mount Cloud Storage buckets as storage volumes when you create and run Batch jobs. You can specify a bucket in a job's definition, and the bucket gets automatically mounted to the VMs for the job when the job runs. |
Cloud Run | Cloud Run lets you mount a Cloud Storage bucket as a volume and presents the bucket content as files in the container file system. To set up volume mounting, see Mount a Cloud Storage volume. |
Cloud Composer | When you create an environment, Cloud Composer stores the source code for your workflows and their dependencies in specific folders in a Cloud Storage bucket. Cloud Composer uses Cloud Storage FUSE to map the folders in the bucket to the Airflow components in the Cloud Composer environment. |
For a list of Google Cloud products that are integrated with Cloud Storage generally, see Integration with Google Cloud services and tools.
Caching
Cloud Storage FUSE offers four types of caching to help increase performance and reduce cost: file caching, stat caching, type caching, and list caching. For more information about these caches, see Overview of caching.
Directory semantics
Cloud Storage offers buckets with a flat namespace and buckets with hierarchical namespace enabled. By default, Cloud Storage FUSE can infer explicitly-defined directories, also known as folders, in buckets with hierarchical namespace enabled but it can't infer implicitly-defined directories in buckets with a flat namespace. Implicitly-defined directories include simulated folders and managed folders.
For example, say you mount a bucket named my-bucket
, which contains an object
named my-directory/my-object.txt
, where my-directory/
is a simulated folder.
When you run ls
on the bucket mount point, by default, Cloud Storage FUSE cannot
access the simulated directory my-bucket/my-directory/
nor the object
my-object.txt
within it. To enable Cloud Storage FUSE to infer the simulated folder
and the object within it, include the --implicit-dirs
option as part of your
gcsfuse mount
command when mounting a flat namespace bucket. For more
information about the --implicit-dirs
option, see the
Cloud Storage FUSE command-line documentation.
If you need to store and access your data using a file system, use buckets with hierarchical namespace enabled. To learn how to create such buckets, see Create buckets with hierarchical namespace enabled.
For more information about directory semantics, including how to mount buckets with implicitly-defined directories, see Files and Directories in the GitHub documentation.
Retry strategy
By default, failed requests from Cloud Storage FUSE to Cloud Storage are
retried with exponential backoff up to a specified maximum backoff
duration, which has a value of 30s
(30 seconds) by default. Once the backoff
duration exceeds the specified maximum duration, the retry continues with the
specified maximum duration. You can use the --max-retry-sleep
option as part
of a gcsfuse
call to specify the backoff duration.
For more information on the --max-retry-sleep
option, see the
gcsfuse
command-line documentation.
Retry strategy for stalled uploads
Large file writes are uploaded in chunks. To help reduce tail end write latencies, if a chunk-level write operation stalls or fails, Cloud Storage FUSE attempts a retry after 10 seconds. A maximum of four retry operations are performed for each stalled chunk.
Cloud Storage FUSE operations associated with Cloud Storage operations
When you perform an operation using Cloud Storage FUSE, you also perform the
Cloud Storage operations associated with the Cloud Storage FUSE
operation. The following table describes common Cloud Storage FUSE commands and
their associated Cloud Storage JSON API operations. You can display
information about the Cloud Storage FUSE operations by setting the
--log-severity
flag to TRACE
in your gcsfuse
command.
Command | JSON API Operations |
---|---|
gcsfuse --log-severity=TRACE example-bucket mp |
Objects.list (to check credentials) |
cd mp |
n/a |
ls mp |
Objects.list("") |
mkdir subdir |
Objects.get("subdir") Objects.get("subdir/") Objects.insert("subdir/") |
cp ~/local.txt subdir/ |
Objects.get("subdir/local.txt") Objects.get("subdir/local.txt/") Objects.insert("subdir/local.txt"), to create an empty object Objects.insert("subdir/local.txt"), when closing after done writing |
rm -rf subdir |
Objects.list("subdir") Objects.list("subdir/") Objects.delete("subdir/local.txt") Objects.list("subdir/") Objects.delete("subdir/") |
Pricing for Cloud Storage FUSE
Cloud Storage FUSE is available free of charge, but the storage, metadata, and network I/O it generates to and from Cloud Storage are charged like any other Cloud Storage interface. In other words, all data transfer and operations performed by Cloud Storage FUSE map to Cloud Storage transfers and operations, and are charged accordingly. For more information on common Cloud Storage FUSE operations and how they map to Cloud Storage operations, see the operations mapping.
To avoid surprises, you should estimate how your use of Cloud Storage FUSE translates to Cloud Storage charges. For example, if you are using Cloud Storage FUSE to store log files, you can incur charges quickly if logs are aggressively flushed on hundreds or thousands of machines at the same time.
See Cloud Storage Pricing for information on charges such as storage, network usage, and operations.
Limitations
While Cloud Storage FUSE has a file system interface, it is not like an NFS or CIFS file system on the backend. Additionally, Cloud Storage FUSE is not POSIX compliant. For a POSIX file system product in Google Cloud, see Filestore.
When using Cloud Storage FUSE, be aware of its limitations and semantics, which are different than that of POSIX file systems. Cloud Storage FUSE should only be used within its capabilities.
Limitations and differences from POSIX file systems
The following list describes the limitations of Cloud Storage FUSE:
- Metadata: Cloud Storage FUSE does not transfer object metadata when uploading files to Cloud Storage, with the exception of mtime and symlink targets. This means that you cannot set object metadata when you upload files using Cloud Storage FUSE. If you need to preserve object metadata, consider uploading files using the Google Cloud CLI, the JSON API, or the Google Cloud console.
- Concurrency: Cloud Storage FUSE does not provide concurrency control for multiple writes to the same file. When multiple writes try to replace a file, the last write wins and all previous writes are lost. There is no merging, version control, or user notification of the subsequent overwrite.
- Linking: Cloud Storage FUSE does not support hard links.
- File locking and file patching: Cloud Storage FUSE does not support file locking or file patching. As such, you shouldn't store version control system repositories in Cloud Storage FUSE mount points, as version control systems rely on file locking and patching. Additionally, you shouldn't use Cloud Storage FUSE as a file replacement.
- Semantics: Semantics in Cloud Storage FUSE are different from semantics in a conventional file system. For example, metadata like last access time are not supported, and some metadata operations like directory renaming are not atomic unless you use buckets with hierarchical namespace enabled. For a list of differences between Cloud Storage FUSE semantics and conventional file system semantics, see Semantics in the Cloud Storage FUSE GitHub documentation. To learn about how Cloud Storage FUSE infers directories in Cloud Storage, see directory semantics.
- Workloads that do file patching (or overwrites in place): Cloud Storage FUSE can only write whole objects at a time to Cloud Storage and does not provide a mechanism for patching. If you try to patch a file, Cloud Storage FUSE will reupload the entire file. The only exception to this behavior is that you can append content to the end of a file that's 2 MB and larger, where Cloud Storage FUSE will only reupload the appended content.
- Access: Authorization for files is governed by Cloud Storage permissions. POSIX-style access control does not work.
- Performance: Cloud Storage FUSE has much higher latency than a local file system, and as such, shouldn't be used as the backend for storing a database. Throughput may be reduced when reading or writing one small file at a time. Using larger files or transferring multiple files at a time will help increase throughput.
- Availability: Transient errors can sometimes occur when you use Cloud Storage FUSE to access Cloud Storage. It's recommended that you retry failed operations using retry strategies.
- Object versioning: Cloud Storage FUSE does not formally support usage with buckets that have object versioning enabled. Attempting to use Cloud Storage FUSE with buckets that have object versioning enabled can produce unpredictable behavior.
- File transcoding:
Objects with
content-encoding: gzip
in metadata: Any such object in a Cloud Storage FUSE-mounted directory does not undergo decompressive transcoding. Instead, the object remains compressed in the same manner that it's stored in the bucket.For example, a file of 1000 bytes, uploaded to a bucket using the
gcloud storage cp
command with the--gzip-local
flag, might become 60 bytes (the actual compressed size depends on the content and the gzip implementation used by the gcloud CLI) as a Cloud Storage object. If the bucket is mounted using gcsfuse, and the corresponding file is listed or read from the mount directory, its size is returned as 60 bytes, and its contents are a compressed version of the original 1000 bytes content.This is in contrast to a download using
gcloud storage cp gs://bucket/path /local/path
which undergoes decompressive transcoding: in thegcloud
command, the content is auto-decompressed during the download, and the original, uncompressed content is served. - Retention policies: Cloud Storage FUSE does not
support writing to buckets with a
retention policy.
If you attempt to write to a bucket with a retention policy, your writes will
fail.
Cloud Storage FUSE supports reading objects from buckets with a retention policy, but the bucket must be mounted as
Read-Only
by passing the-o RO
flag during bucket mounting. - Local storage: Objects that are new or modified are stored in their entirety in a local temporary file until they are closed or synced. When working with large files, make sure you have enough local storage capacity for temporary copies of the files, particularly if you are working with Compute Engine instances. For more information, see the README in the Cloud Storage FUSE GitHub documentation.
- File handle limits: The Linux kernel has a default limit of 1,024 open file handles. When using Cloud Storage FUSE as a server to handle multiple concurrent connections, you might exceed this limit. To avoid issues, ensure the number of concurrent connections to a single host remains under the limit, and consider increasing the limit. Scenarios where this is important include using a Cloud Storage FUSE mount to serve web content, host network-attached storage (NAS), or host a file transfer protocol (FTP) server. When serving web content on Cloud Run from a Cloud Storage FUSE mount, the maximum concurrent requests per instance is restricted to less than 1,000.
rsync
limitations: Cloud Storage FUSE's file system latency affectsrsync
, which reads and writes only one file at a time. To transfer multiple files to or from your bucket in parallel, use the Google Cloud CLI by runninggcloud storage rsync
. For more information, see thersync
documentation.- List operations limitations: When you list all the
objects in a mounted bucket, for example, by running
ls
, Cloud Storage FUSE calls the Objects: list API on Cloud Storage. The API paginates results, which means that Cloud Storage FUSE might need to issue multiple calls, depending on how many objects are in your bucket, which can make a list operation expensive and slow.
Known issues
For a list of known issues in Cloud Storage FUSE, refer to GitHub.
Get support
You can get support, submit general questions, and request new features by using one of Google Cloud's official support channels. You can also get support by filing issues in GitHub.
For solutions to commonly-encountered issues, see Troubleshooting in the Cloud Storage FUSE GitHub documentation.
What's next
Learn how to install the gcsfuse CLI.
Discover Cloud Storage FUSE by completing a quickstart.
Learn how to mount buckets.
Learn how to configure the behavior of Cloud Storage FUSE, using the
gcsfuse
command-line tool or a configuration file.