Use Cloud Storage FUSE file caching

Cloud Storage FUSE file caching is a client-side read cache that enhances the performance of read operations by serving repeat file reads from a faster cache storage of your choice. When file caching is enabled, Cloud Storage FUSE stores copies of frequently accessed files locally, allowing subsequent reads to be served directly from the cache, which reduces latency and improves throughput.

Benefits of file caching

File caching provides the following benefits:

  • Improves performance for small and random I/Os: file caching improves latency and throughput by serving reads directly from the cache media. Small and random I/O operations can be significantly faster when served from the cache.

  • Leverages parallel downloads: parallel downloads are enabled automatically on Cloud Storage FUSE versions 2.12 and later when the file cache is enabled. Parallel downloads utilize multiple workers to download a file in parallel using the file cache directory as a prefetch buffer, which can result in up to nine times faster model load time. We recommend that you use parallel downloads for single-threaded read scenarios that load large files such as model serving and checkpoint restores.

  • Use existing capacity: file caching can use existing provisioned machine capacity for your cache directory without incurring charges for additional storage. This includes Local SSDs that come bundled with Cloud GPUs machine types such as a2-ultragpu, a3-highgpu, Persistent Disk (which is the boot disk used by each VM), or in-memory /tmpfs.

  • Reduced charges: cache hits are served locally and don't incur Cloud Storage operation or network charges.

  • Improved total cost of ownership for AI and ML training: file caching increases Cloud GPUs and Cloud TPU utilization by loading data faster, which reduces time to training and provides a greater price-performance ratio for artificial intelligence and machine learning (AI/ML) training workloads.

Parallel downloads

Parallel downloads can improve read performance by using multiple workers to download multiple parts of a file in parallel using the file cache directory as a prefetch buffer. We recommend using parallel downloads for read scenarios that load large files such as model serving, checkpoint restores, and training on large objects.

Use cases for enabling file caching with parallel downloads include the following:

Use case type Description
Training Enable file caching if the data you want to access is read multiple times, whether the same file multiple times, or different offsets of the same file. If the dataset is larger than the file cache, the file cache should remain disabled, and instead, use one of the following methods:
Serving model weights and checkpoint reads Enable file caching with parallel downloads to be able to utilize parallel downloads, which loads large files much faster than if file caching and parallel downloads aren't used.

Considerations

The following sections provide important considerations for using file caching.

File size and available capacity

The file being read must fit within the available capacity in the file cache directory available capacity which can be controlled using either the --file-cache-max-size-mb CLI option or the file-cache:max-size-mb field.

Random and partial read management

If the first file read operation starts from the beginning of the file, at offset 0, the Cloud Storage FUSE file cache ingests and loads the entire file into the cache, even if you're only reading from a small range subset. This lets subsequent random or partial reads from the same object get served directly from the cache.

If a file's first read operation starts from anywhere other than offset 0, Cloud Storage FUSE, by default, doesn't trigger an asynchronous full file fetch. To change this behavior so that Cloud Storage FUSE ingests a file to the cache upon an initial random read, use one of the following methods to set the behavior to true:

We recommend that you enable this property if many different random or partial read operations are performed on the same object.

Cache eviction

The eviction of cached metadata and data is based on a least recently used (LRU) algorithm that begins once the space threshold configured per --file-cache-max-size-mb limit is reached. If the entry expires based on its TTL, a GET metadata call is first made to Cloud Storage and is subject to network latencies. Since the data and metadata are managed separately, you might experience one entity being evicted or invalidated and not the other.

Cache persistence

Cloud Storage FUSE caches aren't persisted on unmounts and restarts. For file caching, while the metadata entries needed to serve files from the cache are evicted on unmounts and restarts, data in the file cache might still be present in the file directory. We recommend that you delete data in the file cache directory after unmounts or restarts.

Data security

When you enable caching, Cloud Storage FUSE uses the cache directory you specified using one of the following methods as the underlying directory for the cache to persist files from your Cloud Storage bucket in an encrypted format:

Any user or process that has access to this cache directory can access these files. We recommend restricting access to this directory.

Direct or multiple access to the file cache

Using a process other than Cloud Storage FUSE to access or modify a file in the cache directory can lead to data corruption. Cloud Storage FUSE caches are specific to each Cloud Storage FUSE running process with no awareness across different Cloud Storage FUSE processes running on the same or different machines. Therefore, we don't recommend using the same cache directory for different Cloud Storage FUSE processes.

If multiple Cloud Storage FUSE processes need to run on the same machine, each Cloud Storage FUSE process should get its own specific cache directory, or use one of following methods to ensure your data doesn't get corrupted:

  • Mount all buckets with a shared cache: use dynamic mounting to mount all buckets you have access to in a single process with a shared cache. To learn more, see Cloud Storage FUSE dynamic mounting.

  • Enable caching on a specific bucket: enable caching on only a specified bucket using static mounting. To learn more, see Cloud Storage FUSE static mounting.

  • Cache only a specific folder or directory: mount and cache only a specific bucket-level folder instead of mounting an entire bucket. To learn more, see Mount a directory within a bucket.

Before you begin

The file cache requires a directory path to be used to cache files. You can create a new directory on an existing file system or create a new file system on provisioned storage. If you are provisioning new storage to be used, use the following instructions to create a new file system:

  1. For Google Cloud Hyperdisk, see Create a new Google Cloud Hyperdisk volume.

  2. For Persistent Disk, see Create a new Persistent Disk volume.

  3. For Local SSDs, see Add a Local SSD to your VM.

  4. For in-memory RAM disks, see Creating in-memory RAM disks.

Enable and configure file caching behavior

Enable and configure file caching using one of the following options:

  1. Specify the cache directory you want to use with one of the following methods. This lets you enable the file cache for non-Google Kubernetes Engine deployments:

    If you're using a Google Kubernetes Engine deployment using the Cloud Storage FUSE CSI driver for Google Kubernetes Engine, specify one of the following options:

  2. Optional: enable parallel downloads by setting one of the following options to true if parallel downloads weren't enabled automatically:

  3. Limit the total capacity the Cloud Storage FUSE cache can use within its mounted directory by adjusting one of the following options, which is automatically set to a value of -1 when you specify a cache directory:

    You can also specify a value in MiB or GiB to limit the cache size.

  4. Optional: bypass the TTL expiration of cached entries and serve file metadata from the cache if it's available using one of the following methods and setting a value of -1:

    The default is 60 seconds, and a value of -1 sets it to unlimited. You can also specify a high value based on your requirements. We recommend that you set the ttl-secs value to as high as your workload lets you. For more information about setting a TTL for cached entries, see Time to live.

  5. Optional: enable the file cache's ability to asynchronously load the entire file into the cache if the file's first read operation starts from anywhere other than offset 0 so that subsequent reads of different offsets from the same file can also be served from the cache. Use one of the following methods and set the option to true:

  6. Optional: configure stat caching and type caching. To learn more about stat and type caches, see Overview of type caching or Overview of stat caching.

  7. Manually run the ls -R command on your mounted bucket before you run your workload to pre-populate metadata to ensure the type cache populates ahead of the first read in a faster, batched method. For more information about how to improve first time read performance, see Improve first-time reads.

Once you enable file caching, parallel downloads are enabled automatically on Cloud Storage FUSE versions 2.12 and later. If you're using an older version of Cloud Storage FUSE, set the enable-parallel-downloads option to true to enable parallel downloads.

Configure supporting properties for parallel downloads

You can optionally configure the following supporting properties for parallel downloads using the Cloud Storage FUSE CLI or a Cloud Storage FUSE configuration file:

Property description CLI option Configuration file field
The maximum number of workers that can be spawned per file to download the object from Cloud Storage into the file cache. --file-cache-parallel-downloads-per-file file-cache:parallel-downloads-per-file
The maximum number of workers that can be spawned at any given time across all file download jobs. The default is set to twice the number of CPU cores on your machine. To specify no limit, enter a value of `-1`. --file-cache-max-parallel-downloads file-cache:max-parallel-downloads
The size of each read request in MiB that each worker makes to Cloud Storage when downloading the object into the file cache. Note that a parallel download is only triggered if the file being read is the specified size. --file-cache-download-chunk-size-mb file-cache:download-chunk-size-mb

Disable parallel downloads

To disable parallel downloads, set one of the following options to false:

  • --file-cache-enable-parallel-downloads CLI option
  • file-cache:enable-parallel-downloads field

What's next