List caching in Cloud Storage FUSE

This document provides details on list caching for Cloud Storage FUSE, which accelerates directory listing operations for workloads that frequently list the entire contents of a directory, such as iterating over a large set of files at the beginning of a processing job, improving the speed of directory traversal.

Benefits of list caching

  • Faster directory listing operations: list caching provides improved performance for operations that list the contents of directories. When list caching is enabled, the results of Cloud Storage object listings for a directory are cached in memory. Subsequent listings of the same directory can be served directly from this cache.

  • Reduced latency: by serving list results from the local cache, Cloud Storage avoids network round trips to Cloud Storage to fetch the object list, significantly reducing the latency of directory listing operations, especially for directories containing many objects or when network latency is high.

  • Improved performance for repetitive workloads: workloads that repeatedly scan the same directories, such as artificial intelligence and machine learning (AI/ML) training jobs, build processes, or file synchronization tools, can see performance gains.

  • In-memory storage: The list cache is kept in memory in the page cache, which is controlled by the kernel based on memory availability, as opposed to the stat and type caches, which are kept in your machine's memory and controlled by Cloud Storage FUSE.

Configure list caching

You can enable list caching using one of the following methods:

List cache invalidation

List cache invalidation is set by specifying a value greater than 0 using one of the following methods:

  • gcsfuse option: --kernel-list-cache-ttl-secs
  • Configuration file field: file-system:kernel-list-cache-ttl-secs

The directory list response is kept in the kernel's page cache and remains valid for the amount of time you specified. When you specify a value of -1, Cloud Storage FUSE disables list cache expiration and returns the list response from the cache when it's available. Specifying a value of 0 disables the list cache.

What's next