Select and implement a storage strategy

Last reviewed 2023-07-17 UTC

This document in the Google Cloud Architecture Framework provides best practices to deploy your system based on storage. You learn how to select a storage strategy and how to manage storage, access patterns, and workloads.

To facilitate data exchange and securely back up and store data, organizations need to choose a storage plan based on workload, input/output operations per second (IOPS), latency, retrieval frequency, location, capacity, and format (block, file, and object).

Cloud Storage provides reliable, secure object storage services, including the following:

In Google Cloud, IOPS scales according to your provisioned storage space. Storage types like Persistent Disk require manual replication and backup because they are zonal or regional. By contrast, object storage is highly available and it automatically replicates data across a single region or across multiple regions.

Storage type

This section provides best practices for choosing a storage type to support your system.

Evaluate options for high-performance storage needs

Evaluate persistent disks or local solid-state drives (SSD) for compute applications that require high-performance storage. Cloud Storage is an immutable object store with versioning. Using Cloud Storage with Cloud CDN helps optimize for cost, especially for frequently accessed static objects.

Filestore supports multi-write applications that need high-performance shared space. Filestore also supports legacy and modern applications that require POSIX-like file operations through Network File System (NFS) mounts.

Cloud Storage supports use cases such as creating data lakes and addressing archival requirements. Make tradeoff decisions based on how you choose Cloud Storage class due to access and retrieval costs, especially when you configure retention policies. For more information, see Design an optimal storage strategy for your cloud workload.

All storage options are by default encrypted at rest and in-transit using Google-owned and Google-managed keys. For storage types such as Persistent Disk and Cloud Storage, you can either supply your own key or manage them through Cloud Key Management Service (Cloud KMS). Establish a strategy for handling such keys before you employ them on production data.

Choose Google Cloud services to support storage design

To learn about the Google Cloud services that support storage design, use the following table:

Google Cloud service Description
Cloud Storage Provides global storage and retrieval of any amount of data at any time. You can use Cloud Storage for multiple scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users through direct download.

For more information, see the following:
Persistent Disk A high-performance block storage for Google Cloud. Persistent Disk provides SSD and hard disk drive (HDD) storage that you can attach to instances running in Compute Engine or Google Kubernetes Engine (GKE).
  • Regional disks provide durable storage and replication of data between two zones in the same region. If you need higher IOPS and low latency, Google Cloud offers Filestore.
  • Local SSDs are physically attached to the server that hosts your virtual machine instance. You can use local SSDs as temporary disk space.
Filestore A managed file storage service for applications that require a file system interface and a shared file system for data. Filestore gives users a seamless experience for standing up managed Network Attached Storage (NAS) with their Compute Engine and GKE instances.
Cloud Storage for Firebase Built for app developers who need to store and serve user-generated content, such as photos or videos. All your files are stored in Cloud Storage buckets, so they are accessible from both Firebase and Google Cloud.

Choose a storage strategy

To select a storage strategy that meets your application requirements, use the following table:

Use case Recommendations
You want to store data at scale at the lowest cost, and access performance is not an issue. Cloud Storage
You are running compute applications that need immediate storage.

For more information, see Optimizing Persistent Disk and Local SSD performance.
Persistent Disk or Local SSD
You are running high-performance workloads that need read and write access to shared space. Filestore
You have high-performance computing (HPC) or high-throughput computing (HTC) use cases. Using clusters for large-scale technical computing in the cloud

Choose active or archival storage based on storage access needs

A storage class is a piece of metadata that is used by every object. For data that is served at a high rate with high availability, use the Standard Storage class. For data that is infrequently accessed and can tolerate slightly lower availability, use the Nearline Storage, Coldline Storage, or Archive Storage class. For more information about cost considerations for choosing a storage class, see Cloud Storage pricing.

Evaluate storage location and data protection needs for Cloud Storage

For a Cloud Storage bucket located in a region, data contained within it is automatically replicated across zones within the region. Data replication across zones protects the data if there is a zonal failure within a region.

Cloud Storage also offers locations that are redundant across regions, which means data is replicated across multiple, geographically separate data centers. For more information, see Bucket locations.

Use Cloud CDN to improve static object delivery

To optimize the cost to retrieve objects and minimize access latency, use Cloud CDN. Cloud CDN uses the Cloud Load Balancing external Application Load Balancer to provide routing, health checking, and anycast IP address support. For more information, see Setting up Cloud CDN with cloud buckets.

Storage access pattern and workload type

This section provides best practices for choosing storage access patterns and workload types to support your system.

Use Persistent Disk to support high-performance storage access

Data access patterns depend on how you design system performance. Cloud Storage provides scalable storage, but it isn't an ideal choice when you run heavy compute workloads that need high throughput access to large amounts of data. For high-performance storage access, use Persistent Disk.

Use exponential backoff when implementing retry logic

Use exponential backoff when implementing retry logic to handle 5XX, 408, and 429 errors. Each Cloud Storage bucket is provisioned with initial I/O capacity. For more information, see Request rate and access distribution guidelines. Plan a gradual ramp-up for retry requests.

Storage management

This section provides best practices for storage management to support your system.

Assign unique names to every bucket

Make every bucket name unique across the Cloud Storage namespace. Don't include sensitive information in a bucket name. Choose bucket and object names that are difficult to guess. For more information, see the bucket naming guidelines and Object naming guidelines.

Keep Cloud Storage buckets private

Unless there is a business-related reason, ensure that your Cloud Storage bucket isn't anonymously or publicly accessible. For more information, see Overview of access control.

Assign random object names to distribute load evenly

Assign random object names to facilitate performance and avoid hotspotting. Use a randomized prefix for objects where possible. For more information, see Use a naming convention that distributes load evenly across key ranges.

Use public access prevention

To prevent access at the organization, folder, project, or bucket level, use public access prevention. For more information, see Using public access prevention.

What's next

Learn about Google Cloud database services and best practices, including the following:

Explore other categories in the Architecture Framework such as reliability, operational excellence, and security, privacy, and compliance.