Tiered storage overview
This page describes how tiered storage works in Bigtable.
About tiered storage
Bigtable tiered storage optimizes data management and reduces storage costs. Enable and configure this option to store infrequently accessed data in a separate, lower-cost storage tier. This lets you choose the storage tier that best suits your data access needs.
Tiered storage supports age-based policies to meet workload requirements, such as the following:
- Extended data retention for regulatory compliance.
- Long-term retention of historical data for model training.
- Infrequent access to customer records.
Tiered storage is available for Bigtable SSD instances and supports the following storage tiers:
- SSD storage
- Infrequent access storage
Tiered storage is not available for Bigtable HDD instances. For more information about Bigtable SSD and HDD storage, see Choose between SSD and HDD storage.
You configure Bigtable tiered storage at the table level. For more information, see Create and manage tables.
Infrequent access storage
Bigtable moves data to the infrequent access tier based on an age-based tiering policy with an age threshold that you configure. When a cell's timestamp exceeds the configured age, Bigtable moves the cell from the SSD tier to the infrequent access tier. This data movement is based only on the cell's timestamp and is not affected by how often the data is read.
Infrequent access storage provides a cost-effective solution for long-term data storage. Operations on data in this tier, particularly reads, are subject to reduced throughput performance. When reading from a table where data is distributed across different tiers, your application experiences significantly better performance when reading data that is stored on SSD compared to data that has aged into the infrequent access tier. For more information, see Understand performance.
How tiered storage works
When you enable tiered storage for a table, you configure an age-based tiering policy that applies to the entire table. This policy determines when data moves from the SSD tier to the infrequent access tier.
Bigtable examines each cell's timestamp to determine its eligibility for data movement. You can set the timestamp of a cell to any integer value. If you don't set a timestamp when you write the data, Bigtable uses the current server time as the cell timestamp by default. When a cell's timestamp exceeds the age threshold you configured, Bigtable moves the cell to the infrequent access tier.
This data movement happens automatically in the background as part of Bigtable's compaction process and can be bidirectional. If you change the tiering policy to move older data back to the SSD tier, data automatically moves from infrequent access to SSD. For information about the time needed to finalize data movement, see Compactions.
For example, if you set an age threshold of 60 days, you define a rolling 60-day window. This window is not fixed to a specific date but is always relative to the current server time, continuously moving forward. Data with a cell-level timestamp falling inside this window remains on SSD storage, while data outside of it is moved to infrequent access storage. Therefore, the age threshold defines how old a cell's timestamp must be, relative to the current server time, before the cell is moved from SSD to the infrequent access tier.
Assume that the current server time is 2024-07-24 10:00:00 PT and you set an age threshold of 60 days. At this moment, Bigtable considers any data with a timestamp older than 2024-05-28 10:00:00 PT (60 days before the current server time) to be outside the window and stores it in the infrequent access tier. Because the window is "rolling," it advances as time progresses. The next day, on 2024-07-25 10:00:00 PT, the start of the window will have also moved forward by one day. As this dynamic window moves, data that was previously on SSD continuously ages out and is moved to the infrequent access tier when it falls outside the 60-day window.
Filter by time range
If you know that the data that you want to access is recent, stored on SSD, and within your age threshold, use a timestamp range filter when you read the data. This ensures that your read query targets only reads from SSD, which improves efficiency and avoids unnecessary reads from the infrequent access tier.
If you don't use a timestamp range filter, Bigtable can't identify whether your data is in the SSD or infrequent access tier. Bigtable has to check both SSD and infrequent access to ensure that all of the data is returned, even if the data is only in SSD. This extra work can reduce performance.
For example, if your age threshold is 60 days, we recommend that you query with a time range filter that is within this 60 day window, so that Bigtable skips looking at the infrequent access tier and the query is performed as an SSD operation.
Move data back to SSD
Data moves from infrequent access to SSD using the same compaction process as when data moves from SSD to infrequent access. You can move data back from infrequent access to SSD in one of the following ways:
- Increase the tiering policy age threshold to include older data in SSD.
- Disable tiered storage. For more information about cost implications, see the Costs section of this document.
- Rewrite the data with a new timestamp. However, this results in a duplicate copy of the data because Bigtable stores data immutably. The copy with the old timestamp remains in infrequent access storage until Bigtable removes it. Remove expired and obsolete data by deleting it or using a garbage collection age policy to automatically delete it.
Backups
Whether a table is restored with tiered storage enabled depends on the tiering status of the original table at the time of backup:
- Backup of a non-tiered table: If you create a backup and then enable tiering on the original table, restoring that backup creates a standard, non-tiered table. This includes hot backups taken before tiering was enabled.
- Backup of a tiered table: If the table already has tiered storage enabled when you back it up, Bigtable restores the table as a tiered table. Hot backups are not supported for tiered tables.
Costs
Google Cloud charges you the infrequent access storage rate, which is separate from the SSD storage rate, for the amount of infrequent data storage used.
When you write data directly to the infrequent access tier or when you transfer data from SSD to infrequent access, Google Cloud charges you the table compaction rate for data movement. For optimal performance and cost-effectiveness, use tiered storage for data that you expect to remain inactive for long periods.
Google Cloud does not charge you for configuring tiered storage or for moving data from infrequent access storage back to SSD.
Disabling tiered storage results in all data being charged as SSD storage. The following approaches to disabling tiered storage impact pricing:
- Set the age threshold to a period of time that is greater than your oldest data to ensure that all data is moved to SSD. Once the data has moved to SSD as part of the compaction cycle, disable tiered storage. This lets you avoid being charged for SSD storage while your data is still in the infrequent access tier.
- Disable tiered storage immediately and let data automatically move back to SSD. However, you might experience reduced performance while still being charged for SSD storage.
For more information about pricing, see Bigtable pricing.
Metrics
Once you enable tiered storage, Bigtable reports the following metrics to Cloud Logging that you can use to monitor your tiered storage:
Metric | Description |
---|---|
disk/bytes_used |
The amount of data stored in each storage tier (SSD or infrequent access), in bytes. |
server/read_latencies_by_storage_tier |
The time it takes to read data from each storage tier (SSD or infrequent access). |
server/read_request_count_by_storage_tier |
The number of read requests served from each storage tier (SSD or infrequent access). |
table/bytes_transferred_to_infrequent_access |
The amount of data moved from SSD to infrequent access. |
For more information, see Metrics.
To get started with Cloud Logging, see Query and view logs overview.
Limitations
Tiered storage has the following limitations:
- Data Boost isn't supported.
- Hot backups aren't supported.
- Bigtable HDD instances don't support tiered storage.
- The minimum age threshold is 30 days. For more information, see Create a table.