Bigtable backups overview
This page provides an overview of Bigtable backups. The content presented here is intended for Bigtable administrators and developers.
Backups let you save a copy of a table's schema and data and then restore from the backup to a new table later. Bigtable offers two types of backups. The type of backup you create depends on your disaster recovery (DR) requirements and the type of storage (HDD or SSD) that your Bigtable cluster uses.
- Standard backups are optimized for long-term retention. When you restore from a standard backup to an SSD cluster, the restore operation requires additional optimization by Bigtable to bring the table to production-level performance. For more information, see Performance when restoring
- Hot backups provide the most efficient restoration to production-level performance and low-latency serving. For more information, see Hot backups
You can create backups in the following ways:
- Enable automated backup to let Bigtable create daily backups for you
- Create a backup on demand, by using the Google Cloud console, the gcloud CLI, or a Bigtable client library
- Create a copy of a backup
Before you read this page, you should be familiar with the Bigtable overview and Manage tables.
Features
- Fully integrated: Backups are handled entirely by the Bigtable service, with no need to import or export.
- Incremental: A backup shares physical storage with the source table and other backups of the table.
- Cost effective: Using Bigtable backups lets you avoid the costs associated with exporting, storing, and importing data using other services.
- Automatic expiration: Each backup has a user-defined expiration date that can be up to 90 days after the backup is created. You can store a copy of a backup for up to 30 days.
- Flexible restore options: You can restore from a backup to a table in a different instance from where the backup was created.
- Automated backup: Enable automated backup to let Bigtable create daily backups.
- Hot backups: Plan for disaster recovery with production-ready hot backups.
Use cases
Backups are useful for the following use cases:
- Business continuity
- Regulatory compliance
- Testing and development
- Disaster recovery
Consider the following disaster recovery scenarios:
Goal | Backup strategy | Restoration strategy |
---|---|---|
Protect against human error: You want to always have a recent backup of your data ready in case of accidental deletion or corruption. | Determine the backups creation schedule that's right for your business needs, such as daily. Optionally, create periodic copies of the backups and store them in a different project or region for increased isolation and protection. For even more protection, store the backup copies in a project or instance with restricted access permissions. | Restore to a new table from the backup or copy, and then re-route requests to the new table. |
Zone unavailability: You need to make sure that in the unlikely event that a Google Cloud zone becomes unavailable, your data is still available. | Enable automated backup to let Bigtable create a daily backup on every cluster in the instance. Alternatively, create backups on a regular basis and then periodically create a copy of the most recent backup and store it on one or more clusters in different zones (optionally in a different instance or project). | If the zone where your serving cluster becomes unavailable, restore from the remote backup copy to a new table, and then re-route requests to the new table. |
Data corruption: Use a backup to recover some of a table's data, such as when part of the source table has become corrupted. | Enable replication and automated backup to create daily backups in multiple regions, so that if a table becomes corrupted on one cluster, you have one or more backups that don't share storage on the corrupted cluster. | Restore from the backup to a new table on the new cluster or instance. Then write an application using a Bigtable client library or Dataflow that reads from the new table and then writes the data back to the source table. When the data has been copied to the original table, delete the new table. |
Fast recovery: Restore to full production performance levels quickly, minimizing downtime. | Always maintain a recent hot backup of your table. | Restore to a new table from the hot backup, and then re-route requests to the new table. |
Hot backups
A hot backup is a production-ready backup that is optimized for speedy recovery, with lower latency when reading from the new table shortly after restoration. Restoring to production performance from a hot backup is faster than restoring from a standard backup.
You can convert a hot backup to a standard backup, but you can't convert a standard backup to a hot backup.
You can't create hot backups using automated backup, and you can't create a hot backup on an HDD cluster.
Working with Bigtable backups
The following actions are available for Bigtable backups. In all cases, the destination project, instance, and cluster must already exist. You are not able to create these resources as part of a backup operation.
|
||
Action | Destination options | |
---|---|---|
Create a standard backup |
|
|
Create a hot backup |
|
|
Restore from a standard or hot backup to a new table |
|
|
Copy a backup1, 2 |
|
See Manage backups for step-by-step instructions on these actions as well as operations such as updating and deleting backups.
Use the following to work with Bigtable backups:
- The Google Cloud console
- The Google Cloud CLI
- The Cloud Bigtable [client libraries][client-libraries]
Backup storage
A table backup that you create manually or programmatically is stored on a single cluster that you specify. When automated backup is enabled, a backup is stored on each cluster in the instance.
A backup of a table includes all the data that was in the table at the time the backup was created, on the cluster where the backup is created. A backup is never larger than the size of the source table at the time that the backup is created.
Bigtable backups are incremental. The amount of storage that a backup consumes depends on the size of the table and the extent to which it can share storage of unchanged data with the original table or other backups of the same table. For that reason, a backup's size depends on the amount of data divergence since the prior backup.
You can create up to 150 backups per table per cluster.
You can delete a table that has a backup. To protect your backups, you cannot delete a cluster that contains a backup, and you cannot delete an instance that has one or more backups in any cluster.
A backup still exists after you restore from it to a new table. You can delete it or let it expire when you no longer need it. Backup storage does not count toward the node storage limit for a project.
Data in backups is encrypted.
Retention
You can specify a retention period of up to 90 days for a backup. If you create a copy of a backup, the maximum retention period for the copy is 30 days from the time the copy is created.
You can change the retention period for a backup to keep it for up to 90 days after the backup creation time. For more information, see Modify a backup or backup copy.
For tables with automated backup enabled, the retention period is
seven days if you set the policy using the
--enable-automated-backup
flag. You can set a custom retention period by
passing in the --automated-backup-retention-period
flag, which accepts a value
from 3 days to 90 days. For more information, see Update an automated backup
policy.
Post-restoration storage
The storage cost for a new table restored from a backup is the same as for any table.
A table restored from a backup might not consume the same amount of storage as the original table, and it might decrease in size after restoration. The size difference depends on how recently compaction has occurred on the source cluster and the destination cluster.
Because compaction occurs on a rolling basis, it's possible that compaction occurs as soon as the table is created. However, compaction can take up to a week to occur.
A new table restored from a backup doesn't inherit the garbage collection policies of the source table. Configure garbage collection policies in the new table before you begin writing new data to the table. For more information, see Configure garbage collection.
Costs
Standard network costs apply when working with backups. You are not charged for backup operations, including creating, copying, or restoring from a backup.
Storage costs
Storage costs are different for standard backups and hot backups.
Standard backup storage costs
To store a standard backup or a copy of a backup, you're charged the standard backup storage rate for the region that the cluster containing the backup or backup copy is in.
A standard backup is a complete logical copy of a table. Behind the scenes, Bigtable optimizes standard backup storage utilization. This optimization means that a standard backup is incremental — it shares physical storage with the original table or with other backups of the table whenever possible. Because of Bigtable's built-in storage optimizations, the cost to store a standard backup or a copy of a backup might sometimes be less than the cost of a full physical copy of the table backup.
In replicated instances where automated backup is enabled, the storage costs might be higher because a backup is created on each cluster daily.
Hot backup storage costs
To store a hot backup, you're charged the hot backup storage rate for the region that the cluster containing the hot backup is in.
Because a hot backup is stored in a ready state, optimized for quick restoration, you are charged for storage of the entire logical copy of the table, rather than for incremental portions, as you are with a standard backup.
Costs when copying a backup
When you create a copy of a backup in a different region than the source backup, you are charged standard network rates for the cost of copying the data to the destination cluster. You are not charged for network traffic when you create a copy in the same region as the source backup.
Costs when restoring
When you restore a new table from a backup, you are billed for the network cost of replication. If the new table is in an instance that uses replication, you are charged a one-time replication cost for the data to be copied to all clusters in the instance.
If you restore to a different instance than where the backup was created, and the backup's instance and the destination instance don't have at least one cluster in the same region, you are charged a one-time cost for the initial data copy to the destination cluster at the standard network rates.
CMEK
When you create a backup in a cluster that is protected by a customer-managed encryption key (CMEK), the backup is pinned to the primary version of the cluster's CMEK key at the time it is taken. Once the backup is created, its key and key version cannot be modified, even if the KMS key is rotated.
When you restore from a backup, the key version that the backup is pinned to must be enabled for the backup decryption process to succeed. The new table is protected with the latest primary version of the CMEK key for each cluster in the destination instance. If you want to restore from a CMEK-protected backup to a different instance, the destination instance must be CMEK-protected as well but does not need to have the same CMEK configuration as the source instance.
Replication considerations
This section describes additional concepts to understand when backing up and restoring a table in an instance that uses replication.
Replication and backing up
When you take a backup of a table manually in a replicated instance, you choose the cluster where you want to create and store the backup. For tables with automated backup enabled, a daily backup is created on each cluster in the instance.
You don't have to stop writing to the cluster that contains the backup, but you should understand how Bigtable handles replicated writes to the cluster.
A backup is a copy of the table in its state on the cluster where the backup is stored, at the time the backup is created. Table data that has not yet been replicated from another cluster in the instance is not included in the backup.
Each backup has a start and end time. Writes that are sent to the cluster shortly before or during the backup operation might not be included in the backup. Two factors contribute to this uncertainty:
- A write might be sent to a section of the table that the backup has already copied.
- A write to another cluster might not have been replicated to the cluster that contains the backup.
In other words, there's a chance that some writes with timestamps before the time of the backup might not be included in the backup.
If this inconsistency is unacceptable for your business requirements, you can use a consistency token with your write requests to ensure that all replicated writes are included in a backup.
Backups of replicated tables that are created as part of automated backup are not exact copies of each other, because backup times can vary from cluster to cluster.
Replication and restoring
When you restore a backup to a new table, replication to and from the other clusters in the instance starts immediately after the restore operation has completed on the destination cluster.
Performance
While creating backups, use the following best practices to ensure that your performance remains optimal.
Performance when backing up
Creating a backup usually takes less than a minute, although it can take up to one hour. Under normal circumstances, backup creation does not affect serving performance.
For optimal performance, don't create a backup of a single table more than once every five minutes. Creating backups more frequently can potentially lead to an observable increase in serving latency.
Performance when restoring
Restoring from a backup to a table in a single-cluster instance takes a few minutes. In replicated instances, restoration takes longer because the data has to be copied to all the clusters. Bigtable always chooses the most efficient route to copy data.
If you restore to a different instance from where the backup was created, the restore operation takes longer than if you restore to the same instance. This is especially true if the destination instance does not have a cluster in the same zone as the cluster where the backup was created.
A bigger table takes longer to restore than a smaller table.
If you have an SSD instance, you might initially experience higher read latency, even after a restore is complete, while the table is optimized. You can check the status at any time during the restore operation to see if optimization is still in process.
If you restore to a different instance from where the backup was created, the destination instance can use HDD or SSD storage. It does not need to use the same storage type as the source instance.
Access control
IAM permissions control access to backup and restore operations. Backup permissions are at the instance level and apply to all backups in the instance.
The account that you use to create a backup of a table must have permission to read the table and create backups in the instance that the table is in (the source instance).
The account that you use to copy a backup must have permission to read the source backup and to create a backup in the destination instance and project.
The account that you use to restore a new table from a backup must have permission to create a table in the instance that you are restoring to.
Action | Required IAM permission |
---|---|
Create a backup | bigtable.tables.readRows, bigtable.backups.create |
Get a backup | bigtable.backups.get |
List backups | bigtable.backups.list |
Delete a backup | bigtable.backups.delete |
Update a backup | bigtable.backups.update |
Copy a backup | bigtable.backups.read, bigtable.backups.create |
Restore from a backup to a new table | bigtable.tables.create, bigtable.backups.restore |
Get an operation | bigtable.instances.get |
List operations | bigtable.instances.get |
Best practices
The following best practices must be noted before creating a backup strategy.
Creating backups
- Don't back up a table more frequently than once every five minutes.
- When you back up a table that uses replication, choose the cluster to store
the backup after considering the following factors:
- Cost. One cluster in your instance may be in a lower-cost region than the others.
- Proximity to your application server. You might want to store the backup as close to your serving application as possible.
- Storage utilization. You need enough storage space to keep your backups as they accumulate. Depending on your workload, you could have clusters of different sizes or with different disk usage. This may factor into which cluster you choose.
- If you need to ensure that all replicated writes are included in a backup when you back up a table in an instance that uses replication, use a consistency token with your write requests.
Restoring from backups
- Plan ahead what you will name the new table if you need to restore from a backup. The key point is to be prepared ahead of time so that you don't have to decide when you're dealing with a problem.
- If you are restoring a table for a reason other than accidental deletion, make sure all reads and writes are going to the new table before you delete the original table.
- If you plan to restore to a different instance, create the destination instance before you initiate the backup restore operation.
Quotas and limits
Backup and restore requests and backup storage are subject to Bigtable quotas and limits.
Limitations
The following limitations apply to Bigtable backups:
General
- You can't read directly from a backup.
- A backup is a version of a table in a single cluster at a specific time. Backups don't represent a consistent state. The same also applies to backups of the same table in different clusters.
- You cannot back up more than one table in a single operation.
- You cannot export, copy, or move a Bigtable backup to another service, such as Cloud Storage.
- Bigtable backups contain only Bigtable data and are not integrated with or related to backups for other Google services.
Restoring
- You can't restore from a backup to an existing table.
- You can only restore to an instance that already exists. Bigtable does not create a new instance when restoring from a backup. If the destination instance specified in a restore request does not exist, the restore operation fails.
- If you restore from a backup to a table in an SSD cluster and then delete the newly restored table, the table deletion might take a while to complete because Bigtable waits for table optimization to finish.
Copying
- You can't create a copy of a backup that is within 24 hours of expiring.
- You can't create a copy of a backup copy.
CMEK
- A backup that is protected by CMEK must be restored to a new table in an instance that is CMEK-protected.
- When you create a copy of a backup that is CMEK-protected, the destination cluster must also be CMEK-protected.