[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-12-30。"],[[["\u003cp\u003eThis principle focuses on testing data recovery processes to ensure systems can recover from data loss or corruption, caused by issues like software bugs or natural disasters.\u003c/p\u003e\n"],["\u003cp\u003eSuccessful data recovery testing is determined by data integrity, recovery time objective (RTO), and recovery point objective (RPO), and includes verification of the entire application stack and all the critical infrastructure services with the restored data.\u003c/p\u003e\n"],["\u003cp\u003eRegular and frequent backups are crucial to minimize data loss and meet RPO targets, and the frequency should align with the defined RPO, using tools like Cloud Storage, Cloud SQL automated backups, or Spanner backups.\u003c/p\u003e\n"],["\u003cp\u003eIt is important to verify that backups contain consistent and usable snapshots, and test restorations in non-production environments by regularly simulating data recovery scenarios.\u003c/p\u003e\n"],["\u003cp\u003eSetting a clear RPO based on business requirements and monitoring it, along with monitoring backup health and using tools such as Google Cloud Backup and DR service, are also needed.\u003c/p\u003e\n"]]],[],null,["# Perform testing for recovery from data loss\n\nThis principle in the reliability pillar of the\n[Google Cloud Well-Architected Framework](/architecture/framework)\nprovides recommendations to help you design and run tests for recovery from data\nloss.\n\nThis principle is relevant to the *learning*\n[focus area](/architecture/framework/reliability#focus-areas)\nof reliability.\n\nPrinciple overview\n------------------\n\nTo ensure that your system can recover from situations where data is lost or\ncorrupted, you need to run tests for those scenarios. Instances of data loss\nmight be caused by a software bug or some type of natural disaster. After such\nevents, you need to restore data from backups and bring all of the services back\nup again by using the freshly restored data.\n\nWe recommend that you use three criteria to judge the success or failure of this\ntype of recovery test: data integrity, recovery time objective (RTO), and\nrecovery point objective (RPO). For details about the RTO and RPO metrics, see\n[Basics of DR planning](/architecture/dr-scenarios-planning-guide#basics_of_dr_planning).\n\nThe goal of data restoration testing is to periodically verify that your\norganization can continue to meet business continuity requirements. Besides\nmeasuring RTO and RPO, a data restoration test must include testing of the\nentire application stack and all the critical infrastructure services with the\nrestored data. This is necessary to confirm that the entire deployed application\nworks correctly in the test environment.\n\nRecommendations\n---------------\n\nWhen you design and run tests for recovering from data loss, consider the\nrecommendations in the following subsections.\n\n### Verify backup consistency and test restoration processes\n\nYou need to verify that your backups contain consistent and usable snapshots of\ndata that you can restore to immediately bring applications back into service.\nTo validate data integrity, set up automated consistency checks to run after\neach backup.\n\nTo test backups, restore them in a non-production environment. To ensure your\nbackups can be restored efficiently and that the restored data meets application\nrequirements, regularly simulate data recovery scenarios. Document the steps for\ndata restoration, and train your teams to execute the steps effectively during a\nfailure.\n\n### Schedule regular and frequent backups\n\nTo minimize data loss during restoration and to meet RPO targets, it's\nessential to have regularly scheduled backups. Establish a backup frequency that\naligns with your RPO. For example, if your RPO is 15 minutes, schedule backups\nto run at least every 15 minutes. Optimize the backup intervals to reduce the\nrisk of data loss.\n\nUse Google Cloud tools like Cloud Storage, Cloud SQL\nautomated backups, or Spanner backups to schedule and manage backups.\nFor critical applications, use near-continuous backup solutions like\n[point-in-time recovery (PITR) for Cloud SQL](/sql/docs/postgres/backup-recovery/pitr)\nor incremental backups for large datasets.\n\n### Define and monitor RPO\n\nSet a clear RPO based on your business needs, and monitor adherence to the RPO.\nIf backup intervals exceed the defined RPO, use Cloud Monitoring to set up\nalerts.\n\n### Monitor backup health\n\nUse\n[Google Cloud Backup and DR service](/backup-disaster-recovery/docs/concepts/backup-dr)\nor similar tools to track the health of your backups and confirm that they are\nstored in secure and reliable locations. Ensure that the backups are replicated\nacross multiple\n[regions](/docs/geography-and-regions#regions_and_zones)\nfor added resilience.\n\n### Plan for scenarios beyond backup\n\nCombine backups with disaster recovery strategies like active-active failover\nsetups or cross-region replication for improved recovery time in extreme cases.\nFor more information, see\n[Disaster recovery planning guide](/architecture/dr-scenarios-planning-guide)."]]