[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-12-30。"],[[["\u003cp\u003eThis content emphasizes the importance of resource redundancy in the Google Cloud Well-Architected Framework's reliability pillar, helping to prevent system failures.\u003c/p\u003e\n"],["\u003cp\u003eSystems should be designed to avoid single points of failure by replicating critical components across multiple machines, zones, and regions.\u003c/p\u003e\n"],["\u003cp\u003eTo achieve high availability, services and applications should be distributed across multiple zones and regions, with automatic failover mechanisms implemented for outages.\u003c/p\u003e\n"],["\u003cp\u003eIt is crucial to monitor failure domains and address any detected issues promptly, using tools like the Google Cloud Service Health dashboard.\u003c/p\u003e\n"],["\u003cp\u003eRegularly simulating failures, similar to a fire drill, is recommended to validate replication and failover strategies' effectiveness.\u003c/p\u003e\n"]]],[],null,["# Build highly available systems through resource redundancy\n\nThis principle in the reliability pillar of the\n[Google Cloud Well-Architected Framework](/architecture/framework)\nprovides recommendations to plan, build, and manage resource redundancy, which\ncan help you to avoid failures.\n\nThis principle is relevant to the *scoping*\n[focus area](/architecture/framework/reliability#focus-areas)\nof reliability.\n\nPrinciple overview\n------------------\n\nAfter you\n[decide the level of reliability](/architecture/framework/reliability/set-targets)\nthat you need, you must design your systems to avoid any\n[single points of failure](/architecture/infra-reliability-guide/design#avoid_single_points_of_failure).\nEvery critical component in the system must be replicated across multiple\nmachines, zones, and\n[regions](/docs/geography-and-regions#regions_and_zones).\nFor example, a critical database can't be located in only one region, and a\nmetadata server can't be deployed in only one single zone or region. In those\nexamples, if the sole zone or region has an outage, the system has a global\noutage.\n\nRecommendations\n---------------\n\nTo build redundant systems, consider the recommendations in the following\nsubsections.\n\n### Identify failure domains and replicate services\n\nMap out your system's\n[failure domains](/architecture/infra-reliability-guide/building-blocks),\nfrom individual VMs to regions, and design for redundancy across the failure\ndomains.\n\nTo ensure high availability, distribute and replicate your services and\napplications across multiple zones and regions. Configure the system for\nautomatic failover to make sure that the services and applications continue to\nbe available in the event of zone or region outages.\n\nFor examples of multi-zone and multi-region architectures, see\n[Design reliable infrastructure for your workloads in Google Cloud](/architecture/infra-reliability-guide/design#deployment_architectures).\n\n### Detect and address issues promptly\n\nContinuously track the status of your failure domains to detect and address\nissues promptly.\n\nYou can monitor the current status of Google Cloud services in all regions\nby using the\n[Google Cloud Service Health dashboard](https://status.cloud.google.com/).\nYou can also view incidents relevant to your project by using\n[Personalized Service Health](https://cloud.google.com/service-health).\nYou can use load balancers to detect resource health and automatically route\ntraffic to healthy backends. For more information, see\n[Health checks overview](/load-balancing/docs/health-check-concepts).\n\n### Test failover scenarios\n\nLike a fire drill, regularly simulate failures to validate the effectiveness of\nyour replication and failover strategies.\n\nFor more information, see\n[Simulate a zone outage for a regional MIG](/compute/docs/instance-groups/regional-mig-simulate-zonal-outage)\nand\n[Simulate a zone failure in GKE regional clusters](/kubernetes-engine/docs/tutorials/simulate-zone-failure)."]]