[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-12-30。"],[[["\u003cp\u003eThis content discusses designing Google Cloud workloads to fail gracefully, a key principle within the reliability pillar of the Google Cloud Well-Architected Framework, focusing on the \u003cem\u003eresponse\u003c/em\u003e focus area.\u003c/p\u003e\n"],["\u003cp\u003eGraceful degradation, as explained, is a design approach that allows a system to continue functioning under high load, albeit with potentially reduced performance, preventing complete failure.\u003c/p\u003e\n"],["\u003cp\u003eImplementing throttling, dropping excess requests early, and handling partial errors and retries are critical recommendations to enable systems to degrade gracefully during periods of high load.\u003c/p\u003e\n"],["\u003cp\u003eRegularly testing overload scenarios and actively monitoring traffic spikes are essential practices to ensure the effectiveness of graceful degradation strategies and to maintain service availability.\u003c/p\u003e\n"]]],[],null,["# Design for graceful degradation\n\nThis principle in the reliability pillar of the\n[Google Cloud Well-Architected Framework](/architecture/framework)\nprovides recommendations to help you to design your Google Cloud workloads\nto fail gracefully.\n\nThis principle is relevant to the *response*\n[focus area](/architecture/framework/reliability#focus-areas)\nof reliability.\n\nPrinciple overview\n------------------\n\nGraceful degradation is a design approach where a system that experiences a\nhigh load continues to function, possibly with reduced performance or accuracy.\nGraceful degradation ensures continued availability of the system and prevents\ncomplete failure, even if the system's work isn't optimal. When the load returns\nto a manageable level, the system resumes full functionality.\n\nFor example, during periods of high load, Google Search prioritizes results\nfrom higher-ranked web pages, potentially sacrificing some accuracy. When the\nload decreases, Google Search recomputes the search results.\n\nRecommendations\n---------------\n\nTo design your systems for graceful degradation, consider the recommendations\nin the following subsections.\n\n### Implement throttling\n\nEnsure that your replicas can independently handle overloads and can throttle\nincoming requests during high-traffic scenarios. This approach helps you to\nprevent cascading failures that are caused by shifts in excess traffic between\nzones.\n\nUse tools like\n[Apigee](/apigee/docs/api-platform/get-started/what-apigee)\nto control the rate of API requests during high-traffic times. You can configure\npolicy rules to reflect how you want to scale back requests.\n\n### Drop excess requests early\n\nConfigure your systems to drop excess requests at the frontend layer to protect\nbackend components. Dropping some requests prevents global failures and enables\nthe system to recover more gracefully.With this approach, some users might\nexperience errors. However, you can minimize the impact of outages, in contrast\nto an approach like *circuit-breaking* , where *all* traffic is dropped during an\noverload.\n\n### Handle partial errors and retries\n\nBuild your applications to handle partial errors and retries seamlessly. This\ndesign helps to ensure that as much traffic as possible is served during\nhigh-load scenarios.\n\n### Test overload scenarios\n\nTo validate that the throttle and request-drop mechanisms work effectively,\nregularly simulate overload conditions in your system. Testing helps ensure that\nyour system is prepared for real-world traffic surges.\n\n### Monitor traffic spikes\n\nUse analytics and monitoring tools to predict and respond to traffic surges\nbefore they escalate into overloads. Early detection and response can help\nmaintain service availability during high-demand periods."]]