可靠的基础设施是云端工作负载的关键要求。作为云架构师,如需为您的工作负载设计可靠的基础设施,您需要充分了解所选云服务商的可靠性功能。本文档介绍了Google Cloud 中的可靠性基础组件(可用区、区域和位置范围的资源)及其提供的可用性级别。本指南还提供了评估工作负载可靠性要求的指南,并提供了在 Google Cloud中构建和管理可靠基础设施的架构建议。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-11-20。"],[[["\u003cp\u003eReliable cloud infrastructure is essential for application workloads, requiring a deep understanding of the cloud provider's reliability features.\u003c/p\u003e\n"],["\u003cp\u003eApplication reliability is defined by meeting availability and failure resilience objectives, with availability often measured in terms of uptime or successful request rates.\u003c/p\u003e\n"],["\u003cp\u003eKey reliability indicators vary by application type, including availability, latency, throughput, durability, and data correctness.\u003c/p\u003e\n"],["\u003cp\u003eThe reliability of applications in Google Cloud is influenced by the application's design, dependencies, infrastructure resources, capacity management, and DevOps processes.\u003c/p\u003e\n"],["\u003cp\u003eThis document provides a comprehensive guide for cloud architects on how to assess reliability requirements, and design, build and manage infrastructure in Google Cloud.\u003c/p\u003e\n"]]],[],null,["# Google Cloud infrastructure reliability guide\n\nReliable infrastructure is a critical requirement for workloads in the cloud.\nAs a cloud architect, to design reliable infrastructure for your workloads, you\nneed a good understanding of the reliability capabilities of your cloud provider\nof choice. This document describes the building blocks of reliability in\nGoogle Cloud (zones, regions, and location-scoped resources) and the\navailability levels that they provide. This document also provides guidelines\nfor assessing the reliability requirements of your workloads, and presents\narchitectural recommendations for building and managing reliable infrastructure\nin Google Cloud.\n\nThis document is divided into the following parts:\n\n- Overview of reliability (this part)\n- [Building blocks of reliability in Google Cloud](/architecture/infra-reliability-guide/building-blocks)\n- [Assess the reliability requirements for your cloud workloads](/architecture/infra-reliability-guide/requirements)\n- [Design reliable infrastructure for your workloads in Google Cloud](/architecture/infra-reliability-guide/design)\n- [Manage traffic and load for your workloads in Google Cloud](/architecture/infra-reliability-guide/traffic-load)\n- [Manage and monitor your Google Cloud infrastructure](/architecture/infra-reliability-guide/manage-and-monitor)\n\nIf you've read this guide previously and want to see what's changed, see the\n[Release notes](/architecture/release-notes).\n\nOverview of reliability\n-----------------------\n\nAn application or workload is reliable when it meets your current objectives\nfor availability and resilience to failures.\n\nAvailability (or uptime) is the percentage of time that an application is\nusable. For example, for an application that has an availability target of\n99.99%, the total downtime must not exceed 8.64 seconds during a 24-hour period.\nSometimes, availability is measured as the proportion of requests that the\napplication serves successfully during a given period. For example, for an\napplication that has an availability target of 99.99%, for every 100,000\nrequests received, not more than ten requests can fail. Availability is often\nexpressed as the number of nines in the percentage. For example, 99.99%\navailability is expressed as \"4 nines\".\n\nDepending on the purpose of the application, you might have different sets of\nindicators for how reliable the application is. The following are examples of\nsuch reliability indicators:\n\n- For applications that serve content, availability, latency, and throughput are important reliability indicators. They indicate whether the application can respond to requests, how long the application takes to respond to requests, and how many requests the application can process successfully in a given period.\n- For databases and storage systems, latency, throughput, availability, and durability (how well data is protected against loss or corruption), are indicators of reliability. They indicate how long the system takes to read or write data, and whether data can be accessed on demand.\n- For big data and analytics workloads such as data processing pipelines, consistent pipeline performance (throughput and latency) is essential to ensure freshness of the data products, and is an important reliability indicator. It indicates how much data can be processed, and how long it takes for the pipeline to progress from data ingestion to data processing.\n- Most applications have data correctness as an essential reliability indicator.\n\nFor further guidelines to define the reliability objectives for your\napplications, see\n[Assess the reliability requirements for your cloud workloads](/architecture/infra-reliability-guide/requirements).\n| **Note:** Planning for disaster recovery (DR) is related to reliability, and DR is essential for business continuity. For detailed guidance about DR planning, see the [Disaster recovery planning guide](/architecture/dr-scenarios-planning-guide).\n\nFactors that affect application reliability\n-------------------------------------------\n\nThe reliability of an application that's deployed in Google Cloud depends\non the following factors:\n\n- The internal design of the application.\n- The secondary applications or components that the application depends on.\n- Google Cloud infrastructure resources such as compute, networking, storage, databases, and security that the application runs on, and how the application uses the infrastructure.\n- Infrastructure capacity that you provision, and how the capacity scales.\n- The DevOps processes and tools that you use to build, deploy, and maintain the application, its dependencies, and the Google Cloud infrastructure.\n\nThese factors are summarized in the following diagram:\n\nAs shown in the preceding diagram, the reliability of an application that's\ndeployed in Google Cloud depends on multiple factors. The focus of this\nguide is the reliability of the Google Cloud infrastructure.\n\nWhat's next\n-----------\n\n- [Building blocks of reliability in Google Cloud](/architecture/infra-reliability-guide/building-blocks)\n- [Assess the reliability requirements for your cloud workloads](/architecture/infra-reliability-guide/requirements)\n- [Design reliable infrastructure for your workloads in Google Cloud](/architecture/infra-reliability-guide/design)\n- [Manage traffic and load for your workloads in Google Cloud](/architecture/infra-reliability-guide/traffic-load)\n- [Manage and monitor your Google Cloud infrastructure](/architecture/infra-reliability-guide/manage-and-monitor)\n\nContributors\n------------\n\nAuthors:\n\n- [Nir Tarcic](https://www.linkedin.com/in/nirtarcic) \\| Cloud Lifecycle SRE UTL\n- [Kumar Dhanagopal](https://www.linkedin.com/in/kumardhanagopal) \\| Cross-Product Solution Developer\n\n\u003cbr /\u003e\n\nOther contributors:\n\n- [Alok Kumar](https://www.linkedin.com/in/alok-kumar-0a51159) \\| Distinguished Engineer\n- [Andrew Fikes](https://www.linkedin.com/in/andrew-fikes) \\| Engineering Fellow, Reliability\n- [Chris Heiser](https://www.linkedin.com/in/christopher-heiser) \\| SRE TL\n- [David Ferguson](https://www.linkedin.com/in/davidsferguson) \\| Director, Site Reliability Engineering\n- [Joe Tan](https://www.linkedin.com/in/joe-tan-378a55a8) \\| Senior Product Counsel\n- [Krzysztof Duleba](https://www.linkedin.com/in/kduleba) \\| Principal Engineer\n- [Narayan Desai](https://www.linkedin.com/in/nldesai) \\| Principal SRE\n- [Sailesh Krishnamurthy](https://www.linkedin.com/in/saileshkrishnamurthy) \\| VP, Engineering\n- [Steve McGhee](https://www.linkedin.com/in/stevemcghee) \\| Reliability Advocate\n- [Sudhanshu Jain](https://www.linkedin.com/in/sudhanshujain) \\| Product Manager\n- [Yaniv Aknin](https://www.linkedin.com/in/yanivaknin) \\| Software Engineer\n\n\u003cbr /\u003e"]]