将应用部署到 Google Cloud的生产环境后,您可能需要修改其使用的基础架构。例如,您可能需要更改虚拟机的机器类型或更改 Cloud Storage 存储桶的存储类别。Google Cloud 基础架构可靠性指南的这一部分总结了可供您遵循的变更管理指南,以降低基础架构资源的可靠性风险。本部分还介绍了如何监控基础架构的可用性。 Google Cloud
逐步部署基础架构更改
如果您需要尽可能更改 Google Cloud 基础架构,请逐步将更改部署到生产环境中。例如,如果您需要更改虚拟机的机器类型,请将更改部署到一个可用区中的一些虚拟机,并监控更改的效果。如果您发现任何问题,请将基础架构快速还原到先前的稳定状态。诊断并解决问题,然后重启渐进式部署流程。验证工作负载是否按预期运行后,请在整个基础架构中逐步部署更改。
您可以使用 Google Cloud Service Health 信息中心监控所有区域的 Google Cloud 服务的当前状态。您还可以查看每项服务的基础架构故障(称为突发事件)的历史记录。历史记录页面提供每个突发事件的详细信息,例如突发事件时长、受影响的可用区和区域、受影响的服务,以及任何建议的解决方法。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2024-11-20。"],[[["\u003cp\u003eDeploy infrastructure changes to production progressively, starting with a small subset of resources and monitoring the effects before expanding.\u003c/p\u003e\n"],["\u003cp\u003eExercise rigorous control over changes to global resources like VPC networks and global load balancers, as they can become single points of failure.\u003c/p\u003e\n"],["\u003cp\u003eUse the Google Cloud Service Health Dashboard to monitor the current status of Google Cloud services and view historical infrastructure failures.\u003c/p\u003e\n"],["\u003cp\u003eUtilize Personalized Service Health to view incidents relevant to your specific project and set up alerts, and use the provided API to access incident information.\u003c/p\u003e\n"]]],[],null,["# Manage and monitor your Google Cloud infrastructure\n\nAfter you deploy an application to production in Google Cloud, you might\nneed to modify the infrastructure that it uses. For example, you might need to\nchange the machine types of your VMs or change the storage class of the\nCloud Storage buckets. This part of the\n[Google Cloud infrastructure reliability guide](/architecture/infra-reliability-guide)\nsummarizes change-management guidelines that you can follow to reduce the\nreliability risk of the infrastructure resources. This part also describes how\nyou can monitor the availability of Google Cloud infrastructure.\n\nDeploy infrastructure changes progressively\n-------------------------------------------\n\nWhen you need to change your Google Cloud infrastructure, as much as\npossible, deploy the changes to production progressively. For example, if you\nneed to change the machine types of the VMs, deploy the changes to a few VMs in\none zone, and monitor the effects of the changes. If you observe any issues,\nrevert the infrastructure quickly to the previous stable state. Diagnose and\nresolve the issues, and then restart the progressive deployment process. After\nverifying that your workload runs as expected, gradually deploy the changes\nacross all of your infrastructure.\n\nControl changes to global resources\n-----------------------------------\n\nWhen you modify global resources such as VPC networks and global load\nbalancers, take extra care to verify the changes before deploying them to\nproduction.\n\nBecause global resources are resilient to zone and\n[region](/docs/geography-and-regions#regions_and_zones)\noutages, you might\ndecide to use single instances of certain global resources in your architecture.\nIn such deployments, the global resources can become single points of failure. For example, if you\ninadvertently misconfigure a forwarding rule of your global load balancer, the\nfrontend can stop receiving or processing user requests. Effectively, the\napplication is unavailable to users in this case though the backend is intact.\nTo avoid such situations, exercise rigorous control over changes to global\nresources. For example, in your change-review process, you can classify any\nmodifications to global resources as high-risk changes that additional reviewers\nmust verify and approve.\n\nMonitor availability of Google Cloud infrastructure\n---------------------------------------------------\n\nYou can monitor the current status of the Google Cloud services across\nall the regions by using the\n[Google Cloud Service Health Dashboard](https://status.cloud.google.com/).\nYou can also view a\n[history](https://status.cloud.google.com/summary)\nof the infrastructure failures (called *incidents*) for each service. The\nhistory page provides the details of each incident, such as the incident\nduration, affected zones and regions, affected services, and any recommended\nworkarounds.\n\nYou can also view incidents relevant to your project using\n[Personalized Service Health](https://console.cloud.google.com/servicehealth/incidents).\nService Health also lets you request incident information using an API on a\nper-project or per-organization basis and lets you configure alerts.\n\nGoogle provides regular updates about the status of each incident, including an\nestimated time for the next update. You can programmatically get status updates\nfor incidents by using an RSS feed. For more information, see\n[Incidents and the Google Cloud Service Health Dashboard](/support/docs/dashboard).\n| **Note:** Even when there's no infrastructure outage, your application might be unavailable due to errors in the application or configuration issues. For example, a software update might have caused the app servers to crash, or an administrator might have inadvertently deleted the load balancer forwarding rules. For help with troubleshooting issues with specific Google Cloud resources, see the documentation for the appropriate service."]]