Migrate across Google Cloud regions: Design resilient single-region environments on Google Cloud

Last reviewed 2023-12-08 UTC

This document helps you design resilient, single-region environments on Google Cloud. This document is useful if you're planning to migrate a single-region environment or if you're evaluating the opportunity to do so in the future and want to explore what it might look like.

This document is part of a series:

This document aims to provide guidance about how to design resilient, single-region environments on Google Cloud, and it focuses on the following architectural components:

The guidance in this document assumes that you're designing and implementing single-region environments. If you use a single-region environment now, in the future you can migrate to a multi-region environment. If you're considering a future migration and evolution of your zonal and single-region environments to multi-region environments, see Migrate across Google Cloud regions: Get started.

Properties of different deployment archetypes

Google Cloud provides services from different regions around the world. Each region is a physically independent geographic area that consists of deployment areas called zones. For more information about Google Cloud regions and zones, see Geography and locations.

When you design your Google Cloud environment, you can choose between the following deployment archetypes, presented in order of increasing reliability and operational overhead:

  • Zonal archetype: You provision Google Cloud resources in a single zone within a region, and you use zonal services where they're available. If zonal services aren't available, you use regional services.
  • Single-region archetype: You provision Google Cloud resources in multiple zones within a region, and you use regional services when possible.
  • Multi-region archetype: You provision Google Cloud resources in multiple zones across different regions. Zonal resources are provisioned in one or more zones in each region.

The preceding deployment archetypes have different reliability properties, and you can use them to provide the reliability guarantees that your environment needs. For example, a multi-region environment is more likely to survive a regional outage compared to a single-region or zonal environment. For more information about the reliability properties of each architectural archetype, see How to leverage zones and regions to achieve reliability and the Google Cloud infrastructure reliability guide.

Designing, implementing, and operating an environment based on these deployment archetypes requires different levels of effort due to the cost and complexity properties of each archetype. For example, a zonal environment might be cheaper and easier to design, implement, and operate compared to a regional or a multi-region environment. The potentially lower effort and cost of the zonal environment is because of the additional overhead that you have to manage to coordinate workloads, data, and processes that reside in different regions.

The following table summarizes the resource distribution, the reliability properties, and the complexity of each architectural archetype. It also describes the effort that's required to design and implement an environment based on each.

Architectural archetype name Resource distribution Helps to resist Design complexity
Zonal environment In a single zone Resource failures Requires coordination inside a single zone
Single-region environment Across multiple zones, in a single region Resource failures, zonal outages Requires coordination across multiple zones, in a single region
Multi-region environment Across multiple zones, across multiple regions Resource failures, zonal outages, regional outages, multi-region outages Requires coordination across multiple zones, across multiple regions

Choose deployment archetypes for your environments

To choose the architectural archetype that best fits your needs, do the following:

  1. Define the failure models that you want to guard against.
  2. Evaluate the deployment archetypes to determine what will best fit your needs.

Define failure models

To define failure models, consider the following questions:

  • Which components of your environment need failure models? Failure models can apply to anything that you provision or deploy on Google Cloud. A failure model can apply to an individual, or you can apply a failure model to all resources in an entire zone or region. We recommend that you apply a failure model to anything that provides you value, such as workloads, data, processes, and any Google Cloud resource.
  • What are your high availability, business continuity, and disaster recovery requirements for these components? Each component of your environment might have its own service level objectives (SLOs) that define the acceptable service levels for that component, and its own disaster recovery requirements. For example, the Compute Engine SLA indicates that if you need to achieve more than 99.5% of monthly uptime, you need to provision instances in multiple zones across a single region. For more information, see the Disaster recovery planning guide.
  • How many failure models do you need to define? In a typical environment, not all components have to provide the same reliability guarantees. If you offer guarantees for higher uptime and stronger resilience, you usually have to expend more effort and resources. When you define your failure models, we recommend that you consider an approach where you define multiple failure models for each component, and not just one for all your components. For example, business-critical workloads usually need to offer higher reliability, although it might be acceptable to offer lesser reliability guarantees for other, less critical workloads.
  • How many resources do the failure models need in order to guard against failures? To guard against the failure models that you defined, you expend resources such as the time and cost required for people to design, provision, and configure protection mechanisms and automated processes. We recommend that you assess how many resources you need to guard against each failure model that you define.
  • How will you detect that a failure is happening? Being able to detect that a failure is happening or is about to happen is critical so that you can start mitigation, recovery, and reconciliation processes. For example, you can configure Google Cloud Observability to alert you about degraded performance.
  • How can you test the failure models that you're defining? When you define failure models, we recommend that you think about how to continuously test each model to verify that it effectively guards against the failures that the models are aimed at. For example, you can inject faults in your environments, or to assess the ability of your environments to tolerate failures, you can adopt chaos engineering.
  • How much impact do you expect if a particular failure model occurs? To gain an understanding of the impact that a failure might have on your business, we recommend that, for each failure model, you estimate the consequences of each failure the model is designed against. This understanding is useful in establishing priorities and recovery orders so that you and your processes deal with the most critical components first.
  • How long do you expect the failures to last in the failure models that you're defining? The duration of a failure can greatly affect mitigation and recovery plans. Therefore, when you define failure models, we recommend that you account for how much time a failure can last. When you consider how much time that a failure can last, also consider how much time it takes to: identify a failure, reconcile the failure, and to restore the resources that failed.

For more considerations about failure models and how to design a reliable disaster recovery plan, see Architecting disaster recovery for cloud infrastructure outages.

Evaluate deployment archetypes

After you define the failure models that you want to guard against, you evaluate the deployment archetypes to determine what will best fit your needs. When you evaluate the deployment archetypes, consider the following questions:

  • How many deployment archetypes do you need? You don't have to choose just one architectural archetype to fit all your environments. Instead, you can implement a hybrid approach where you pick multiple deployment archetypes according to the reliability guarantees that you need in order to guard against the failure models you defined. For example, if you defined two failure models—one that requires a zonal environment, and one that requires a regional environment—you might want to choose separate deployment archetypes to guard against each failure model. If you choose multiple deployment archetypes, we recommend that you evaluate the potentially increasing complexity of designing, implementing, and operating multiple environments.
  • How many resources do you need to design and implement environments based on the deployment archetypes? Designing and implementing any kind of environment requires resources and effort. We recommend that you assess how many resources you think that you'll need in order to design and implement each environment based on the archetype you choose. When you have a complete understanding of how many resources you need, you can balance the trade-offs between the reliability guarantees that each architectural archetype offers, and the cost and the complexity of designing, implementing, and operating environments based on those archetypes.
  • Do you expect to migrate an environment based on one architectural archetype to an environment based on a different archetype? In the future, you might migrate workloads, data, and processes from one Google Cloud environment to a different Google Cloud environment. For example, you might migrate from a zonal environment to a regional environment.
  • How business-critical are the environments that you're designing and implementing? Business-critical environments likely need more reliability guarantees. For example, you might choose to design and implement a multi-region environment for business-critical workloads, data, and processes, and design a zonal or regional environment for less critical workloads, data, and processes.
  • Do you need the features that are offered by particular architectural archetypes for certain environments? Aside from the reliability guarantees that each architectural archetype offers, the archetypes also offer different scalability, geographical proximity, latency, and data locality guarantees. We recommend that you consider those guarantees when you choose the deployment archetypes for your environments.

Along with the technical aspects of the failure modes that you defined by following the preceding guidance, we recommend that you consider any non-functional requirements such as regulatory, locality, and sovereignty requirements. Those requirements can restrict the options that are available to you. For example, if you need to meet regulatory requirements that mandate the usage of a specific region, then you have to design and implement either a single-region environment, or a zonal environment in that region.

Choose a Google Cloud region for your environment

When you start designing your single-region environments, you have to determine the region that best fits the requirements of each environment. The following sections describe these two categories of selection criteria:

  • Functional criteria. These criteria are about which Google Cloud products a particular region offers, and whether a particular region meets your latency and geographical proximity to users and other environments outside Google Cloud. For example, if your workloads and data have latency requirements for your users or other environments outside Google Cloud, you might need to choose the region that's closest to your users or other environments to minimize that latency.
  • Non-functional criteria. These criteria are about the product prices that are associated with specific regions, carbon footprint requirements, and mandatory requirements and regulations that are in place for your business. For example, highly regulated markets such as banking and public sector have very stringent and specific requirements about data and workload locality, and how they share the cloud provider infrastructure with other customers.

If you choose a particular Google Cloud region now, in the future you can migrate to different regions or to a multi-region environment. If you're considering a future migration to other regions, see Migrate across Google Cloud regions: Get started.

Evaluate functional criteria

To evaluate functional criteria, consider the following questions:

  • What are your geographical proximity requirements? When you choose a Google Cloud region, you might need to place your workloads, data, and processes near your users or your environments outside Google Cloud, such as your on-premises environments. For example, if you're targeting a user base that's concentrated in a particular geographic area, we recommend that you choose a Google Cloud region that's closest to that geographic area. Choosing a Google Cloud region that best fits your geographical proximity requirements lets your environments guarantee lower latency and lower reaction times to requests from your users and from your environments outside Google Cloud. Tools like the Google Cloud latency dashboard, and unofficial tools such as GCPing and the Google Cloud Region Picker can give you a high-level idea of the latency characteristics of Google Cloud regions. However, we recommend that you perform a comprehensive assessment to evaluate if the latency properties fit your requirements, workloads, data, and processes.
  • Which of the regions that you want to use offer the products that you need? We recommend that you assess the products that are available in each Google Cloud region, and which regions provide the services that you need to design and implement your environments. For more information about which products are available in each region and their availability timelines, see Cloud locations. Additionally, some products might not offer all their features in every region where they're available. For example, the available regions and zones for Compute Engine offer specific machine types in specific Google Cloud regions. For more information about what features each product offers in each region, see the product documentation.
  • Are the resources that you need in each Google Cloud region within the per-region quota limits? Google Cloud uses quotas to restrict how much of a shared Google Cloud resource that you can use. Some quotas are global and apply to your usage of the resource anywhere in Google Cloud, while others are regional or zonal and apply to your usage of the resource in a specific Google Cloud region. For example, most Compute Engine resource usage quotas, such as the number of virtual machines that you can create, are regional. For more information about quotas and how to increase them, see Working with quotas.

Evaluate non-functional criteria

To evaluate non-functional criteria, consider the following questions:

  • Do you prefer a low carbon footprint? Google Cloud continuously invests in sustainability and in carbon-free energy for Google Cloud regions, and it's committed to carbon free energy for all cloud regions. Google Cloud regions have different carbon footprints. For information about the carbon footprint of each Google Cloud region, and how to incorporate carbon-free energy in your location strategy, see Carbon free energy for Google Cloud regions.
  • Do your environments need to meet particular regulations? Governments and national and supranational entities often strictly regulate certain markets and business areas, such as banking and public sector. These regulations might mandate that workloads, data, and processes reside only in certain geographic regions. For example, your environments might need to comply with data, operational, and software sovereignty requirements to guarantee certain levels of control and transparency for sensitive data and workloads running in the cloud. We recommend that you assess your current and upcoming regulatory requirements when choosing the Google Cloud regions for your environments, and select the Google Cloud regions that best fit your regulatory requirements.

Design and build your single-region environments

To design a single-region environment, do the following:

  1. Build your foundation on Google Cloud.
  2. Provision and configure computing resources.
  3. Provision and configure data storage resources.
  4. Provision and configure data analytics resources.

When you design your environment, consider the following general design principles:

  • Provision regional resources. Many Google Cloud products support provisioning resources in multiple zones across a region. We recommend that you provision regional resources instead of zonal resources when possible. Theoretically, you might be able to provision zonal resources in multiple zones across a region and manage them yourself to achieve a higher reliability. However, that configuration wouldn't fully benefit from all the reliability features of the Google infrastructure that underpins Google Cloud services.
  • Verify that the environments work as expected with the failure model assumptions. When you design and implement your single-region environments, we recommend that you verify whether those environments meet the requirements to guard against the failure models that you're considering, before you promote those environments as part of your production environment. For example, you can simulate zonal outages to verify that your single-region environments can survive with minimal disruption.

For more general design principles for designing reliable single- and multi-region environments and for information about how Google achieves better reliability with regional and multi-region services, see Architecting disaster recovery for cloud infrastructure outages: Common themes.

Build your foundation on Google Cloud

To build the foundation of your single-region environments, see Migration to Google Cloud: Build your foundation. The guidance in that document is aimed at building a foundation for migrating workloads, data, and processes to Google Cloud, but it's also applicable to build the foundation for your single-region environments. After you read that document, continue to read this document.

After you build your foundation on Google Cloud, you design and implement security controls and boundaries. Those security measures help to ensure that your workloads, data, and processes stay inside their respective regions. The security measures also help to ensure that your resources don't leak anything to other regions due to bugs, misconfigurations, or malicious attacks.

Provision and configure computing resources

After you build the foundation of your single-region environments, you provision and configure computing resources. The following sections describe the Google Cloud computing products that support regional deployments.

Compute Engine

Compute Engine is Google Cloud's infrastructure as a service (IaaS). It uses Google's worldwide infrastructure to offer virtual machines and related services to customers.

Compute Engine resources are either zonal, such as virtual machines or zonal Persistent Disk; regional, such as static external IP addresses; or global, such as Persistent Disk snapshots. For more information about the zonal, regional, and global resources that Compute Engine supports, see Global, regional, and zonal resources.

To allow for better flexibility and resource management of physical resources, Compute Engine decouples zones from their physical resources. For more information about this abstraction and what it might imply for you, see Zones and clusters.

To increase the reliability of your environments that use Compute Engine, consider the following:

  • Regional managed instance groups (MIGs). Compute Engine virtual machines are zonal resources, so they will be unavailable in the event of a zonal outage. To mitigate this issue, Compute Engine lets you create regional MIGs that provision virtual machines across multiple zones in a region automatically, according to demand and regional availability. If your workloads are stateful, you can also create regional stateful MIGs to preserve stateful data and configurations. Regional MIGs support simulating zonal failures. For information about simulating a zonal failure when using a regional MIG, see Simulate a zone outage for a regional MIG. For information about how regional MIGs compare to other deployment options, see Choose a Compute Engine deployment strategy for your workload.
  • Target distribution shape. Regional MIGs distribute virtual machines according to the target distribution shape. To ensure that virtual machine distribution doesn't differ by more than one unit between any two zones in a region, we recommend that you choose the EVEN distribution shape when you create regional MIGs. For information about the differences between target distribution shapes, see Comparison of shapes.
  • Instance templates. To define the virtual machines to provision, MIGs use a global resource type called instance templates. Although instance templates are global resources, they might reference zonal or regional resources. When you create instance templates, we recommend that you reference regional resources over zonal resources when possible. If you use zonal resources, we recommend that you assess the impact of using them. For example, if you create an instance template that references a Persistent Disk volume that's available only in a given zone, you can't use that template in any other zones because the Persistent Disk volume isn't available in those other zones.
  • Configure load balancing and scaling. Compute Engine supports load balancing traffic between Compute Engine instances, and it supports autoscaling to automatically add or remove virtual machines from MIGs, according to demand. To increase the reliability and the flexibility of your environments, and to avoid the management burden of self-managed solutions, we recommend that you configure load balancing and autoscaling. For more information about configuring load balancing and scaling for Compute Engine, see Load balancing and scaling.
  • Configure resource reservations. To ensure that your environments have the necessary resources when you need them, we recommend that you configure resource reservations to provide assurance in obtaining capacity for zonal Compute Engine resources. For example, if there is a zonal outage, you might need to provision virtual machines in another zone to supply the necessary capacity to make up for the ones that are unavailable because of the outage. Resource reservations ensure that you have the resources available to provision the additional virtual machines.
  • Use zonal DNS names. To mitigate the risk of cross-regional