Multi-regional deployment on Compute Engine

Last reviewed 2024-02-20 UTC

This document provides a reference architecture for a multi-tier application that runs on Compute Engine VMs in multiple regions in Google Cloud. The document also provides guidance to help you build an architecture that uses other Google Cloud infrastructure services. It describes the design factors that you should consider when you build a multi-regional architecture for your cloud applications. The intended audience for this document is cloud architects.


Figure 1 shows an architecture for an application that runs in active-active mode in isolated stacks that are deployed across two Google Cloud regions. In each region, the application runs independently in three zones. The architecture is aligned with the multi-regional deployment archetype, which ensures that your Google Cloud topology is robust against zone and region outages and that it provides low latency for application users.

Multi-regional architecture using a global load balancer

Figure 1. A global load balancer routes user requests to regionally isolated application stacks.

The architecture is based on the infrastructure as a service (IaaS) cloud model. You provision the required infrastructure resources (compute, networking, and storage) in Google Cloud, and you retain full control over and responsibility for the operating system, middleware, and higher layers of the application stack. To learn more about IaaS and other cloud models, see PaaS vs. IaaS vs. SaaS vs. CaaS: How are they different?

The preceding diagram includes the following components:

Component Purpose
Global external load balancer

The global external load balancer receives and distributes user requests to the application. The global external load balancer advertises a single anycast IP address, but it's implemented as a large number of proxies on Google Front Ends (GFEs). Client requests are directed to the GFE that's closest to the client.

Depending on your requirements, you can use a global external Application Load Balancer or a global external proxy Network Load Balancer. For more information, see Choose a load balancer.

Regional managed instance groups (MIGs) for the web tier

The web tier of the application is deployed on Compute Engine VMs that are part of regional MIGs. These MIGs are the backends for the global load balancer.

Each MIG contains Compute Engine VMs in three different zones. Each of these VMs hosts an independent instance of the web tier of the application.

Regional internal load balancers

The internal load balancer in each region distributes traffic from the web tier VMs to the application tier VMs in that region.

Depending on your requirements, you can use a regional internal Application Load Balancer or Network Load Balancer. For more information, see Choose a load balancer.

Regional MIGs for the application tier

The application tier is deployed on Compute Engine VMs that are part of regional MIGs. The MIG in each region is the backend for the internal load balancer in that region.

Each MIG contains Compute Engine VMs in three different zones. Each VM hosts an independent instance of the application tier.

Third-party database deployed on Compute Engine VMs

A third-party database (such as PostgreSQL) is deployed on Compute Engine VMs in the two regions. You can set up cross-region replication for the databases and configure the database in each region to fail over to the database in the other region. The replication and failover capabilities depend on the database that you use.

Installing and managing a third-party database involves additional effort and operational cost for replication, applying updates, monitoring, and ensuring availability. You can avoid the overhead of installing and managing a third-party database and take advantage of built-in high availability (HA) features by using a fully managed database like a multi-region Spanner instance.

Virtual Private Cloud network and subnets All the Google Cloud resources in the architecture use a single VPC network that has subnets in two different regions.
Cloud Storage dual-region buckets Backups of application data are stored in dual-region Cloud Storage buckets. Alternatively, you can use Backup and DR Service to create, store, and manage the database backups.

Use cases

This section describes use cases for which a multi-regional deployment on Compute Engine is an appropriate choice.

Efficient migration of on-premises applications

You can use this reference architecture to build a Google Cloud topology to rehost (lift and shift) on-premises applications to the cloud with minimal changes to the applications. All the tiers of the application in this reference architecture are hosted on Compute Engine VMs. This approach lets you migrate on-premises applications efficiently to the cloud and take advantage of the cost benefits, reliability, performance, and operational simplicity that Google Cloud provides.

High availability for geo-dispersed users

We recommend a multi-regional deployment for applications that are business-critical and where high availability and robustness against region outages are essential. If a region becomes unavailable for any reason (even a large-scale disruption caused by a natural disaster), users of the application don't experience any downtime. Traffic is routed to the application in the other available regions. If data is replicated synchronously, the recovery time objective (RTO) is near zero.

Low latency for application users

If your users are within a specific geographical area, such as a continent, you can use a multi-regional deployment to achieve an optimal balance between availability and performance. When one of the regions has an outage, the global load balancer sends requests that originate in that region to another region. Users don't perceive significant performance impact because the regions are within a geographical area.

Design alternative

An architecture that uses a global load balancer (figure 1) supports certain features that help you to enhance the reliability of your deployments, such as edge caching using Cloud CDN. This section presents an alternative architecture that uses regional load balancers and Cloud DNS, as shown in figure 2. This alternative architecture supports the following additional features:

  • Transport Layer Security (TLS) termination in specified regions.
  • Ability to serve content from the region that you specify. However, that region might not be the best performing region at a given time.
  • A wider range of connection protocols if you use a Passthrough Network Load Balancer.

For more information about the differences between regional and global load balancers, see the following documentation:

Multi-regional architecture using regional load balancers and DNS.

Figure 2. Cloud DNS routes user requests to regional load balancers.

Like the architecture in figure 1, the architecture in figure 2 is robust against zone and region outages. A Cloud DNS public zone routes user requests to the appropriate region. Regional external load balancers receive user requests and distribute them across the web tier instances of the application within each region. The other components in this architecture are identical to the components in the global load balancer-based architecture.

For more information about building a multi-regional architecture that uses multiple regional load balancers and Cloud DNS, see Global load-balancing architectures using DNS routing policies.

Design considerations

This section provides guidance to help you use this reference architecture to develop an architecture that meets your specific requirements for system design, security and compliance, reliability, operational efficiency, cost, and performance.

System design

This section provides guidance to help you to choose Google Cloud regions for your multi-regional deployment and to select appropriate Google Cloud services.

Region selection

When you choose the Google Cloud regions where your applications must be deployed, consider the following factors and requirements:

  • Availability of Google Cloud services in each region. For more information, see Products available by location.
  • Availability of Compute Engine machine types in each region. For more information, see Regions and zones.
  • End-user latency requirements
  • Cost of Google Cloud resources
  • Cross-regional data transfer costs
  • Regulatory requirements

Some of these factors and requirements might involve trade-offs. For example, the most cost-efficient region might not have the lowest carbon footprint. For more information, see Select geographic zones and regions in the Google Cloud Architecture Framework.

Compute services

The reference architecture in this document uses Compute Engine VMs for all the tiers of the application. Depending on the requirements of your application, you can choose from other Google Cloud compute services:

  • You can run containerized applications in Google Kubernetes Engine (GKE) clusters. GKE is a container-orchestration engine that automates deploying, scaling, and managing containerized applications.
  • If you prefer to focus your IT efforts on your data and applications instead of setting up and operating infrastructure resources, then you can use serverless services like Cloud Run and Cloud Functions.

The decision of whether to use VMs, containers, or serverless services involves a trade-off between configuration flexibility and management effort. VMs and containers provide more configuration flexibility, but you're responsible for managing the resources. In a serverless architecture, you deploy workloads to a preconfigured platform that requires minimal management effort. For more information about choosing appropriate compute services for your workloads in Google Cloud, see Choose and manage compute in the Google Cloud Architecture Framework.

Storage services

The architectures shown in this document use regional Persistent Disk volumes for all the tiers. Persistent disks provide synchronous replication of data across two zones within a region.

Other storage options for multi-regional deployments include Cloud Storage dual-region or multi-region buckets. Objects stored in a dual-region or multi-region bucket are stored redundantly in at least two separate geographic locations. Metadata is written synchronously across regions, and data is replicated asynchronously. For dual-region buckets, you can use turbo replication, which ensures that objects are replicated across region pairs, with a recovery point objective (RPO) of 15 minutes. For more information, see Data availability and durability.

To store data that's shared across multiple VMs in a region, such as across all the VMs in the web tier or application tier, you can use a Filestore Enterprise instance. The data that you store in a Filestore Enterprise instance is replicated synchronously across three zones within the region. This replication ensures high availability and robustness against zone outages. You can store shared configuration files, common tools and utilities, and centralized logs in the Filestore instance, and mount the instance on multiple VMs.

If your database is Microsoft SQL Server, you can deploy a failover cluster instance (FCI) and use the fully managed Google Cloud NetApp Volumes to provide continuous availability (CA) SMB storage for the database.

When you design storage for your multi-regional workloads, consider the functional characteristics of the workloads, resilience requirements, performance expectations, and cost goals. For more information, see Design an optimal storage strategy for your cloud workload.

Database services

The reference architecture in this document uses a third-party database, like PostgreSQL, that's deployed on Compute Engine VMs. Installing and managing a third-party database involves effort and cost for operations like applying updates, monitoring and ensuring availability, performing backups, and recovering from failures.

You can avoid the effort and cost of installing and managing a third-party database by using a fully managed database service like Cloud SQL, AlloyDB for PostgreSQL, Bigtable, Spanner, or Firestore. These Google Cloud database services provide uptime service-level agreements (SLAs), and they include default capabilities for scalability and observability. If your workloads require an Oracle database, you can use Bare Metal Solution provided by Google Cloud. For an overview of the use cases that each Google Cloud database service is suitable for, see Google Cloud databases.

When you choose and set up the database for a multi-regional deployment, consider your application's requirements for cross-region data consistency, and be aware of the performance and cost trade-offs.

  • If the application requires strong consistency (all users must read the same data at all times), then the data must be replicated synchronously across all regions in the architecture. However, synchronous replication can lead to higher cost and decreased performance, because any data that's written must be replicated in real time across the regions before the data is available for read operations.
  • If your application can tolerate eventual consistency, then you can replicate data asynchronously. This can help improve performance because the data doesn't need to be replicated synchronously across regions. However, users in different regions might read different data because the data might not have been fully replicated at the time of the request.

Security and compliance

This section describes factors that you should consider when you use this reference architecture to design and build a multi-regional topology in Google Cloud that meets the security and compliance requirements of your workloads.

Protection against threats

To protect your application against threats like distributed denial of service (DDoS) attacks and cross-site scripting (XSS), you can use Google Cloud Armor security policies. Each policy is a set of rules that specifies certain conditions that should be evaluated and actions to take when the conditions are met. For example, a rule could specify that if the incoming traffic's source IP address matches a specific IP address or CIDR range, then the traffic must be denied. In addition, you can apply preconfigured web application firewall (WAF) rules. For more information, see Security policy overview.

External access for VMs

In the reference architecture that this document describes, the VMs that host the application tier, web tier, and databases don't need inbound access from the internet. Don't assign external IP addresses to those VMs. Google Cloud resources that have only a private, internal IP address can still access certain Google APIs and services by using Private Service Connect or Private Google Access. For more information, see Private access options for services.

To enable secure outbound connections from Google Cloud resources that have only private IP addresses, like the Compute Engine VMs in this reference architecture, you can use Cloud NAT.

VM image security

To ensure that your VMs use only approved images (that is, images with software that meets your policy or security requirements), you can define an organization policy that restricts the use of images in specific public image projects. For more information, see Setting up trusted image policies.

Service account privileges

In Google Cloud projects where the Compute Engine API is enabled, a default service account is created automatically. The default service account is granted the Editor IAM role (roles/editor) unless this behavior is disabled. By default, the default service account is attached to all VMs that you create by using the Google Cloud CLI or the Google Cloud console. The Editor role includes a broad range of permissions, so attaching the default service account to VMs creates a security risk. To avoid this risk, you can create and use dedicated service accounts for each application. To specify the resources that the service account can access, use fine-grained policies. For more information, see Limit service account privileges in "Best practices for using service accounts."

Data residency considerations

You can use regional load balancers to build a multi-regional architecture that helps you to meet data residency requirements. For example, a country in Europe might require that all user data be stored and accessed in data centers that are located physically within Europe. To meet this requirement, you can use the regional load balancer-based architecture in figure 2. In that architecture, the application runs in Google Cloud regions in Europe and you use Cloud DNS with a geofenced routing policy to route traffic through regional load balancers. To meet data residency requirements for the database tier, use a sharded architecture instead of replication across regions. With this approach, the data in each region is isolated, but you can't implement cross-region high availability and failover for the database.

More security considerations

When you build the architecture for your workload, consider the platform-level security best practices and recommendations provided in the Security foundations blueprint.


This section describes design factors that you should consider when you use this reference architecture to build and operate reliable infrastructure for your multi-regional deployments in Google Cloud.

MIG autoscaling

When you run your application on multiple regional MIGs, the application remains available during isolated zone outages or region outages. The autoscaling capability of stateless MIGs lets you maintain application availability and performance at predictable levels. To control the autoscaling behavior of your stateless MIGs, you can specify target utilization metrics, such as average CPU utilization. You can also configure schedule-based autoscaling for stateless MIGs. Stateful MIGs can't be autoscaled. For more information, see Autoscaling groups of instances.

VM autohealing

Sometimes the VMs that host your application might be running and available, but there might be issues with the application itself. It might freeze, crash, or not have sufficient memory. To verify whether an application is responding as expected, you can configure application-based health checks as part of the autohealing policy of your MIGs. If the application on a particular VM isn't responding, the MIG autoheals (repairs) the VM. For more information about configure autohealing, see Set up an application health check and autohealing.

VM placement

In the architecture that this document describes, the application tier and web tier run on Compute Engine VMs that are distributed across multiple zones. This distribution ensures that your application is robust against zone outages. To improve this robustness further, you can create a spread placement policy and apply it to the MIG template. When the MIG creates VMs, it places the VMs within each zone on different physical servers (called hosts), so your VMs are robust against failures of individual hosts. For more information, see Apply spread placement policies to VMs.

VM capacity planning

To make sure that capacity for Compute Engine VMs is available when required, you can create reservations. A reservation provides assured capacity in a specific zone for a specified number of VMs of a machine type that you choose. A reservation can be specific to a project, or shared across multiple projects. For more information about reservations, including billing considerations, see Reservations of Compute Engine zonal resources.

Persistent disk state

A best practice in application design is to avoid the need for stateful local disks. But if the requirement exists, you can configure your persistent disks to be stateful to ensure that the data is preserved when the VMs are repaired or recreated. However, we recommend that you keep the boot disks stateless, so that you can update them easily to the latest images with new versions and security patches. For more information, see Configuring stateful persistent disks in MIGs.

Data durability

You can use Backup and DR to create, store, and manage backups of the Compute Engine VMs. Backup and DR stores backup data in its original, application-readable format. When required, you can restore your workloads to production by directly using data from long-term backup storage without time-consuming data movement or preparation activities.

To store database backups and transaction logs, you can use regional Cloud Storage buckets, which provide lowest cost backup storage that's redundant across zones.

Compute Engine provides the following options to help you to ensure the durability of data that's stored in Persistent Disk volumes:

  • You can use standard snapshots to capture the point-in-time state of Persistent Disk volumes. The snapshots are stored redundantly in multiple regions, with automatic checksums to ensure the integrity of your data. Snapshots are incremental by default, so they use less storage space and you save money. Snapshots are stored in a Cloud Storage location that you can configure. For more recommendations about using and managing snapshots, see Best practices for Compute Engine disk snapshots.
  • Regional Persistent Disk volumes let you run highly available applications that aren't affected by failures in persistent disks. When you create a regional Persistent Disk volume, Compute Engine maintains a replica of the disk in a different zone in the same region. Data is replicated synchronously to the disks in both zones. If any one of the two zones has an outage, the data remains available.

Database availability

To implement cross-zone failover for the database in each region, you need a mechanism to identify failures of the primary database and a process to fail over to the standby database. The specifics of the failover mechanism depend on the database that you use. You can set up an observer instance to detect failures of the primary database and orchestrate the failover. You must configure the failover rules appropriately to avoid a split-brain situation and prevent unnecessary failover. For example architectures that you can use to implement failover for PostgreSQL databases, see Architectures for high availability of PostgreSQL clusters on Compute Engine.

More reliability considerations

When you build the cloud architecture for your workload, review the reliability-related best practices and recommendations that are provided in the following documentation:

Cost optimization

This section provides guidance to optimize the cost of setting up and operating a multi-regional Google Cloud topology that you build by using this reference architecture.

VM machine types

To help you optimize the resource utilization of your VM instances, Compute Engine provides machine type recommendations. Use the recommendations to choose machine types that match your workload's compute requirements. For workloads with predictable resource requirements, you can customize the machine type to your needs and save money by using custom machine types.

VM provisioning model

If your application is fault tolerant, then Spot VMs can help to reduce your Compute Engine costs for the VMs in the application and web tiers. The cost of Spot VMs is significantly lower than regular VMs. However, Compute Engine might preemptively stop or delete Spot VMs to reclaim capacity. Spot VMs are suitable for batch jobs that can tolerate preemption and don't have high availability requirements. Spot VMs offer the same machine types, options, and performance as regular VMs. However, when the resource capacity in a zone is limited, MIGs might not be able to scale out (that is, create VMs) automatically to the specified target size until the required capacity becomes available again.

Resource utilization

The autoscaling capability of stateless MIGs enables your application to handle increases in traffic gracefully, and it helps you to reduce cost when the need for resources is low. Stateful MIGs can't be autoscaled.

Third-party licensing

When you migrate third-party workloads to Google Cloud, you might be able to reduce cost by bringing your own licenses (BYOL). For example, to deploy Microsoft Windows Server VMs, instead of using a premium image that incurs additional cost for the third-party license, you can create and use a custom Windows BYOL image. You then pay only for the VM infrastructure that you use on Google Cloud. This strategy helps you continue to realize value from your existing investments in third-party licenses. If you decide to use the BYOL approach, we recommend that you do the following:

  • Provision the required number of compute CPU cores independently of memory by using custom machine types. By doing this, you limit the third-party licensing cost to the number of CPU cores that you need.
  • Reduce the number of vCPUs per core from 2 to 1 by disabling simultaneous multithreading (SMT), and reduce your licensing costs by 50%.

More cost considerations

When you build the architecture for your workload, also consider the general best practices and recommendations that are provided in Google Cloud Architecture Framework: Cost optimization.

Operational efficiency

This section describes the factors that you should consider when you use this reference architecture to design and build a multi-regional Google Cloud topology that you can operate efficiently.

VM configuration updates

To update the configuration of the VMs in a MIG (such as the machine type or boot-disk image), you create a new instance template with the required configuration and then apply the new template to the MIG. The MIG updates the VMs by using the update method that you choose: automatic or selective. Choose an appropriate method based on your requirements for availability and operational efficiency. For more information about these MIG update methods, see Apply new VM configurations in a MIG.

VM images

For your MIG instance templates, instead of using Google-provided public images, we recommend that you create and use custom images that contain the configurations and software that your applications require. You can group your custom images into a custom image family. An image family always points to the most recent image in that family, so your instance templates and scripts can use that image without you having to update references to a specific image version.

Deterministic instance templates

If the instance templates that you use for your MIGs include startup scripts to install third-party software, make sure that the scripts explicitly specify software-installation parameters such as the software version. Otherwise, when the MIG creates the VMs, the software that's installed on the VMs might not be consistent. For example, if your instance template includes a startup script to install Apache HTTP Server 2.0 (the apache2 package), then make sure that the script specifies the exact apache2 version that should be installed, such as version 2.4.53. For more information, see Deterministic instance templates.

More operational considerations

When you build the architecture for your workload, consider the general best practices and recommendations for operational efficiency that are described in Google Cloud Architecture Framework: Operational excellence.

Performance optimization

This section describes the factors that you should consider when you use this reference architecture to design and build a multi-regional topology in Google Cloud that meets the performance requirements of your workloads.

VM placement

For workloads that require low inter-VM network latency, you can create a compact placement policy and apply it to the MIG template. When the MIG creates VMs, it places the VMs on physical servers that are close to each other. For more information, see Reduce latency by using compact placement policies.

VM machine types

Compute Engine offers a wide range of predefined and customizable machine types that you can choose from depending on your cost and performance requirements. The machine types are grouped into machine series and families. The following table provides a summary of the recommended machine families and series for different workload types:

Requirement Recommended machine family Example machine series
Best price-performance ratio for a variety of workloads General-purpose machine family C3, C3D, E2, N2, N2D, Tau T2D, Tau T2A
Highest performance per core and optimized for compute-intensive workloads Compute-optimized machine family C2, C2D, H3
High memory-to-vCPU ratio for memory-intensive workloads Memory-optimized machine family M3, M2, M1
GPUs for massively parallelized workloads Accelerator-optimized machine family A2, G2

For more information, see Machine families resource and comparison guide.

VM multithreading

Each virtual CPU (vCPU) that you allocate to a Compute Engine VM is implemented as a single hardware multithread. By default, two vCPUs share a physical CPU core. For workloads that are highly parallel or that perform floating point calculations (such as genetic sequence analysis, and financial risk modeling), you can improve performance by reducing the number of threads that run on each physical CPU core. For more information, see Set the number of threads per core.

Network Service Tiers

Network Service Tiers lets you optimize the network cost and performance of your workloads. You can choose from the following tiers:

  • Premium Tier uses Google's highly reliable global backbone to help you achieve minimal packet loss and latency. Traffic enters and leaves the Google network at a global edge point of presence (PoP) that's closest to your end user's ISP. We recommend using Premium Tier as the default tier for optimal performance. Premium Tier supports both regional external IP addresses and global external IP addresses for VMs and load balancers.
  • Standard Tier is available only for resources that use regional external IP addresses. Traffic enters and leaves the Google network at an edge PoP that's closest to the region where your Google Cloud workload runs. The pricing for Standard Tier is lower than Premium Tier. Standard Tier is suitable for traffic that isn't sensitive to packet loss and that doesn't have low latency requirements.


If your application serves static website assets and if your architecture includes a global external Application Load Balancer (as shown in figure 1), then you can use Cloud CDN to cache regularly accessed static content closer to your users. Cloud CDN can help to improve performance for your users, reduce your infrastructure resource usage in the backend, and reduce your network delivery costs. For more information, see Faster web performance and improved web protection for load balancing.

More performance considerations

When you build the architecture for your workload, consider the general best practices and recommendations that are provided in Google Cloud Architecture Framework: Performance optimization.

What's next


Author: Kumar Dhanagopal | Cross-Product Solution Developer

Other contributors: