Regional deployment on Compute Engine

Last reviewed 2025-08-12 UTC

This document provides a reference architecture for a multi-tier application that runs on Compute Engine VMs in multiple zones within a Google Cloud region. You can use this reference architecture to efficiently rehost (lift and shift) on-premises applications to the cloud with minimal changes to the applications. The document also describes the design factors that you should consider when you build a regional architecture for your cloud applications. The intended audience for this document is cloud architects.

Architecture

The following diagram shows an architecture for an application that runs in active-active mode in isolated stacks that are deployed across three Google Cloud zones within a region. The architecture is aligned with the regional deployment archetype.

An application runs in active-active-mode in isolated stacks that are deployed across three Google Cloud zones within a region.

The architecture is based on the infrastructure as a service (IaaS) cloud model. You provision the required infrastructure resources (compute, networking, and storage) in Google Cloud. You retain full control over the infrastructure and responsibility for the operating system, middleware, and higher layers of the application stack. To learn more about IaaS and other cloud models, see PaaS vs. IaaS vs. SaaS vs. CaaS: How are they different?.

The preceding diagram includes the following components:

Component	Purpose
Regional external load balancer	The regional external load balancer receives and distributes user requests to the web tier VMs. Use an appropriate load balancer type depending on the traffic type and other requirements. For example, if the backend consists of web servers (as shown in the preceding architecture), then use an Application Load Balancer to forward HTTP(S) traffic. To load-balance TCP traffic, use a Network Load Balancer. For more information, see Choose a load balancer.
Regional managed instance group (MIG) for the web tier	The web tier of the application is deployed on Compute Engine VMs that are part of a regional MIG. The MIG is the backend for the regional external load balancer. The MIG contains Compute Engine VMs in three different zones. Each of these VMs hosts an independent instance of the web tier of the application.
Regional internal load balancer	The regional internal load balancer distributes traffic from the web tier VMs to the application tier VMs. Depending on your requirements, you can use a regional internal Application Load Balancer or Network Load Balancer. For more information, see Choose a load balancer.
Regional MIG for the application tier	The application tier is deployed on Compute Engine VMs that are part of a regional MIG, which is the backend for the internal load balancer. The MIG contains Compute Engine VMs in three different zones. Each VM hosts an independent instance of the application tier.
Third-party database deployed on a Compute Engine VM	The architecture in this document shows a third-party database (like PostgreSQL) that's deployed on a Compute Engine VM. You can deploy a standby database in another zone. The database replication and failover capabilities depend on the database that you use. Installing and managing a third-party database involves additional effort and operational cost for applying updates, monitoring, and ensuring availability. You can avoid the overhead of installing and managing a third-party database and take advantage of built-in high availability (HA) features by using a fully managed database service like Cloud SQL or AlloyDB for PostgreSQL. For more information about managed database options, see Database services later in this guide.
Virtual Private Cloud network and subnet	All the Google Cloud resources in the architecture use a single VPC network and subnet. Depending on your requirements, you can choose to build an architecture that uses multiple VPC networks or multiple subnets. For more information, see Deciding whether to create multiple VPC networks in "Best practices and reference architectures for VPC design."
Cloud Storage dual-region bucket	Application and database backups are stored in a dual-region Cloud Storage bucket. If a zone or region outage occurs, your application and data aren't lost. Alternatively, you can use Backup and DR Service to create, store, and manage the database backups.

Products used

This reference architecture uses the following Google Cloud products:

Compute Engine: A secure and customizable compute service that lets you create and run VMs on Google's infrastructure.
Cloud Load Balancing: A portfolio of high performance, scalable, global and regional load balancers.
Cloud Storage: A low-cost, no-limit object store for diverse data types. Data can be accessed from within and outside Google Cloud, and it's replicated across locations for redundancy.
Virtual Private Cloud (VPC): A virtual system that provides global, scalable networking functionality for your Google Cloud workloads. VPC includes VPC Network Peering, Private Service Connect, private services access, and Shared VPC.

Use cases

This section describes use cases for which a regional deployment on Compute Engine is an appropriate choice.

Efficient migration of on-premises applications

You can use this reference architecture to build a Google Cloud topology to rehost (lift and shift) on-premises applications to the cloud with minimal changes to the applications. All the tiers of the application in this reference architecture are hosted on Compute Engine VMs. This approach lets you migrate on-premises applications efficiently to the cloud and take advantage of the cost benefits, reliability, performance, and operational simplicity that Google Cloud provides.

Highly available application with users within a geographic area

We recommend a regional deployment architecture for applications that need robustness against zone outages but can tolerate some downtime caused by region outages. If any part of the application stack fails, the application continues to run if at least one functioning component with adequate capacity exists in every tier. If a zone outage occurs, the application stack continues to run in the other zones.

Low latency for application users

If all the users of an application are within a single geographic area, such as a single country, a regional deployment architecture can help improve the user-perceived performance of the application. You can optimize network latency for user requests by deploying the application in the Google Cloud region that's closest to your users.

Low-latency networking between application components

A single-region architecture might be well suited for applications such as batch computing that need low-latency and high-bandwidth network connections among the compute nodes. All the resources are in a single Google Cloud region, so inter-resource network traffic remains within the region. The inter-resource network latency is low, and you don't incur cross-region data transfer costs. Intra-region network costs still apply.

Compliance with data residency requirements

You can use a single-region architecture to build a topology that helps you to meet data residency requirements. For example, a country in Europe might require that all user data be stored and accessed in data centers that are located physically within Europe. To meet this requirement, you can run the application in a Google Cloud region in Europe.

Design considerations

This section provides guidance to help you use this reference architecture to develop an architecture that meets your specific requirements for system design, security and compliance, reliability, operational efficiency, cost, and performance.

System design

This section provides guidance to help you to choose Google Cloud regions for your regional deployment and to select appropriate Google Cloud services.

Region selection

When you choose the Google Cloud regions where your applications must be deployed, consider the following factors and requirements:

Availability of Google Cloud services in each region. For more information, see Products available by location.
Availability of Compute Engine machine types in each region. For more information, see Regions and zones.
End-user latency requirements.
Cost of Google Cloud resources.
Cross-regional data transfer costs.
Regulatory requirements.

Some of these factors and requirements might involve trade-offs. For example, the most cost-efficient region might not have the lowest carbon footprint. For more information, see Best practices for Compute Engine regions selection.

Compute infrastructure

The reference architecture in this document uses Compute Engine VMs for certain tiers of the application. Depending on the requirements of your application, you can choose from other Google Cloud compute services:

Containers: You can run containerized applications in Google Kubernetes Engine (GKE) clusters. GKE is a container-orchestration engine that automates deploying, scaling, and managing containerized applications.
Serverless: If you prefer to focus your IT efforts on your data and applications instead of setting up and operating infrastructure resources, then you can use serverless services like Cloud Run.

The decision of whether to use VMs, containers, or serverless services involves a trade-off between configuration flexibility and management effort. VMs and containers provide more configuration flexibility, but you're responsible for managing the resources. In a serverless architecture, you deploy workloads to a preconfigured platform that requires minimal management effort. For more information about choosing appropriate compute services for your workloads in Google Cloud, see Hosting Applications on Google Cloud.

Storage services

The architecture shown in this document uses regional Persistent Disk volumes for all the tiers. Persistent disks provide synchronous replication of data across two zones within a region.

Google Cloud Hyperdisk provides better performance, flexibility, and efficiency than Persistent Disk. With Hyperdisk Balanced, you can provision IOPS and throughput separately and dynamically, which lets you tune the volume to a wide variety of workloads.

For low-cost storage that's replicated across multiple locations, you can use Cloud Storage regional, dual-region, or multi-region buckets.

Data in regional buckets is replicated synchronously across the zones in the region.
Data in dual-region or multi-region buckets is stored redundantly in at least two separate geographic locations. Metadata is written synchronously across regions, and data is replicated asynchronously. For dual-region buckets, you can use turbo replication, which ensures that objects are replicated across region pairs, with a recovery point objective (RPO) of 15 minutes. For more information, see Data availability and durability.

To store data that's shared across multiple VMs in a region, such as across all the VMs in the web tier or application tier, you can use a Filestore regional instance. The data that you store in a Filestore regional instance is replicated synchronously across three zones within the region. This replication ensures high availability and robustness against zone outages. You can store shared configuration files, common tools and utilities, and centralized logs in the Filestore instance, and mount the instance on multiple VMs. For robustness against region outages, you can replicate a Filestore instance to a different region. For more information, see Instance replication.

If your database is Microsoft SQL Server, we recommend using Cloud SQL for SQL Server. In scenarios when Cloud SQL doesn't support your configuration requirements, or if you need access to the operating system, you can deploy a Microsoft SQL Server failover cluster instance (FCI). In this scenario, you can use the fully managed Google Cloud NetApp Volumes to provide continuous availability (CA) SMB storage for the database.

When you design storage for your workloads, consider the functional characteristics, resilience requirements, performance expectations, and cost goals. For more information, see Design an optimal storage strategy for your cloud workload.

Database services

The reference architecture in this document uses a third-party database that's deployed on Compute Engine VMs. Installing and managing a third-party database involves effort and cost for operations like applying updates, monitoring and ensuring availability, performing backups, and recovering from failures.

You can avoid the effort and cost of installing and managing a third-party database by using a fully managed database service like Cloud SQL, AlloyDB for PostgreSQL, Bigtable, Spanner, or Firestore. These Google Cloud database services provide uptime service-level agreements (SLAs), and they include default capabilities for scalability and observability.

If your workload needs an Oracle database, you can deploy the database on a Compute Engine VM or use Oracle Database@Google Cloud. For more information, see Oracle workloads in Google Cloud.

Network design

Choose a network design that meets your business and technical requirements. You can use a single VPC network or multiple VPC networks. For more information, see the following documentation:

Security, privacy, and compliance

This section describes factors that you should consider when you use this reference architecture to design and build a regional topology in Google Cloud that meets the security, privacy, and compliance requirements of your workloads.

Protection against external threats

To protect your application against threats like distributed-denial-of-service (DDoS) attacks and cross-site scripting (XSS), you can use Google Cloud Armor security policies. Each policy is a set of rules that specifies certain conditions that should be evaluated and actions to take when the conditions are met. For example, a rule could specify that if the source IP address of the incoming traffic matches a specific IP address or CIDR range, then the traffic must be denied. You can also apply preconfigured web application firewall (WAF) rules. For more information, see Security policy overview.

External access for VMs

In the reference architecture that this document describes, the Compute Engine VMs don't need inbound access from the internet. Don't assign external IP addresses to the VMs. Google Cloud resources that have only a private, internal IP address can still access certain Google APIs and services by using Private Service Connect or Private Google Access. For more information, see Private access options for services.

To enable secure outbound connections from Google Cloud resources that have only private IP addresses, like the Compute Engine VMs in this reference architecture, you can use Secure Web Proxy or Cloud NAT.

Service account privileges

For the Compute Engine VMs in the architecture, instead of using the default service accounts, we recommend that you create dedicated service accounts and specify the resources that the service account can access. The default service account has a broad range of permissions, including some that might not be necessary. You can tailor dedicated service accounts to have only the essential permissions. For more information, see Limit service account privileges.

SSH security

To enhance the security of SSH connections to the Compute Engine VMs in your architecture, implement Identity-Aware Proxy (IAP) and Cloud OS Login API. IAP lets you control network access based on user identity and Identity and Access Management (IAM) policies. Cloud OS Login API lets you control Linux SSH access based on user identity and IAM policies. For more information about managing network access, see Best practices for controlling SSH login access.

Network security

To control network traffic between the resources in the architecture, you must configure appropriate Cloud Next Generation Firewall (NGFW) policies.

More security considerations

When you build the architecture for your workload, consider the platform-level security best practices and recommendations that are provided in the Enterprise foundations blueprint and Google Cloud Well-Architected Framework: Security, privacy, and compliance.

Reliability

This section describes design factors that you should consider when you use this reference architecture to build and operate reliable infrastructure for your regional deployments in Google Cloud.

Infrastructure outages

In a regional architecture, if any individual component in the infrastructure stack fails, the application can process requests if at least one functioning component with adequate capacity exists in each tier. For example, if a web server instance fails, the load balancer forwards user requests to the other available web server instances. If a VM that hosts a web server or app server instance crashes, the MIG recreates the VM automatically.

If a zone outage occurs, the load balancer isn't affected, because it's a regional resource. A zone outage might affect individual Compute Engine VMs. But the application remains available and responsive because the VMs are in a regional MIG. A regional MIG ensures that new VMs are created automatically to maintain the configured minimum number of VMs. After Google resolves the zone outage, you must verify that the application runs as expected in all the zones where it's deployed.

If all the zones in this architecture have an outage or if a region outage occurs, then the application is unavailable. You must wait for Google to resolve the outage, and then verify that the application works as expected.

You can reduce the downtime caused by region outages by maintaining a passive (failover) replica of the infrastructure stack in another Google Cloud region. If an outage occurs in the primary region, you can activate the stack in the failover region and use DNS routing policies to route traffic to the load balancer in the failover region.

For applications that require robustness against region outages, consider using a multi-regional architecture. For more information, see Multi-regional deployment on Compute Engine.

MIG autoscaling

To control the autoscaling behavior of your stateless MIGs, you can specify target utilization metrics, such as average CPU utilization. You can also configure schedule-based autoscaling for stateless MIGs. Stateful MIGs can't be autoscaled. For more information, see Autoscaling groups of instances.

MIG size limit

When you decide the size of your MIGs, consider the default and maximum limits on the number of VMs that can be created in a MIG. For more information, see Add and remove VMs from a MIG.

VM autohealing

Sometimes the VMs that host your application might be running and available, but there might be issues with the application itself. The application might freeze, crash, or not have sufficient memory. To verify whether an application is responding as expected, you can configure application-based health checks as part of the autohealing policy of your MIGs. If the application on a particular VM isn't responding, the MIG autoheals (repairs) the VM. For more information about configuring autohealing, see About repairing VMs for high availability.

VM placement

In the architecture that this document describes, the application tier and web tier run on Compute Engine VMs that are distributed across multiple zones. This distribution ensures that your application is robust against zone outages.

To improve the robustness of the architecture, you can create a spread placement policy and apply it to the MIG template. When the MIG creates VMs, it places the VMs within each zone on different physical servers (called hosts), so your VMs are robust against failures of individual hosts. For more information, see Create and apply spread placement policies to VMs.

VM capacity planning

To make sure that capacity for Compute Engine VMs is available when VMs need to be provisioned, you can create reservations. A reservation provides assured capacity in a specific zone for a specified number of VMs of a machine type that you choose. A reservation can be specific to a project, or shared across multiple projects. For more information about reservations, see Choose a reservation type.

Stateful storage

A best practice in application design is to avoid the need for stateful local disks. But if the requirement exists, you can configure your persistent disks to be stateful to ensure that the data is preserved when the VMs are repaired or recreated. However, we recommend that you keep the boot disks stateless, so that you can update them to the latest images with new versions and security patches. For more information, see Configuring stateful persistent disks in MIGs.

Data durability

You can use Backup and DR to create, store, and manage backups of the Compute Engine VMs. Backup and DR stores backup data in its original, application-readable format. When required, you can restore your workloads to production by directly using data from long-term backup storage and avoid the need to prepare or move data.

Compute Engine provides the following options to help you to ensure the durability of data that's stored in Persistent Disk volumes:

You can use snapshots to capture the point-in-time state of Persistent Disk volumes. The snapshots are stored redundantly in multiple regions, with automatic checksums to ensure the integrity of your data. Snapshots are incremental by default, so they use less storage space and you save money. Snapshots are stored in a Cloud Storage location that you can configure. For more recommendations about using and managing snapshots, see Best practices for Compute Engine disk snapshots.
To ensure that data in Persistent Disk remains available if a zone outage occurs, you can use Regional Persistent Disk or Hyperdisk Balanced High Availability. Data in these disk types is replicated synchronously between two zones in the same region. For more information, see About synchronous disk replication.

If you use a managed database service like Cloud SQL, backups are taken automatically based on the retention policy that you define. You can supplement the backup strategy with additional logical backups to meet regulatory, workflow, or business requirements.

If you use a third-party database and you need to store database backups and transaction logs, you can use regional Cloud Storage buckets. Regional Cloud Storage buckets provide low-cost backup storage that's redundant across zones.

Database availability

If you use a managed database service like Cloud SQL in HA configuration, then in the event of a failure of the primary database, Cloud SQL fails over automatically to the standby database. You don't need to change the IP address for the database endpoint. If you use a self-managed third-party database that's deployed on a Compute Engine VM, then you must use an internal load balancer or other mechanism to ensure that the application can connect to another database if the primary database is unavailable.

To implement cross-zone failover for a database that's deployed on a Compute Engine VM, you need a mechanism to identify failures of the primary database and a process to fail over to the standby database. The specifics of the failover mechanism depend on the database that you use. You can set up an observer instance to detect failures of the primary database and orchestrate the failover. You must configure the failover rules appropriately to avoid a split-brain situation and prevent unnecessary failover. For example architectures that you can use to implement failover for PostgreSQL databases, see Architectures for high availability of PostgreSQL clusters on Compute Engine.

More reliability considerations

When you build the cloud architecture for your workload, review the reliability-related best practices and recommendations that are provided in the following documentation:

Cost optimization

This section provides guidance to optimize the cost of setting up and operating a regional Google Cloud topology that you build by using this reference architecture.

VM machine types

To help you optimize the resource utilization of your VM instances, Compute Engine provides machine type recommendations. Use the recommendations to choose machine types that match your workload's compute requirements. For workloads with predictable resource requirements, you can customize the machine type to your needs and save money by using custom machine types.

VM provisioning model

If your application is fault tolerant, then Spot VMs can help to reduce your Compute Engine costs for the VMs in the application and web tiers. The cost of Spot VMs is significantly lower than regular VMs. However, Compute Engine might preemptively stop or delete Spot VMs to reclaim capacity.

Spot VMs are suitable for batch jobs that can tolerate preemption and don't have high availability requirements. Spot VMs offer the same machine types, options, and performance as regular VMs. However, when the resource capacity in a zone is limited, MIGs might not be able to scale out (that is, create VMs) automatically to the specified target size until the required capacity becomes available again.

VM resource utilization

The autoscaling capability of stateless MIGs enables your application to handle increases in traffic gracefully, and it helps you to reduce cost when the need for resources is low. Stateful MIGs can't be autoscaled.

Third-party licensing

When you migrate third-party workloads to Google Cloud, you might be able to reduce cost by bringing your own licenses (BYOL). For example, to deploy Microsoft Windows Server VMs, instead of using a premium image that incurs additional cost for the third-party license, you can create and use a custom Windows BYOL image. You then pay only for the VM infrastructure that you use on Google Cloud. This strategy helps you continue to realize value from your existing investments in third-party licenses. If you decide to use the BYOL approach, then the following recommendations might help to reduce cost:

Provision the required number of compute CPU cores independently of memory by using custom machine types. By doing this, you limit the third-party licensing cost to the number of CPU cores that you need.
Reduce the number of vCPUs per core from 2 to 1 by disabling simultaneous multithreading (SMT).

If you deploy a third-party database like Microsoft SQL Server on Compute Engine VMs, then you must consider the license costs for the third-party software. When you use a managed database service like Cloud SQL, the database license costs are included in the charges for the service.

More cost considerations

When you build the architecture for your workload, also consider the general best practices and recommendations that are provided in Google Cloud Well-Architected Framework: Cost optimization.

Operational efficiency

This section describes the factors that you should consider when you use this reference architecture to design and build a regional Google Cloud topology that you can operate efficiently.

VM configuration updates

To update the configuration of the VMs in a MIG (such as the machine type or boot-disk image), you create a new instance template with the required configuration and then apply the new template to the MIG. The MIG updates the VMs by using the update method that you choose: automatic or selective. Choose an appropriate method based on your requirements for availability and operational efficiency. For more information about these MIG update methods, see Apply new VM configurations in a MIG.

VM images

For your VMs, instead of using Google-provided public images, we recommend that you create and use custom OS images that contain the configurations and software that your applications require. You can group your custom images into a custom image family. An image family always points to the most recent image in that family, so your instance templates and scripts can use that image without you having to update references to a specific image version. You must regularly update your custom images to include the security updates and patches that are provided by the OS vendor.

Deterministic instance templates

If the instance templates that you use for your MIGs include startup scripts to install third-party software, make sure that the scripts explicitly specify software-installation parameters such as the software version. Otherwise, when the MIG creates the VMs, the software that's installed on the VMs might not be consistent. For example, if your instance template includes a startup script to install Apache HTTP Server 2.0 (the apache2 package), then make sure that the script specifies the exact apache2 version that should be installed, such as version 2.4.53. For more information, see Deterministic instance templates.

More operational considerations

When you build the architecture for your workload, consider the general best practices and recommendations for operational efficiency that are described in Google Cloud Well-Architected Framework: Operational excellence.

Performance optimization

This section describes the factors that you should consider when you use this reference architecture to design and build a regional topology in Google Cloud that meets the performance requirements of your workloads.

Compute performance

Compute Engine offers a wide range of predefined and customizable machine types for the workloads that you run on VMs. Choose an appropriate machine type based on your performance requirements. For more information, see Machine families resource and comparison guide.

VM multithreading

Each virtual CPU (vCPU) that you allocate to a Compute Engine VM is implemented as a single hardware multithread. By default, two vCPUs share a physical CPU core. For applications that involve highly parallel operations or that perform floating point calculations (such as genetic sequence analysis, and financial risk modeling), you can improve performance by reducing the number of threads that run on each physical CPU core. For more information, see Set the number of threads per core.

VM multithreading might have licensing implications for some third-party software, like databases. For more information, read the licensing documentation for the third-party software.

Network Service Tiers

Network Service Tiers lets you optimize the network cost and performance of your workloads. You can choose Premium Tier or Standard Tier. Premium Tier delivers traffic on Google's global backbone to achieve minimal packet loss and low latency. Standard Tier delivers traffic using peering, internet service providers (ISP), or transit networks at an edge point of presence (PoP) that's closest to the region where your Google Cloud workload runs. To optimize performance, we recommend using Premium Tier. To optimize cost, we recommend using Standard Tier.

Network performance

For workloads that need low inter-VM network latency within the application and web tiers, you can create a compact placement policy and apply it to the MIG template that's used for those tiers. When the MIG creates VMs, it places the VMs on physical servers that are close to each other. While a compact placement policy helps improve inter-VM network performance, a spread placement policy can help improve VM availability as described earlier. To achieve an optimal balance between network performance and availability, when you create a compact placement policy, you can specify how far apart the VMs must be placed. For more information, see Placement policies overview.

Compute Engine has a per-VM limit for egress network bandwidth. This limit depends on the VM's machine type and whether traffic is routed through the same VPC network as the source VM. For VMs with certain machine types, to improve network performance, you can get a higher maximum egress bandwidth by enabling Tier_1 networking.

More performance considerations

When you build the architecture for your workload, consider the general best practices and recommendations that are provided in Google Cloud Well-Architected Framework: Performance optimization.

What's next

Learn more about the Google Cloud products used in this reference architecture:
- Cloud Load Balancing overview
- Instance groups
Get started with migrating your workloads to Google Cloud.
Explore and evaluate deployment archetypes that you can choose to build architectures for your cloud workloads.
Review architecture options for designing reliable infrastructure for your workloads in Google Cloud.
For more reference architectures, diagrams, and best practices, explore the Cloud Architecture Center.

Contributors

Authors:

Kumar Dhanagopal | Cross-Product Solution Developer
Samantha He | Technical Writer

Other contributors:

Ben Good | Solutions Architect
Carl Franklin | Director, PSO Enterprise Architecture
Daniel Lees | Cloud Security Architect
Gleb Otochkin | Cloud Advocate, Databases
Mark Schlagenhauf | Technical Writer, Networking
Pawel Wenda | Group Product Manager
Sean Derrington | Group Product Manager, Storage
Sekou Page | Outbound Product Manager
Simon Bennett | Group Product Manager
Steve McGhee | Reliability Advocate
Victor Moreno | Product Manager, Cloud Networking