Patterns for scalable and resilient apps

Last reviewed 2024-03-19 UTC

This document introduces some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises. A well-designed app scales up and down as demand increases and decreases, and is resilient enough to withstand service disruptions. Building and operating apps that meet these requirements requires careful planning and design.

Scalability: Adjusting capacity to meet demand

Scalability is the measure of a system's ability to handle varying amounts of work by adding or removing resources from the system. For example, a scalable web app is one that works well with one user or many users, and that gracefully handles peaks and dips in traffic.

The flexibility to adjust the resources consumed by an app is a key business driver for moving to the cloud. With proper design, you can reduce costs by removing under-utilized resources without compromising performance or user experience. You can similarly maintain a good user experience during periods of high traffic by adding more resources. In this way, your app can consume only the resources necessary to meet demand.

Google Cloud provides products and features to help you build scalable, efficient apps:

  • Compute Engine virtual machines and Google Kubernetes Engine (GKE) clusters integrate with autoscalers that let you grow or shrink resource consumption based on metrics that you define.
  • Google Cloud's serverless platform provides managed compute, database, and other services that scale quickly from zero to high request volumes, and you pay only for what you use.
  • Database products like BigQuery, Spanner, and Bigtable can deliver consistent performance across massive data sizes.
  • Cloud Monitoring provides metrics across your apps and infrastructure, helping you make data-driven scaling decisions.

Resilience: Designing to withstand failures

A resilient app is one that continues to function despite failures of system components. Resilience requires planning at all levels of your architecture. It influences how you lay out your infrastructure and network and how you design your app and data storage. Resilience also extends to people and culture.

Building and operating resilient apps is hard. This is especially true for distributed apps, which might contain multiple layers of infrastructure, networks, and services. Mistakes and outages happen, and improving the resilience of your app is an ongoing journey. With careful planning, you can improve the ability of your app to withstand failures. With proper processes and organizational culture, you can also learn from failures to further increase your app's resilience.

Google Cloud provides tools and services to help you build highly available and resilient apps:

  • Google Cloud services are available in regions and zones across the globe, enabling you to deploy your app to best meet your availability goals.
  • Compute Engine instance groups and GKE clusters can be distributed and managed across the available zones in a region.
  • Compute Engine regional persistent disks are synchronously replicated across zones in a region.
  • Google Cloud provides a range of load-balancing options to manage your app traffic, including global load balancing that can direct traffic to a healthy region closest to your users.
  • Google Cloud's serverless platform includes managed compute and database products that offer built-in redundancy and load balancing.
  • Google Cloud supports CI/CD through native tools and integrations with popular open source technologies, to help automate building and deploying your apps.
  • Cloud Monitoring provides metrics across your apps and infrastructure, helping you make data-driven decisions about the performance and health of your apps.

Drivers and constraints

There are varying requirements and motivations for improving the scalability and resilience of your app. There might also be constraints that limit your ability to meet your scalability and resilience goals. The relative importance of these requirements and constraints varies depending on the type of app, the profile of your users, and the scale and maturity of your organization.


To help prioritize your requirements, consider the drivers from the different parts of your organization.

Business drivers

Common drivers from the business side include the following:

  • Optimize costs and resource consumption.
  • Minimize app downtime.
  • Ensure that user demand can be met during periods of high usage.
  • Improve quality and availability of service.
  • Ensure that user experience and trust are maintained during any outages.
  • Increase flexibility and agility to handle changing market demands.

Development drivers

Common drivers from the development side include the following:

  • Minimize time spent investigating failures.
  • Increase time spent on developing new features.
  • Minimize repetitive toil through automation.
  • Build apps using the latest industry patterns and practices.

Operations drivers

Requirements to consider from the operations side include the following:

  • Reduce the frequency of failures requiring human intervention.
  • Increase the ability to automatically recover from failures.
  • Minimize repetitive toil through automation.
  • Minimize the impact from the failure of any particular component.


Constraints might limit your ability to increase the scalability and resilience of your app. Ensure that your design decisions don't introduce or contribute to these constraints:

  • Dependencies on hardware or software that are difficult to scale.
  • Dependencies on hardware or software that are difficult to operate in a high-availability configuration.
  • Dependencies between apps.
  • Licensing restrictions.
  • Lack of skills or experience in your development and operations teams.
  • Organizational resistance to automation.

Patterns and practices

The remainder of this document defines patterns and practices to help you build resilient and scalable apps. These patterns touch all parts of your app lifecycle, including your infrastructure design, app architecture, storage choices, deployment processes, and organizational culture.

Three themes are evident in the patterns:

  • Automation. Building scalable and resilient apps requires automation. Automating your infrastructure provisioning, testing, and app deployments increases consistency and speed, and minimizes human error.
  • Loose coupling. Treating your system as a collection of loosely coupled, independent components allows flexibility and resilience. Independence covers how you physically distribute your resources and how you architect your app and design your storage.
  • Data-driven design. Collecting metrics to understand the behavior of your app is critical. Decisions about when to scale your app, or whether a particular service is unhealthy, need to be based on data. Metrics and logs should be core features.

Automate your infrastructure provisioning

Create immutable infrastructure through automation to improve the consistency of your environments and increase the success of your deployments.

Treat your infrastructure as code

Infrastructure as code (IaC) is a technique that encourages you to treat your infrastructure provisioning and configuration in the same way you handle application code. Your provisioning and configuration logic is stored in source control so that it's discoverable and can be versioned and audited. Because it's in a code repository, you can take advantage of continuous integration and continuous deployment (CI/CD) pipelines, so that any changes to your configuration can be automatically tested and deployed.

By removing manual steps from your infrastructure provisioning, IaC minimizes human error and improves the consistency and reproducibility of your apps and environments. In this way, adopting IaC increases the resilience of your apps.

Cloud Deployment Manager lets you automate the creation and management of Google Cloud resources with flexible templates. Alternatively, Config Connector lets you manage your resources using Kubernetes techniques and workflows. Google Cloud also has built-in support for popular third-party IaC tools, including Terraform, Chef, and Puppet.

Create immutable infrastructure

Immutable infrastructure is a philosophy that builds on the benefits of infrastructure as code. Immutable infrastructure mandates that resources never be modified after they're deployed. If a virtual machine, Kubernetes cluster, or firewall rule needs to be updated, you can update the configuration for the resource in the source repository. After you've tested and validated the changes, you fully redeploy the resource using the new configuration. In other words, rather than tweaking resources, you re-create them.

Creating immutable infrastructure leads to more predictable deployments and rollbacks. It also mitigates issues that are common in mutable infrastructures, like configuration drift and snowflake servers. In this way, adopting immutable infrastructure further improves the consistency and reliability of your environments.

Design for high availability

Availability is a measure of the fraction of time that a service is usable. Availability is often used as a key indicator of overall service health. Highly available architectures aim to maximize service availability, typically through redundantly deploying components. In simplest terms, achieving high availability typically involves distributing compute resources, load balancing, and replicating data.

Physically distribute resources

Google Cloud services are available in locations across the globe. These locations are divided into regions and zones. How you deploy your app across these regions and zones affects the availability, latency, and other properties of your app. For more information, see best practices for Compute Engine region selection.

Redundancy is the duplication of components of a system in order to increase the overall availability of that system. In Google Cloud, redundancy is typically achieved by deploying your app or service to multiple zones, or even in multiple regions. If a service exists in multiple zones or regions, it can better withstand service disruptions in a particular zone or region. Although Google Cloud makes every effort to prevent such disruptions, certain events are unpredictable and it's best to be prepared.

With Compute Engine managed instance groups, you can distribute virtual machine instances across multiple zones in a region, and you can manage the instances as a logical unit. Google Cloud also offers regional persistent disks to automatically replicate your data to two zones in a region.

You can similarly improve the availability and resilience of your apps deployed on GKE by creating regional clusters. A regional cluster distributes GKE control plane components, nodes, and pods across multiple zones within a region. Because your control plane components are distributed, you can continue to access the cluster's control plane even during an outage involving one or more (but not all) zones.

Favor managed services

Rather than independently installing, supporting, and operating all parts of your application stack, you can use managed services to consume parts of your application stack as services. For example, rather than installing and managing a MySQL database on virtual machines (VMs), you can instead use a MySQL database provided by Cloud SQL. You then get an availability Service Level Agreement (SLA) and can rely on Google Cloud to manage data replication, backups, and the underlying infrastructure. By using managed services, you can spend less time managing infrastructure, and more time on improving the reliability of your app.

Many of Google Cloud's managed compute, database, and storage services offer built-in redundancy, which can help you meet your availability goals. Many of these services offer a regional model, which means the infrastructure that runs your app is located in a specific region and is managed by Google to be redundantly available across all the zones within that region. If a zone becomes unavailable, your app or data automatically serves from another zone in the region.

Certain database and storage services also offer multi-regional availability, which means that the infrastructure that runs your app is located in several regions. Multi-regional services can withstand the loss of an entire region, but typically at the cost of higher latency.

Load-balance at each tier

Load balancing lets you distribute traffic among groups of resources. When you distribute traffic, you help ensure that individual resources don't become overloaded while others sit idle. Most load balancers also provide health-checking features to help ensure that traffic isn't routed to unhealthy or unavailable resources.

Google Cloud offers several load-balancing choices. If your app runs on Compute Engine or GKE, you can choose the most appropriate type of load balancer depending on the type, source, and other aspects of the traffic. For more information, see the load-balancing overview and GKE networking overview.

Alternatively, some Google Cloud-managed services, such as App Engine and Cloud Run, automatically load-balance traffic.

It's common practice to load-balance requests received from external sources, such as from web or mobile clients. However, using load balancers between different services or tiers within your app can also increase resilience and flexibility. Google Cloud provides internal layer 4 and layer 7 load balancing for this purpose.

The following diagram shows an external load balancer distributing global traffic across two regions, us-central1 and asia-east1. It also shows internal load balancing distributing traffic from the web tier to the internal tier within each region.

Distributing global traffic across regions.

Monitor your infrastructure and apps

Before you can decide how to improve the resilience and scalability of your app, you need to understand its behavior. Having access to a comprehensive set of relevant metrics and time series about the performance and health of your app can help you discover potential issues before they cause an outage. They can also help you diagnose and resolve an outage if it does occur. The monitoring distributed systems chapter in the Google SRE book provides a good overview of some approaches to monitoring.

In addition to providing insight into the health of your app, metrics can also be used to control autoscaling behavior for your services.

Cloud Monitoring is Google Cloud's integrated monitoring tool. Cloud Monitoring ingests events, metrics, and metadata, and provides insights through dashboards and alerts. Most Google Cloud services automatically send metrics to Cloud Monitoring, and Google Cloud also supports many third-party sources. Cloud Monitoring can also be used as a backend for popular open source monitoring tools, providing a "single pane of glass" with which to observe your app.

Monitor at all levels

Gathering metrics at various levels or tiers within your architecture provides a holistic picture of your app's health and behavior.

Infrastructure monitoring

Infrastructure-level monitoring provides the baseline health and performance for your app. This approach to monitoring captures information like CPU load, memory usage, and the number of bytes written to disk. These metrics can indicate that a machine is overloaded or is not functioning as expected.

In addition to the metrics collected automatically, Cloud Monitoring provides an agent that can be installed to collect more detailed information from Compute Engine VMs, including from third-party apps running on those machines.

App monitoring

We recommend that you capture app-level metrics. For example, you might want to measure how long it takes to execute a particular query, or how long it takes to perform a related sequence of service calls. You define these app-level metrics yourself. They capture information that the built-in Cloud Monitoring metrics cannot. App-level metrics can capture aggregated conditions that more closely reflect key workflows, and they can reveal problems that low-level infrastructure metrics do not.

We also recommend using OpenTelemetry to capture your app-level metrics. OpenTelemetry provides a single open standard for telemetry data. Use OpenTelemetry to collect and export data from your cloud first applications and infrastructure. You can then monitor and analyze the exported telemetry data.

Service monitoring

For distributed and microservices-driven apps, it's important to monitor the interactions between the different services and components in your apps. These metrics can help you diagnose problems like increased numbers of errors or latency between services.

Istio is an open source tool that provides insights and operational control over your network of microservices. Istio generates detailed telemetry for all service communications, and it can be configured to send the metrics to Cloud Monitoring.

End-to-end monitoring

End-to-end monitoring, also called black-box monitoring, tests externally visible behavior the way a user sees it. This type of monitoring checks whether a user is able to complete critical actions within your defined thresholds. This coarse-grained monitoring can uncover errors or latency that finer-grained monitoring might not, and it reveals availability as perceived by the user.

Expose the health of your apps

A highly available system must have some way of determining which parts of the system are healthy and functioning correctly. If certain resources appear unhealthy, the system can send requests elsewhere. Typically health checks involve pulling data from an endpoint to determine the status or health of a service.

Health checking is a key responsibility of load balancers. When you create a load balancer that is associated with a group of virtual machine instances, you also define a health check. The health check defines how the load balancer communicates with the virtual machines to evaluate whether particular instances should continue to receive traffic. Load-balancer health checks can also be used to autoheal groups of instances such that unhealthy machines are re-created. If you are running on GKE and load-balancing external traffic through an ingress resource, GKE automatically creates appropriate health checks for the load balancer.

Kubernetes has built-in support for liveness and readiness probes. These probes help the Kubernetes orchestrator decide how to manage pods and requests within your cluster. If your app is deployed on Kubernetes, it's a good idea to expose the health of your app to these probes through appropriate endpoints.

Establish key metrics

Monitoring and health checking provide you with metrics on the behavior and status of your app. The next step is to analyze those metrics to determine which are the most descriptive or impactful. The key metrics vary, depending on the platform that the app is deployed on, and on the work that the app is doing.

You're not likely to find just one metric that indicates whether to scale your app, or that a particular service is unhealthy. Often it's a combination of factors that together indicate a certain set of conditions. With Cloud Monitoring, you can create custom metrics to help capture these conditions. The Google SRE book advocates four golden signals for monitoring a user-facing system: latency, traffic, errors, and saturation.

Also consider your tolerance for outliers. Using an average or median value to measure health or performance might not be the best choice, because these measures can hide wide imbalances. It's therefore important to consider the metric distribution; the 99th percentile might be a more informative measure than the average.

Define service level objectives (SLOs)

You can use the metrics that are collected by your monitoring system to define service level objectives (SLOs). SLOs specify a target level of performance or reliability for your service. SLOs are a key pillar of SRE practices and are described in detail in the service level objectives chapter in the SRE book, and also in the implementing SLOs chapter in the SRE workbook.

You can use service monitoring to define SLOs based on the metrics in Cloud Monitoring. You can create alerting policies on SLOs to let you know whether you are in danger of violating an SLO.

Store the metrics

Metrics from your monitoring system are useful in the short term to help with real-time health checks or to investigate recent problems. Cloud Monitoring retains your metrics for several weeks to best meet those use cases.

However, there is also value in storing your monitoring metrics for longer-term analysis. Having access to a historical record can help you adopt a data-driven approach to refining your app architecture. You can use data collected during and after an outage to identify bottlenecks and interdependencies in your apps. You can also use the data to help create and validate meaningful tests.

Historical data can also help validate that your app is supporting business goals during key periods. For example, the data can help you analyze how your app scaled during high-traffic promotional events over the course of the last few quarters or even years.

For details on how to export and store your metrics, see the Cloud Monitoring metric export solution.

Determine scaling profile

You want your app to meet its user experience and performance goals without