Multi-zone deployment

Google Distributed Cloud (GDC) air-gapped provides deployment capabilities to ensure high availability and disaster recovery. This functionality is referred to as Multi-zone on this page.

Multi-zone lets you run disconnected, mission critical workloads on GDC by delivering hyperscale, cloud-like high availability (HA) and disaster recovery (DR) capabilities as-a-service. GDC provides managed and infrastructure services that are resilient to local failures. Multi-zone is only enabled for a subset of services, and can be leveraged by using the provided APIs as stated in the corresponding documentation for those services.

The following resources can be managed on the new global API that spans across all zones in GDC air-gapped 1.13:

GDC Multi-zone provides global resource management capabilities to simplify managing resources across GDC zones. Multi-zone provides a comprehensive view of select GDC resources and services that are supported for it across zones.

Multi-zone provides you with the continuity of control plane services, which helps with your disaster recovery goals. This means that in case of a zonal disaster, critical functionality required to recover an organization and its associated services will already be present at the secondary zone.

Minimum requirements

Multi-zone requires the following:

  • GDC air-gapped hardware 3.0 or later
  • GDC air-gapped software 1.13.0 or later
  • Multi-path network connectivity
  • Hardware and software symmetry for each zone, including one org admin cluster per zone

What is a zone?

Each zone is an independent disaster domain. It is a full implementation of GDC air-gapped – a hardware and software solution that does not require connectivity to Google Cloud at any time. A zone manages infrastructure, services, APIs, and tooling that use a local control plane.

A GDC air-gapped zone is composed of four layers:

  • Hardware: The underlying hardware and rack design defined by Google.

  • Infrastructure: Manages the hardware, and provides abstractions which allows the software layers to run without reference to hardware-specific configurations.

  • Service Platform: A framework for building services on Distributed Cloud that provides consistency among managed services and marketplace services.

  • Managed and Marketplace Services: Customer-facing cloud services running on Distributed Cloud.

A group of connected air-gapped zones is considered a deployment. To deploy fault-tolerant applications with high availability that help protect against unexpected failures, you must deploy your applications across multiple zones in a deployment.

What is a region?

A region is a grouping of zones in a deployment within our defined latency requirements. A zone with no peers close enough is considered its own region. Zones in a region should be separated by at least 10 km to ensure they are separate disaster domains.

Regions can be hundreds of kilometers apart and support asynchronous replication between regions. For example, asynchronous replication includes:

  • Asynchronous PostgreSQL database transactions
  • Asynchronous block storage
  • Asynchronous object storage

What is a deployment?

Zones with direct network connectivity, regardless of distance or latency, belong to a deployment. You're limited to a maximum of six zones per deployment. Zones are considered a single disaster domain within a deployment.

Each deployment can consist of multiple zones organized into regions that are interconnected. For example, two regions in the US State of Virginia and Amsterdam, Netherlands, respectively, each with three zones:

  • GDC Region 1 (Virginia)

    • Zone 1 (us-virginia1-a)
    • Zone 2 (us-virginia1-b)
    • Zone 3 (us-virginia1-c)
  • GDC Region 2 (Netherlands)

    • Zone 1 (eu-ams1-a)
    • Zone 2 (eu-ams1-b)
    • Zone 3 (eu-ams1-c)

The following diagram shows an example GDC deployment.

A deployment consists of zones that are grouped across regions.

A deployment can have 1-6 zones, and can have one or two operation centers.

Deployments offer the following automated recovery strategies, regardless of region configuration:

  • For deployments with two zones, recovery must be triggered manually.
  • For deployments with three or more zones, recovery can be triggered automatically.

Reach out to your operator for more information.

Zonal resources

Zonal resources operate within a single zone. Zonal outages can affect some or all of the resources in that zone. An example of a zonal resource is a virtual machine (VM) instance that resides within a specific zone.

Global resources

Global resources are resources that are redundantly deployed across zones within and across the regions of a deployment, such as organizations. This gives them higher availability relative to zonal resources.

For every organization, there is a global API and a zonal API.

Disaster domains

A disaster domain represents a collection of buildings that might be impacted at the same time due to physical proximity of the buildings. Thus, it is a durability-related construct used to simplify the requirements for zone separation. Usually, a single disaster domain corresponds to a single campus and is often referred to as a failure domain. Typically to be considered separate disaster domains, locations must be at least 10 km apart.

In some GDC air-gapped deployments, Google does not own the facilities, but rather works with colocation vendors who have data centers which provide access to robust infrastructure, redundant power, and high-speed connectivity. This approach ensures optimal performance and uptime for applications and services based on Google's strategy and best practices for HA and DR.

Global and zonal APIs

GDC air-gapped offers two levels of management plane APIs to create and manage both global and zonal resources: global APIs and zonal APIs.

Both global and zonal APIs are Kubernetes declarative APIs served at different endpoints, and GDC resources are represented as Kubernetes custom resources in the API servers. The global API servers share a single etcd cluster distributed across zones to provide strong consistency with fault tolerance, at the cost of higher latency and reduced write queries per second QPS compared to the zonal API servers. In every organization, a zonal management API server provides the zonal API for administrators and developers to manage zonal resources, and a global management API server provides the global API to manage multi-zone resources.

For more information on APIs in GDC, see the APIs overview.

Organizations

An organization defines a security boundary that encloses infrastructure resources to be administered together. All organizations in GDC air-gapped 1.13 and later running on Hardware 3.0 provide both a global API and zonal API to allow for the creation of both global and zonal resources within the organization. At this time, project and service resources that reside in your organization's zones must be updated in each zone independently.

When creating a global organization, the operator can configure your zones and zonal settings, such as storage and workload servers.

Contact your operator for more information.

Networking

Multi-zone provides Anycast networking services to serve your managed zones from multiple locations around the world for high availability. Likewise, Data Center Interconnection (DCI) options are implemented as a full mesh to interconnect multiple GDC air-gapped zones over diverse geographic locations. This enables GDC to deliver multi-zone disaster protection with site diversity while accommodating the requirement for complete disconnection from any Google infrastructure.

Anycast services are represented by unique /32 IPv4 prefixes, which are provided using Border Gateway Protocol (BGP) to customer facilities, ensuring reachability from any connected location. While each Anycast service is accessible from all zones within the GDC air-gapped network, the actual endpoint to which traffic is directed depends on factors such as proximity and zone preference based on your custom routing policy.

Traffic delivery is optimized by routing it to the nearest available service instance, always within the same zone as the customer connection. This not only reduces latency but also enhances the overall performance and responsiveness of the service. For example, if an Anycast service is deployed across zone 1, zone 2, and zone 3, a customer request originating from zone 2 would typically be routed to the service instance within zone 2, as it is the closest and, therefore, most efficient option.

While the Anycast range is globally accessible, it is only provided to customers from the specific zones where the service is actively deployed. This means a service deployed in zone 1 would only be available to customers connected to zone 1 and not to those connected to other zones.

Contact your operator for more information.

Latency requirements

To ensure that you can plan GDC multi-zone deployments that will support synchronous services when they're released, we provide maximum supported latency for zones within a region. This approach lets you confidently choose GDC air-gapped locations knowing whether those zones will be in the same region and, therefore, support synchronous services when available.

Multi-zone latency requirements.

The maximum supported latency is <1ms round-trip time (RTT) at the physical layer between any two zones in a region. Because calculating latency at the physical layer requires specialized equipment not available in most instances, this can be approximated by measuring the fiber length between two zones. As long as the primary and secondary path fiber length does not exceed 100 km when using modern network hardware, the maximum physical RTT latency should not exceed 1ms even in a failure scenario. For example, in a three zone ring topology where each zone has one connection to its adjacent zone only, the primary path fiber distance cannot exceed 50 km, therefore making the secondary path less than 100 km.