Advanced Load Balancing on GKE Clusters

This page shows you how to configure advanced load balancing on GKE clusters for managed Cloud Service Mesh (TD) users using the Kubernetes API. For the corresponding user guide to configure advanced load balancing using the Google Cloud API, see Set up advanced load balancing.

You can use advanced load balancing to:

  • Keep traffic to a service zonal until local capacity is exhausted.
  • Send traffic to the service in a "primary" location with failover to a secondary location when sufficient endpoints in the primary location become unhealthy.
  • Control when failover happens (based on percentage of healthy hosts).

Limitations

  • General limitations around using advanced load balancing on Google Cloud apply.
  • This feature is only available to managed Cloud Service Mesh users using Traffic Director as control plane and requires data plane version 1.19.10-asm.22 or later.
  • Not all the fields in GCPTrafficDistributionPolicy and GCPBackendPolicy are supported for managedCloud Service Mesh (TD). The supported fields are as follows:
    • GCPTrafficDistributionPolicy
      • ServiceLbAlgorithm
      • AutoCapacityDrain
      • FailoverConfig
    • GCPBackendPolicy
      • MaxRatePerEndpoint
      • BackendPreference
  • Advanced load balancing can only be applied to Kubernetes services backed by workloads running on Google Cloud. External services or workloads (like ServiceEntry) are not supported.
  • The load balancing policies can only be applied to individual Kubernetes services. Namespace-wide/mesh-wide load balancing policies are not supported.
  • Only QPS capacity is supported.
  • Only GKE versions >= 1.31.1 are supported.
  • Service mesh advanced load balancing policies must only be applied to services that only serves mesh traffic. It must not be applied to services serving as GKE Gateway backends. The traffic behaviors are undefined when an advanced load balancing traffic target a Kubernetes service serving both mesh traffic and traffic from a GKE Gateway.

Configure advanced load balancing

You can use the following custom resources to configure advanced load balancing on GKE. You can find the detailed resource definition in the gke-gateway-api repo.

GCPTrafficDistributionPolicy

GCPTrafficDistributionPolicy configures the service-level load balancing policy for Kubernetes services. It lets you:

If multiple GCPTrafficDistributionPolicies target the same service, the oldest policy will be enforced.

GCPBackendPolicy

GCPBackendPolicy configures service backends properties which impact load balancing behavior, including:

If multiple GCPBackendPolicies target the same service in a cluster, the oldest policy will be enforced.

Policy Status

Both GCPTrafficDistributionPolicy and GCPBackendPolicy have a status field indicating the attachment status of the policy.

For example, running kubectl describe gcpbackendpolicies example-policy -n example would yield an output similar to:

...
Status:
  Ancestors:
    Ancestor Ref:
      Group:
      Kind:       Service
      Name:       example-svc
      Namespace:  example
    Conditions:
      Last Transition Time:  2024-10-13T01:15:03Z
      Message:
      Observed Generation:   1
      Reason:                Attached
      Status:                True
      Type:                  Attached
    Controller Name:         gsmconfig.gke.io/controller

Preliminary setup

Before you can complete this guide you must provision Cloud Service Mesh on a GKE cluster.

  1. Check the CRDs are installed:

    kubectl get crd
    

    The output is similar to:

    ...
    gcptrafficdistributionpolicies.networking.gke.io   2024-07-18T21:50:12Z
    gcpbackendpolicies.networking.gke.io               2024-07-18T21:50:12Z
    ...
    
  2. Install the GCPBackendPolicy CRD if it's not already installed:

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/refs/heads/main/config/crd/networking.gke.io_gcpbackendpolicies.yaml
    
  3. Install the GCPTrafficDistributionPolicy CRD if it's not already installed:

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-gateway-api/refs/heads/main/config/crd/networking.gke.io_gcptrafficdistributionpolicies.yaml
    

The example policies in this user guide target Service foo in namespace foo for demonstration purposes. You can run the following command to create the test service and namespace, or if you prefer, you can use your own service and namespace:

kubectl apply -f - <<EOF
kind: Namespace
apiVersion: v1
metadata:
  name: foo
  labels:
    istio-injection: enabled
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: foo
  namespace: foo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: test-backend
  template:
    metadata:
      labels:
        app: test-backend
    spec:
      containers:
      - name: whereami
        image: gcr.io/google-samples/whereami:v1.2.23
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: foo
  namespace: foo
spec:
  selector:
    app: test-backend
  ports:
  - port: 8080
    targetPort: 8080
EOF

Configuring the load balancing algorithm

By default, traffic to a service is evenly distributed to every single healthy service backend in a Cloud Service Mesh service mesh. You can create the following GCPTrafficDistributionPolicy so that traffic is distributed to the closest zone up to the backend capacity:

kubectl apply -f - <<EOF
apiVersion: networking.gke.io/v1
kind: GCPTrafficDistributionPolicy
metadata:
  name: lb-policy
  namespace: foo
spec:
  targetRefs:
  - kind: Service
    group: ""
    name: foo-service
  default:
    serviceLbAlgorithm: WATERFALL_BY_ZONE
EOF

By default, service backends are treated as if they have infinite capacity. When there are sufficient healthy hosts in the local/closest zone, traffic will never be distributed outside of the local/closest zone for a certain client locality. You can optionally configure the capacity of your service backend using GCPBackendPolicy so that a single zone doesn't get overloaded.

kubectl apply -f - <<EOF
apiVersion: networking.gke.io/v1
kind: GCPBackendPolicy
metadata:
  name: backend-policy
  namespace: foo
spec:
  targetRef:
    kind: Service
    group: ""
    name: foo-backend
  default:
    maxRatePerEndpoint: 5
EOF

Tuning failover behavior

By default, failover won't be triggered as long as a sufficient percentage of hosts are healthy in the primary backends. For more information on primary backends and other terminology, see the Advanced load balancing overview. GCPTrafficDistributionPolicy lets you configure the healthy host percentage threshold until traffic is shifted away from primary backends to failover backends. Failover is triggered sooner with a larger threshold. For example, if you want failover to be triggered as soon as the percentage of healthy hosts drop below 90% in the primary backends, you can configure the following GCPTrafficDistributionPolicy:

kubectl apply -f - <<EOF
apiVersion: networking.gke.io/v1
kind: GCPTrafficDistributionPolicy
metadata:
  name: lb-policy
  namespace: foo
spec:
  targetRefs:
  - kind: Service
    group: ""
    name: foo-service
  default:
   failoverConfig:
     failoverHealthThreshold: 90
EOF

Configuring advanced load balancing in a multi-cluster service mesh

GCPTrafficDistributionPolicy and GCPBackendPolicy are applied at different scope in a multi-cluster service mesh.

When a GCPTrafficDistributionPolicy targets a multi-cluster service, it defines the service level load balancing behavior across all clusters. Only one GCPTrafficDistributionPolicy needs to be created for a particular multi-cluster service. If you are using Istio API to configure your service mesh, you can create GCPTrafficDistributionPolicy in any cluster in the fleet. You can check if a policy conflicts with another policy by inspecting its policy status.

When a GCPBackendPolicy targets a multi-cluster service, it defines backend level settings (for example, per-pod capacity) for the backend pods selected by its targeting servicein its local cluster. For the same multi-cluster service, it's possible to define different backend level settings in different clusters.

In the following example, a GCPTrafficDistributionPolicy is created in cluster A to define the load balancing algorithm to be used across the fleet while GCPBackendPolicies are in each cluster. Both GCPBackendPolicy configures a capacity of 10qps per pod for the backend pods in their local cluster while the GCPBackendPolicy in cluster A configures the backend pods in cluster A as preferred backend.

Together, these policies configures the load balancing behaviors for in-mesh traffic sent to Service foo:

  • Traffic from anywhere prefers backends in cluster A until backend pods in cluster A need to handle 10 qps per pod.
    • This behavior is mainly defined by the GCPBackendPolicy that set backendPreference to PREFERRED in cluster A.
  • Traffic exceeding the configured capacity of the backends in cluster A is routed using algorithm WATERFALL_BY_ZONE to cluster B. For a more detailed explanation of preferred backends, see Advanced load balancing overview.
    • This behavior is mainly defined by the GCPTrafficDistributionPolicy, which defines the algorithm, in cluster A and GCPBackendPolicy, which define the backend capacity, in both cluster A and B.

Advanced load balancing multi-cluster service mesh

In Istio, regular Kubernetes services implicitly become "multi-cluster" when there are multi clusters in the service mesh and the service is created across cluster boundaries.. While the following GCPTrafficDistributionPolicy targets the regular Kubernetes service foo, it applies to the multi-cluster service foo consisting of corresponding workloads in two clusters.

  1. Create the GCPTrafficDistributionPolicy for cluster A:

    kubectl apply --context cluster-a-context -f - <<EOF
    kind: GCPTrafficDistributionPolicy
    apiVersion: networking.gke.io/v1
    metadata:
    name: foo-traffic-distribution-policy
    namespace: foo
    spec:
      targetRefs:
      - kind: Service
        group: ""
        name: foo-service
      default:
        serviceLbAlgorithm: WATERFALL_BY_ZONE
    
    EOF
    
  2. Create the GCPBackendPolicy for cluster A:

    kubectl apply --context cluster-a-context -f - <<EOF
    kind: GCPBackendPolicy
    apiVersion: networking.gke.io/v1
    metadata:
    name: foo-backend-policy
    namespace: foo
    spec:
      default:
        maxRatePerEndpoint: 100
        backendPreference: PREFERRED
      targetRef:
        group: ""
        kind: Service
        name: foo-service
    EOF
    
  3. Create the GCPBackendPolicy for cluster B:

    kubectl apply --context cluster-b-context -f - <<EOF
    kind: GCPBackendPolicy
    apiVersion: networking.gke.io/v1
    metadata:
    name: foo-backend-policy
    namespace: foo
    spec:
      default:
        maxRatePerEndpoint: 10
      targetRef:
        group: ""
        kind: Service
        name: foo-service
    EOF
    

What's Next