Cloud Service Mesh and Traffic Director are now Cloud Service Mesh. For more information, see the Cloud Service Mesh overview.

Configure Google Cloud Armor rate limiting with Envoy

This page shows you how to configure global server-side rate limiting for your service mesh by using Google Cloud Armor. You can use this feature to apply fairshare rate limiting to all traffic arriving to your service, helping you to fairly share the available capacity of your services and mitigate the risk of malicious or misbehaving clients from overloading your services. For more information about rate limiting, read the rate limiting overview.

Configure Google Kubernetes Engine (GKE) for Envoy

Before you begin

Before you begin, you must enable the following APIs:

container.googleapis.com
compute.googleapis.com
trafficdirector.googleapis.com
networkservices.googleapis.com
meshconfig.googleapis.com
monitoring.googleapis.com

You can enable all of the APIs by using the following Google Cloud CLI command:

gcloud services enable \
    container.googleapis.com \
    compute.googleapis.com \
    trafficdirector.googleapis.com \
    networkservices.googleapis.com \
    meshconfig.googleapis.com \
    monitoring.googleapis.com

Then, create the environment variables that are used in this document:

export PROJECT_ID=PROJECT_ID
export PROJECT_NUMBER="$(gcloud projects describe "${PROJECT_ID}" --format="value(projectNumber)")"
export CLUSTER=CLUSTER
export ZONE=ZONE
export MESH_NAME=MESH_NAME
export MESH_URI=projects/${PROJECT_NUMBER}/locations/global/meshes/${MESH_NAME}

Replace the following variables with information from your project:

Replace PROJECT_ID with your project ID
Replace ZONE with the zone in which you intend to create your GKE cluster
Replace CLUSTER with the name of the cluster
Replace MESH_NAME with the name of the mesh

Create a GKE cluster

Use the following command to create a GKE cluster in the zone that you specified in the previous section:

 gcloud container clusters create "CLUSTER" \
     --zone="ZONE" \
     --scopes="cloud-platform" \
     --tags="allow-envoy-health-checks" \
     --enable-ip-alias

Get the credentials of your new cluster:

 gcloud container clusters get-credentials "CLUSTER" \
     --zone="ZONE"

Enable automatic injection

Use the following command to apply the MutatingWebhookConfiguration resource to your cluster. When a Pod is created, the in-cluster admission controller is invoked, and it instructs the managed sidecar injector to add the Envoy container to the Pod.

cat <<EOF | kubectl apply -f -
apiVersion: admissionregistration.k8s.io/v1
kind: MutatingWebhookConfiguration
metadata:
 labels:
   app: sidecar-injector
 name: td-mutating-webhook
webhooks:
- admissionReviewVersions:
  - v1beta1
  - v1
  clientConfig:
    url: https://meshconfig.googleapis.com/v1internal/projects/PROJECT_ID/locations/ZONE/clusters/CLUSTER/channels/rapid/targets/${MESH_URI}:tdInject
  failurePolicy: Fail
  matchPolicy: Exact
  name: namespace.sidecar-injector.csm.io
  namespaceSelector:
    matchExpressions:
    - key: td-injection
      operator: Exists
  reinvocationPolicy: Never
  rules:
  - apiGroups:
    - ""
    apiVersions:
    - v1
    operations:
    - CREATE
    resources:
    - pods
    scope: '*'
  sideEffects: None
  timeoutSeconds: 30
EOF

Enable sidecar injection for the default namespace. The sidecar injector injects sidecar containers for pods created under the default namespace.
```
kubectl label namespace default td-injection=enabled
```

Save the following GKE config for your service as service_sample.yaml.

apiVersion: v1
kind: Service
metadata:
 name: service-test
 annotations:
   cloud.google.com/neg: '{"exposed_ports":{"80":{"name": "rate-limit-demo-neg"}}}'
spec:
 ports:
 - port: 80
   name: service-test
   targetPort: 8000
 selector:
   run: app1
 type: ClusterIP

---

apiVersion: apps/v1
kind: Deployment
metadata:
 name: app1
 labels:
   run: app1
spec:
 replicas: 1
 selector:
   matchLabels:
     run: app1
 template:
   metadata:
     labels:
       run: app1
     annotations:
       cloud.google.com/proxyMetadata: '{"app": "rate-limit-demo"}'
       cloud.google.com/includeInboundPorts: "8000"
       cloud.google.com/sidecarProxyVersion: "1.34.1-gke.1"
   spec:
     containers:
     - image: mendhak/http-https-echo:37
       name: app1
       ports:
       - containerPort: 8000
       env:
       - name: VALIDATION_NONCE
         value: "http"
       - name: HTTP_PORT
         value: "8000"
     securityContext:
       fsGroup: 1337

Apply the service sample that you created in the previous step:
```
kubectl apply -f service_sample.yaml
```

Save the following GKE config for your client as client_sample.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
 labels:
   run: client
 name: load-generator
spec:
 replicas: 1
 selector:
   matchLabels:
     run: client
 template:
   metadata:
     labels:
       run: client
   spec:
     containers:
     - name: load-generator
       image: envoyproxy/nighthawk-dev
       command: ["/bin/sh", "-c"]
       args: ["echo 'Nighthawk client pod is running' && sleep infinity"]
       resources:
         requests:
           cpu: 200m
           memory: 256Mi
         limits:
           cpu: 1
           memory: 512Mi
     securityContext:
       fsGroup: 1337

Apply the client sample that you created in the previous step:
```
kubectl apply -f client_sample.yaml
```

Set up Cloud Service Mesh for rate limiting

Use the steps in this section to prepare Cloud Service Mesh for rate limiting.

Create the Mesh resource specification and save it in a file called mesh.yaml:
```
name: MESH_NAME
interceptionPort: 15001
```

Create the Mesh resource using the mesh.yaml specification.

  gcloud network-services meshes import "MESH_NAME" \
      --source=mesh.yaml \
      --location=global

Create a health check:

  gcloud compute health-checks create http rate-limit-demo-hc \
      --use-serving-port

Create a firewall rule to allow incoming health check connections to instances in your network.

  gcloud compute firewall-rules create rate-limit-demo-fw-allow-hc \
      --action ALLOW \
      --direction INGRESS \
      --source-ranges 35.191.0.0/16,130.211.0.0/22 \
      --target-tags allow-envoy-health-checks \
      --rules tcp

Create a global backend service with a load balancing scheme of INTERNAL_SELF_MANAGED and add the health check.

  gcloud compute backend-services create rate-limit-demo-service \
      --global \
      --health-checks rate-limit-demo-hc \
      --load-balancing-scheme INTERNAL_SELF_MANAGED

Add the NEG rate-limit-demo-neg to the backend service.

  gcloud compute backend-services add-backend rate-limit-demo-service \
      --global \
      --network-endpoint-group rate-limit-demo-neg \
      --network-endpoint-group-zone "ZONE" \
      --balancing-mode RATE \
      --max-rate-per-endpoint 5

Create the HTTPRoute specification and save it to a file called http_route.yaml:

name: rate-limit-demo-http-route
hostnames:
- service-test
- service-test:80
meshes:
- projects/PROJECT_ID/locations/global/meshes/MESH_NAME
rules:
- action:
   destinations:
   - serviceName: "projects/PROJECT_ID/locations/global/backendServices/rate-limit-demo-service"

Create the HTTPRoute resource using the specification in the http_route.yaml file.

  gcloud network-services http-routes import rate-limit-demo-http-route \
      --source=http_route.yaml \
      --location=global

Configure rate limiting with Envoy

The following sections explain how to configure server-side rate limiting for your service mesh. The first section shows you how to set up one server-side global rate limit for all clients, and the second section shows you how to enforce different rate limits for different groups of clients.

Configure server-side global rate limiting

In this example, you create one server-side rate limiting rule that enforces rate limiting on all clients.

In a YAML file called rate-limit-policy.yaml, create a Google Cloud Armor security policy with the type CLOUD_ARMOR_INTERNAL_SERVICE.

name: "rate-limit-policy"
type: CLOUD_ARMOR_INTERNAL_SERVICE
rules:
- priority: 2147483647
  match:
    config:
      srcIpRanges: ["*"]
    versionedExpr: SRC_IPS_V1
  action: "fairshare"
  rateLimitOptions:
    rateLimitThreshold:
      count: 10000
      intervalSec: 60
    exceedAction: "deny(429)"
    conformAction: "allow"
    enforceOnKey: "ALL"

Create the security policy called rate-limit-policy:

  gcloud beta compute security-policies create rate-limit-policy \
      --global \
      --file-name=rate-limit-policy.yaml

In a YAML file, create an endpoint policy that references the security policy that you created in the previous step. In these examples, this file is called endpoints-policies.yaml.

name: "rate-limit-ep"
endpointMatcher:
 metadataLabelMatcher:
   metadataLabelMatchCriteria: MATCH_ALL
   metadataLabels:
   - labelName: app
     labelValue: rate-limit-demo
type: SIDECAR_PROXY
securityPolicy: projects/PROJECT_ID/locations/global/securityPolicies/rate-limit-policy

Create an endpoint policy named rate-limit-ep:

  gcloud beta network-services endpoint-policies import rate-limit-ep \
      --source=endpoints-policies.yaml \
      --location=global

Configure different server-side rate limits for different groups of clients

In this example, you create different server-side rate limiting rules that enforce different rate limiting thresholds for groups of clients.

Create a Google Cloud Armor security policy with the type CLOUD_ARMOR_INTERNAL_SERVICE with multiple rate limiting rules, like the one defined in the following file. In these examples, this file is called per-client-security-policy.yaml.

name: "per-client-security-policy"
type: CLOUD_ARMOR_INTERNAL_SERVICE
rules:
- priority: 0
  match:
    expr:
      expression: "request.headers['user'] == 'demo'"
  action: "fairshare"
  rateLimitOptions:
    rateLimitThreshold:
      count: 1000
      intervalSec: 60
    exceedAction: "deny(429)"
    conformAction: "allow"
    enforceOnKey: "ALL"
- priority: 2147483647
  match:
    config:
      srcIpRanges: ["*"]
    versionedExpr: SRC_IPS_V1
  action: "fairshare"
  rateLimitOptions:
    rateLimitThreshold:
      count: 10000
      intervalSec: 60
    exceedAction: "deny(429)"
    conformAction: "allow"
    enforceOnKey: "ALL"

This policy applies rate limiting to requests containing an HTTP header with the name user and value demo if Cloud Service Mesh receives more than 1,000 such requests within a 60 second window. Requests that don't have this HTTP header are instead rate limited if Cloud Service Mesh receives more than 10,000 such requests within a 60 second window.

Use the following command to create the policy, which is called per-client-security-policy:

  gcloud beta compute security-policies create per-client-security-policy \
      --global \
      --file-name=per-client-security-policy.yaml

Create an endpoint policy that references the security policy that you created in the previous step, like the one defined in the following file. In this example, this file is called per-client-endpoints-policies.yaml.

name: "rate-limit-ep"
endpointMatcher:
 metadataLabelMatcher:
   metadataLabelMatchCriteria: MATCH_ALL
   metadataLabels:
   - labelName: app
     labelValue: rate-limit-demo
type: SIDECAR_PROXY
securityPolicy: projects/PROJECT_ID/locations/global/securityPolicies/per-client-security-policy

Use the following command to create an endpoint policy named rate-limit-ep:

  gcloud beta network-services endpoint-policies import rate-limit-ep \
      --source=per-client-endpoints-policies.yaml \
      --location=global

Validate your setup

You can use the Nighthawk load testing tool to generate traffic to test whether your rate limiting rules are performing as you expect. Use the following command to generate traffic with Nighthawk:

kubectl exec -it deploy/load-generator -c load-generator -- \
    nighthawk_client http://service-test \
    --open-loop --no-default-failure-predicates \
    --rps 60 \
    --duration 360 \
    --connections 10 \
    --protocol http1 \
    --request-header user:demo

Next, use the following command to enable Envoy debug logs:

kubectl exec -it deploy/app1 -c app1 -- wget -q -O - \
    --post-data="" 'http://localhost:15000/logging?level=debug'

To view the usage reports Envoy sends to the management server, see Accessing your logs.

You can expect to see the following from your test results:

It takes roughly five minutes before rate limiting takes effect.
After the initial warm up period, you see about 15-21 QPS from the Nighthawk client output benchmark.http_2xx counter. This means that Google Cloud Armor allows about 1,000 requests per minute.

To view the effectiveness of your Google Cloud Armor security policy rules, see Viewing the monitoring dashboard.

Disable rate limiting

You can disable rate limiting by using either of the following methods:

You can delete the endpoint policies and security policies that you configured with your rate limiting rules.
You can detach the security policy from your endpoint policy by updating your endpoint policy to remove the securityPolicies field.

The following sections tell you how to disable rate limiting using each method.

Delete an endpoint policy and a security policy

First, use the following gcloud command to delete your endpoint policy named rate-limit-ep. If you used the name provided in the first or second example on this page, the endpoint policy is named endpoints-policies or per-client-endpoints-policies respectively.

gcloud beta network-services endpoint-policies delete --location=global rate-limit-ep

Then, use the following gcloud command to delete a security policy, replacing per-client-security-policy with the name of your security policy. If you used the name provided in the first or second example on this page, then your security policy has the same name as your endpoint policy.

gcloud beta compute security-policies delete --global per-client-security-policy

Detach a security policy from your endpoint policy

First, update your endpoint-policy.yaml file to remove the securityPolcies field:

name: "rate-limit-ep"
endpointMatcher:
  metadataLabelMatcher:
    metadataLabelMatchCriteria: MATCH_ALL
    metadataLabels:
    - labelName: app
      labelValue: rate-limit-demo
type: SIDECAR_PROXY

Then, use the following command to update the endpoint policy named rate-limit-ep with the changes to the endpoint-policy.yaml file:

gcloud beta network-services endpoint-policies import rate-limit-ep \
    --source=endpoints-policies.yaml \
    --location=global