Configuring horizontal Pod autoscaling


This page explains how to use horizontal Pod autoscaling to autoscale a Deployment using different types of metrics. A Deployment is a Kubernetes API object that lets you run multiple replicas of Pods that are distributed among the nodes in a cluster. You can use the same guidelines to configure a HorizontalPodAutoscaler for any scalable Deployment object.

Before you begin

Before you start, make sure you have performed the following tasks:

  • Enable the Google Kubernetes Engine API.
  • Enable Google Kubernetes Engine API
  • If you want to use the Google Cloud CLI for this task, install and then initialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running gcloud components update.

API versions for HorizontalPodAutoscaler objects

When you use the Google Cloud console, HorizontalPodAutoscaler objects are created using the autoscaling/v2 API.

When you use kubectl to create or view information about a Horizontal Pod Autoscaler, you can specify either the autoscaling/v1 API or the autoscaling/v2 API.

  • apiVersion: autoscaling/v1 is the default, and lets you autoscale based only on CPU utilization. To autoscale based on other metrics, using apiVersion: autoscaling/v2 is recommended. The example in Create the example Deployment uses apiVersion: autoscaling/v1.

  • apiVersion: autoscaling/v2 is recommended for creating new HorizontalPodAutoscaler objects. It lets you autoscale based on multiple metrics, including custom or external metrics. All other examples in this page use apiVersion: autoscaling/v2.

To check which API versions are supported, use the kubectl api-versions command.

You can specify which API to use when viewing details about a Horizontal Pod Autoscaler that uses apiVersion: autoscaling/v2.

Create the example Deployment

Before you can create a Horizontal Pod Autoscaler, you must create the workload it monitors. The examples in this page apply different Horizontal Pod Autoscaler configurations to the following nginx Deployment. Separate examples show a Horizontal Pod Autoscaler based on resource utilization, based on a custom or external metric, and based on multiple metrics.

Save the following to a file named nginx.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: default
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.7.9
        ports:
        - containerPort: 80
        resources:
          # You must specify requests for CPU to autoscale
          # based on CPU utilization
          requests:
            cpu: "250m"

This manifest specifies a value for CPU requests. If you want to autoscale based on a resource's utilization as a percentage, you must specify requests for that resource. If you don't specify requests, you can autoscale based only on the absolute value of the resource's utilization, such as milliCPUs for CPU utilization.

To create the Deployment, apply the nginx.yaml manifest:

kubectl apply -f nginx.yaml

The Deployment has spec.replicas set to 3, so three Pods are deployed. You can verify this using the kubectl get deployment nginx command.

Each of the examples in this page applies a different Horizontal Pod Autoscaler to an example nginx Deployment.

Autoscaling based on resources utilization

This example creates HorizontalPodAutoscaler object to autoscale the nginx Deployment when CPU utilization surpasses 50%, and ensures that there is always a minimum of 1 replica and a maximum of 10 replicas.

You can create a Horizontal Pod Autoscaler that targets CPU using the Google Cloud console, the kubectl apply command, or for average CPU only, the kubectl autoscale command.

Console

  1. Go to the Workloads page in the Google Cloud console.

    Go to Workloads

  2. Click the name of the nginx Deployment.

  3. Click Actions > Autoscale.

  4. Specify the following values:

    • Minimum number of replicas: 1
    • Maximum number of replicas: 10
    • Autoscaling metric: CPU
    • Target: 50
    • Unit: %
  5. Click Done.

  6. Click Autoscale.

kubectl apply

Save the following YAML manifest as a file named nginx-hpa.yaml:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  targetCPUUtilizationPercentage: 50

To create the HPA, apply the manifest using the following command:

kubectl apply -f nginx-hpa.yaml

kubectl autoscale

To create a HorizontalPodAutoscaler object that only targets average CPU utilization, you can use the kubectl autoscale command:

kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10

To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:

kubectl get hpa

The output is similar to the following:

NAME    REFERENCE          TARGETS   MINPODS   MAXPODS   REPLICAS   AGE
nginx   Deployment/nginx   0%/50%    1         10        3          61s

To get details about the Horizontal Pod Autoscaler, you can use the Google Cloud console or the kubectl command.

Console

  1. Go to the Workloads page in the Google Cloud console.

    Go to Workloads

  2. Click the name of the nginx Deployment.

  3. View the Horizontal Pod Autoscaler configuration in the Autoscaler section.

  4. View more details about autoscaling events in the Events tab.

kubectl get

To get details about the Horizontal Pod Autoscaler, you can use kubectl get hpa with the -o yaml flag. The status field contains information about the current number of replicas and any recent autoscaling events.

kubectl get hpa nginx -o yaml

The output is similar to the following:

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  annotations:
    autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ScaleDownStabilized","message":"recent
      recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ValidMetricFound","message":"the
      HPA was able to successfully calculate a replica count from cpu resource utilization
      (percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"DesiredWithinRange","message":"the
      desired count is within the acceptable range"}]'
    autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]'
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"maxReplicas":10,"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"nginx"},"targetCPUUtilizationPercentage":50}}
  creationTimestamp: "2019-10-30T19:42:43Z"
  name: nginx
  namespace: default
  resourceVersion: "220050"
  selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx
  uid: 70d1067d-fb4d-11e9-8b2a-42010a8e013f
spec:
  maxReplicas: 10
  minReplicas: 1
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  targetCPUUtilizationPercentage: 50
status:
  currentCPUUtilizationPercentage: 0
  currentReplicas: 3
  desiredReplicas: 3

Before following the remaining examples in this page, delete the HPA:

kubectl delete hpa nginx

When you delete a Horizontal Pod Autoscaler, the number of replicas of the Deployment remain the same. A Deployment does not automatically revert back to its state before the Horizontal Pod Autoscaler was applied.

You can learn more about deleting a Horizontal Pod Autoscaler.

Autoscaling based on load balancer traffic

Traffic-based autoscaling is a capability of GKE that integrates traffic utilization signals from load balancers to autoscale Pods.

Using traffic as an autoscaling signal might be helpful since traffic is a leading indicator of load that is complementary to CPU and memory. Built-in integration with GKE ensures that the setup is easy and that autoscaling reacts to traffic spikes quickly to meet demand.

Traffic-based autoscaling is enabled by the Gateway controller and its global traffic management capabilities. To learn more, see Traffic-based autoscaling.

Autoscaling based on load balancer traffic is only available for Gateway workloads.

Requirements

Traffic-based autoscaling has the following requirements:

  • Supported on GKE versions 1.31.1-gke.2008000 and later.
  • Gateway API enabled in your GKE cluster.
  • Supported for traffic that goes through load balancers deployed using the Gateway API and either the gke-l7-global-external-managed, gke-l7-regional-external-managed, gke-l7-rilb, or the gke-l7-gxlb GatewayClass.

Limitations

Traffic-based autoscaling has the following limitations:

  • Not supported by the multi-cluster GatewayClasses (gke-l7-global-external-managed-mc, gke-l7-regional-external-managed-mc, gke-l7-rilb-mc, and gke-l7-gxlb-mc).
  • Not supported for traffic using Services of type LoadBalancer.
  • There must be a clear and isolated relationship between the components involved in traffic-based autoscaling. One Horizontal Pod Autoscaler must be dedicated to scaling a single Deployment (or any scalable resource) exposed by a single Service.
  • After configuring the capacity of your Service using the maxRatePerEndpoint field, allow sufficient time (usually one minute, but potentially up to 15 minutes in large clusters) for the load balancer to be updated with this change, before configuring the Horizontal Pod Autoscaler with traffic-based metrics. This ensures your service won't temporarily experience a situation where your cluster tries to autoscale based on metrics emitted by a load balancer still undergoing configuration.
  • If traffic-based autoscaling is used on a Service served by multiple load balancers (for example -- by both an Ingress and a Gateway, or by two Gateways), the Horizontal Pod Autoscaler might consider the highest traffic value from individual load balancers to make scaling decisions, rather than the sum of traffic values from all load balancers.

Deploy traffic-based autoscaling

The following exercise uses the HorizontalPodAutoscaler to autoscale the store-autoscale Deployment based on the traffic it receives. A Gateway accepts ingress traffic from the internet for the Pods. The autoscaler compares traffic signals from the Gateway with the per-Pod traffic capacity that is configured on the store-autoscale Service resource. By generating traffic to the Gateway, you influence the number of Pods deployed.

The following diagram demonstrates how traffic-based autoscaling works:

HorizontalPodAutoscaler scaling a Deployment based on traffic.

To deploy traffic-based autoscaling, perform the following steps:

  1. For Standard clusters, confirm that the GatewayClasses are installed in your cluster. For Autopilot clusters, the GatewayClasses are installed by default.

    kubectl get gatewayclass
    

    The output confirms that the GKE GatewayClass resources are ready to use in your cluster:

    NAME                               CONTROLLER                  ACCEPTED   AGE
    gke-l7-global-external-managed     networking.gke.io/gateway   True       16h
    gke-l7-regional-external-managed   networking.gke.io/gateway   True       16h
    gke-l7-gxlb                        networking.gke.io/gateway   True       16h
    gke-l7-rilb                        networking.gke.io/gateway   True       16h
    

    If you don't see this output, enable the Gateway API in your GKE cluster.

  2. Deploy the sample application and Gateway load balancer to your cluster:

    kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-autoscale.yaml
    

    The sample application creates:

    • A Deployment with 2 replicas.
    • A Service with an associated GCPBackendPolicy setting maxRatePerEndpoint set to 10. To learn more about Gateway capabilities, see GatewayClass capabilities.
    • An external Gateway for accessing the application on the internet. To learn more about how to use Gateway load balancers, see Deploying Gateways.
    • An HTTPRoute that matches all traffic and sends it to the store-autoscale Service.

    The Service capacity is a critical element when using traffic-based autoscaling because it determines the amount of per-Pod traffic that triggers an autoscaling event. It is configured using a maxRatePerEndpoint field on a GCPBackendPolicy associated with the Service, which defines the maximum traffic a Service should receive in requests per second, per Pod. Service capacity is specific to your application.

    For more information, see Determining your Service's capacity.

  3. Save the following manifest as hpa.yaml:

    apiVersion: autoscaling/v2
    kind: HorizontalPodAutoscaler
    metadata:
      name: store-autoscale
    spec:
      scaleTargetRef:
        apiVersion: apps/v1
        kind: Deployment
        name: store-autoscale
      minReplicas: 1
      maxReplicas: 10
      metrics:
      - type: Object
        object:
          describedObject:
            kind: Service
            name: store-autoscale
          metric:
            name: "autoscaling.googleapis.com|gclb-capacity-fullness"
          target:
            averageValue: 70
            type: AverageValue
    

    This manifest describes a HorizontalPodAutoscaler with the following properties:

    • minReplicas and maxReplicas: sets the minimum and maximum number of replicas for this Deployment. In this configuration, the number of Pods can scale from 1 to 10 replicas.
    • describedObject.name: store-autoscale: the reference to the store-autoscale Service that defines the traffic capacity.
    • scaleTargetRef.name: store-autoscale: the reference to the store-autoscale Deployment that defines the resource that is scaled by the Horizontal Pod Autoscaler.
    • averageValue: 70: target average value of capacity utilization. This gives the Horizontal Pod Autoscaler a growth margin so that the running Pods can process excess traffic while new Pods are being created.

The Horizontal Pod Autoscaler results in the following traffic behavior:

  • The number of Pods is adjusted between 1 and 10 replicas to achieve 70% of the max rate per endpoint. This results in 7 RPS per Pod when maxRatePerEndpoint=10.
  • At more than 7 RPS per pod, Pods are scaled up until they've reached their maximum of 10 replicas or until the average traffic is 7 RPS per Pod.
  • If traffic is reduced, Pods scale down to a reasonable rate using the Horizontal Pod Autoscaler algorithm.

You can also deploy a traffic generator to validate traffic-based autoscaling behavior.

At 30 RPS, the Deployment is scaled to 5 replicas so that each replica ideally receives 6 RPS of traffic, which would be 60% utilization per Pod. This is under the 70% target utilization and so the Pods are scaled appropriately. Depending on traffic fluctuations, the number of autoscaled replicas might also fluctuate. For a more detailed description of how the number of replicas is computed, see Autoscaling behavior.

Autoscaling based on a custom or external metric

To create horizontal Pod autoscalers for custom metrics and external metrics, see Optimize Pod autoscaling based on metrics.

Autoscaling based on multiple metrics

This example creates a Horizontal Pod Autoscaler that autoscales based on CPU utilization and a custom metric named packets_per_second.

If you followed the previous example and still have a Horizontal Pod Autoscaler named nginx, delete it before following this example.

This example requires apiVersion: autoscaling/v2. For more information about the available APIs, see API versions for HorizontalPodAutoscaler objects.

Save this YAML manifest as a file named nginx-multiple.yaml:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 50
  - type: Resource
    resource:
      name: memory
      target:
        type: AverageValue
        averageValue: 100Mi
  # Uncomment these lines if you create the custom packets_per_second metric and
  # configure your app to export the metric.
  # - type: Pods
  #   pods:
  #     metric:
  #       name: packets_per_second
  #     target:
  #       type: AverageValue
  #       averageValue: 100

Apply the YAML manifest:

kubectl apply -f nginx-multiple.yaml

When created, the Horizontal Pod Autoscaler monitors the nginx Deployment for average CPU utilization, average memory utilization, and (if you uncommented it) the custom packets_per_second metric. The Horizontal Pod Autoscaler autoscales the Deployment based on the metric whose value would create the larger autoscale event.

Configuring Performance HPA profile

The Performance HPA profile speeds up the HPA reaction time and enables quick recalculation of up to 1000 HPA objects. You can enable it for both new and existing clusters. However, it can't be enabled for the projects enrolled in a VPC-SC perimeter.

Requirements

Performance HPA profile has the following requirements:

Enable Performance HPA profile in a new cluster

Autopilot

To create a new Autopilot cluster with Performance HPA profile enabled, use the following command:

gcloud beta container clusters create-auto CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --cluster-version=CLUSTER_VERSION \
    --hpa-profile=performance

Replace:

  • CLUSTER_NAME: The name for the cluster you're creating.
  • LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.
  • PROJECT_ID: Your Google Cloud project ID.
  • CLUSTER_VERSION: GKE version 1.31.2-gke.1138000 or later.

Standard

To create a new Standard cluster with Performance HPA profile enabled, use the following command:

gcloud beta container clusters create CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --cluster-version=CLUSTER_VERSION \
    --hpa-profile=performance

Replace:

  • CLUSTER_NAME: The name for the cluster you're creating.
  • LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.
  • PROJECT_ID: Your Google Cloud project ID.
  • CLUSTER_VERSION: GKE version 1.31.2-gke.1138000 or later.

Enable Performance HPA profile in an existing cluster

To enable the Performance HPA profile in an existing cluster, use the following command:

gcloud beta container clusters update CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --hpa-profile=performance

Replace:

  • CLUSTER_NAME: The name of the cluster.
  • LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.
  • PROJECT_ID: Your Google Cloud project ID.

Disable Performance HPA profile

To disable Performance HPA profile in a cluster, use the following command:

gcloud beta container clusters update CLUSTER_NAME \
    --location=LOCATION \
    --project=PROJECT_ID \
    --hpa-profile=none

Replace:

  • CLUSTER_NAME: The name of the cluster.
  • LOCATION: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.
  • PROJECT_ID: Your Google Cloud project ID.

Viewing details about a Horizontal Pod Autoscaler

To view a Horizontal Pod Autoscaler's configuration and statistics, use the following command:

kubectl describe hpa HPA_NAME

Replace HPA_NAME with the name of your HorizontalPodAutoscaler object.

If the Horizontal Pod Autoscaler uses apiVersion: autoscaling/v2 and is based on multiple metrics, the kubectl describe hpa command only shows the CPU metric. To see all metrics, use the following command instead:

kubectl describe hpa.v2.autoscaling HPA_NAME

Replace HPA_NAME with the name of your HorizontalPodAutoscaler object.

Each Horizontal Pod Autoscaler's current status is shown in Conditions field, and autoscaling events are listed in the Events field.

The output is similar to the following:

Name:                                                  nginx
Namespace:                                             default
Labels:                                                <none>
Annotations:                                           kubectl.kubernetes.io/last-applied-configuration:
                                                         {"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"s...
CreationTimestamp:                                     Tue, 05 May 2020 20:07:11 +0000
Reference:                                             Deployment/nginx
Metrics:                                               ( current / target )
  resource memory on pods:                             2220032 / 100Mi
  resource cpu on pods  (as a percentage of request):  0% (0) / 50%
Min replicas:                                          1
Max replicas:                                          10
Deployment pods:                                       1 current / 1 desired
Conditions:
  Type            Status  Reason              Message
  ----            ------  ------              -------
  AbleToScale     True    ReadyForNewScale    recommended size matches current size
  ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from memory resource
  ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:                                                <none>

Deleting a Horizontal Pod Autoscaler

You can delete a Horizontal Pod Autoscaler using the Google Cloud console or the kubectl delete command.

Console

To delete the nginx Horizontal Pod Autoscaler:

  1. Go to the Workloads page in the Google Cloud console.

    Go to Workloads

  2. Click the name of the nginx Deployment.

  3. Click Actions > Autoscale.

  4. Click Delete.

kubectl delete

To delete the nginx Horizontal Pod Autoscaler, use the following command:

kubectl delete hpa nginx

When you delete a Horizontal Pod Autoscaler, the Deployment or (or other deployment object) remains at its existing scale, and does not revert back to the number of replicas in the Deployment's original manifest. To manually scale the Deployment back to three Pods, you can use the kubectl scale command:

kubectl scale deployment nginx --replicas=3

Cleaning up

  1. Delete the Horizontal Pod Autoscaler, if you have not done so:

    kubectl delete hpa nginx
    
  2. Delete the nginx Deployment:

    kubectl delete deployment nginx
    
  3. Optionally, delete the cluster.

Troubleshooting

This section shows troubleshooting steps for problems using Horizontal Pod Autoscaling.

Horizontal Pod Autoscaler displays a unable to fetch pod metrics for pod error

When you set up a Horizontal Pod Autoscaler, you might see warning messages like the following:

unable to fetch pod metrics for pod

It's normal to see this message when the metrics server starts up. However, if you continue to see the warnings and you notice that Pods are not scaling for your workload, ensure you have specified resource requests for each container in your workload. To use resource utilization percentage targets with horizontal Pod autoscaling, you must configure requests for that resource for each container running in each Pod in the workload. Otherwise, the Horizontal Pod Autoscaler cannot perform the calculations it needs to, and takes no action related to that metric.

Horizontal Pod Autoscaler displays a multiple services selecting the same target of... event

A Horizontal Pod Autoscaler displays a multiple services selecting the same target of <hpa>: <services> error if it detects that you are using traffic-based autoscaling with multiple services associated with the target of the Horizontal Pod Autoscaler (typically a Deployment).

Traffic-based autoscaling only supports configurations where exactly one service is associated with the autoscaled resource, see Autoscaling based on load balancer traffic. The error message lists the services that have been found.

To resolve the issue, ensure only one service is associated with the Horizontal Pod Autoscaler.

What's next