This page explains how to use horizontal Pod autoscaling
to autoscale a Deployment using different types of metrics. A Deployment is a Kubernetes API object that lets you run multiple replicas of Pods that are distributed among the nodes in a cluster.
You can use the same
guidelines to configure a HorizontalPodAutoscaler
for any scalable Deployment object.
Before you begin
Before you start, make sure you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,
install and then
initialize the
gcloud CLI. If you previously installed the gcloud CLI, get the latest
version by running
gcloud components update
.
API versions for HorizontalPodAutoscaler
objects
When you use the Google Cloud console, HorizontalPodAutoscaler
objects are created using the
autoscaling/v2
API.
When you use kubectl
to create or view information about a Horizontal Pod Autoscaler, you can
specify either the autoscaling/v1
API or the autoscaling/v2
API.
apiVersion: autoscaling/v1
is the default, and lets you autoscale based only on CPU utilization. To autoscale based on other metrics, usingapiVersion: autoscaling/v2
is recommended. The example in Create the example Deployment usesapiVersion: autoscaling/v1
.apiVersion: autoscaling/v2
is recommended for creating newHorizontalPodAutoscaler
objects. It lets you autoscale based on multiple metrics, including custom or external metrics. All other examples in this page useapiVersion: autoscaling/v2
.
To check which API versions are supported, use the kubectl api-versions
command.
You can specify which API to use when
viewing details about a Horizontal Pod Autoscaler that uses apiVersion: autoscaling/v2
.
Create the example Deployment
Before you can create a Horizontal Pod Autoscaler, you must create the workload it monitors. The
examples in this page apply different Horizontal Pod Autoscaler configurations to the following
nginx
Deployment. Separate examples show a Horizontal Pod Autoscaler based on
resource utilization, based on a
custom or external metric,
and based on multiple metrics.
Save the following to a file named nginx.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
namespace: default
spec:
replicas: 3
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.7.9
ports:
- containerPort: 80
resources:
# You must specify requests for CPU to autoscale
# based on CPU utilization
requests:
cpu: "250m"
This manifest specifies a value for CPU requests. If you want to autoscale based on a resource's utilization as a percentage, you must specify requests for that resource. If you don't specify requests, you can autoscale based only on the absolute value of the resource's utilization, such as milliCPUs for CPU utilization.
To create the Deployment, apply the nginx.yaml
manifest:
kubectl apply -f nginx.yaml
The Deployment has spec.replicas
set to 3, so three Pods are deployed.
You can verify this using the kubectl get deployment nginx
command.
Each of the examples in this page applies a different Horizontal Pod Autoscaler to an example nginx Deployment.
Autoscaling based on resources utilization
This example creates HorizontalPodAutoscaler
object to autoscale the
nginx
Deployment when CPU utilization
surpasses 50%, and ensures that there is always a minimum of 1
replica and a maximum of 10 replicas.
You can create a Horizontal Pod Autoscaler that targets CPU using the Google Cloud console, the
kubectl apply
command, or for average CPU only, the kubectl autoscale
command.
Console
Go to the Workloads page in the Google Cloud console.
Click the name of the
nginx
Deployment.Click list Actions > Autoscale.
Specify the following values:
- Minimum number of replicas: 1
- Maximum number of replicas: 10
- Autoscaling metric: CPU
- Target: 50
- Unit: %
Click Done.
Click Autoscale.
kubectl apply
Save the following YAML manifest as a file named nginx-hpa.yaml
:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
To create the HPA, apply the manifest using the following command:
kubectl apply -f nginx-hpa.yaml
kubectl autoscale
To create a HorizontalPodAutoscaler
object that only targets average CPU utilization, you can use
the
kubectl autoscale
command:
kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10
To get a list of Horizontal Pod Autoscalers in the cluster, use the following command:
kubectl get hpa
The output is similar to the following:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
nginx Deployment/nginx 0%/50% 1 10 3 61s
To get details about the Horizontal Pod Autoscaler, you can use the Google Cloud console or the
kubectl
command.
Console
Go to the Workloads page in the Google Cloud console.
Click the name of the
nginx
Deployment.View the Horizontal Pod Autoscaler configuration in the Autoscaler section.
View more details about autoscaling events in the Events tab.
kubectl get
To get details about the Horizontal Pod Autoscaler, you can use kubectl get hpa
with the -o yaml
flag. The status
field contains information about the current number of
replicas and any recent autoscaling events.
kubectl get hpa nginx -o yaml
The output is similar to the following:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
annotations:
autoscaling.alpha.kubernetes.io/conditions: '[{"type":"AbleToScale","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ScaleDownStabilized","message":"recent
recommendations were higher than current one, applying the highest recent recommendation"},{"type":"ScalingActive","status":"True","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"ValidMetricFound","message":"the
HPA was able to successfully calculate a replica count from cpu resource utilization
(percentage of request)"},{"type":"ScalingLimited","status":"False","lastTransitionTime":"2019-10-30T19:42:59Z","reason":"DesiredWithinRange","message":"the
desired count is within the acceptable range"}]'
autoscaling.alpha.kubernetes.io/current-metrics: '[{"type":"Resource","resource":{"name":"cpu","currentAverageUtilization":0,"currentAverageValue":"0"}}]'
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"autoscaling/v1","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"spec":{"maxReplicas":10,"minReplicas":1,"scaleTargetRef":{"apiVersion":"apps/v1","kind":"Deployment","name":"nginx"},"targetCPUUtilizationPercentage":50}}
creationTimestamp: "2019-10-30T19:42:43Z"
name: nginx
namespace: default
resourceVersion: "220050"
selfLink: /apis/autoscaling/v1/namespaces/default/horizontalpodautoscalers/nginx
uid: 70d1067d-fb4d-11e9-8b2a-42010a8e013f
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
targetCPUUtilizationPercentage: 50
status:
currentCPUUtilizationPercentage: 0
currentReplicas: 3
desiredReplicas: 3
Before following the remaining examples in this page, delete the HPA:
kubectl delete hpa nginx
When you delete a Horizontal Pod Autoscaler, the number of replicas of the Deployment remain the same. A Deployment does not automatically revert back to its state before the Horizontal Pod Autoscaler was applied.
You can learn more about deleting a Horizontal Pod Autoscaler.
Autoscaling based on load balancer traffic
Traffic-based autoscaling is a capability of GKE that integrates traffic utilization signals from load balancers to autoscale Pods.
Using traffic as an autoscaling signal might be helpful since traffic is a leading indicator of load that is complementary to CPU and memory. Built-in integration with GKE ensures that the setup is easy and that autoscaling reacts to traffic spikes quickly to meet demand.
Traffic-based autoscaling is enabled by the Gateway controller and its global traffic management capabilities. To learn more, see Traffic-based autoscaling.
Autoscaling based on load balancer traffic is only available for Gateway workloads.
Requirements
Traffic-based autoscaling has the following requirements:
- Supported on GKE versions 1.31.1-gke.2008000 and later.
- Gateway API enabled in your GKE cluster.
- Supported for traffic that goes through load balancers deployed using the
Gateway API and either the
gke-l7-global-external-managed
,gke-l7-regional-external-managed
,gke-l7-rilb
, or thegke-l7-gxlb
GatewayClass.
Limitations
Traffic-based autoscaling has the following limitations:
- Not supported by the multi-cluster GatewayClasses
(
gke-l7-global-external-managed-mc
,gke-l7-regional-external-managed-mc
,gke-l7-rilb-mc
, andgke-l7-gxlb-mc
). - Not supported for traffic using Services of type
LoadBalancer
. - There must be a clear and isolated relationship between the components involved in traffic-based autoscaling. One Horizontal Pod Autoscaler must be dedicated to scaling a single Deployment (or any scalable resource) exposed by a single Service.
- After configuring the capacity of your Service using the
maxRatePerEndpoint
field, allow sufficient time (usually one minute, but potentially up to 15 minutes in large clusters) for the load balancer to be updated with this change, before configuring the Horizontal Pod Autoscaler with traffic-based metrics. This ensures your service won't temporarily experience a situation where your cluster tries to autoscale based on metrics emitted by a load balancer still undergoing configuration. - If traffic-based autoscaling is used on a Service served by multiple load balancers (for example -- by both an Ingress and a Gateway, or by two Gateways), the Horizontal Pod Autoscaler might consider the highest traffic value from individual load balancers to make scaling decisions, rather than the sum of traffic values from all load balancers.
Deploy traffic-based autoscaling
The following exercise uses the HorizontalPodAutoscaler
to autoscale the
store-autoscale
Deployment based on the traffic it receives. A
Gateway accepts ingress
traffic from the internet for the Pods. The autoscaler compares traffic signals
from the Gateway with the
per-Pod traffic capacity
that is configured on the store-autoscale
Service resource. By generating
traffic to the Gateway, you influence the number of Pods deployed.
The following diagram demonstrates how traffic-based autoscaling works:
To deploy traffic-based autoscaling, perform the following steps:
For Standard clusters, confirm that the GatewayClasses are installed in your cluster. For Autopilot clusters, the GatewayClasses are installed by default.
kubectl get gatewayclass
The output confirms that the GKE GatewayClass resources are ready to use in your cluster:
NAME CONTROLLER ACCEPTED AGE gke-l7-global-external-managed networking.gke.io/gateway True 16h gke-l7-regional-external-managed networking.gke.io/gateway True 16h gke-l7-gxlb networking.gke.io/gateway True 16h gke-l7-rilb networking.gke.io/gateway True 16h
If you don't see this output, enable the Gateway API in your GKE cluster.
Deploy the sample application and Gateway load balancer to your cluster:
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-networking-recipes/master/gateway/docs/store-autoscale.yaml
The sample application creates:
- A Deployment with 2 replicas.
- A Service with an associated
GCPBackendPolicy
settingmaxRatePerEndpoint
set to10
. To learn more about Gateway capabilities, see GatewayClass capabilities. - An external Gateway for accessing the application on the internet. To learn more about how to use Gateway load balancers, see Deploying Gateways.
- An HTTPRoute that matches all traffic and sends it to the
store-autoscale
Service.
The Service capacity is a critical element when using traffic-based autoscaling because it determines the amount of per-Pod traffic that triggers an autoscaling event. It is configured using a
maxRatePerEndpoint
field on a GCPBackendPolicy associated with the Service, which defines the maximum traffic a Service should receive in requests per second, per Pod. Service capacity is specific to your application.For more information, see Determining your Service's capacity.
Save the following manifest as
hpa.yaml
:apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: store-autoscale spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: store-autoscale minReplicas: 1 maxReplicas: 10 metrics: - type: Object object: describedObject: kind: Service name: store-autoscale metric: name: "autoscaling.googleapis.com|gclb-capacity-fullness" target: averageValue: 70 type: AverageValue
This manifest describes a
HorizontalPodAutoscaler
with the following properties:minReplicas
andmaxReplicas
: sets the minimum and maximum number of replicas for this Deployment. In this configuration, the number of Pods can scale from 1 to 10 replicas.describedObject.name: store-autoscale
: the reference to thestore-autoscale
Service that defines the traffic capacity.scaleTargetRef.name: store-autoscale
: the reference to thestore-autoscale
Deployment that defines the resource that is scaled by the Horizontal Pod Autoscaler.averageValue: 70
: target average value of capacity utilization. This gives the Horizontal Pod Autoscaler a growth margin so that the running Pods can process excess traffic while new Pods are being created.
The Horizontal Pod Autoscaler results in the following traffic behavior:
- The number of Pods is adjusted between 1 and 10 replicas to achieve
70% of the max rate per endpoint. This results in 7 RPS per Pod when
maxRatePerEndpoint=10
. - At more than 7 RPS per pod, Pods are scaled up until they've reached their maximum of 10 replicas or until the average traffic is 7 RPS per Pod.
- If traffic is reduced, Pods scale down to a reasonable rate using the Horizontal Pod Autoscaler algorithm.
You can also deploy a traffic generator to validate traffic-based autoscaling behavior.
At 30 RPS, the Deployment is scaled to 5 replicas so that each replica ideally receives 6 RPS of traffic, which would be 60% utilization per Pod. This is under the 70% target utilization and so the Pods are scaled appropriately. Depending on traffic fluctuations, the number of autoscaled replicas might also fluctuate. For a more detailed description of how the number of replicas is computed, see Autoscaling behavior.
Autoscaling based on a custom or external metric
To create horizontal Pod autoscalers for custom metrics and external metrics, see Optimize Pod autoscaling based on metrics.
Autoscaling based on multiple metrics
This example creates a Horizontal Pod Autoscaler that autoscales based on CPU utilization and a
custom metric named packets_per_second
.
If you followed the previous example and still have a Horizontal Pod Autoscaler named nginx
,
delete it before following this example.
This example requires apiVersion: autoscaling/v2
. For more information
about the available APIs, see API versions for HorizontalPodAutoscaler
objects.
Save this YAML manifest as a file named nginx-multiple.yaml
:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: nginx
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: nginx
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 100Mi
# Uncomment these lines if you create the custom packets_per_second metric and
# configure your app to export the metric.
# - type: Pods
# pods:
# metric:
# name: packets_per_second
# target:
# type: AverageValue
# averageValue: 100
Apply the YAML manifest:
kubectl apply -f nginx-multiple.yaml
When created, the Horizontal Pod Autoscaler monitors the nginx
Deployment for average CPU utilization,
average memory utilization, and (if you uncommented it) the custom
packets_per_second
metric. The Horizontal Pod Autoscaler autoscales the Deployment based on the
metric whose value would create the larger autoscale event.
Configuring Performance HPA profile
The Performance HPA profile speeds up the HPA reaction time and enables quick recalculation of up to 1000 HPA objects. You can enable it for both new and existing clusters. However, it can't be enabled for the projects enrolled in a VPC-SC perimeter.
Requirements
Performance HPA profile has the following requirements:
- Your control plane must be running GKE version 1.31.2-gke.1138000 or later.
- Ensure that the Autoscaling API is enabled in your cluster.
- Ensure that all node Service Accounts have the
roles/autoscaling.metricsWriter
role assigned. - Ensure that Cloud Monitoring is enabled in your cluster.
Enable Performance HPA profile in a new cluster
Autopilot
To create a new Autopilot cluster with Performance HPA profile enabled, use the following command:
gcloud beta container clusters create-auto CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--cluster-version=CLUSTER_VERSION \
--hpa-profile=performance
Replace:
CLUSTER_NAME
: The name for the cluster you're creating.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.CLUSTER_VERSION
: GKE version 1.31.2-gke.1138000 or later.
Standard
To create a new Standard cluster with Performance HPA profile enabled, use the following command:
gcloud beta container clusters create CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--cluster-version=CLUSTER_VERSION \
--hpa-profile=performance
Replace:
CLUSTER_NAME
: The name for the cluster you're creating.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.CLUSTER_VERSION
: GKE version 1.31.2-gke.1138000 or later.
Enable Performance HPA profile in an existing cluster
To enable the Performance HPA profile in an existing cluster, use the following command:
gcloud beta container clusters update CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--hpa-profile=performance
Replace:
CLUSTER_NAME
: The name of the cluster.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.
Disable Performance HPA profile
To disable Performance HPA profile in a cluster, use the following command:
gcloud beta container clusters update CLUSTER_NAME \
--location=LOCATION \
--project=PROJECT_ID \
--hpa-profile=none
Replace:
CLUSTER_NAME
: The name of the cluster.LOCATION
: Compute zone or region (e.g. us-central1-a or us-central1) for the cluster.PROJECT_ID
: Your Google Cloud project ID.
Viewing details about a Horizontal Pod Autoscaler
To view a Horizontal Pod Autoscaler's configuration and statistics, use the following command:
kubectl describe hpa HPA_NAME
Replace HPA_NAME
with the name of your HorizontalPodAutoscaler
object.
If the Horizontal Pod Autoscaler uses apiVersion: autoscaling/v2
and is based on multiple
metrics, the kubectl describe hpa
command only shows the CPU metric. To see
all metrics, use the following command instead:
kubectl describe hpa.v2.autoscaling HPA_NAME
Replace HPA_NAME
with the name of your HorizontalPodAutoscaler
object.
Each Horizontal Pod Autoscaler's current status is shown in Conditions
field, and autoscaling events
are listed in the Events
field.
The output is similar to the following:
Name: nginx
Namespace: default
Labels: <none>
Annotations: kubectl.kubernetes.io/last-applied-configuration:
{"apiVersion":"autoscaling/v2","kind":"HorizontalPodAutoscaler","metadata":{"annotations":{},"name":"nginx","namespace":"default"},"s...
CreationTimestamp: Tue, 05 May 2020 20:07:11 +0000
Reference: Deployment/nginx
Metrics: ( current / target )
resource memory on pods: 2220032 / 100Mi
resource cpu on pods (as a percentage of request): 0% (0) / 50%
Min replicas: 1
Max replicas: 10
Deployment pods: 1 current / 1 desired
Conditions:
Type Status Reason Message
---- ------ ------ -------
AbleToScale True ReadyForNewScale recommended size matches current size
ScalingActive True ValidMetricFound the HPA was able to successfully calculate a replica count from memory resource
ScalingLimited False DesiredWithinRange the desired count is within the acceptable range
Events: <none>
Deleting a Horizontal Pod Autoscaler
You can delete a Horizontal Pod Autoscaler using the Google Cloud console or the kubectl delete
command.
Console
To delete the nginx
Horizontal Pod Autoscaler:
Go to the Workloads page in the Google Cloud console.
Click the name of the
nginx
Deployment.Click list Actions > Autoscale.
Click Delete.
kubectl delete
To delete the nginx
Horizontal Pod Autoscaler, use the following command:
kubectl delete hpa nginx
When you delete a Horizontal Pod Autoscaler, the Deployment or (or other deployment object) remains
at its existing scale, and does not revert back to the number of replicas in
the Deployment's original manifest. To manually scale the Deployment back to
three Pods, you can use the kubectl scale
command:
kubectl scale deployment nginx --replicas=3
Cleaning up
Delete the Horizontal Pod Autoscaler, if you have not done so:
kubectl delete hpa nginx
Delete the
nginx
Deployment:kubectl delete deployment nginx
Optionally, delete the cluster.
Troubleshooting
This section shows troubleshooting steps for problems using Horizontal Pod Autoscaling.
Horizontal Pod Autoscaler displays a unable to fetch pod metrics for pod
error
When you set up a Horizontal Pod Autoscaler, you might see warning messages like the following:
unable to fetch pod metrics for pod
It's normal to see this message when the metrics server starts up. However, if you continue to see the warnings and you notice that Pods are not scaling for your workload, ensure you have specified resource requests for each container in your workload. To use resource utilization percentage targets with horizontal Pod autoscaling, you must configure requests for that resource for each container running in each Pod in the workload. Otherwise, the Horizontal Pod Autoscaler cannot perform the calculations it needs to, and takes no action related to that metric.
Horizontal Pod Autoscaler displays a multiple services selecting the same target of...
event
A Horizontal Pod Autoscaler displays a multiple services selecting the same target of <hpa>: <services>
error if it detects that you are using traffic-based autoscaling with multiple services associated
with the target of the Horizontal Pod Autoscaler (typically a Deployment).
Traffic-based autoscaling only supports configurations where exactly one service is associated with the autoscaled resource, see Autoscaling based on load balancer traffic. The error message lists the services that have been found.
To resolve the issue, ensure only one service is associated with the Horizontal Pod Autoscaler.
What's next
- Learn more about Horizontal Pod Autoscaling.
- Learn more about Vertical Pod Autoscaling.
- Learn more about Multidimensional Pod Autoscaling.
- Learn more about autoscaling Deployments with Custom Metrics.
- Learn how to Assign CPU Resources to Containers and Pods.
- Learn how to Assign Memory Resources to Containers and Pods.