Scale stateless workloads

This document describes how to scale existing stateless workloads running in a Google Distributed Cloud (GDC) air-gapped Kubernetes cluster. You must scale the pods running in your stateless workloads as your container workload requirements evolve.

This document is for developers within the application operator group who are responsible for managing application workloads for their organization. For more information, see Audiences for GDC air-gapped documentation.

Before you begin

To run commands against a Kubernetes cluster, make sure you have the following resources:

Locate the Kubernetes cluster name, or ask your Platform Administrator what the cluster name is.
Sign in and generate the kubeconfig file for the Kubernetes cluster if you don't have one.
Use the kubeconfig path of the Kubernetes cluster to replace KUBERNETES_CLUSTER_KUBECONFIG in these instructions.

To get the required permissions to scale stateless workloads, ask your Organization IAM Admin to grant you the Namespace Admin role (namespace-admin) in your project namespace.

Scale a deployment

Use the scaling functionality of Kubernetes to appropriately scale the amount of pods running in your deployment.

Autoscale the pods of a deployment

Kubernetes offers autoscaling to remove the need of manually updating your deployment when demand evolves. Complete the following steps to autoscale the pods of your deployment:

To ensure the horizontal pod autoscaler can appropriately measure the CPU percentage, set the CPU resource request on your deployment.
Set the horizontal pod autoscaler in your deployment:
```
kubectl --kubeconfig KUBERNETES_CLUSTER_KUBECONFIG \
    -n NAMESPACE \
    autoscale deployment DEPLOYMENT_NAME \
    --cpu-percent=CPU_PERCENT \
    --min=MIN_NUMBER_REPLICAS \
    --max=MAX_NUMBER_REPLICAS
```
Replace the following:
- KUBERNETES_CLUSTER_KUBECONFIG: the kubeconfig file for the cluster.
- NAMESPACE: the project namespace.
- DEPLOYMENT_NAME: the name of the deployment to autoscale.
- CPU_PERCENT: the target average CPU utilization to request, represented as a percentage, over all the pods.
- MIN_NUMBER_REPLICAS: the lower limit for the number of pods the autoscaler can provision.
- MAX_NUMBER_REPLICAS: the upper limit for the number of pods the autoscaler can provision.

Check the current status of the horizontal pod autoscaler:

kubectl get hpa

The output is similar to the following:

NAME              REFERENCE                          TARGET    MINPODS   MAXPODS   REPLICAS   AGE
DEPLOYMENT_NAME   Deployment/DEPLOYMENT_NAME/scale   0% / 50%  1         10        1          18s

Manually scale the pods of a deployment

If you prefer to manually scale a deployment, run:

kubectl --kubeconfig KUBERNETES_CLUSTER_KUBECONFIG \
    -n NAMESPACE \
    scale deployment DEPLOYMENT_NAME \
    --replicas NUMBER_OF_REPLICAS

Replace the following:

KUBERNETES_CLUSTER_KUBECONFIG: the kubeconfig file for the cluster.
NAMESPACE: the project namespace.
DEPLOYMENT_NAME: the name of the deployment in which to autoscale.
DEPLOYMENT_NAME: the desired number of replicated Pod objects in the deployment.