横向 Pod 自动扩缩 (HPA)

本文档介绍了如何为 Google Cloud Managed Service for Prometheus 启用横向 Pod 自动扩缩 (HPA)。您可以通过执行以下任一操作来启用 HPA：

使用 KEDA (Kubernetes 事件驱动型自动扩缩)，这是一种已从 Cloud Native Computing Foundation 毕业的开源解决方案。
使用 Google Cloud开发和支持的自定义指标 Stackdriver 适配器库。
使用第三方 Prometheus 适配器库。

您不能在同一集群中同时使用 Stackdriver 适配器和 Prometheus 适配器，因为它们的资源定义重叠，如问题排查中所述。我们建议只为 HPA 选择一种解决方案。

使用 KEDA

KEDA（Kubernetes 事件驱动型自动扩缩）是最近发布的使用 Prometheus 指标的自动扩缩器，并且正成为 Prometheus 社区中备受青睐的解决方案。

如需开始使用，请参阅有关与 Google Cloud Managed Service for Prometheus 集成的 KEDA 文档。

使用自定义指标 Stackdriver 适配器

从适配器的 v0.13.1 版开始，自定义指标 Stackdriver 适配器支持从 Managed Service for Prometheus 查询指标。

如需使用自定义指标 Stackdriver 适配器设置示例 HPA 配置，请执行以下操作：

在集群中设置代管式收集。

在集群中安装自定义指标 Stackdriver 适配器。

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml

部署示例 Prometheus 指标导出器和 HPA 资源：
```
kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/examples/prometheus-to-sd/custom-metrics-prometheus-sd.yaml
```
此命令会部署一个导出器应用以发出指标 foo 和 HPA 资源。HPA 会将此应用纵向扩容到 5 个副本，以实现指标 foo 的目标值。

如果您使用 Workload Identity Federation for GKE，则还必须向用于运行适配器的服务账号授予“Monitoring Viewer”角色。如果您的 Kubernetes 集群未启用 Workload Identity Federation for GKE，请跳过此步骤。

export PROJECT_NUMBER=$(gcloud projects describe PROJECT_ID --format 'get(projectNumber)')
gcloud projects add-iam-policy-binding projects/PROJECT_ID \
  --role roles/monitoring.viewer \
  --member=principal://iam.googleapis.com/projects/$PROJECT_NUMBER/locations/global/workloadIdentityPools/PROJECT_ID.svc.id.goog/subject/ns/custom-metrics/sa/custom-metrics-stackdriver-adapter

将以下配置放在名为 podmonitoring.yaml 的文件中，以定义 PodMonitoring 资源。

apiVersion: monitoring.googleapis.com/v1
kind: PodMonitoring
metadata:
  name: prom-example
spec:
  selector:
    matchLabels:
      run: custom-metric-prometheus-sd
  endpoints:
  - port: 8080
    interval: 30s

部署新的 PodMonitoring 资源：
```
kubectl -n default apply -f podmonitoring.yaml
```
几分钟后，Managed Service for Prometheus 会处理从导出器中抓取的指标，并使用长名称将其存储在 Cloud Monitoring 中。Prometheus 指标使用以下约定存储：
- 前缀 prometheus.googleapis.com。
- 此后缀通常是 gauge、counter、summary 或 histogram 之一，但无类型指标可能具有 unknown 或 unknown:counter 后缀。如需验证后缀，请使用 Metrics Explorer 在 Cloud Monitoring 中查找指标。
更新已部署的 HPA 以从 Cloud Monitoring 查询指标。指标 foo 提取为 prometheus.googleapis.com/foo/gauge。如需使已部署的 HorizontalPodAutoscaler 资源可查询指标，请在已部署的 HPA 中使用长名称，但必须将所有正斜杠 (/) 替换为竖线字符 (|) 以修改该值：prometheus.googleapis.com|foo|gauge。如需了解详情，请参阅自定义指标 Stackdriver 适配器代码库的 Stackdriver 中提供的指标部分。
1. 运行以下命令以更新已部署的 HPA：
```
kubectl edit hpa custom-metric-prometheus-sd
```
2. 将 pods.metric.name 字段的值从 foo 更改为 prometheus.googleapis.com|foo|gauge。spec 部分应如以下示例所示：
```
spec:
   maxReplicas: 5
   metrics:
   - pods:
       metric:
         name: prometheus.googleapis.com|foo|gauge
       target:
         averageValue: "20"
         type: AverageValue
     type: Pods
   minReplicas: 1
```
在此示例中，HPA 配置查找指标 prometheus.googleapis.com/foo/gauge 的平均值为 20。由于 Deployment 将指标的值设置为 40，HPA 控制器会将 pod 的数量增加到 maxReplicas (5) 字段的值，以尝试将该指标在所有 pod 上的平均值减小到 20。

HPA 查询的范围限定为安装了 HPA 资源的命名空间和集群，因此其他集群和命名空间中的相同指标不会影响您的自动扩缩。

如需观察工作负载扩容，请运行以下命令：

kubectl get hpa custom-metric-prometheus-sd --watch

REPLICAS 字段的值从 1 更改为 5。

NAME                          REFERENCE                                TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
custom-metric-prometheus-sd   Deployment/custom-metric-prometheus-sd   40/20          1         5         5          *

如需缩减部署，请将目标指标值更新为高于导出的指标值。在此示例中，Deployment 将 prometheus.googleapis.com/foo/gauge 指标的值设置为 40。如果将目标值设置为大于 40 的数字，则部署将缩减。

例如，使用 kubectl edit 将 HPA 配置中 pods.target.averageValue 字段的值从 20 更改为 100。
```
kubectl edit hpa custom-metric-prometheus-sd
```
修改规范部分以匹配以下内容：
```
spec:
  maxReplicas: 5
  metrics:
  - pods:
      metric:
        name: prometheus.googleapis.com|foo|gauge
      target:
        averageValue: "100"
        type: AverageValue
  type: Pods
  minReplicas: 1
```

如需观察工作负载缩减，请运行以下命令：

kubectl get hpa custom-metric-prometheus-sd --watch

REPLICAS 字段的值从 5 更改为 1。从设计上来说，此情况的发生速度比扩容 pod 数时慢：

NAME                          REFERENCE                                TARGETS        MINPODS   MAXPODS   REPLICAS   AGE
custom-metric-prometheus-sd   Deployment/custom-metric-prometheus-sd   40/100          1         5         1          *

如需清理已部署的示例，请运行以下命令：

kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter_new_resource_model.yaml
kubectl delete -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/examples/prometheus-to-sd/custom-metrics-prometheus-sd.yaml
kubectl delete podmonitoring/prom-example

如需了解详情，请参阅自定义指标 Stackdriver 适配器代码库中的 Prometheus 示例，或参阅扩缩应用。

使用 Prometheus 适配器

现有的 prometheus-adapter 配置可用于自动扩缩，只需要进行少量更改即可。与使用上游 Prometheus 进行扩缩相比，将 prometheus-aptapter 配置为使用 Managed Service for Prometheus 进行扩缩还有两个额外的限制：

与使用 Prometheus API 或界面查询 Managed Service for Prometheus 时一样，必须通过 Prometheus 前端界面代理路由查询。对于 prometheus-aptapter，您需要修改 prometheus-adapter 部署以更改 prometheus-url 值，如下所示：
```
--prometheus-url=http://frontend.NAMESPACE_NAME.svc:9090/
```
其中，NAMESPACE_NAME 是部署前端的命名空间。
您不能在规则配置的 .seriesQuery 字段中对指标名称使用正则表达式匹配器。您必须完全指定指标名称。

由于与 Prometheus 上游相比，让数据在 Managed Service for Prometheus 中可供使用的时间略长，因此配置过于激增的自动扩缩逻辑可能会导致不希望的行为。虽然无法保证数据新鲜度，但数据在发送到 Managed Service for Prometheus 后，通常在 3-7 秒内可用于查询，不包括任何网络延迟时间。

prometheus-adapter 发出的所有查询都是全局性的。这意味着，如果两个命名空间中的应用发出具有相同名称的指标，则使用该指标的 HPA 配置会使用这两个应用中的数据进行扩缩。我们建议您始终在 PromQL 中使用 namespace 或 cluster 过滤条件，以避免使用错误的数据进行扩缩。

如需使用 prometheus-adapter 和代管式集合来设置示例 HPA 配置，请按以下步骤操作：

在集群中设置代管式收集。
在集群中部署 Prometheus 前端界面代理。如果您使用 Workload Identity Federation for GKE，则还必须配置服务账号并向其授权。
在 prometheus-engine 代码库内的 examples/hpa/ 目录中部署清单：
- example-app.yaml：发出指标的部署和服务示例。
- pod-monitoring.yaml：用于配置示例指标的资源。
- hpa.yaml：配置工作负载的 HPA 资源。
确保已在集群中安装 prometheus-adapter。为此，您可以将示例安装清单部署到集群。此清单已配置为：
- 查询部署在 default 命名空间中的前端代理。
- 发出 PromQL 以计算并返回示例部署中的 http_requests_per_second 指标。
注意：http_requests_per_second 指标在系统针对示例应用生成负载之前不可用。
注意：建议您在端口 6443 上安装内部防火墙规则（从控制平面到节点）。

在单独的终端会话中运行以下命令：

针对 prometheus-example-app 服务生成 HTTP 负载：

kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://prometheus-example-app; done"

观察横向 pod 自动扩缩器：

kubectl get hpa prometheus-example-app --watch

观察工作负载的纵向扩容：

kubectl get po -lapp.kubernetes.io/name=prometheus-example-app --watch

使用 Ctrl+C 停止生成 HTTP 负载，并观察工作负载缩减情况。

问题排查

自定义指标 Stackdriver 适配器使用与 Prometheus 适配器 prometheus-adapter 中同名的资源定义。名称重叠意味着在同一集群中运行多个适配器会导致错误。

如果在之前已安装了自定义指标 Stackdriver 适配器的集群中安装 Prometheus 适配器，那么系统可能会因名称冲突而抛出 FailedGetObjectMetric 之类的错误。为解决此问题，您可能需要删除之前由自定义指标适配器注册的 v1beta1.external.metrics.k8s.io、v1beta1.custom.metrics.k8s.io 和 v1beta2.custom.metrics.k8s.io apiservice。

问题排查提示：

某些 Cloud Monitoring 系统指标（例如 Pub/Sub 指标）会延迟 60 秒或更长时间。当 Prometheus 适配器使用当前时间戳执行查询时，使用 Prometheus 适配器查询这些指标可能会错误地导致无数据。如需查询延迟指标，请使用 PromQL 中的 offset 修饰符将查询的时间偏移值更改为必要的量。
如需确认前端界面代理是否按预期工作以及权限没有问题，请在终端运行以下命令：
```
kubectl -n NAMESPACE_NAME port-forward svc/frontend 9090
```
接下来，打开另一个终端并运行以下命令：
```
curl --silent 'localhost:9090/api/v1/series?match%5B%5D=up'
```
当前端界面代理正常运行时，第二个终端中的响应类似于以下内容：
```
curl --silent 'localhost:9090/api/v1/series?match%5B%5D=up' | jq .
{
  "status": "success",
  "data": [
     ...
  ]
}
```
如果您收到 403 错误，则表示前端界面代理未正确配置。如需了解如何解决 403 错误，请参阅进行服务账号配置和授权指南。

如要验证自定义指标 apiserver 是否可用，请运行以下命令：

kubectl get apiservices.apiregistration.k8s.io v1beta1.custom.metrics.k8s.io

当 apiserver 可用时，响应类似于以下内容：

$ kubectl get apiservices.apiregistration.k8s.io v1beta1.custom.metrics.k8s.io
NAME                            SERVICE                         AVAILABLE   AGE
v1beta1.custom.metrics.k8s.io   monitoring/prometheus-adapter   True        33m

如需验证 HPA 是否按预期工作，请运行以下命令：

$ kubectl describe hpa prometheus-example-app
Name:                                  prometheus-example-app
Namespace:                             default
Labels:                                
Annotations:                           
Reference:                             Deployment/prometheus-example-app
Metrics:                               ( current / target )
"http_requests_per_second" on pods:  11500m / 10
Min replicas:                          1
Max replicas:                          10
Deployment pods:                       2 current / 2 desired
Conditions:
Type            Status  Reason              Message
----            ------  ------              -------
AbleToScale     True    ReadyForNewScale    recommended size matches current size
ScalingActive   True    ValidMetricFound    the HPA was able to successfully calculate a replica count from pods metric http_requests_per_second
ScalingLimited  False   DesiredWithinRange  the desired count is within the acceptable range
Events:
Type     Reason               Age                   From                       Message
----     ------               ----                  ----                       -------
Normal   SuccessfulRescale    47s                   horizontal-pod-autoscaler  New size: 2; reason: pods metric http_requests_per_second above target

如果响应包含 FailedGetPodsMetric 等语句，则说明 HPA 失败。下面说明了 HPA 失败时对 describe 调用的响应：

$ kubectl describe hpa prometheus-example-app
Name:                                  prometheus-example-app
Namespace:                             default
Reference:                             Deployment/prometheus-example-app
Metrics:                               ( current / target )
  "http_requests_per_second" on pods:   / 10
Min replicas:                          1
Max replicas:                          10
Deployment pods:                       1 current / 1 desired
Conditions:
  Type            Status  Reason               Message
  ----            ------  ------               -------
  AbleToScale     True    ReadyForNewScale     recommended size matches current size
  ScalingActive   False   FailedGetPodsMetric  the HPA was unable to compute the replica count: unable to get metric http_requests_per_second: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests_per_second for pods
  ScalingLimited  False   DesiredWithinRange   the desired count is within the acceptable range
Events:
  Type     Reason               Age                   From                       Message
  ----     ------               ----                  ----                       -------
  Warning  FailedGetPodsMetric  104s (x11 over 16m)   horizontal-pod-autoscaler  unable to get metric http_requests_per_second: unable to fetch metrics from custom metrics API: the server could not find the metric http_requests_per_second for pods

当 HPA 失败时，请确保使用 load-generator 生成指标。您可以使用以下命令直接检查自定义指标 API：

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/" | jq .

成功的输出应如下所示：

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/" | jq .
  {
  "kind": "APIResourceList",
  "apiVersion": "v1",
  "groupVersion": "custom.metrics.k8s.io/v1beta1",
  "resources": [
     {
        "name": "namespaces/http_requests_per_second",
        "singularName": "",
        "namespaced": false,
        "kind": "MetricValueList",
        "verbs": [
        "get"
        ]
     },
     {
        "name": "pods/http_requests_per_second",
        "singularName": "",
        "namespaced": true,
        "kind": "MetricValueList",
        "verbs": [
        "get"
        ]
     }
  ]
  }

如果没有指标，输出中的 "resources" 下将没有数据，例如：

kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/" | jq .
{
"kind": "APIResourceList",
"apiVersion": "v1",
"groupVersion": "custom.metrics.k8s.io/v1beta1",
"resources": []
}