在 GKE 上大规模运行全栈工作负载

Autopilot

本教程介绍如何在 Google Kubernetes Engine (GKE) 中大规模运行由高可用性关系型数据库提供支持的 Web 应用。

本教程中使用的示例应用是 Bank of Anthos，这是一个基于 HTTP 的 Web 应用，用于模拟银行的付款处理网络。Bank of Anthos 使用多种服务来运行。本教程重点介绍网站前端以及支持 Bank of Anthos 服务的关系型 PostgreSQL 数据库。如需详细了解 Bank of Anthos，包括其架构及其部署的服务，请参阅 GitHub 上的 Anthos。

目标

创建和配置 GKE 集群。
部署示例 Web 应用和高可用性 PostgreSQL 数据库。
配置 Web 应用和数据库的自动扩缩。
使用负载生成器模拟流量高峰。
观察服务如何扩容和缩容。

费用

在本文档中，您将使用 Google Cloud的以下收费组件：

如需根据您的预计使用量来估算费用，请使用价格计算器。

新 Google Cloud 用户可能有资格申请免费试用。

完成本文档中描述的任务后，您可以通过删除所创建的资源来避免继续计费。如需了解详情，请参阅清理。

准备工作

Sign in to your Google Cloud account. If you're new to Google Cloud, create an account to evaluate how our products perform in real-world scenarios. New customers also get $300 in free credits to run, test, and deploy workloads.

Install the Google Cloud CLI.

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

Install the Google Cloud CLI.

如果您使用的是外部身份提供方 (IdP)，则必须先使用联合身份登录 gcloud CLI。

如需初始化 gcloud CLI，请运行以下命令：

gcloud init

Create or select a Google Cloud project.

Roles required to select or create a project

Select a project: Selecting a project doesn't require a specific IAM role—you can select any project that you've been granted a role on.
Create a project: To create a project, you need the Project Creator (roles/resourcemanager.projectCreator), which contains the resourcemanager.projects.create permission. Learn how to grant roles.

Create a Google Cloud project:
```
gcloud projects create PROJECT_ID
```
Replace PROJECT_ID with a name for the Google Cloud project you are creating.
Select the Google Cloud project that you created:
```
gcloud config set project PROJECT_ID
```
Replace PROJECT_ID with your Google Cloud project name.

Verify that billing is enabled for your Google Cloud project.

Enable the GKE API:

Roles required to enable APIs

To enable APIs, you need the Service Usage Admin IAM role (roles/serviceusage.serviceUsageAdmin), which contains the serviceusage.services.enable permission. Learn how to grant roles.

gcloud services enable container.googleapis.com

安装 Helm CLI。

准备环境

克隆本教程中使用的示例代码库：

git clone https://github.com/GoogleCloudPlatform/bank-of-anthos.git
cd bank-of-anthos/

设置环境变量：

PROJECT_ID=PROJECT_ID
GSA_NAME=bank-of-anthos
GSA_EMAIL=bank-of-anthos@${PROJECT_ID}.iam.gserviceaccount.com
KSA_NAME=default

将 PROJECT_ID 替换为您的 Google Cloud 项目 ID。

设置集群和服务账号

创建集群：

gcloud container clusters create-auto bank-of-anthos --location=us-central1

集群最多可能需要 5 分钟才能启动。

创建 IAM 服务账号

gcloud iam service-accounts create bank-of-anthos

撤消对 IAM 服务账号的访问权限

gcloud projects add-iam-policy-binding PROJECT_ID \
  --role roles/cloudtrace.agent \
  --member "serviceAccount:bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com"
gcloud projects add-iam-policy-binding PROJECT_ID \
  --role roles/monitoring.metricWriter \
  --member "serviceAccount:bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com"
gcloud iam service-accounts add-iam-policy-binding "bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com" \
  --role roles/iam.workloadIdentityUser \
  --member "serviceAccount:PROJECT_ID.svc.id.goog[default/default]"

此步骤会授予以下访问权限：

roles/cloudtrace.agent：将跟踪记录数据（例如延迟时间信息）写入 Trace。
roles/monitoring.metricWriter：将指标写入 Cloud Monitoring。
roles/iam.workloadIdentityUser：允许 Kubernetes 服务账号使用适用于 GKE 的工作负载身份联合充当 IAM 服务账号。

在 default 命名空间中配置 default Kubernetes 服务账号，以充当您创建的 IAM 服务账号：
```
kubectl annotate serviceaccount default \
    iam.gke.io/gcp-service-account=bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com
```
这可让 default 命名空间中使用 default Kubernetes 服务账号的 Pod 访问与 IAM 服务账号相同的 Google Cloud 资源。

部署 Bank of Anthos 和 PostgreSQL

在本部分中，您将在高可用性 (HA) 模式下安装 Bank of Anthos 和 PostgreSQL 数据库，以便自动扩缩数据库服务器的副本。如需查看本部分中使用的脚本、Helm 图表和 Kubernetes 清单，请查看 GitHub 上的 Bank of Anthos 代码库。

部署数据库架构和数据定义语言 (DDL) 脚本：

kubectl create configmap initdb \
    --from-file=src/accounts/accounts-db/initdb/0-accounts-schema.sql \
    --from-file=src/accounts/accounts-db/initdb/1-load-testdata.sql \
    --from-file=src/ledger/ledger-db/initdb/0_init_tables.sql \
    --from-file=src/ledger/ledger-db/initdb/1_create_transactions.sh

使用示例 Helm 图表安装 PostgreSQL：

helm repo add bitnami https://charts.bitnami.com/bitnami
helm install accounts-db bitnami/postgresql-ha \
    --version 10.0.1 \
    --values extras/postgres-hpa/helm-postgres-ha/values.yaml \
    --set="postgresql.initdbScriptsCM=initdb" \
    --set="postgresql.replicaCount=1" \
    --wait

此命令会创建一个初始副本数为 1 的 PostgreSQL 集群。在本教程的后面部分，您将根据传入的连接来扩缩集群。此操作可能需要十分钟或更长时间才能完成。

部署 Bank of Anthos：

kubectl apply -f extras/jwt/jwt-secret.yaml
kubectl apply -f extras/postgres-hpa/kubernetes-manifests

此操作可能需要几分钟才能完成。

检查点：验证您的设置

检查所有 Bank of Anthos Pod 是否正在运行：

kubectl get pods

输出内容类似如下：

NAME                                  READY   STATUS
accounts-db-pgpool-57ffc9d685-c7xs8   3/3     Running
accounts-db-postgresql-0              1/1     Running
balancereader-57b59769f8-xvp5k        1/1     Running
contacts-54f59bb669-mgsqc             1/1     Running
frontend-6f7fdc5b65-h48rs             1/1     Running
ledgerwriter-cd74db4cd-jdqql          1/1     Running
pgpool-operator-5f678457cd-cwbhs      1/1     Running
transactionhistory-5b9b56b5c6-sz9qz   1/1     Running
userservice-f45b46b49-fj7vm           1/1     Running

检查您是否可以访问网站前端：
1. 获取 frontend 服务的外部 IP 地址：
```
kubectl get ingress frontend
```
  输出内容类似如下：
```
NAME       CLASS    HOSTS   ADDRESS         PORTS   AGE
frontend   <none>   *       203.0.113.9     80      12m
```
2. 在浏览器中，转到外部 IP 地址。此时会显示 Bank of Anthos 登录页面。如果有兴趣，请浏览该应用。
  
  如果您收到 404 错误，请等待几分钟，以便微服务完成预配，然后重试。

自动扩缩 Web 应用和 PostgreSQL 数据库

GKE Autopilot 会根据集群中的工作负载数量自动扩缩集群计算资源。如需根据资源指标自动扩缩集群中的 Pod 数量，您必须实现 Kubernetes Pod 横向自动扩缩。您可以使用内置的 Kubernetes CPU 和内存指标，也可以使用自定义指标，例如每秒 HTTP 请求数或取自 Cloud Monitoring 的 SELECT 语句的数量。

在本部分中，您将执行以下操作：

使用内置指标和自定义指标，为 Bank of Anthos 微服务配置 Pod 横向自动扩缩。
模拟 Bank of Anthos 应用承受的负载以触发自动扩缩事件。
观察集群中 Pod 和节点的数量如何根据负载自动扩缩。

设置自定义指标收集

如需从 Monitoring 读取自定义指标，您必须在集群中部署自定义指标 - Stackdriver Adapter 适配器。

部署适配器：

kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml

配置适配器以使用适用于 GKE 的工作负载身份联合获取指标：

配置 IAM 服务账号：

gcloud projects add-iam-policy-binding PROJECT_ID \
    --member "serviceAccount:bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com" \
    --role roles/monitoring.viewer
gcloud iam service-accounts add-iam-policy-binding bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com \
    --role roles/iam.workloadIdentityUser \
    --member "serviceAccount:PROJECT_ID.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]"

为适配器使用的 Kubernetes 服务账号添加注释：

kubectl annotate serviceaccount custom-metrics-stackdriver-adapter \
    --namespace=custom-metrics \
  iam.gke.io/gcp-service-account=bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com

重启适配器 Deployment 以传播更改：

kubectl rollout restart deployment custom-metrics-stackdriver-adapter \
    --namespace=custom-metrics

为数据库配置自动扩缩

在本教程前面部署 Bank of Anthos 和 PostgreSQL 时，您将数据库部署为了具有一个主读写副本的 StatefulSet，以处理所有传入 SQL 语句。在本部分中，您将配置 Pod 横向自动扩缩以添加新的备用只读副本来处理传入的 SELECT 语句。减少每个副本负载的好方法是分布 SELECT 语句，即读取操作。PostgreSQL 部署包含一个名为 Pgpool-II 的工具，用于实现此负载均衡并提高系统吞吐量。

PostgreSQL 会将 SELECT 语句指标导出为 Prometheus 指标。您将使用名为 prometheus-to-sd 的轻量级指标导出器，以受支持的格式将这些指标发送到 Cloud Monitoring。

查看 HorizontalPodAutoscaler 对象：

# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: accounts-db-postgresql
spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
      - type: Percent
        value: 100
        periodSeconds: 5
      selectPolicy: Max
  scaleTargetRef:
    apiVersion: apps/v1
    kind: StatefulSet
    name: accounts-db-postgresql
  minReplicas: 1
  maxReplicas: 5
  metrics:
  - type: External
    external:
      metric:
        name: custom.googleapis.com|mypgpool|pgpool2_pool_backend_stats_select_cnt
      target:
          type: AverageValue
          averageValue: "15"

此清单执行以下操作：

将纵向扩容期间的副本数上限设置为 5。
将纵向缩容期间的副本数下限设置为 1。
使用外部指标来制定扩缩决策。在此示例中，指标是 SELECT 语句的数量。如果传入的 SELECT 语句计数超过 15，则会发生纵向扩容事件。

将清单应用于集群：

kubectl apply -f extras/postgres-hpa/hpa/postgresql-hpa.yaml

为网页界面配置自动扩缩

在部署 Bank of Anthos 和 PostgreSQL 中，您部署了 Bank of Anthos 网页界面。当用户数量增加时，userservice Service 会消耗更多 CPU 资源。在本部分中，您将在配置 userservice Deployment 在现有 Pod 使用的请求 CPU 超过 60% 以上时进行横向 Pod 自动扩缩，以及 frontend Deployment 在负载均衡器的传入 HTTP 请求数超过每秒 5 个时也进行横向 Pod 自动扩缩。

为用户服务 Deployment 配置自动扩缩

查看 userservice Deployment 的 HorizontalPodAutoscaler 清单：

# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: userservice
spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 5
      selectPolicy: Max
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: userservice
  minReplicas: 5
  maxReplicas: 50
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 60

此清单执行以下操作：

将纵向扩容期间的副本数上限设置为 50。
将纵向缩容期间的副本数下限设置为 5。
使用内置的 Kubernetes 指标做出扩缩决策。在此示例中，指标为 CPU 利用率，目标利用率为 60%，这可以避免过度利用和利用不充分的情况。

将清单应用于集群：

kubectl apply -f extras/postgres-hpa/hpa/userservice.yaml

为前端部署配置自动扩缩

查看 userservice Deployment 的 HorizontalPodAutoscaler 清单：

# Copyright 2022 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#      http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

---
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: frontend
spec:
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 5
      selectPolicy: Max
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: frontend
  minReplicas: 5
  maxReplicas: 25
  metrics:
    - type: External
      external:
        metric:
          name: loadbalancing.googleapis.com|https|request_count
          selector:
            matchLabels:
              resource.labels.forwarding_rule_name: FORWARDING_RULE_NAME
        target:
          type: AverageValue
          averageValue: "5"

此清单使用以下字段：

spec.scaleTargetRef：要扩缩的 Kubernetes 资源。
spec.minReplicas：副本数下限，在本示例中为 5。
spec.maxReplicas：副本数上限，在本示例中为 25。
spec.metrics.*：要使用的指标。在本示例中，这是每秒的 HTTP 请求数，它是您部署的适配器提供的 Cloud Monitoring 中的自定义指标。
spec.metrics.external.metric.selector.matchLabels：自动扩缩时要过滤的特定资源标签。

查找从负载均衡器到 frontend Deployment 的转发规则的名称：

export FW_RULE=$(kubectl get ingress frontend -o=jsonpath='{.metadata.annotations.ingress\.kubernetes\.io/forwarding-rule}')
echo $FW_RULE

输出内容类似如下：

k8s2-fr-j76hrtv4-default-frontend-wvvf7381

将您的转发规则添加到清单中：
```
sed -i "s/FORWARDING_RULE_NAME/$FW_RULE/g" "extras/postgres-hpa/hpa/frontend.yaml"
```
此命令将 FORWARDING_RULE_NAME 替换为已保存的转发规则。

将清单应用于集群：

kubectl apply -f extras/postgres-hpa/hpa/frontend.yaml

检查点：验证自动扩缩设置

获取 HorizontalPodAutoscaler 资源的状态：

kubectl get hpa

输出内容类似如下：

NAME                     REFERENCE                            TARGETS             MINPODS   MAXPODS   REPLICAS   AGE
accounts-db-postgresql   StatefulSet/accounts-db-postgresql   10905m/15 (avg)     1         5         2          5m2s
contacts                 Deployment/contacts                  1%/70%              1         5         1          11m
frontend                 Deployment/frontend                  <unknown>/5 (avg)   5         25        1          34s
userservice              Deployment/userservice               0%/60%              5         50        5          4m56s

此时，您已设置应用并配置了自动扩缩。您的前端和数据库现在可以根据您提供的指标进行扩缩。

模拟负载并观察 GKE 扩缩

Bank of Anthos 包含 loadgenerator Service，因此您可以模拟流量，以测试负载下的应用扩缩。在本部分中，您将部署 loadgenerator Service，生成负载，并观察产生的扩缩情况。

部署负载测试生成器

使用 Bank of Anthos 负载均衡器的 IP 地址创建环境变量：

export LB_IP=$(kubectl get ingress frontend -o=jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo $LB_IP

输出内容类似如下：

203.0.113.9

将负载均衡器的 IP 地址添加到清单中：

sed -i "s/FRONTEND_IP_ADDRESS/$LB_IP/g" "extras/postgres-hpa/loadgenerator.yaml"

将清单应用于集群：

kubectl apply -f  extras/postgres-hpa/loadgenerator.yaml

负载生成器开始每秒添加 1 个用户，最多添加 250 个用户。

模拟负载

在本部分中，您将使用负载生成器模拟流量高峰，并观察副本数和节点数如何纵向扩容以适应随时间增加的负载。然后，您可以结束测试并观察副本数和节点数如何相应地纵向缩容。

在本地开放负载生成器网页界面：
```
kubectl port-forward svc/loadgenerator 8080
```
如果您看到错误消息，请在 Pod 运行时重试。
在浏览器中，打开负载生成器网页界面。
- 如果您使用的是本地 shell，请打开浏览器并转到 http://127.0.0.1:8080。
- 如果您使用的是 Cloud Shell，请点击 网页预览，然后点击在端口 8080 上预览。
点击图表标签页以观察一段时间内的性能。

打开一个新的终端窗口，并观察 Pod 横向自动扩缩器的副本计数：

kubectl get hpa -w

副本数量会随着负载的增加而增加。扩容可能需要大约十分钟。

NAME                     REFERENCE                            TARGETS          MINPODS   MAXPODS   REPLICAS
accounts-db-postgresql   StatefulSet/accounts-db-postgresql   8326m/15 (avg)   1         5         5
contacts                 Deployment/contacts                  51%/70%          1         5         2
frontend                 Deployment/frontend                  5200m/5 (avg)    5         25        13
userservice              Deployment/userservice               71%/60%          5         50        17

打开另一个终端窗口，并检查集群中的节点数：

gcloud container clusters list \
    --filter='name=bank-of-anthos' \
    --format='table(name, currentMasterVersion, currentNodeVersion, currentNodeCount)' \
    --location="us-central1"

节点数从最初的三个节点开始增加，以容纳新副本。
打开负载生成器界面，然后点击停止以结束测试。
再次检查副本数和节点数，并观察数量随着负载的减少而减少。纵向缩容可能需要一些时间，因为 Kubernetes HorizontalPodAutoscaler 资源中副本的默认稳定时长为 5 分钟。如需了解详情，请参阅稳定时长。

清理

为避免因本教程中使用的资源导致您的 Google Cloud 账号产生费用，请删除包含这些资源的项目，或者保留项目但删除各个资源。

删除各个资源

Google Cloud 会根据您创建的 Kubernetes 对象创建资源，例如负载均衡器。如需删除本教程中的所有资源，请执行以下操作：

删除示例 Kubernetes 资源：

kubectl delete \
    -f extras/postgres-hpa/loadgenerator.yaml \
    -f extras/postgres-hpa/hpa \
    -f extras/postgres-hpa/kubernetes-manifests \
    -f extras/jwt/jwt-secret.yaml \
    -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml

删除 PostgreSQL 数据库：

helm uninstall accounts-db
kubectl delete pvc -l "app.kubernetes.io/instance=accounts-db"
kubectl delete configmaps initdb

删除 GKE 集群和 IAM 服务账号：

gcloud iam service-accounts delete "bank-of-anthos@PROJECT_ID.iam.gserviceaccount.com" --quiet
gcloud container clusters delete "bank-of-anthos" --location="us-central1" --quiet

删除项目

Delete a Google Cloud project:

gcloud projects delete PROJECT_ID

后续步骤

了解 Pod 纵向自动扩缩，该功能可用于自动调整长时间运行的工作负载的资源请求并根据使用历史记录提供建议。
详细了解 Pod 横向自动扩缩。