迁移集群以使用 Node Agent

本文档介绍了如何为新集群和现有集群启用 Node Agent,以提供更安全的集群操作。从 1.33 版开始,Google Distributed Cloud for Bare Metal 能够从使用 Ansible over SSH 进行集群操作改为使用 Node Agent 的更安全的基于代理的模型。使用 Node Agent 可对集群操作解决在敏感环境中需要具有客户节点的 SSH 访问权限的安全问题进行管理。 在新模型中,Node Agent 二进制文件会在每个节点上运行。Node Agent 通过安全的 gRPC 通道与控制器等客户端通信,以管理所有节点配置活动。Google Distributed Cloud 会在集群控制器与 Node Agent 之间以及 bmctl 与 Node Agent 之间强制执行双向传输层安全协议 (mTLS),以对 gRPC 连接进行身份验证和加密。

bmctl nodeagent 命令可让迁移现有集群以使用 Node Agent 的过程变得简单可靠。这些命令可减少手动操作,提高各节点之间的一致性,并自动执行证书创建和变换等关键任务。bmctl 命令主要通过 SSH 运行。这样,即使集群控制器健康状况不佳或其标准通信通道受损,管理员也可以部署或重新部署代理。

Node Agent 和相应的 bmctl nodeagent 命令支持 Google Distributed Cloud for Bare Metal 1.33.0 版及更高版本。您可以为现有 1.33 版或更高版本的集群启用 Node Agent,也可以在创建 1.33 版或更高版本的集群时启用 Node Agent。

本页面适用于管理底层技术基础设施生命周期的管理员、架构师和运维人员。如需详细了解我们在 Google Cloud 内容中提及的常见角色和示例任务,请参阅常见的 GKE Enterprise 用户角色和任务

准备工作

在将集群迁移到 Node Agent 模式之前,请确保所有集群节点都满足以下要求:

  • 每个节点都有一个专门用于 Node Agent 的开放端口。默认情况下,Node Agent 使用端口 9192,但您可以在部署、启用或安装新集群期间配置此端口。如需了解详情,请参阅自定义 Node Agent 端口

  • 每个节点都安装了 containerd 1.7 版或更高版本。

迁移到 Node Agent 模式

迁移到 Node Agent 模式的过程分为两个步骤:

  1. 部署 Node Agent:将 Node Agent 组件部署到集群中的所有节点。

  2. 启用 Node Agent 模式

    • 对于现有集群,请使用 bmctl nodeagent 命令启用该模式。
    • 对于新集群,请在创建之前将启用注解和相应的凭证路径添加到集群配置文件中。

部署 Node Agent

bmctl nodeagent deploy 命令使用 SSH 将 Node Agent 服务部署到指定集群中的一个或多个目标节点。此命令会安装或重新安装 Node Agent。它通过 SSH 连接并执行必要的步骤,包括转移二进制文件、视需要生成和转移证书,以及设置 systemd 服务。Node Agent 需要具有目标节点的 SSH 访问权限和 sudo 权限。

您可以通过多种方式指定目标节点:直接通过 --nodes 标志指定,通过集群配置文件使用 --cluster 标志指定,或通过引用集群自定义资源指定。如需详细了解 Node Agent 命令和选项,请参阅 bmctl 命令参考文档

在全新环境中部署

对于初始部署,请下载 nodeagentd 二进制文件并生成新的证书授权机构 (CA)。以下命令会从集群配置文件中检索节点列表。--sa-key 标志可提供从 Cloud Storage 存储桶下载 nodeagentd 二进制文件所需的凭证。

  • 如需在新集群上首次部署 Node Agent,请使用以下命令:

    bmctl nodeagent deploy \
        --pull-binaries true \
        --generate-ca-creds true \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH \
        --sa-key SERVICE_ACCOUNT_KEY_PATH
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已为其配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    • SERVICE_ACCOUNT_KEY_PATH:具有拉取注册表映像权限的服务账号密钥文件的路径。默认情况下,这是 anthos-baremetal-gcr 服务账号的 JSON 密钥文件。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_deploy-20250819-175703/nodeagent_deploy.log
    [2025-08-19 17:57:03+0000] INFO: Executing 'nodeagent deploy'...
    [2025-08-19 17:57:05+0000] -------------------- Deployment Plan --------------------
    [2025-08-19 17:57:05+0000]   Target Cluster:            demo-cluster
    [2025-08-19 17:57:05+0000]   SSH User:                  root
    [2025-08-19 17:57:05+0000]   SSH Key:                   rootSSH
    [2025-08-19 17:57:05+0000]   Concurrency:               25
    [2025-08-19 17:57:05+0000]   Generate Credentials:      true
    [2025-08-19 17:57:05+0000]   Deploy Credentials:        true
    [2025-08-19 17:57:05+0000]   Server Cert Validity Days: 1825
    [2025-08-19 17:57:05+0000]   Verify SSH Host Keys:      true
    [2025-08-19 17:57:05+0000]   Node Agent pull version:   1.33.0-gke.799
    [2025-08-19 17:57:05+0000]   Target Nodes Source:       cluster YAML
    [2025-08-19 17:57:05+0000]   Nodes Port:                9192
    [2025-08-19 17:57:05+0000]   Target Nodes (4):          10.200.0.2, 10.200.0.3, 10.200.0.4, 10.200.0.5
    [2025-08-19 17:57:05+0000] ---------------------------------------------------------
    Proceed with deployment? [y/N]: y
    [2025-08-19 17:57:07+0000] INFO: User confirmed.
    [2025-08-19 17:57:07+0000] Downloading Node Agent binary (1.33.0-gke.799)... OK
    [2025-08-19 17:57:08+0000] INFO: Node Agent binary pulled and stored at bmctl-workspace/bins/nodeagentd
    [2025-08-19 17:57:08+0000] INFO: Starting generate credentials (CAs and client credentials) phase...
    [2025-08-19 17:57:08+0000] Generating credentials for the cluster: demo-cluster, 2025-08-19T17:57:08Z
    [2025-08-19 17:57:08+0000] ------------ Credentials Options ------------
    [2025-08-19 17:57:08+0000] Cluster Name:           demo-cluster
    [2025-08-19 17:57:08+0000] Key Algorithm:          rsa
    [2025-08-19 17:57:08+0000] Key Length:             4096
    [2025-08-19 17:57:08+0000] CA Validity (days):     3650
    [2025-08-19 17:57:08+0000] Client Validity (days): 1825
    [2025-08-19 17:57:08+0000] Server CA CN:           Node Agent Server CA
    [2025-08-19 17:57:08+0000] Client CA CN:           Node Agent Client CA
    [2025-08-19 17:57:08+0000] Creds path:             bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 17:57:08+0000] --------------------------------------------
    [2025-08-19 17:57:08+0000] Generating credentials... OK
    [2025-08-19 17:57:19+0000] Certificates have been created and stored in bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 17:57:19+0000] INFO: Attempting to load CAs from: bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 17:57:19+0000] INFO: Server CA loaded successfully. Subject: CN=Node Agent Server CA,O=GCD-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 17:57:19+0000] INFO: Client CA loaded successfully. Subject: CN=Node Agent Client CA,O=GCD-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 17:57:19+0000] ===============================================
    [2025-08-19 17:57:19+0000] --- Starting Artifact Preparation ---
    [2025-08-19 17:57:19+0000] Starting artifact preparation for 4 nodes (concurrency: 25)...
    [2025-08-19 17:57:23+0000] --- Finished Artifact Preparation ---
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.2
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.3
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.4
    [2025-08-19 17:57:23+0000] INFO: Preparation SUCCEEDED for node 10.200.0.5
    [2025-08-19 17:57:23+0000] ===============================================
    [2025-08-19 17:57:23+0000] --- Starting Deployment Phase ---
    [2025-08-19 17:57:23+0000] INFO: Starting deployment to 4 nodes (Concurrency: 25)...
    [2025-08-19 17:57:36+0000] INFO: All host deployments finished.
    [2025-08-19 17:57:36+0000] INFO: --- Deployment Phase Completed Successfully ---
    [2025-08-19 17:57:36+0000]
    ===============================================
    --- Deployment Summary ---
      Host: 10.200.0.2, Status: SUCCESS
      Host: 10.200.0.3, Status: SUCCESS
      Host: 10.200.0.4, Status: SUCCESS
      Host: 10.200.0.5, Status: SUCCESS
    -----------------------------------------------
    Total Nodes Attempted: 4 | SUCCESS: 4 | FAILED: 0
    ===============================================
    

升级 Node Agent 版本

Node Agent 升级独立于集群升级。如需升级 Node Agent,请使用 bmctl nodeagent deploy 命令并将 --pull-binaries 设置为 true。升级 Node Agent 时,请将 --generate-ca-creds 设置为 false,以使用现有 CA,而不是重新生成它们。重新生成 CA 需要更新相应的集群凭证,此过程用于凭证变换。输出类似于全新部署,但没有 CA 生成的日志。

升级 Node Agent 会重启 Node Agent 进程,这可能会中断任何正在运行的作业。虽然大多数作业都可以通过重试机制恢复,但为了尽可能减少潜在的中断,请按照以下步骤操作:

  1. 确保没有正在进行的集群升级或其他安装后配置活动。

  2. 验证集群是否处于正在运行状态。

  3. 启动 Node Agent 升级:

    bmctl nodeagent deploy \
        --pull-binaries true \
        --generate-ca-creds false \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH \
        --sa-key SERVICE_ACCOUNT_KEY_PATH
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已为其配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    • SERVICE_ACCOUNT_KEY_PATH:具有拉取注册表映像权限的服务账号密钥文件的路径。默认情况下,这是 anthos-baremetal-gcr 服务账号的 JSON 密钥文件。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_deploy-20250819-180416/nodeagent_deploy.log
    [2025-08-19 18:04:16+0000] INFO: Executing 'nodeagent deploy'...
    [2025-08-19 18:04:18+0000] -------------------- Deployment Plan --------------------
    [2025-08-19 18:04:18+0000]   Target Cluster:            demo-cluster
    [2025-08-19 18:04:18+0000]   SSH User:                  root
    [2025-08-19 18:04:18+0000]   SSH Key:                   rootSSH
    [2025-08-19 18:04:18+0000]   Concurrency:               25
    [2025-08-19 18:04:18+0000]   Generate Credentials:      false
    [2025-08-19 18:04:18+0000]   Deploy Credentials:        true
    [2025-08-19 18:04:18+0000]   Server Cert Validity Days: 1825
    [2025-08-19 18:04:18+0000]   Verify SSH Host Keys:      true
    [2025-08-19 18:04:18+0000]   Node Agent pull version:   1.33.0-gke.799
    [2025-08-19 18:04:18+0000]   Target Nodes Source:       cluster YAML
    [2025-08-19 18:04:18+0000]   Nodes Port:                9192
    [2025-08-19 18:04:18+0000]   Target Nodes (4):          10.200.0.2, 10.200.0.3, 10.200.0.4, 10.200.0.5
    [2025-08-19 18:04:18+0000] ---------------------------------------------------------
    Proceed with deployment? [y/N]: y
    [2025-08-19 18:04:20+0000] INFO: User confirmed.
    [2025-08-19 18:04:20+0000] Downloading Node Agent binary (1.33.0-gke.799)... OK
    [2025-08-19 18:04:22+0000] INFO: Node Agent binary pulled and stored at bmctl-workspace/bins/nodeagentd
    [2025-08-19 18:04:22+0000] INFO: Attempting to load CAs from: bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:04:22+0000] INFO: Server CA loaded successfully. Subject: CN=Node Agent Server CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:04:22+0000] INFO: Client CA loaded successfully. Subject: CN=Node Agent Client CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:04:22+0000] ===============================================
    [2025-08-19 18:04:22+0000] --- Starting Artifact Preparation ---
    [2025-08-19 18:04:22+0000] Starting artifact preparation for 4 nodes (concurrency: 25)...
    

部署或重新部署到特定节点

如果您添加或恢复集群节点,则可以指定用于部署 Node Agent 的特定节点,而不是将 Node Agent 部署到集群中的所有节点。您可以使用 --nodes 标志指定用于部署的节点。

  • 如需将 Node Agent 部署到特定节点,请使用以下命令:

    bmctl nodeagent deploy \
        --pull-binaries true \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH \
        --sa-key SERVICE_ACCOUNT_KEY_PATH \
        --nodes NODE_IP_ADDRESS_LIST
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已为其配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    • SERVICE_ACCOUNT_KEY_PATH:具有拉取注册表映像权限的服务账号密钥文件的路径。默认情况下,这是 anthos-baremetal-gcr 服务账号的 JSON 密钥文件。

    • NODE_IP_ADDRESS_LIST:您要将 Node Agent 部署到的节点 IP 地址的英文逗号分隔列表。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_deploy-20250819-181751/nodeagent_deploy.log
    [2025-08-19 18:17:51+0000] INFO: Executing 'nodeagent deploy'...
    [2025-08-19 18:17:54+0000] -------------------- Deployment Plan --------------------
    [2025-08-19 18:17:54+0000]   Target Cluster:            demo-cluster
    [2025-08-19 18:17:54+0000]   SSH User:                  user
    [2025-08-19 18:17:54+0000]   SSH Key:                   SSH_KEY_PATH
    [2025-08-19 18:17:54+0000]   Concurrency:               25
    [2025-08-19 18:17:54+0000]   Generate Credentials:      false
    [2025-08-19 18:17:54+0000]   Deploy Credentials:        true
    [2025-08-19 18:17:54+0000]   Server Cert Validity Days: 1825
    [2025-08-19 18:17:54+0000]   Verify SSH Host Keys:      true
    [2025-08-19 18:17:54+0000]   Node Agent pull version:   1.33.0-gke.799
    [2025-08-19 18:17:54+0000]   Target Nodes Source:       nodes flag
    [2025-08-19 18:17:54+0000]   Nodes Port:                9192
    [2025-08-19 18:17:54+0000]   Target Nodes (3):          10.200.0.2, 10.200.0.3
    [2025-08-19 18:17:54+0000] ---------------------------------------------------------
    Proceed with deployment? [y/N]:
    

如需查看 bmctl nodeagent deploy 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent deploy

启用 Node Agent

在将 Node Agent 部署到集群中的所有节点后,enable 命令会在现有正在运行的集群中激活 Node Agent 模式。此命令还会在集群中创建或更新 Node Agent 凭证。

为现有正在运行的集群启用 Node Agent

您可以在现有 1.33 版及更高版本的集群上启用 Node Agent。

  • 如需在现有集群上启用 Node Agent,请使用以下命令:

    ./bmctl nodeagent enable \
        --kubeconfig KUBECONFIG \
        --cluster CLUSTER_NAME \
        --ensure-status=true
    

    替换以下内容:

    • KUBECONFIG:您要为其启用 Node Agent 的集群的 kubeconfig 文件路径。

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_enable-20250819-183058/nodeagent_enable.log
    [2025-08-19 18:30:58+0000] Enable Node Agent for cluster: demo-cluster
    [2025-08-19 18:31:00+0000] Update Node Agent credentials
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Server CA certificate path: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_cert.pem
    [2025-08-19 18:31:00+0000] Server CA private key path: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_key.pem
    [2025-08-19 18:31:00+0000] Client CA certificate path: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_cert.pem
    [2025-08-19 18:31:00+0000] Client CA private key path: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_key.pem
    [2025-08-19 18:31:00+0000] Client certificate path: bmctl-workspace/demo-cluster/nodeagent-creds/client_cert.pem
    [2025-08-19 18:31:00+0000] Client private key path: bmctl-workspace/demo-cluster/nodeagent-creds/client_key.pem
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Node Agent client credentials secret has been created/updated
    [2025-08-19 18:31:00+0000] Node Agent server CA secret has been created/updated
    [2025-08-19 18:31:00+0000] Node Agent client CA secret has been created/updated
    [2025-08-19 18:31:00+0000] Successfully created/updated Node Agent credentials secrets in namespace cluster-demo-cluster
    [2025-08-19 18:31:00+0000] Annotation 'baremetal.cluster.gke.io/node-agent-port' not found on cluster cluster-demo-cluster/demo-cluster, no removal needed.
    [2025-08-19 18:31:00+0000] Successfully enable Node Agent for cluster: demo-cluster
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 18:31:00+0000] --------------------- Total nodes: 3 ----------------------
    [2025-08-19 18:31:00+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1577
    [2025-08-19 18:31:00+0000] node: control-1--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1578
    [2025-08-19 18:31:00+0000] node: control-2--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1581
    [2025-08-19 18:31:00+0000] ----------------------------------------------------------
    [2025-08-19 18:31:00+0000] Verified Node Agent status on all nodes in cluster
    

新集群安装

您可以在创建 1.33 版及更高版本的集群时启用 Node Agent。

如需为新集群启用 Node Agent,请按照以下步骤操作:

  1. 对于新的管理员集群,请将以下凭证文件路径添加到管理员集群配置文件的顶部部分:

    nodeAgentServerCACertificatePath: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_cert.pem
    nodeAgentServerCAPrivateKeyPath: bmctl-workspace/demo-cluster/nodeagent-creds/server_ca_key.pem
    nodeAgentClientCACertificatePath: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_cert.pem
    nodeAgentClientCAPrivateKeyPath: bmctl-workspace/demo-cluster/nodeagent-creds/client_ca_key.pem
    nodeAgentClientCertificatePath: bmctl-workspace/demo-cluster/nodeagent-creds/client_cert.pem
    nodeAgentClientPrivateKeyPath: bmctl-workspace/demo-cluster/nodeagent-creds/client_key.pem
    
  2. 在集群配置文件的集群元数据部分中添加 Node Agent 启用注解:

    kind: Cluster
    metadata:
      annotations:
        baremetal.cluster.gke.io/enable-node-agent: ""
    
  3. 按照标准说明创建集群。

如需查看 bmctl nodeagent enable 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent enable

轮替凭据

rotate-credentials 命令会变换节点上和集群内的 Node Agent 凭证。这包括变换证书授权机构 (CA) 的功能。--generate-ca-creds 标志指示该命令重新生成 CA,并使用这些新生成的 CA 对服务器(节点)和客户端(控制器)的证书进行签名。

  • 如需变换凭证并重新生成和使用新的 CA,请使用以下命令:

    bmctl nodeagent rotate-credentials \
        --kubeconfig KUBECONFIG \
        --generate-ca-creds true \
        --cluster CLUSTER_NAME \
        --ssh-user USERNAME \
        --ssh-key SSH_KEY_PATH
    

    替换以下内容:

    • KUBECONFIG:您要为其启用 Node Agent 的集群的 kubeconfig 文件路径。

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • USERNAME:已为其配置对节点的 SSH 访问权限的用户名。默认情况下,SSH 配置为使用 root,但如果您设置了登录用户,则使用该用户名。

    • SSH_KEY_PATH:SSH 私钥文件的路径。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_rotate_credentials-20250819-184216/nodeagent_rotate_credentials.log
    [2025-08-19 18:42:16+0000] INFO: Executing 'nodeagent rotate-credentials'...
    [2025-08-19 18:42:18+0000] ------------------- Credentials Rotation  -------------------
    [2025-08-19 18:42:18+0000]   Target Cluster:            demo-cluster
    [2025-08-19 18:42:18+0000]   SSH User:                  root
    [2025-08-19 18:42:18+0000]   SSH Key:                   rootSSH
    [2025-08-19 18:42:18+0000]   Concurrency:               25
    [2025-08-19 18:42:18+0000]   Generate Credentials:      true
    [2025-08-19 18:42:18+0000]   Deploy Credentials:        true
    [2025-08-19 18:42:18+0000]   Server Cert Validity Days: 1825
    [2025-08-19 18:42:18+0000]   Verify SSH Host Keys:      true
    [2025-08-19 18:42:18+0000]   Target Nodes Source:       cluster CR
    [2025-08-19 18:42:18+0000]   Nodes Port:                9192
    [2025-08-19 18:42:18+0000]   Target Nodes (3):          10.200.0.2, 10.200.0.3, 10.200.0.4
    [2025-08-19 18:42:18+0000] ---------------------------------------------------------
    Proceed with credentials rotation? [y/N]: [2025-08-19 18:42:18+0000] INFO: Non-interactive mode enabled; automatically confirming.
    [2025-08-19 18:42:18+0000] INFO: Starting generate credentials (CAs and client credentials) phase...
    [2025-08-19 18:42:18+0000] Generating credentials for the cluster: demo-cluster, 2025-08-19T18:42:18Z
    [2025-08-19 18:42:18+0000] ------------ Credentials Options ------------
    [2025-08-19 18:42:18+0000] Cluster Name:           demo-cluster
    [2025-08-19 18:42:18+0000] Key Algorithm:          rsa
    [2025-08-19 18:42:18+0000] Key Length:             4096
    [2025-08-19 18:42:18+0000] CA Validity (days):     3650
    [2025-08-19 18:42:18+0000] Client Validity (days): 1825
    [2025-08-19 18:42:18+0000] Server CA CN:           Node Agent Server CA
    [2025-08-19 18:42:18+0000] Client CA CN:           Node Agent Client CA
    [2025-08-19 18:42:18+0000] Creds path:             bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:42:18+0000] --------------------------------------------
    [2025-08-19 18:42:18+0000] Generating credentials... OK
    Credential directory 'bmctl-workspace/demo-cluster/nodeagent-creds' already exists. Do you want to back it up and continue? (y/N): y
    [2025-08-19 18:42:27+0000] INFO: User confirmed.
    [2025-08-19 18:42:27+0000] Credentials backup to bmctl-workspace/demo-cluster/nodeagent-creds_backup_20250819_184227
    [2025-08-19 18:42:27+0000] Certificates have been created and stored in bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:42:27+0000] INFO: Attempting to load CAs from: bmctl-workspace/demo-cluster/nodeagent-creds
    [2025-08-19 18:42:27+0000] INFO: Server CA loaded successfully. Subject: CN=Node Agent Server CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:42:27+0000] INFO: Client CA loaded successfully. Subject: CN=Node Agent Client CA,O=gcd-SO,L=Sunnyvale,ST=California,C=US, Key Type: *rsa.PrivateKey
    [2025-08-19 18:42:27+0000] ===============================================
    [2025-08-19 18:42:34+0000] INFO: All host deployments finished.
    [2025-08-19 18:42:34+0000] INFO: --- Deployment Phase Completed Successfully ---
    [2025-08-19 18:42:34+0000]
    ===============================================
    --- Deployment Summary ---
      Host: 10.200.0.2, Status: SUCCESS
      Host: 10.200.0.3, Status: SUCCESS
      Host: 10.200.0.4, Status: SUCCESS
    -----------------------------------------------
    Total Nodes Attempted: 3 | SUCCESS: 3 | FAILED: 0
    ===============================================
    

如需查看 bmctl nodeagent rotate-credentials 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent rotate-credentials

查看状态

status 命令可提供有关节点上 Node Agent 运行状态的信息。您可以直接通过 --nodes 标志指定目标节点,通过集群配置文件使用 --cluster 标志指定,或通过引用集群的自定义资源指定。

当您从集群配置文件或 --nodes 标志获取节点时,系统会从本地文件系统检索凭证。如果节点源是集群自定义资源,则系统会从集群中检索凭证。

以下优先顺序决定了 Node Agent 端口:

  1. --port 标志
  2. kubeconfig 文件
  3. 集群配置文件

验证 Node Agent 状态

仅使用 --cluster 标志,您就可以根据集群配置文件中指定的内容检查 Node Agent 状态。

  • 如需根据集群配置文件检查 Node Agent 状态,请使用以下命令:

    ./bmctl nodeagent status \
        --cluster CLUSTER_NAME
    

    CLUSTER_NAME 替换为您要检查的集群的名称。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_status-20250819-205707/nodeagent_status.log
    [2025-08-19 20:57:07+0000] Check Node Agent for cluster: demo-cluster
    [2025-08-19 20:57:09+0000] ----------------------------------------------------------
    [2025-08-19 20:57:09+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 20:57:09+0000] Target Nodes Source: cluster YAML
    [2025-08-19 20:57:09+0000] --------------------- Total nodes: 4 ----------------------
    [2025-08-19 20:57:09+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1175
    [2025-08-19 20:57:09+0000] node: control-1--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1174
    [2025-08-19 20:57:09+0000] node: control-2--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1176
    [2025-08-19 20:57:09+0000] node: worker-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1179
    [2025-08-19 20:57:09+0000] ----------------------------------------------------------
    [2025-08-19 20:57:09+0000] Verified Node Agent status on all nodes in cluster
    

从集群验证 Node Agent 状态

--cluster 标志与 --kubeconfig 标志结合使用,您可以根据集群自定义资源检查 Node Agent 状态。

  • 如需根据集群自定义资源检查 Node Agent 状态,请使用以下命令:

    ./bmctl nodeagent status \
        --cluster CLUSTER_NAME \
        --kubeconfig KUBECONFIG
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • KUBECONFIG:您要为其启用 Node Agent 的集群的 kubeconfig 文件路径。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_status-20250819-205712/nodeagent_status.log
    [2025-08-19 20:57:12+0000] Check Node Agent for cluster: demo-cluster
    [2025-08-19 20:57:14+0000] ----------------------------------------------------------
    [2025-08-19 20:57:14+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 20:57:14+0000] Target Nodes Source: cluster CR
    [2025-08-19 20:57:14+0000] --------------------- Total nodes: 3 ----------------------
    [2025-08-19 20:57:14+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1180
    [2025-08-19 20:57:14+0000] node: control-1--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1179
    [2025-08-19 20:57:14+0000] node: control-2--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1180
    [2025-08-19 20:57:14+0000] ----------------------------------------------------------
    [2025-08-19 20:57:14+0000] Verified Node Agent status on all nodes in cluster
    

从节点验证 Node Agent 状态

--cluster 标志与 --nodes 标志结合使用,您可以检查特定集群节点的 Node Agent 状态。

  • 如需检查特定节点的 Node Agent 状态,请使用以下命令:

    ./bmctl nodeagent status \
        --cluster CLUSTER_NAME \
        --nodes NODE_IP_ADDRESS_LIST
    

    替换以下内容:

    • CLUSTER_NAME:您要在其节点上部署 Node Agent 的集群的名称。

    • NODE_IP_ADDRESS_LIST:您要将 Node Agent 部署到的节点 IP 地址的英文逗号分隔列表。

    命令输出类似于以下示例:

    Please check the logs at bmctl-workspace/demo-cluster/log/nodeagent_status-20250819-210050/nodeagent_status.log
    [2025-08-19 21:00:50+0000] Check Node Agent for cluster: demo-cluster
    [2025-08-19 21:00:53+0000] ----------------------------------------------------------
    [2025-08-19 21:00:53+0000] Verifying Node Agent status on all nodes...
    [2025-08-19 21:00:53+0000] Target Nodes Source: nodes flag
    [2025-08-19 21:00:53+0000] --------------------- Total nodes: 1 ----------------------
    [2025-08-19 21:00:53+0000] node: control-0--893f0567cb79efc-9b9ec55816170dcf.lab.anthos, version: 1.33.0-gke.799, OS: linux, uptime (seconds): 1399
    [2025-08-19 21:00:53+0000] ----------------------------------------------------------
    [2025-08-19 21:00:53+0000] Verified Node Agent status on all nodes in cluster
    

如需查看 bmctl nodeagent status 命令选项的完整列表,请参阅 bmctl 命令参考文档中的 nodeagent status

SSH 用户权限

非根用户可以执行 bmctl nodeagent 命令。这要求用户拥有完整的无密码 sudo 权限或明确的无密码 sudo 允许列表。

Node Agent 的明确无密码 sudo 允许列表具有以下权限:

# Permission to create the necessary folders and set permissions.
/bin/mkdir -p /etc/nodeagentd
/bin/chmod 0755 /etc/nodeagentd
/bin/mkdir -p /usr/local/bin
/bin/chmod 0755 /usr/local/bin
/bin/mkdir -p /etc/systemd/system
/bin/chmod 0755 /etc/systemd/system

# Permission to place the main application executable and link it.
/bin/rm -f /usr/local/bin/nodeagentd-*
/bin/touch /usr/local/bin/nodeagentd-*
/bin/cp -f /home/deployer/.deploy_tmp_*/* /usr/local/bin/nodeagentd-*
/bin/chmod 0755 /usr/local/bin/nodeagentd-*
/bin/rm -f /usr/local/bin/nodeagentd
/bin/ln -s /usr/local/bin/nodeagentd-* /usr/local/bin/nodeagentd

# Permission to place configuration files in /etc/nodeagentd and set permissions.
/bin/rm -f /etc/nodeagentd/*
/bin/touch /etc/nodeagentd/*
/bin/cp -f /home/deployer/.deploy_tmp_*/* /etc/nodeagentd/*
/bin/chmod 0600 /etc/nodeagentd/*
/bin/chmod 0644 /etc/nodeagentd/*

# Permission to place the systemd unit file.
/bin/rm -f /etc/systemd/system/nodeagentd.service
/bin/touch /etc/systemd/system/nodeagentd.service
/bin/cp -f /home/deployer/.deploy_tmp_*/* /etc/systemd/system/nodeagentd.service
/bin/chmod 0644 /etc/systemd/system/nodeagentd.service

# Permission to interact with systemd service.
/bin/systemctl daemon-reload
/bin/systemctl stop nodeagentd
/bin/systemctl start nodeagentd
/bin/systemctl enable --now nodeagentd

# Permission to remove the temporary files used for the deployment.
/bin/rm -f /home/deployer/.deploy_tmp_*/*

SSH 主机密钥验证

确保所有节点都已添加到管理员工作站上的 known_hosts 文件中。否则,请使用 --enforce-host-key-verify=false 标志在部署 (nodeagent deploy) 和凭证变换 (nodeagent rotate-credentials) 期间停用主机密钥验证。

自定义 Node Agent 端口

Node Agent 允许自定义端口。在部署期间使用 --port 标志指定此自定义端口。这会将设置传播到每个节点上的 Node Agent 配置。自定义端口必须与客户端配置保持一致,如以下方法中所述。

对于现有集群

如需更新现有正在运行的集群,请使用 --port 标志指定新的自定义端口。此设置会传播到客户端(控制器)。

对于新集群

创建新集群时,请将以下注解添加到集群配置,以指定 Node Agent 的自定义端口:

kind: Cluster
metadata:
  annotations:
    baremetal.cluster.gke.io/node-agent-port: "10086"

性能

部署和启用过程会在不到一分钟的时间内完成。凭证变换运行时长与标准部署相当,甚至更快。