Configuration updates for modernization

This document describes configuration updates you may need to make to your managed Cloud Service Mesh before modernizing your mesh to the TRAFFIC_DIRECTOR control plane from the ISTIOD control plane.

The following are a list of possible configuration updates necessary to prepare your cluster for modernization. See each section for update instructions:

For more information on the modernization workflow, see the Managed control plane modernization page.

Migrate from Istio secrets to multicluster_mode

Multi-cluster secrets are not supported when a cluster is using the TRAFFIC_DIRECTOR control plane. This document describes how you can modernize from using Istio multi-cluster secrets to using multicluster_mode.

Istio secrets versus declarative API overview

Open source istio multi-cluster endpoint discovery works by using istioctl or other tools to create a Kubernetes Secret in a cluster. This secret allows a cluster to load balance traffic to another cluster in the mesh. The ISTIOD control plane then reads this secret and begins routing traffic to that other cluster.

Cloud Service Mesh has a declarative API to control multi-cluster traffic instead of directly creating Istio secrets. This API treats Istio secrets as an implementation detail and is more reliable than creating Istio secrets manually. Future Cloud Service Mesh features will depend on the declarative API, and you won't be able to use those new features with Istio secrets directly. The declarative API is the only supported path forward.

If you are using Istio Secrets, migrate to using the declarative API as soon as possible. Note that the multicluster_mode setting directs each cluster to direct traffic to every other cluster in the mesh. Using secrets allows a more flexible configuration, letting you configure for each cluster which other cluster it should direct traffic to in the mesh. For a full list of the differences between the supported features of the declarative API and Istio secrets, see Supported features using Istio APIs.

Migrate from Istio secrets to declarative API

If you provisioned Cloud Service Mesh using automatic management with the fleet feature API, you don't need to follow these instructions. These steps only apply if you onboarded using asmcli --managed.

Note, this process changes secrets that point to a cluster. During this process, the endpoints are removed and then re-added. In between the endpoints being removed and added, the traffic will briefly revert to routing locally instead of load balancing to other clusters. For more information, see the GitHub issue.

To move from using Istio secrets to the declarative API, follow these steps. Execute these steps at the same time or in close succession:

  1. Enable the declarative API for each cluster in the fleet where you want to enable multi cluster endpoint discovery by setting multicluster_mode=connected. Note that you need to explicitly set multicluster_mode=disconnected if you don't want the cluster to be discoverable.

    Use the following command to opt in a cluster for multi cluster endpoint discovery:

     kubectl patch configmap/asm-options -n istio-system --type merge -p '{"data":{"multicluster_mode":"connected"}}'
    

    Use the following command to opt a cluster out of endpoint discovery:

     kubectl patch configmap/asm-options -n istio-system --type merge -p '{"data":{"multicluster_mode":"disconnected"}}'
    
  2. Delete old secrets.

    After setting multicluster_mode=connected on your clusters, each cluster will have a new secret generated for every other cluster that also has multicluster_mode=connected set. The secret is placed in the istio-system namespace and have the following format:

    istio-remote-secret-projects-PROJECT_NAME-locations-LOCATION-memberships-MEMBERSHIPS
    

    Each secret will also have the label istio.io/owned-by: mesh.googleapis.com applied.

    Once the new secrets are created, you can delete any secrets manually created with istioctl create-remote-secret:

    kubectl delete secret SECRET_NAME -n istio-system
    

Once migrated, check your request metrics to make sure they're routed as expected.

Enable Workload Identity Federation for GKE

Workload Identity Federation is the recommended secure method for Google Kubernetes Engine workloads. This allows access to Google Cloud services such as Compute Engine, BigQuery, and Machine Learning APIs. Workload Identity Federation doesn't require manual configuration or less secure methods like service account key files because it uses IAM policies. For more details on Workload Identity Federation, see How Workload Identity Federation for GKE works.

The following section describe how to enable Workload Identity Federation.

Enable Workload Identity Federation on clusters

  1. Check Workload Identity Federation is enabled for your cluster. To do that, ensure the GKE cluster has a Workload Identity Federation pool set configured, which is essential for IAM credential validation.

    Use the following command, to check the workload identity pool set for a cluster:

    gcloud container clusters describe CLUSTER_NAME \
      --format="value(workloadIdentityConfig.workloadPool)"
    

    Replace CLUSTER_NAME with the name of your GKE cluster. If you haven't already specified a default zone or region for gcloud, you might also need to specify a --region or --zone flag when running this command.

  2. If the output is empty, follow the instructions in Update an existing cluster to enable workload identity on existing GKE clusters.

Enable Workload Identity Federation on node pools

After Workload Identity Federation is enabled on a cluster, node pools must be configured to use the GKE metadata server.

  1. List all the node pools of a Standard cluster. Run the gcloud container node-pools list command:

    gcloud container node-pools list --cluster CLUSTER_NAME
    

    Replace CLUSTER_NAME with the name of your GKE cluster. If you haven't already specified a default zone or region for gcloud, you might also need to specify a --region or --zone flag when running this command.

  2. Verify that each node pool is using the GKE metadata server:

    gcloud container node-pools describe NODEPOOL_NAME \
        --cluster=CLUSTER_NAME \
        --format="value(config.workloadMetadataConfig.mode)"
    

    Replace the following:

    • NODEPOOL_NAME with the name of your nodepool.
    • CLUSTER_NAME with the name of your GKE cluster.
  3. If the output doesn't contain GKE_METADATA, update the node pool using the Update an existing node pool guide.

Enable managed container network interface (CNI)

This section guides you through enabling managed CNI for Cloud Service Mesh on Google Kubernetes Engine.

Managed CNI overview

Managed container network interface (CNI) is a Google-managed implementation of the Istio CNI. The CNI plugin streamlines pod networking by configuring iptables rules. This enables traffic redirection between applications and Envoy proxies, eliminating the need for privileged permissions for the init-container required to manage iptables.

The Istio CNI plugin replaces the istio-init container. The istio-init container was previously responsible for setting up the pod's network environment to enable traffic interception for the Istio sidecar. The CNI plugin performs the same network redirect function, but with the added benefit of reducing the need for elevated privileges, thereby enhancing security.

Therefore, for enhanced security and reliability, and to simplify management and troubleshooting, managed CNI is required across all Managed Cloud Service Mesh deployments.

Impact on init containers

Init containers are specialized containers that run before application containers for setup tasks. Setup tasks can include tasks such as downloading configuration files, communicating with external services, or performing pre-application initialization. Init containers that rely on network access might encounter issues when managed CNI is enabled in the cluster.

The pod setup process with managed CNI is as follows:

  1. The CNI plugin sets up pod network interfaces, assigns pod IPs and redirects traffic to the Istio sidecar proxy which hasn't started yet.
  2. All init containers execute and complete.
  3. The Istio sidecar proxy starts alongside the application containers.

Therefore, if an init container attempts to make outbound network connections or connect to services within the mesh, the network requests from the init containers may be dropped or misrouted. This is because the Istio sidecar proxy, which manages network traffic for the pod, is not running when the requests are made. For more details, refer Istio CNI documentation.

Enable managed CNI for your cluster

Follow the steps in this section to enable managed CNI on your cluster.

  1. Remove network dependencies from your init container. Consider the following alternatives:

    • Modify application logic or containers: You can modify your services to remove the dependency on init containers that require network requests or perform network operations within your application containers, after the sidecar proxy has started.
    • Use Kubernetes ConfigMaps or secrets: Store configuration data fetched by the network request in Kubernetes ConfigMaps or secrets and mount them into your application containers. For alternative solutions, refer to the Istio documentation.
  2. Enable managed CNI on your cluster:

    1. Make the following configuration changes:

      1. Run the following command to locate the controlPlaneRevision.

        kubectl get controlplanerevision -n istio-system
        
      2. In your ControlPlaneRevision (CPR) custom resource (CR), set the label mesh.cloud.google.com/managed-cni-enabled to true.

        kubectl label controlplanerevision CPR_NAME -n istio-system mesh.cloud.google.com/managed-cni-enabled=true --overwrite
        

        Replace CPR_NAME with the value under the NAME column from the output of the previous step.

      3. In the asm-options ConfigMap, set the ASM_OPTS value to CNI=on.

        kubectl patch configmap asm-options -n istio-system -p '{"data":{"ASM_OPTS":"CNI=on"}}'
        
      4. In your ControlPlaneRevision (CPR) custom resource (CR), set the label mesh.cloud.google.com/force-reprovision to true. This action triggers control plane restart.

        kubectl label controlplanerevision CPR_NAME -n istio-system mesh.cloud.google.com/force-reprovision=true --overwrite
        
    2. Check the feature state. Retrieve the feature state using the following command:

      gcloud container fleet mesh describe --project FLEET_PROJECT_ID
      

      Replace FLEET_PROJECT_ID with the ID of your Fleet Host project. Generally, the FLEET_PROJECT_ID has the same name as the project.

      • Verify that the MANAGED_CNI_NOT_ENABLED condition is removed from servicemesh.conditions.
      • Note, it may take up to 15-20 minutes for the state to update. Try waiting a few minutes and re-run the command.
    3. Once the controlPlaneManagement.state is Active in the cluster's feature, restart the pods.

Move away from unsupported binary usage

TRAFFIC_DIRECTOR control plane only supports the distroless image type. This section suggest ways to make your deployment compatible with this image type.

Distroless envoy proxy sidecar images

Cloud Service Mesh uses two types of Envoy proxy sidecar images based on your control plane configuration, Ubuntu-based image containing various binaries and Distroless image. Distroless base images are minimal container images that prioritize security and resource optimization by only including essential components. The attack surface is reduced to help prevent vulnerabilities. For more information, refer to the documentation on Distroless proxy image.

Modernization and binary compatibility

As part of modernization, Cloud Service Mesh is transitioning to a distroless sidecar image. This image has a minimal set of dependencies and is stripped of all non-essential executables, libraries, and debugging tools. It is therefore not possible to execute a shell command or use curl, ping, or other debug utilities like kubectl exec inside the container.

For a cluster to be eligible for modernization, the current usage should be compatible with the distroless sidecar image.

Make clusters compatible with distroless images

  • Remove references to any unsupported binaries (like bash or curl) from your configuration. Particularly inside Readiness, Startup, and Liveness probes, and Lifecycle PostStart and PreStop hooks within the istio-proxy, istio-init, or istio-validation containers.
  • Consider alternatives like holdApplicationUntilProxyStarts for certain use cases.
  • For debugging, you can use ephemeral containers to attach to a running workload Pod. You can then inspect it and run custom commands. For an example, see Collecting Cloud Service Mesh logs.

If you can't find a solution for your specific use case, contact Google Cloud Support at Getting support.