Resolving sidecar proxy/webhook issues in Cloud Service Mesh

This section explains common Cloud Service Mesh problems and how to resolve them. If you need additional assistance, see Getting support.

Cloud Service Mesh contains two webhooks:

  • The validating webhook ensures applied Istio configuration is valid.
  • The mutating webhook sets automatic sidecar injection on new pods.

A configuration issue in one of these webhooks might cause new pods to fail start up, or kubectl apply generating error messages.

Sidecar injection problems

If you have provisioned managed Cloud Service Mesh, then contact support.

Sidecar injection is not working correctly if you see any of the following:

  • pods that are scheduling without sidecars
  • pods that should have sidecars injected never appear when using kubectl get pods, but the corresponding replica set from kubectl get replicaset exists.

Use the following steps to troubleshoot sidecar injection problems.

  1. Verify that your namespace or pod has the correct injection label.

    If you are running single-revision Istio (the default), verify that your namespace or pod spec have the istio-injection=enabled label.

    If you are running multiple-revision Istio (for zero-downtime migrations, multiple control planes, etc), verify that your namespace or pod spec have the appropriate label, where REVISION is the Cloud Service Mesh revision number on istiod that corresponds with your selected Cloud Service Mesh version. For more information about revision labels, see Injecting sidecar proxies.

  2. Verify that your istio sidecar injection webhook is present and has a CA bundle.

    The sidecar injector webhook (which is used for automatic sidecar injection) requires a CA bundle to establish secure connections with the API server and istiod. This CA bundle is patched into the configuration by istiod, but can sometimes be overwritten (for example, if you reapply the webhook configuration).

    You can verify the presence of the CA bundle using the following command. The command includes istio-sidecar-injector-asm-1214-5, which is specific to this version of Cloud Service Mesh. Ensure you use your actual revision if it differs.

    kubectl get istio-sidecar-injector-asm-1214-5 -o=jsonpath='{.webhooks[0].clientConfig.caBundle}'

    If the output is not empty, the CA bundle is configured. If the CA bundle is missing, restart istiod to cause it to rescan the webhook and reinstall the CA bundle.

  3. Check for sidecar injection failures.

    If you have injection enabled, but are not seeing pods scheduling, check the status of the next higher level of abstraction. For example, if you are running a deployment but no pods are scheduling, check the status of the corresponding replica sets using the following command:

    kubectl -n my-namespace describe replicaset your-deployment-name

    If the replica set is present, check the events log at the bottom of the description for errors. If the error relates to sidecar injection, check the istiod logs for an indication of what is causing the error.

  4. If the problem persists, the issue might be any of the following:

    • Bad configuration passed to the injector
    • Firewall configuration problems
    • A problem in the Istio code itself

    See Troubleshooting Istio for additional diagnostic steps.

Envoy proxies don't receive configuration from istiod

There are several issues that can prevent proxies from receiving configuration from istiod.

  1. istiod will not push configuration to the envoy proxies if it has problems, such as an RBAC issue preventing it from reading its configuration resource.

  2. Discovery address is incorrect ('no healthy upstream' errors)

  3. The discovery address provided to the sidecar injector being incorrect. If you see logs that mention gRPC config stream closed, no healthy upstream, check that the discovery address in the mesh ProxyConfig is correct and points to your istiod service.

  4. Invalid configuration being pushed to the proxy. In this case, configuration is successfully pushed to the proxy, but the configuration is invalid. You will see repeating messages similar to the following:

    Envoy proxy is NOT ready: config not received from Pilot (is Pilot running?): cds updates: 1 successful, 0 rejected; lds updates: 0 successful, 1 rejected

    In this example, cds is the Cluster Discovery Service (which reports 1 update pushed from istiod), and lds is the Listener Discovery Service (which reports 1 update rejected from istiod). Often you will see an earlier error message that explains the reason for the rejection, which usually starts with a warning about envoy configuration or similar.

    To fix the issue, investigate the cause of the rejected configuration. One common cause is bad EnvoyFilter resources. If no reason is obvious, submit a bug report with a configuration dump of the proxy.

Pod creation fails

If you observe that pods are not being created successfully, look for error messages that might give clues to the root problem, using the following command:

kubectl describe replicaset YOUR_REPLICA_SET

Common webhook error messages

Error messages output by the kubectl apply command can provide a hint about their root cause. See the following table for common error messages, their causes and potential resolutions.

Error message Cause Resolution
net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers) This might be a network connectivity issue. Ensure that your firewall rules provide connectivity to `istiod` on port 15017.
no endpoints available for service 'istiod' This can occur if the `istiod` pod is not available or not ready. Check the `istiod` pods to ensure they are running and ready.
Service "istiod" not found This can occur if the `istiod` service does not exist. Verify that your Istio installation was successful and correct.
x509: certificate signed by unknown authority This might be a webhook certificate issue. Check that caBundle is correctly set on the webhook.
Failed to update validatingwebhookconfiguration istio-validator-asm-[version-n]-istio-system (failurePolicy=Fail, resourceVersion=[version]): Operation cannot be fulfilled on "istio-validator-asm-[version-n]-istio-system": the object has been modified; please apply your changes to the latest version and try again. A validating webhook from an old version of Istio or Cloud Service Mesh that has been uninstalled may be interfering with an upgrade or install. Check that all webhooks still in the cluster and remove any webhook(s) that reference versions which are no longer installed.
Error from server (InternalError): Internal error occurred: failed calling webhook "": Post "https://istiod-asm-1122-0.istio-system.svc:443/inject?timeout=10s": context deadline exceeded For private clusters, port 15017 must be open. This error message indicates that port 15017 may not be open. Ensure that your firewall rules provide connectivity to Istiod on port 15017. For more information, see Opening a port on a private cluster.