Pod startup time
This section explains common Cloud Service Mesh problems with pod startup time and how to resolve them. If you need additional assistance, see Getting support.
Pod startup and Envoy configuration synchronization
A common issue observed during pod startup in some Cloud Service Mesh and Istio environments involves the synchronization between application readiness and Envoy proxy configuration. The problem arises from the concurrent startup of the application container and the Envoy sidecar. The application may signal readiness before the Envoy proxy has completed its initialization and received its configuration from the control plane. This creates a race condition where incoming requests are directed to an unconfigured Envoy proxy that is not ready to receive any traffic. This can lead to requests being dropped, misrouted, or handled incorrectly during the initial phase of application startup, since there is no sidecar to forward any traffic.
Mitigation strategies
The following sections describe methods that can mitigate this issue.
Global mesh configuration: holdApplicationUntilProxyStarts
The first option is to set holdApplicationUntilProxyStarts: true
in the Istio
mesh configuration. Note that it is off by default. The flag adds hooks to delay
application startup until the pod's proxy is ready to accept traffic.
Adding the configuration eliminates this race but may introduce a delay to application readiness times for new pods if it was not previously enabled.
Readiness probes
Another solution is to implement readiness probes that incorporate both application and Envoy health checks. Readiness probes inform Kubernetes when a pod is ready to accept traffic. Critically, the readiness probe logic should verify not only the application's readiness but also the Envoy proxy's status. This can be achieved by querying Envoy's administrator port for its health status. By combining both checks, Kubernetes prevents traffic from being directed to the pod until both the application and Envoy are fully initialized and configured.
This approach is more flexible and allows for more complicated startup and readiness logic, but it comes at the cost of more complexity.
Create the file healthcheck.sh
from the following code:
#!/bin/sh
APP_HEALTH=$(curl -s -o /dev/null -w "%{http_code}" \
http://localhost:8080/health)
ENVOY_HEALTH=$(curl -s -o /dev/null -w "%{http_code}" \
http://localhost:9901/ready)
if [[ "$APP_HEALTH" -eq 200 && "$ENVOY_HEALTH" -eq 200 ]]; then
exit 0
else
exit 1
fi
Replace the IP/ports with your application container and Envoy's.
The following YAML file defines a readiness probe that uses the script you previously create:
apiVersion: v1
kind: Pod
metadata:
name: my-app-with-envoy
spec:
containers:
- name: application
<…>
readinessProbe:
initialDelaySeconds: 15
periodSeconds: 10
failureThreshold: 3
exec:
command:
- /healthcheck.sh # using the script
- name: envoy
<…>