Cloud Service Mesh security best practices
This document describes best practices to establish and govern a secure Cloud Service Mesh configuration running on Google Kubernetes Engine (GKE). The guidance in the document goes beyond the settings used to configure and install Cloud Service Mesh and describes how you can use Cloud Service Mesh with other Google Cloud products and features to protect against the security threats that applications in a mesh may face.
The intended audience for this document includes administrators who manage policies in a Cloud Service Mesh and users who run services in a Cloud Service Mesh. The security measures described here are also useful for organizations that need to enhance the security of their service meshes to meet compliance requirements.
The document is organized as follows:
- Introduction
- Attack vectors and security risks
- Measures to protect a service mesh
- Security architecture
- Cluster security
- Mesh edge security
- Security for mesh administration and automation
- Workload security
- Security for sensitive user data and credentials
Introduction
Cloud Service Mesh provides features and tools that help you observe, manage, and secure services in a unified way. It takes an application-centric approach and uses trusted application identities rather than a network IP-focused approach. You can deploy a service mesh transparently without the need to modify existing application code. Cloud Service Mesh provides declarative control over network behavior, which helps to decouple the work of teams that are responsible for delivering and releasing application features from the responsibilities of administrators responsible for security and networking.
Cloud Service Mesh is based on the open source Istio service mesh, which enables sophisticated configurations and topologies. Depending on the structure of your organization, one or more teams or roles may be responsible for installing and configuring a mesh. The default Cloud Service Mesh settings are chosen to protect applications, but in some cases, you may need custom configurations or to grant exceptions by excluding certain apps, ports, or IP addresses from participating in a mesh. Having controls in place to govern mesh configurations and security exceptions is important.
Attack vectors and security risks
Attack vectors
Cloud Service Mesh security follows the zero trust security model which assumes security threats originate from both inside and outside of an organization's security perimeter. Examples of security attack types that may threaten applications in a service mesh include:
- Data exfiltration attacks. For example, attacks that eavesdrop on sensitive data or credentials from service-to-service traffic.
- Man-in-the-middle attacks. For example, a malicious service that masquerades as a legitimate service to obtain or modify the communication between services.
- Privilege escalation attacks. For example, attacks that use illicit access to elevated privileges to conduct operations in a network.
- Denial of service (DoS) attacks.
- Botnet attacks that try to compromise and manipulate services to launch attacks on other services.
The attacks can also be categorized based on the attack targets:
- Mesh internal network attacks. Attacks aimed at tampering, eavesdropping, or spoofing the mesh internal service-to-service or service-to-control-plane communication.
- Control plane attacks. Attacks aimed at causing the control plane to malfunction (such as a DoS attack), or exfiltrating sensitive data from the control plane.
- Mesh edge attacks. Attacks aimed at tampering, eavesdropping, or spoofing the communication at the mesh ingress or egress.
- Mesh operation attacks. Attacks aimed at the mesh operations. Attackers may try to obtain elevated privileges to conduct malicious operations in a mesh, such as modifying its security policies and workload images.
Security risks
Besides security attacks, a mesh also faces other security risks. The following list describes a few possible security risks:
- Incomplete security protection. A service mesh has not been configured with authentication and authorization policies to protect its security. For example, no authentication or authorization policies are defined for services in a mesh.
- Security policy exceptions. To accommodate their specific use cases, users may create security policy exceptions for certain traffic (internal or external) to be excluded from Cloud Service Mesh security policies. To securely handle such cases, please refer to the section Securely handle exceptions to policies.
- Neglect of image upgrades. Vulnerabilities may be discovered for the images used in a mesh. You need to keep the mesh component and workload images up-to-date with the latest vulnerability fixes.
- Lack of maintenance (no expertise or resources). The mesh software and policy configurations need regular maintenance to take advantage of the latest security protection mechanisms.
- Lack of visibility. Misconfiguration or insecure configurations of mesh policies and abnormal mesh traffic/operations are not brought to the attention of mesh administrators.
- Configuration drift. The configuration of policies in a mesh deviates from the source of truth.
Measures to protect a service mesh
This section presents an operating manual to secure service meshes.
Security architecture
The security of a service mesh depends on the security of the components at different layers of the mesh system and its applications. The high-level intention of the proposed Cloud Service Mesh security posture is to secure a service mesh through integrating multiple security mechanisms at different layers, which jointly achieve the overall system security under the zero-trust security model. The following diagram shows the proposed Cloud Service Mesh security posture.
Cloud Service Mesh provides security at multiple layers, including:
- Mesh edge security
- Cloud Service Mesh ingress security provides access control for external traffic and secures external access to the APIs exposed by the services in the mesh.
- Cloud Service Mesh egress security regulates the outbound traffic from internal workloads.
- Cloud Service Mesh User Auth integrates with Google infrastructure to authenticate external calls from web browsers to the services that run web applications.
- Cloud Service Mesh gateway certificate management protects and rotates the private keys and X.509 certificates used by Cloud Service Mesh ingress and egress gateways using Certificate Authority Service.
- Cloud Armor can defend against external distributed denial of service (DDoS) and Layer 7 attacks. It serves as a Web Application Firewall (WAF) to protect the mesh from network attacks. For example, injection and remote code execution attacks.
- VPC and VPC Service Controls protect the mesh edge through the private network access controls.
- Cluster security
- Cloud Service Mesh mutual TLS (mTLS) enforces workload-to-workload traffic encryption and authentication.
- Managed CA, such as Cloud Service Mesh certificate authority and Certificate Authority Service, securely provisions and manages certificates used by the workloads.
- Cloud Service Mesh authorization enforces access control for mesh services based on their identities and other attributes.
- GKE Enterprise security dashboard provides monitoring of the configurations of security policies and Kubernetes Network Policies for the workloads.
- Kubernetes Network Policy enforces Pod access control based on IP addresses, Pod labels, namespaces, and more.
- Control plane security defends against attacks on the control plane. This protection prevents attackers from modifying, exploiting, or leaking service and mesh configuration data.
- Workload security
- Stay up-to-date with Cloud Service Mesh security releases to ensure the Cloud Service Mesh binaries running in your mesh are free of publicly known vulnerabilities.
- Workload Identity Federation for GKE enables workloads to obtain credentials to securely call Google services.
- Cloud Key Management Service (Cloud KMS) secures sensitive data or credentials through Hardware Security Modules (HSM). For example, workloads can use Cloud KMS to store credentials or other sensitive data. CA Service—used to issue certificates to mesh workloads—supports per-customer and HSM-backed signing keys managed by Cloud KMS.
- Kubernetes CNI (Container Network Interface) prevents privilege escalation attacks by eliminating the need for a privileged Cloud Service Mesh init container.
- Operator security
- Kubernetes role-based access control (RBAC) restricts access to Kubernetes resources and confines operator permissions to mitigate attacks originating from malicious operators or operator impersonation.
- GKE Enterprise Policy Controller validates and audits policy configurations in the mesh to prevent misconfigurations.
- Google Cloud Binary Authorization ensures that the workload images in the mesh are the ones authorized by the administrators.
- Google Cloud Audit Logging audits mesh operations.
The diagram below shows the communication and configuration flows with the integrated security solutions in Cloud Service Mesh.
Cluster security
Enable strict mutual TLS
A man-in-the-middle (MitM) attack tries to insert a malicious entity between two communicating parties in order to eavesdrop on or manipulate the communication. Cloud Service Mesh defends against MitM and data exfiltration attacks by enforcing mTLS authentication and encryption for all communicating parties. Permissive mode uses mTLS when both sides support it, but allows connections without mTLS. By contrast, strict mTLS requires that traffic be encrypted and authenticated with mTLS and does not allow plain text traffic.
For more information, see Cloud Service Mesh by example: mTLS | Enforcing mesh-wide mTLS.
Enable access controls
Cloud Service Mesh security policies (such as authentication and authorization policies) should be enforced on all traffic in and out of the mesh unless there are strong justifications to exclude a service or Pod from Cloud Service Mesh security policies. In some cases, users may have legitimate reasons to bypass Cloud Service Mesh security policies for some ports and IP ranges. For example, to establish native connections with services not managed by Cloud Service Mesh. To secure Cloud Service Mesh under such use cases, please refer to Securely handle Cloud Service Mesh policy exceptions.
Service access control is critical in preventing unauthorized access to services. mTLS enforcement encrypts and authenticates a request but a mesh still needs Cloud Service Mesh authorization policies to enforce access control on services. For example, rejecting an unauthorized request coming from an authenticated client.
Cloud Service Mesh authorization policies provide a flexible way to configure access controls to defend your services against unauthorized access. Cloud Service Mesh authorization policies should be enforced based on the authenticated identities derived from the authentication results - mTLS or JSON Web Token (JWT) based authentications should be used together as part of Cloud Service Mesh authorization policies.
Enforce Cloud Service Mesh authentication policies
JSON Web Token (JWT)
In addition to mTLS authentication, mesh administrators can require a service to authenticate and authorize requests based on JWT. Cloud Service Mesh does not act as a JWT provider but authenticates JWTs based on the configured JSON web key set (JWKS) endpoints. JWT authentication can be applied to ingress gateways for external traffic or to internal services for in-mesh traffic. JWT authentication can be combined with mTLS authentication when a JWT is used as a credential to represent the end caller and the requested service requires proof that it is being called on behalf of the end caller. Enforcing JWT authentication defends against attacks that access a service without valid credentials and on behalf of a real end user.
Cloud Service Mesh user authentication
Cloud Service Mesh user authentication is an integrated solution for browser-based end-user authentication and access control to your workloads. It integrates a service mesh with existing Identity Providers (IdP) to implement a standard web-based OpenID Connect (OIDC) login and consent flow and uses Cloud Service Mesh authorization policies for access control.
Cloud Service Mesh authorization policies control:
- Who or what is allowed to access a service.
- Which resources can be accessed.
- Which operations can be conducted on the allowed resources.
Authorization policies are a versatile way to configure access control based on the actual identities that services run as, application layer (Layer 7) properties of traffic (for example request headers), and network layer (Layer 3 and Layer 4) properties like IP ranges and ports.
Cloud Service Mesh authorization policies should be enforced based on authenticated identities derived from the authentication results to defend against unauthorized access to services or data.
By default, access to a service should be denied unless an authorization policy is explicitly defined to allow access to the service. See Authorization Policy Best Practices for examples of authorization policies that deny access requests.
Authorization policies should restrict trust as much as possible. For example,
the access to a service can be defined based on individual URL paths exposed by
a service such that only a service A can access the path /admin
of a service B.
Authorization policies can be used together with Kubernetes Network Policies, which only operate at the network layer (Layer 3 and Layer 4) and control the network access for IP addresses and ports on Kubernetes Pods and Kubernetes namespaces.
Enforce token exchange for accessing mesh services
To defend against token replay attacks which steal tokens and re-use the stolen tokens to access mesh services, a token in a request from outside the mesh should be exchanged for a short-lived mesh-internal token at the mesh edge.
A request from outside the mesh to access a mesh service needs to include a token, such as JWT or cookie, in order to be authenticated and authorized by the mesh service. A token from outside the mesh may be long-lived. To defend against token replay attacks, a token from outside the mesh should be exchanged for a short-lived mesh-internal token with a limited scope at the ingress of the mesh. The mesh service authenticates a mesh-internal token and authorizes the access request based on the mesh-internal token.
Cloud Service Mesh supports
integration with Identity-Aware Proxy (IAP),
which generates a RequestContextToken
(a short-lived mesh-internal token
exchanged from an external token) used in Cloud Service Mesh for authorization. With
token exchange, attackers can not use a token stolen in the mesh to access
services. The limited scope and lifetime of the exchanged token greatly reduces
the chance of a token replay attack.
Securely handle Cloud Service Mesh policy exceptions
You may have special use cases for your service mesh. For example, you may need to expose a certain network port to plain text traffic. To accommodate specific usage scenarios, you may sometimes need to create exceptions to allow certain internal or external traffic to be excluded from Cloud Service Mesh security policies, which creates security concerns.
You may have legitimate reasons to bypass Cloud Service Mesh security policies for
some ports and IP ranges. You can add
annotations
(such as, excludeInboundPorts
, excludeOutboundPorts
,
excludeOutboundIPRanges
) to Pods to exclude traffic from being handled by the
Envoy sidecar. Besides annotations to exclude traffic, you may bypass the mesh
altogether by deploying an application with
sidecar injection disabled.
For example, by adding a label sidecar.istio.io/inject="false"
to the
application Pod.
Bypassing Cloud Service Mesh security policies has a negative impact on overall system security. For example, if Cloud Service Mesh mTLS and authorization policies are bypassed for a network port by means of annotations, there will be no access control for the traffic on the port and eavesdropping or traffic modification may be possible. Furthermore, bypassing Cloud Service Mesh policies also affects non-security policies, such as network policies.
When Cloud Service Mesh security policy is bypassed for a port or IP (either intentionally or unintentionally), there should be other security measures in place to secure the mesh and monitor security exceptions, potential security loopholes, and overall security enforcement status. To secure your mesh in such scenarios you can:
- Make sure traffic bypassing the sidecars is natively encrypted and authenticated to prevent MitM attacks.
- Enforce Kubernetes Network Policies to limit the connectivity of ports with policy exceptions (for example, limit a port with policy exceptions to only allow traffic from another service in the same namespace) or to only allow traffic to go through the ports with Cloud Service Mesh security policy enforced.
- Enforce GKE Enterprise Policy Controller to automatically validate Cloud Service Mesh policies. For example, enforce that the Cloud Service Mesh sidecars are always injected to workloads.
Enforce Kubernetes Network Policies
Cloud Service Mesh builds upon the underlying platform (for example, Kubernetes). Thus, Cloud Service Mesh security is dependent upon the security of the underlying platform. For example, without control over who can update Kubernetes resources, a user may change the Kubernetes deployment of a service to bypass the sidecar of the service.
To form a strong security posture for a service mesh, the security mechanisms of the underlying platform should be enforced to work jointly with the Cloud Service Mesh security policies.
Kubernetes Network Policies operate at the network Layer (L3 and L4) for IP addresses and ports on Kubernetes Pods and namespaces. Kubernetes network Policies can be enforced in conjunction with Cloud Service Mesh policies to enhance the security of the mesh.
For example, the mesh administrator can configure Kubernetes Network Policies to only allow traffic to use ports with Cloud Service Mesh security policy enforced. If all traffic must be enforced with Cloud Service Mesh mTLS, the administrator may configure a Kubernetes network policy to only allow traffic on ports that are configured with Cloud Service Mesh mTLS policy. The mesh administrator may also configure Kubernetes Network Policies to limit connectivity of ports with policy exceptions. For example, limit the connectivity of such ports to be within a namespace.
Secure control plane access
The Cloud Service Mesh control plane authenticates any clients that connect. Thus, only callers with valid credentials (Kubernetes JWT or X.509 certificates issued by allowed CAs) can access the Cloud Service Mesh control plane. TLS encrypts the connections between workloads and the Cloud Service Mesh control plane.
Besides the authentication mechanism, for in-cluster Cloud Service Mesh, Kubernetes network policies can be deployed to isolate the Cloud Service Mesh system namespace (by default istio-system) from unmanaged namespaces and clients outside of the mesh while allowing data planes to access the control plane. VPC firewall rules can prevent traffic outside of a cluster from reaching Istiod. With such network isolation measures, an attacker from outside of the mesh will be unable to access the control plane, even if the attacker has a valid credential. For managed control planes, Google handles the security for the control planes and such network isolation policies for control planes are not needed.
Enforce namespace boundaries
To prevent a user of one namespace from accessing/updating resources in an unauthorized namespace:
- Enforce access controls.
- Enforce Kubernetes Network Policies. If services in a namespace do not have traffic outside of the namespace, the mesh administrator should deploy a Kubernetes network policy that only allows traffic inside the namespace: no ingress or egress from the namespace.
- Enforce Kubernetes RBAC policies.
- The roles of application administrators should be bound to a namespace.
- Only allow mesh administrators to have ClusterRole.
Enforce Kubernetes RBAC policies
The mesh administrators should enforce Kubernetes RBAC policies to control who is allowed to access and update Kubernetes resources. Kubernetes access control can mitigate security risks in the mesh. For example, unauthorized users should not be allowed to change Kubernetes deployments and bypass the Cloud Service Mesh policy enforcements. The roles of a user should be bound to a namespace so the user is not allowed to access any more namespaces than they need access to. For detailed guides and examples of configuring RBAC, refer to Configure role-based access control. After enabling Workload Identity Federation for GKE, you can also allow a Kubernetes service account to act as an IAM service account.
Mesh edge security
Since most attacks may also originate from outside of a cluster, ensuring security at the edge of the mesh is critical.
Cluster ingress access control
Cloud Service Mesh receives incoming external traffic through the ingress gateway. Services exposed by the ingress gateway potentially face attacks from external sources. Security administrators should always ensure that the services exposed to external traffic via ingress gateways are sufficiently secure to defend against attacks.
Ingress should enforce authentication and authorization for services exposed to external callers.
- Enforce cluster ingress security policies. When the cluster needs to receive external traffic, the mesh administrator should enforce ingress security policies, including Cloud Service Mesh gateway TLS, authentication, and authorization policies, to authenticate external requests and verify that external requests are authorized to access services exposed by the ingress gateway. Enforcing ingress security policies defends against attacks from outside of the mesh that try to access a service without valid credentials or permissions.
- Use Cloud Armor to serve as a Web Application Firewall (WAF) to defend against web-based attacks (for example, injection attacks and remote execution attacks). For more information, see From edge to mesh: Exposing service mesh applications through GKE Ingress.
Regulate cluster egress traffic
Cluster egress security is critical for mesh security because egress security policies can defend against data exfiltration attacks, enforce filtering of egress traffic, and enforce TLS origination for egress traffic. Security administrators should regulate and audit cluster egress traffic.
Besides using VPC firewall walls to restrict egress traffic, the mesh administrators should also enforce egress security policies for the cluster and configure its outbound traffic to go through egress gateways.
Egress policies can mitigate the following attacks:
- Data exfiltration attacks.
- Service Pods can be exploited by attackers if their CVEs are not patched. Compromised Pods can become a botnet controlled by attackers to send spam or launch DoS attacks.
Authorization policies applied to egress gateways can ensure that only authorized services are allowed to send traffic to particular hosts outside of the mesh. Meanwhile, for traffic leaving the mesh, instead of handling TLS origination at individual sidecars, TLS can be originated at egress gateways. This provides a uniform and more secure way to originate TLS traffic because the client certificates for mTLS can be isolated from the namespaces where applications run.
Use private cluster or VPC Service Control to lock down external accesses
As well as enforcing ingress and egress security policies, lock down external access using private cluster or VPC Service Controls wherever possible. While security policies are controlled by the mesh security administrators, the private cluster configuration or VPC Service Controls can be enforced by organization security administrators.
VPC Service Controls can be enforced to define a security perimeter for the services in order to:
- Restrict services from accessing outside resources.
- Restrict outsiders from accessing the services in a security perimeter.
VPC Service Controls help defend against data exfiltration attacks and prevent external attackers from accessing services inside a mesh.
Defend against external DDoS attacks
External DDoS attacks may overload ingress gateways and backend services, preventing legitimate requests from being handled. Cloud Armor can be used to defend against DDoS attacks. Cloud Armor defends against not only network layer (L3 and L4) DDoS attacks but also application layer (L7) DDoS attacks.
Security for mesh administration and automation
It is important to consider security for administrative operations and any automation that you build around your mesh, for example CI/CD. The following practices aim to ensure that the mesh can be safely operated without the risk of exposing services to additional attacks.
Segment the roles used for mesh operations
Following the same principle as role-based access control, users of a mesh should be classified according to their roles. Each role should only be granted the minimum set of privileges needed by the role.
For example, the set of users that make service deployments should not have privileges for updating authentication and authorization policies.
There are different categories of operators. For example, cluster operators and namespace operators. It is important to prevent privilege escalation from an operator, which may result in illicit access to unauthorized resources.
Kubernetes RBAC policies allow mesh administrators to limit resource access to authorized users only.
Automatically validate policy configurations
Operators may accidentally misconfigure Cloud Service Mesh policies, which can result in serious security incidents. To prevent misconfiguration and automatically validate Cloud Service Mesh policies, mesh administrators can use Policy Controller to enforce constraints on policy configurations.
To avoid placing too much trust in individuals with permissions to update Cloud Service Mesh security policies and to automate the validation of Cloud Service Mesh policies, the mesh administrators should implement constraints on Cloud Service Mesh policies using Policy Controller.
Policy Controller is based on the open source
Gatekeeper project and can
either be run as a Kubernetes admission controller to deny invalid resources
from being applied or in audit mode so that administrators can be alerted to
violations. Policy Controller can automatically validate the deployment of
resources in the mesh, such as validating that the annotations on a deployment
do not bypass Cloud Service Mesh policies, validating that Cloud Service Mesh policies are
as expected, and validating that a deployment does not include root capabilities
(such as, NET_ADMIN
and NET_RAW
).
Policy Controller can also audit existing Cloud Service Mesh resources against constraints to detect policy misconfigurations.
The following are a few examples of GKE Enterprise Policy Controller enforcing security policies:
- Prevent Pods from running privileged containers.
- Only allow using images from specific repositories to prevent running unauthorized container images.
- Prohibit disabling TLS for all hosts and host subsets in Istio DestinationRules.
- Prohibit principals and namespaces in Istio AuthorizationPolicy rules from having a prefix from a specified list.
- Prohibit the creation of known resources that expose workloads to external IPs.
- Require Ingress resources to be HTTPS only.
- Require a read-only root file system on the container.
The constraint template library provided with Policy Controller contains a set of constraint templates that can be used with the out-of-the-box Cloud Service Mesh security constraints bundle to enforce specific Cloud Service Mesh security best practices, for example, authentication, authorization, and traffic policies. The following are a few example constraints included in the bundle:
- Enforce the mesh level strict mTLS PeerAuthentication.
- Enforce all PeerAuthentications cannot overwrite strict mTLS.
- Enforce the mesh level default deny AuthorizationPolicy.
- Enforce the AuthorizationPolicy safe patterns.
- Enforce the Cloud Service Mesh sidecars are always injected to workloads.
To handle exceptions and break-glass situations, the mesh administrator may:
- Exclude a namespace from Policy Controller's admission webhook enforcement, but any violations are still reported in audit.
- Set the Constraint spec.enforcementAction to dryrun. The admission webhook will not prevent changes, but any violations are still reported in audit.
- Add exemption logic into the Constraint Template (example).
Use a GitOps approach with Config Sync to prevent configuration drift
Configuration drift occurs when the configuration of policies in a mesh deviates from their source of truth. Config Sync can be used to prevent configuration drift.
Enforce Audit Logging and monitoring
The mesh administrators should monitor the following:
- Cloud Audit Logging
- Cloud Service Mesh Audit Logging
- Policy constraint Audit Logging
- Anthos Config Sync
- Access logs
- Service-level metrics
- Accessing traces
These observability resources can be used to verify that the security configuration is working as expected and monitor for any exceptions to security policy enforcement. For example, access that did not go through sidecars, access that did not have valid credentials but reached a service.
While open source observability software (for example, Prometheus) can be used with Cloud Service Mesh, we highly recommend using Google Cloud's operations suite (formerly Stackdriver). The built-in observability solution for Google Cloud provides logging, metric collection, monitoring, and alerting, which is fully managed and easy to use.
Protect the certificate authority for in-cluster certificates
By default Cloud Service Mesh uses a Google managed certificate authority (CA) called Cloud Service Mesh certificate authority.
If you are using the unmanaged Istio certificate authority (CA), which is hosted
as part of Istiod, the CA signing key is stored in a Kubernetes secret and is
accessible to operators who have access to the secret resource in the
istio-system
namespace. This is a risk, since an operator may be able to use
the CA key independently of Istiod's CA and potentially sign workload
certificates independently. There is also a risk that a self-managed CA signing
key may be accidentally leaked due to an operational error.
To protect the CA signing key, the mesh administrator can upgrade the mesh to use Mesh CA or Certificate Authority Service (CA Service), which are secured and managed by Google (such as, CA key rotation). Compared with Mesh CA, CA Service supports per-customer, HSM-backed signing keys through Cloud KMS backed by Cloud HSM.
Workload security
Workload security protects against attacks that compromise workload Pods and then use the compromised Pods to launch attacks against the cluster (for example, botnet attacks).
Restrict Pod privileges
A Kubernetes Pod may have privileges that impact other Pods on the node or the cluster. It is important to enforce security restrictions on workload Pods to prevent a compromised Pod from launching attacks against the cluster.
To enforce the least privilege principle for the workloads on a Pod:
- The services deployed in a mesh should run with as few privileges as possible.
- Kubernetes Pods running in privileged mode can manipulate network stacks and other kernel capabilities on the host. GKE Enterprise Policy Controller can be used to prevent Pods from running privileged containers.
- Cloud Service Mesh can be configured to use an init container to configure iptables traffic redirection to the sidecar. This requires the user making workload deployments to have privileges for deploying containers with NET_ADMIN and NET_RAW capabilities. To avoid the risk of running containers with elevated privileges, mesh administrators can instead enable the Istio CNI plugin for configuring traffic redirection to sidecars.
Secure container images
Attackers may launch attacks by exploiting vulnerable container images. Administrators should enforce Binary Authorization to verify the integrity of container images and ensure only trusted container images are deployed in the mesh.
Mitigate against mesh vulnerabilities
- Container Analysis. Container Analysis can scan and surface vulnerabilities on GKE workloads.
- CVE (Common Vulnerabilities and Exposures) handling. After a vulnerability is discovered in a container image, the mesh administrators should fix the vulnerability as soon as possible. For managed Cloud Service Mesh with managed data plane, Google automatically handles patching CVEs that impact the mesh images.
Use Workload Identity Federation for GKE to securely access Google services
Workload Identity Federation for GKE is the recommended way for mesh workloads to securely access Google services. The alternative of storing a service account key in a Kubernetes secret and using the service account key to access Google services is not as secure due to the risks of credential leakage, privilege escalation, information disclosure, and non-repudiation.
Monitor security status through security dashboard and telemetry
A service mesh may have security exceptions and potential loopholes. It is critical to surface and monitor the security status of a mesh, which includes the security policies enforced, security exceptions, and potential security loopholes in the mesh. GKE Enterprise security dashboard and telemetry can be used to surface and monitor the mesh security status.
Telemetry monitors the health and performance of services in a mesh, which enables mesh administrators to observe the behaviors of services (such as SLOs, abnormal traffic, service outage, topology).
GKE Enterprise security dashboard analyzes and visualizes the security policies applied to a workload in a service mesh, including access control policies (Kubernetes Network Policies, Binary Authorization policies, and service access control policies), and authentication policies (mTLS).
Security for sensitive user data and credentials
Sensitive user data or credentials can be vulnerable to attacks originating from Pods or malicious operations if they are stored in the cluster persistent storage, such using Kubernetes secrets or directly in Pods. They are also vulnerable to network attacks if they are transferred over the network for authentication to services.
- If possible, store sensitive user data and credentials in protected storage, such as Secret Manager and Cloud KMS.
- Designate separate namespaces for Kubernetes Pods that access sensitive data and define Kubernetes policies to make them inaccessible from other namespaces. Segment the roles used for operations and enforce namespace boundaries.
- Enforce token exchange to prevent the exfiltration of long-lived, highly-privileged tokens.
What's next
- Review best practices for using Cloud Service Mesh egress gateways on GKE clusters
- Configure transport security
- Update your authorization policies