Google Kubernetes Engine (GKE) provides many ways to help secure your workloads. Protecting workloads in GKE involves many layers of the stack, including the contents of your container image, the container runtime, the cluster network, and access to the cluster API server.
It's best to take a layered approach to protecting your clusters and workloads. You can apply the principle of least privilege to the level of access provided to your users and your application. In each layer, your organization might need to make different tradeoffs to allow the right level of flexibility and security to securely deploy and maintain your workloads. For example, some security settings might be too constraining for certain types of applications or use cases to function without significant refactoring.
This document provides an overview of each layer of your infrastructure, and shows how you can configure its security features to best suit your needs.
This document is for Security specialists who define, govern and implement policies and procedures to protect an organization's data from unauthorized access. To learn more about common roles and example tasks that we reference in Google Cloud content, see Common GKE Enterprise user roles and tasks.
Authentication and authorization
Kubernetes supports two types of authentication:
- User accounts are accounts that are known to Kubernetes, but are not
managed by Kubernetes - for example, you cannot create or delete them using
kubectl
. - Service accounts are accounts that are created and managed by Kubernetes, but can only be used by Kubernetes-created entities, such as pods.
In a GKE cluster, Kubernetes user accounts are managed by Google Cloud, and may be one of the following two types:
Once authenticated, you need to authorize these identities to create, read, update or delete Kubernetes resources.
Despite the similar names, Kubernetes service accounts and Google Cloud service accounts are different entities. Kubernetes service accounts are part of the cluster in which they are defined and are typically used within that cluster. By contrast, Google Cloud service accounts are part of a Google Cloud project, and can easily be granted permissions both within clusters and to Google Cloud project clusters themselves, as well as to any Google Cloud resource using Identity and Access Management (IAM). This makes Google Cloud service accounts more powerful than Kubernetes service accounts; in order to follow the security principle of least privilege, you should consider using Google Cloud service accounts only when their capabilities are required.
To configure more granular access to Kubernetes resources at the cluster level or within Kubernetes namespaces, you use Role-Based Access Control (RBAC). RBAC allows you to create detailed policies that define which operations and resources you allow users and service accounts to access. With RBAC, you can control access for Google Accounts, Google Cloud service accounts, and Kubernetes service accounts. To further simplify and streamline your authentication and authorization strategy for GKE, you should ensure that the legacy Attribute Based Access Control is disabled so that Kubernetes RBAC and IAM are the sources of truth.
For more information:
- Read the GKE RBAC documentation.
- Learn about supported authentication methods when connecting to the Kubernetes API server in Authenticating to the Kubernetes API server.
Control plane security
In GKE, the Kubernetes control plane components are managed and maintained by Google. The control plane components host the software that runs the Kubernetes control plane, including the API server, scheduler, controller manager and the etcd database where your Kubernetes configuration is persisted.
You can access the control plane using a DNS-based endpoint (recommended), IP-based endpoints, or both. If you use IP-based endpoints, you can protect the Kubernetes API server by using authorized networks and not enabling the external endpoint of the control plane. This lets you assign an internal IP address to the control plane and disable access on the external IP address. If you use a DNS-based endpoint, you can use IAM and VPC Service Controls to secure your control plane access with both identity and network-aware policies.
You can handle cluster authentication in Google Kubernetes Engine by using IAM as the identity provider. For information on authentication, see Authenticating to the Kubernetes API server.
Another way to help secure your control plane is to ensure that you are doing credential rotation on a regular basis. When credential rotation is initiated, the SSL certificates and cluster certificate authority are rotated. This process is automated by GKE and also ensures that your control plane IP address rotates.
For more information:
- Read more about control plane security.
- Read the Role-Based Access Control documentation.
- Follow the Credential Rotation guide.
Node security
GKE deploys your workloads on Compute Engine instances running in your Google Cloud project. These instances are attached to your GKE cluster as nodes. The following sections show you how to leverage the node-level security features available to you in Google Cloud.
Container-Optimized OS
By default, GKE nodes use Google's Container-Optimized OS as the operating system on which to run Kubernetes and its components. Container-Optimized OS implements several advanced features for enhancing the security of GKE clusters, including:
- Locked-down firewall
- Read-only filesystem where possible
- Limited user accounts and disabled root login
GKE Autopilot nodes always use Container-Optimized OS as the operating system.
Node upgrades
A best practice is to patch your OS on a regular basis. From time to time, security issues in the container runtime, Kubernetes itself, or the node operating system might require you to upgrade your nodes more urgently. When you upgrade your node, the node's software is upgraded to their latest versions.
GKE clusters support automatic upgrades. In Autopilot clusters, automatic upgrades are always enabled. You can also manually upgrade the nodes in a Standard cluster.
Protecting nodes from untrusted workloads
For clusters that run unknown or untrusted workloads, a good practice is to protect the operating system on the node from the untrusted workload running in a Pod.
For example, multi-tenant clusters such as software-as-a-service (SaaS) providers often execute unknown code submitted by their users. Security research is another application where workloads may need stronger isolation than nodes provide by default.
You can enable GKE Sandbox on your cluster to isolate untrusted workloads in sandboxes on the node. GKE Sandbox is built using gVisor, an open source project.
Securing instance metadata
GKE uses instance metadata from the underlying Compute Engine instances to provide nodes with credentials and configurations that are used to bootstrap nodes and to connect to the control plane. This metadata contains sensitive information that Pods on the node don't need access to, such as the node's service account key.
You can lock down sensitive instance metadata paths by using
Workload Identity Federation for GKE.
Workload Identity Federation for GKE enables the
GKE metadata server
in your cluster, which filters requests to sensitive fields such as kube-env
.
Workload Identity Federation for GKE is always enabled in Autopilot clusters. In Standard clusters, Pods have access to instance metadata unless you manually enable Workload Identity Federation for GKE.
Network security
Most workloads running in GKE need to communicate with other services that could be running either inside or outside of the cluster. You can use several different methods to control what traffic is allowed to flow through your clusters and their Pods.
Limiting Pod-to-Pod communication
By default, all Pods in a cluster can be reached over the network via their Pod IP address. Similarly, by default, egress traffic allows outbound connections to any address accessible in the VPC into which the cluster was deployed.
Cluster administrators and users can lock down the ingress and egress connections created to and from the Pods in a namespace by using network policies. By default, when there are no network policies defined, all ingress and egress traffic is allowed to flow into and out of all Pods. Network policies allow you to use tags to define the traffic flowing through your Pods.
Once a network policy is applied in a namespace, all traffic is dropped to and from Pods that don't match the configured labels. As part of your creation of clusters and/or namespaces, you can apply the default deny traffic to both ingress and egress of every Pod to ensure that all new workloads added to the cluster must explicitly authorize the traffic they require.
For more information:
- Read more about network policies
- Follow the network policy tutorial
- Read more about default policies
Filtering load balanced traffic
To load balance your Kubernetes Pods with a
network load balancer,
you need to create a Service of type LoadBalancer
that matches your Pod's
labels. With the Service created, you will have an external-facing IP that maps
to ports on your Kubernetes Pods. Filtering authorized traffic is achieved at
the node level by
kube-proxy, which
filters based on IP address.
To configure this filtering, you can use the
loadBalancerSourceRanges
configuration of the Service object. With this
configuration parameter, you can provide a list of CIDR ranges that you would
like to allow for access to the Service. If you do not configure
loadBalancerSourceRanges
, all addresses are allowed to access the Service via
its external IP.
For cases in which external access to the Service is not required, consider using an
internal load balancer.
The internal load balancer also respects the loadBalancerSourceRanges
when it
is necessary to filter out traffic from inside of the VPC.
For more information, follow the internal load balancing tutorial.
Securing your workloads
Kubernetes allows users to quickly provision, scale, and update container-based workloads. This section describes tactics that administrators and users can employ to limit the effect a running container can have on other containers in the same cluster, the nodes where containers can run, and the Google Cloud services enabled in users' projects.
Limiting Pod container process privileges
Limiting the privileges of containerized processes is important for the overall security of your cluster. GKE Autopilot clusters always restrict specific privileges, as described in Autopilot security capabilities.
GKE also allows you to set security-related options via the Security Context on both Pods and containers. These settings allow you to change security settings of your processes like:
- User and group to run as
- Available Linux capabilities
- Ability to escalate privileges
To enforce these restrictions at the cluster level rather than at the Pod or container levels, use the PodSecurityAdmission controller. Cluster administrators can use PodSecurityAdmission to ensure that all Pods in a cluster or namespace adhere to a pre-defined policy in the Pod Security Standards. You can also set custom Pod security policies at the cluster level by using Gatekeeper.
The GKE node operating systems, both Container-Optimized OS and Ubuntu, apply the default Docker AppArmor security policies to all containers started by Kubernetes. You can view the profile's template on GitHub. Among other things, the profile denies the following abilities to containers:
- Write files directly in
/proc/
- Write to files that are not in a process ID directory (
/proc/<number>
) - Write to files in
/proc/sys
other than/proc/sys/kernel/shm*
- Mount filesystems
For more information:
- Read the Pod Security Context documentation.
- Learn more about existing protections in the Container-Optimized OS AppArmor documentation.
Giving Pods access to Google Cloud resources
Your containers and Pods might need access to other resources in Google Cloud. There are three ways to do this.
Workload Identity Federation for GKE (recommended)
The most secure way to authorize Pods to access Google Cloud resources is with Workload Identity Federation for GKE. Workload Identity Federation for GKE allows a Kubernetes service account to run as an IAM service account. Pods that run as the Kubernetes service account have the permissions of the IAM service account.
Workload Identity Federation for GKE can be used with GKE Sandbox.
Node service account
In Standard clusters, your Pods can also authenticate to Google Cloud using the credentials of the service account used by the node's Compute Engine virtual machine (VM).
This approach is not compatible with GKE Sandbox because GKE Sandbox blocks access to the Compute Engine metadata server.
Service Account JSON key (not recommended)
You can grant credentials for Google Cloud resources to applications by using the service account key. This approach is strongly discouraged because of the difficulty of securely managing account keys.
If you choose this method, use custom IAM service accounts for each application so that applications have the minimal necessary permissions. Grant each service account the minimum IAM roles that are needed for its paired application to operate successfully. Keeping the service accounts application-specific makes it easier to revoke access in the case of a compromise without affecting other applications. After you have assigned your service account the correct IAM roles, you can create a JSON service account key, and then mount the key into your Pod using a Kubernetes Secret.
Using Binary Authorization
Binary Authorization is a service on Google Cloud that provides software supply-chain security for applications that run in the cloud. Binary Authorization works with images that you deploy to GKE from Artifact Registry or another container image registry.
With Binary Authorization enforcement, you can ensure that internal processes that safeguard the quality and integrity of your software have successfully completed before an application is deployed to your production environment. For instructions about creating a cluster with Binary Authorization enabled, visit Creating a cluster in the Binary Authorization documentation.
With Binary Authorization continuous validation (CV), you can ensure that container images associated with Pods are regularly monitored to ensure that they conform to your evolving internal processes.
Audit logging
Audit logging provides a way for administrators to retain, query, process, and alert on events that occur in your GKE environments. Administrators can use the logged information to do forensic analysis, real-time alerting, or for cataloging how a fleet of GKE clusters are being used and by whom.
By default, GKE logs Admin Activity logs. You can optionally also log Data Access events, depending on the types of operations you are interested in inspecting.
For more information:
- Follow the GKE audit logging tutorial.
- Read more about Cloud Audit Logs.
Built-in security measures
GKE enforces specific restrictions on what you can do to system objects in your clusters. When you perform an operation like patching a workload, an admission webhook named GKE Warden validates your request against a set of restricted operations and decides whether to allow the request.
Autopilot cluster security measures
Autopilot clusters apply multiple security settings based on our expertise and industry best practices. For details, see Security measures in Autopilot.
Standard cluster security measures
Standard clusters are more permissive by default than Autopilot clusters. GKE Standard clusters have the following security settings:
- You can't update the ServiceAccount used by GKE-managed system
workloads, such as workloads in the
kube-system
namespace. - You can't bind the
cluster-admin
default ClusterRole to thesystem:anonymous
,system:unauthenticated
, orsystem:authenticated
groups.