Best practices for using fleet Workload Identity

As you know from Authenticate to APIs and services from fleet workloads, fleet-wide Workload Identity Federation is a powerful fleet feature that makes it simpler to set up authentication to Google Cloud for your applications across projects. However, it can have access control considerations over and above those for regular Workload Identity Federation for GKE. This guide provides some examples of these potential issues and how to organize your fleets to minimize possible risk.

Before reading this guide, you should be familiar with the concepts described in Authenticate to APIs and services from fleet workloads.

For best practices around adopting other fleet features, see Plan fleet features.

Fleet and project identity pools

To understand why fleet-wide Workload Identity Federation requires careful adoption particularly when working with multi-project fleets, let's take a closer look at how regular Workload Identity Federation for GKE and fleet Workload Identity Federation work. In both cases, workloads authenticate using short-lived tokens generated by the clusters, with each cluster added as an identity provider to a special workload identity pool. Workloads running in a specific namespace can then share the same IAM identity across clusters.

Here's what happens with regular Workload Identity Federation for GKE when it's enabled for your clusters. Note that Workload Identity Federation for GKE is enabled for Autopilot clusters by default.

  1. GKE creates a Google-managed workload identity pool in the cluster's project: PROJECT_ID.svc.goog.id.
  2. GKE adds the clusters as identity providers to the pool.
  3. As a result, workloads running in a specific namespace share the same IAM identity across clusters within a project. The identity is in this form: serviceAccount:PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME].

Fleet-wide Workload Identity Federation is automatically enabled when you add a cluster with Workload Identity Federation for GKE enabled to a fleet, including Autopilot clusters, Standard clusters with the feature explicitly enabled, and GKE clusters outside Google Cloud.

Here's what happens when a user registers a cluster with Workload Identity Federation for GKE enabled to a fleet:

  1. The Google-managed fleet-wide workload identity pool FLEET_PROJECT_ID.svc.goog.id in the fleet host project is created, if the pool doesn't already exist. This is the same as the project workload identity pool for the fleet host project.
  2. The cluster is added as an identity provider to the pool.
  3. As a result, workloads running in a specific namespace share the same IAM identity across clusters within the fleet. We refer to this as implicit sameness of fleet workload identities. The identity is in this form: serviceAccount:FLEET_PROJECT_ID.svc.id.goog[K8S_NAMESPACE/KSA_NAME]. Fleet workloads in different projects can then call Google APIs using the same identity for authentication.

As this suggests, if the fleet only includes clusters from one project, and they are all registered to the fleet, then the result is the same as if you were just using Workload Identity Federation for GKE without fleets: all the clusters are identity providers in the project-wide workload identity pool, and workloads use the same identities they would use with Workload Identity Federation for GKE. However, when the fleet has member clusters in multiple projects, fleet Workload Identity Federation combines the per-project identity pools into a single fleet-wide identity pool, hosted in the fleet host project.

As you'll see in the following examples, some complexities can occur where there is only a partial overlap between the set of clusters in a project and the set of clusters in that project that are members of a fleet.

Scenario 1: Single project fleet with all clusters registered

In this scenario, all the fleet's member clusters are in the fleet host project, and all the clusters in that project are members of the fleet.

Diagram showing a project with all clusters in the same fleet

As described in the previous section, using fleet-wide Workload Identity Federation in this scenario is the same as using regular Workload Identity Federation for GKE, and there is no additional risk.

Scenario 2: Single project fleet with some clusters registered

In this scenario a fleet contains two clusters, both in the fleet host project Project 1. The fleet host project also contains a third cluster that is not a fleet member, which has Workload Identity Federation for GKE enabled.

Diagram showing a project with some clusters in the same fleet.

This means that:

  • Clusters 1, 2, and 3 are added by GKE to the project workload identity pool project-1.svc.goog.id.
  • Clusters 1 and 2 are added by the fleet to the fleet workload identity pool, which (as this is the fleet host project) is also the project workload identity pool project-1.svc.goog.id.

The administrator would like to grant permissions to workloads running in a namespace in all clusters across the fleet. They use serviceAccount:project-1.svc.goog.id[namespace/ksa] as the identity to grant access. However, workloads in that namespace on Cluster 3, which is not part of the fleet, now share the same access. This is because Cluster 3 is in the project workload identity pool, which (because this is the fleet host project) is the same as the fleet workload identity pool. In other words, the administrator might intend to give permissions only to clusters in a fleet, but given the implementation of fleet Workload Identity Federation, non-fleet clusters might also gain access.

Possible mitigation

One possible solution here is to create a dedicated project to host the fleet with no clusters in it, enforced by Custom Org Policy on the container API. This provides a clear separation of the fleet workload identity pool trust domain from the GKE project-level trust domains.

Diagram showing two projects, one with clusters, one acting as the fleet host project

The administrator can then pick the appropriate trust domain when granting permissions to workloads. For example, they can use serviceAccount:project-0.svc.goog.id[namespace/ksa] to grant permissions to a namespaced workload across the fleet. The non-fleet member Cluster 3 is not part of that workload identity pool in this setup, and so doesn't gain access.

This solution works for clusters on Google Cloud and attached clusters.

Scenario 3: Multi-project fleet with some clusters registered

In this scenario a fleet has members from two projects, Project 1 and Project 2.

Diagram showing a fleet with clusters from two projects.

The administrator would like to grant permissions to workloads running in a namespace in all clusters in Project 1, using regular Workload Identity Federation for GKE. However, because Cluster 4 is registered to the fleet, and Project 1 is the fleet host project, workloads on Cluster 4 in Project 2 will also get the same permissions.

Possible mitigation

As in the previous scenario, a possible mitigation here is to create a dedicated fleet host project with no clusters in it. Again, this lets the administrator distinguish between identities from the fleet identity pool and each cluster's project identity pool when setting up access control.

Scenario 4: Consider namespace sameness

Workload identity pools are not the only potential area for confusion when using Workload Identity Federation. As you know from Plan fleet features, many fleet features including fleet Workload Identity Federation use the assumption of namespace sameness to simplify configuration and management across the fleet. This means that the feature treats namespaces with the same name across multiple fleet member clusters as if they were the same namespace. In this example, an administrator has granted permissions to workloads in the NS1 namespace running on fleet member clusters Cluster 1 and Cluster 2.

Diagram showing clusters from two projects with the same namespace.

However, a user has (accidentally or maliciously) created a namespace with the same name on another fleet member cluster. Because of the assumption of namespace sameness, workloads in that namespace automatically gain the same privileges as legitimate NS1 workloads in Cluster 1 and Cluster 2.

Possible mitigation

Set permissions so that only a small group of trusted roles can create namespaces in clusters.