This page shows you how to troubleshoot issues with Google Kubernetes Engine (GKE) service accounts.
Grant the required role for GKE to node service accounts
The IAM service accounts that your GKE nodes use
must have all of the permissions that are included in the
Kubernetes Engine Default Node Service Account
(roles/container.defaultNodeServiceAccount
) IAM role. If
a GKE node service account is missing one or more of these
permissions, GKE can't perform system tasks like the following:
- Send system and application logs from nodes to Cloud Logging.
- Send system and application metrics from nodes to Cloud Monitoring.
- Operate the performance profile of the Horizontal Pod Autoscaler.
Node service accounts might not have certain required permissions for reasons like the following:
- The organization enforces the
iam.automaticIamGrantsForDefaultServiceAccounts
organization policy constraint, which prevents Google Cloud from automatically granting IAM roles to default IAM service accounts. - The IAM role that you grant to custom node
service accounts doesn't include all of the required permissions that are
included in the
roles/container.defaultNodeServiceAccount
role.
If your node service account is missing the permissions that GKE requires, you might see errors and notices like the following:
- In the Google Cloud console, on the Kubernetes clusters page, a Grant critical permissions error message appears in the Notifications column for a specific cluster.
In the Google Cloud console, on the cluster details page for a specific cluster, the following error message appears:
Grant roles/container.defaultNodeServiceAccount role to Node service account to allow for non-degraded operations.
In Cloud Audit Logs, Admin Activity logs for Google Cloud APIs like
monitoring.googleapis.com
have the following values if the corresponding permissions to access those APIs are missing from the node service account:- Severity:
ERROR
- Message:
Permission denied (or the resource may not exist)
- Severity:
Logs for specific nodes are missing from Cloud Logging and the Pod logs for the logging agent on those nodes show
401
errors. To get these Pod logs, run the following command:[[ $(kubectl logs -l k8s-app=fluentbit-gke -n kube-system -c fluentbit-gke | grep -cw "Received 401") -gt 0 ]] && echo "true" || echo "false"
If the output is
true
, then the system workload is experiencing401
errors, which indicate a lack of permissions.
To resolve this issue, grant the Kubernetes Engine Default Node Service Account
(roles/container.defaultNodeServiceAccount
) role on the project to the service
account that's causing the errors. Select one of the following options:
console
To find the name of the service account that your nodes use, do the following:
Go to the Kubernetes clusters page:
In the cluster list, click the name of the cluster that you want to inspect.
Find the name of the node service account. You need this name later.
- For Autopilot mode clusters, in the Security section, find the Service account field.
- For Standard mode clusters, do the following:
- Click the Nodes tab.
- In the Node pools table, click a node pool name. The Node pool details page opens.
- In the Security section, find the Service account field.
If the value in the Service account field is
default
, your nodes use the Compute Engine default service account. If the value in this field is notdefault
, your nodes use a custom service account.
To grant the Kubernetes Engine Default Node Service Account
role to the
service account, do the following:
Go to the Welcome page:
In the Project number field, click
Copy to clipboard.Go to the IAM page:
Click
Grant access.In the New principals field, specify the name of your node service account. If your nodes use the default Compute Engine service account, specify the following value:
PROJECT_NUMBER-compute@developer.gserviceaccount.com
Replace
PROJECT_NUMBER
with the project number that you copied.In the Select a role menu, select the Kubernetes Engine Default Node Service Account role.
Click Save.
To verify that the role was granted, do the following:
- In the IAM page, click the View by roles tab.
- Expand the Kubernetes Engine Default Node Service Account section. A list of principals that have this role is displayed.
- Find your node service account in the list of principals.
gcloud
Find the name of the service account that your nodes use:
- For Autopilot mode clusters, run the following command:
gcloud container clusters describe CLUSTER_NAME \ --location=LOCATION \ --flatten=autoscaling.autoprovisioningNodePoolDefaults.serviceAccount
- For Standard mode clusters, run the following command:
gcloud container clusters describe CLUSTER_NAME \ --location=LOCATION \ --format="table(nodePools.name,nodePools.config.serviceAccount)"
If the output is
default
, your nodes use the Compute Engine default service account. If the output is notdefault
, your nodes use a custom service account.Find your Google Cloud project number:
gcloud projects describe PROJECT_ID \ --format="value(projectNumber)"
Replace
PROJECT_ID
with your project ID.The output is similar to the following:
12345678901
Grant the
roles/container.defaultNodeServiceAccount
role to the service account:gcloud projects add-iam-policy-binding PROJECT_ID \ --member="SERVICE_ACCOUNT_NAME" \ --role="roles/container.defaultNodeServiceAccount"
Replace
SERVICE_ACCOUNT_NAME
with the name of the service account, which you found in the previous step. If your nodes use the Compute Engine default service account, specify the following value:serviceAccount:PROJECT_NUMBER-compute@developer.gserviceaccount.com
Replace
PROJECT_NUMBER
with the project number from the previous step.Verify that the role was granted successfully:
gcloud projects get-iam-policy PROJECT_ID \ --flatten="bindings[].members" --filter=bindings.role:roles/container.defaultNodeServiceAccount \ --format='value(bindings.members)'
The output is the name of your service account.
Identify node service accounts that don't have required permissions
The following sections describe how to identify node service accounts that don't have the required permissions for GKE.
Identify clusters that have node service accounts with missing permissions
Use GKE recommendations of the NODE_SA_MISSING_PERMISSIONS
recommender subtype to
identify Autopilot and Standard clusters that have node
service accounts with missing permissions. Recommender identifies
only clusters that were created on or after January 1, 2024. To find and fix the
missing permissions by using Recommender, do the following:
Find active recommendations in your project for the
NODE_SA_MISSING_PERMISSIONS
recommender subtype:gcloud recommender recommendations list \ --recommender=google.container.DiagnosisRecommender \ --location LOCATION \ --project PROJECT_ID \ --format yaml \ --filter="recommenderSubtype:NODE_SA_MISSING_PERMISSIONS"
Replace the following:
LOCATION
: the location to find recommendations in.PROJECT_ID
: your Google Cloud project ID.
The output is similar to the following, which indicates that a cluster has a node service account with missing permissions:
associatedInsights: # lines omitted for clarity recommenderSubtype: NODE_SA_MISSING_PERMISSIONS stateInfo: state: ACTIVE targetResources: - //container.googleapis.com/projects/12345678901/locations/us-central1/clusters/cluster-1
It might take up to 24 hours for the recommendation to appear. For detailed instructions, see view insights and recommendations.
For every cluster that's in the output of the previous step, find the associated node service accounts and grant the required role to those service accounts. For details, see the instructions in the Grant node service accounts the required role for GKE section.
After you grant the required role to the identified node service accounts, the recommendation might persist for up to 24 hours unless you manually dismiss it.
Identify all node service accounts with missing permissions
You can run a script that searches node pools in your project's Standard clusters
for any node service accounts that don't have the required permissions for GKE.
This script uses the gcloud CLI and the
jq
utility. To view the script, expand the following section:
View the script
#!/bin/bash
# Set your project ID
project_id=PROJECT_ID
project_number=$(gcloud projects describe "$project_id" --format="value(projectNumber)")
declare -a all_service_accounts
declare -a sa_missing_permissions
# Function to check if a service account has a specific permission
# $1: project_id
# $2: service_account
# $3: permission
service_account_has_permission() {
local project_id="$1"
local service_account="$2"
local permission="$3"
local roles=$(gcloud projects get-iam-policy "$project_id" \
--flatten="bindings[].members" \
--format="table[no-heading](bindings.role)" \
--filter="bindings.members:\"$service_account\"")
for role in $roles; do
if role_has_permission "$role" "$permission"; then
echo "Yes" # Has permission
return
fi
done
echo "No" # Does not have permission
}
# Function to check if a role has the specific permission
# $1: role
# $2: permission
role_has_permission() {
local role="$1"
local permission="$2"
gcloud iam roles describe "$role" --format="json" | \
jq -r ".includedPermissions" | \
grep -q "$permission"
}
echo "--- 1. List all service accounts in all GKE node pools"
printf "%-60s| %-40s| %-40s| %-10s| %-20s\n" "service_account" "project_id" "cluster_name" "cluster_location" "nodepool_name"
while read cluster; do
cluster_name=$(echo "$cluster" | awk '{print $1}')
cluster_location=$(echo "$cluster" | awk '{print $2}')
while read nodepool; do
nodepool_name=$(echo "$nodepool" | awk '{print $1}')
while read nodepool_details; do
service_account=$(echo "$nodepool_details" | awk '{print $1}')
if [[ "$service_account" == "default" ]]; then
service_account="${project_number}-compute@developer.gserviceaccount.com"
fi
if [[ -n "$service_account" ]]; then
printf "%-60s| %-40s| %-40s| %-10s| %-20s\n" $service_account $project_id $cluster_name $cluster_location $nodepool_name
all_service_accounts+=( ${service_account} )
else
echo "cannot find service account" for node pool "$project_id\t$cluster_name\t$cluster_location\t$nodepool_details"
fi
done <<< "$(gcloud container node-pools describe "$nodepool_name" --cluster "$cluster_name" --zone "$cluster_location" --project "$project_id" --format="table[no-heading](config.serviceAccount)")"
done <<< "$(gcloud container node-pools list --cluster "$cluster_name" --zone "$cluster_location" --project "$project_id" --format="table[no-heading](name)")"
done <<< "$(gcloud container clusters list --project "$project_id" --format="value(name,location)")"
echo "--- 2. Check if service accounts have permissions"
unique_service_accounts=($(echo "${all_service_accounts[@]}" | tr ' ' '\n' | sort -u | tr '\n' ' '))
echo "Service accounts: ${unique_service_accounts[@]}"
printf "%-60s| %-40s| %-40s| %-20s\n" "service_account" "has_logging_permission" "has_monitoring_permission" "has_performance_hpa_metric_write_permission"
for sa in "${unique_service_accounts[@]}"; do
logging_permission=$(service_account_has_permission "$project_id" "$sa" "logging.logEntries.create")
time_series_create_permission=$(service_account_has_permission "$project_id" "$sa" "monitoring.timeSeries.create")
metric_descriptors_create_permission=$(service_account_has_permission "$project_id" "$sa" "monitoring.metricDescriptors.create")
if [[ "$time_series_create_permission" == "No" || "$metric_descriptors_create_permission" == "No" ]]; then
monitoring_permission="No"
else
monitoring_permission="Yes"
fi
performance_hpa_metric_write_permission=$(service_account_has_permission "$project_id" "$sa" "autoscaling.sites.writeMetrics")
printf "%-60s| %-40s| %-40s| %-20s\n" $sa $logging_permission $monitoring_permission $performance_hpa_metric_write_permission
if [[ "$logging_permission" == "No" || "$monitoring_permission" == "No" || "$performance_hpa_metric_write_permission" == "No" ]]; then
sa_missing_permissions+=( ${sa} )
fi
done
echo "--- 3. List all service accounts that don't have the above permissions"
if [[ "${#sa_missing_permissions[@]}" -gt 0 ]]; then
printf "Grant roles/container.defaultNodeServiceAccount to the following service accounts: %s\n" "${sa_missing_permissions[@]}"
else
echo "All service accounts have the above permissions"
fi
This script only applies to GKE Standard mode node pools. If you use GKE Autopilot mode, see the Identify clusters that have node service accounts with missing permissions section.
After you identify the names of the service accounts with missing permissions, grant them the required role. For details, see the instructions in the Grant node service accounts the required role for GKE section.
Restore the default service account to your Google Cloud project
GKE's default service account, container-engine-robot
, can
accidentally become unbound from a project. The
Kubernetes Engine Service Agent role
(roles/container.serviceAgent
) is an Identity and Access Management (IAM) role
that grants the service account the permissions to manage cluster resources. If
you remove this role binding from the service account, the default service
account becomes unbound from the project, which can prevent you from deploying
applications and performing other cluster operations.
To see if the service account is removed from your project, you can use the Google Cloud console or Google Cloud CLI.
Console
In the Google Cloud console, go to the IAM & Admin page.
gcloud
Run the following command:
gcloud projects get-iam-policy PROJECT_ID
Replace
PROJECT_ID
with your project ID.
If the dashboard or the command doesn't display container-engine-robot
among
your service accounts, the role is unbound.
To restore the Kubernetes Engine Service Agent role
(roles/container.serviceAgent
) binding, run the following commands:
PROJECT_NUMBER=$(gcloud projects describe "PROJECT_ID" \
--format 'get(projectNumber)') \
gcloud projects add-iam-policy-binding PROJECT_ID \
--member "serviceAccount:service-${PROJECT_NUMBER?}@container-engine-robot.iam.gserviceaccount.com" \
--role roles/container.serviceAgent
Confirm that the role binding is restored:
gcloud projects get-iam-policy $PROJECT_ID
If you see the service account name along with the container.serviceAgent
role, the role binding is restored. For example:
- members:
- serviceAccount:service-1234567890@container-engine-robot.iam.gserviceaccount.com
role: roles/container.serviceAgent
Enable the Compute Engine default service account
The service account used for the node pool is usually the Compute Engine default service account. If this default service account is deactivated, your nodes might fail to register with the cluster.
To see if the service account is deactivated in your project, you can use the Google Cloud console or gcloud CLI.
Console
In the Google Cloud console, go to the IAM & Admin page.
gcloud
- Run the following command:
gcloud iam service-accounts list --filter="NAME~'compute' AND disabled=true"
If the service account is deactivated, run the following command to enable the service account:
gcloud iam service-accounts enable PROJECT_ID-compute@developer.gserviceaccount.com
Replace PROJECT_ID
with your project ID.
For more information, see Troubleshoot node registration.
Error 400/403: Missing edit permissions on account
If your service account is deleted, you might see a missing edit permissions error. To learn how to troubleshoot this error, see Error 400/403: Missing edit permissions on account.
What's next
If you need additional assistance, reach out to
Cloud Customer Care.