Review service health and incidents


When your Google Kubernetes Engine (GKE) clusters or applications experience issues, it's crucial to quickly determine if the cause is internal or related to a wider Google Cloud service disruption. Spending time on local debugging is inefficient if the root cause is a known platform incident.

Use this page to determine if an issue with your GKE cluster is caused by a wider Google Cloud service disruption. Learn where to find official status updates, personalized health events, and service incident insights from the following sources:

  • Google Cloud Service Health: status information for Google Cloud services, by region.
  • Personalized Service Health: service disruptions relevant to your projects.
  • Service incident insights and recommendations: GKE clusters that are affected by an ongoing service incident.

This information is important for Platform admins and operators and Application developers who are troubleshooting and need to understand if observed issues are linked to a broader Google Cloud service health event. For more information about the common roles and example tasks that we reference in Google Cloud content, see Common GKE user roles and tasks.

Review Google Cloud service health

The Google Cloud Service Health page provides status information about the services that are part of Google Cloud.

To review incidents related to GKE, go to the Google Cloud Service Health page.

Go to all incidents reported for Google Kubernetes Engine

Review Personalized Service Health

Personalized Service Health lets you identify Google Cloud service disruptions that are relevant to your projects. These disruptions are called service health events, and information about them is available in the Google Cloud console and a variety of integration points.

To review incidents related to GKE that are relevant to your projects, view service health events in the Personalized Service Health dashboard in the Google Cloud console.

Go to Personalized Service Health

You can filter incidents by service, location, relevance, and status. The dashboard also provides incident details such as scope of impact, symptoms, workarounds, and resolution progress updates. To get started, see Quickstart: View service health events in the Google Cloud console.

Review service incident insights and recommendations

Service incident insights and recommendations let you identify GKE clusters that are impacted by an ongoing service incident.

To get service incident insights, view insights and recommendations for the GKE_RELIABILITY_INCIDENT subtype. You can get insights by using the Google Cloud console, the Google Cloud CLI, or the Recommender API. For more information, see View insights and recommendations.

Insights and recommendations include the following information:

  • Impacted cluster: a cluster that's impacted by the incident.
  • Incident name: an incident identifier for reference when you communicate with Cloud Customer Care.
  • Incident description: information about the incident from the incident response team.
  • Last effective time: the last time that information about the incident was updated.
  • Mitigation action: mitigation action that's recommended by the incident response team, if available.

The service incident insight remains visible until the Google Cloud incident response team mitigates the incident and determines that the insight is no longer relevant. There will be a delay between the time the incident is mitigated and no longer impacts your resources, and the time the insight is removed. If you implemented a workaround and no longer want to see the insight, you can dismiss it.

What's next