This page describes how you can fine-tune your Google Kubernetes Engine (GKE) deployments to optimize performance and reliability by using Gemini Cloud Assist, an AI-powered collaborator for Google Cloud. Gemini assistance can include recommendations, code generation, and troubleshooting.
Among many other benefits, Gemini Cloud Assist can help you achieve the following:
- Reduce costs: identify idle resources, rightsize your deployments, and optimize autoscaling configurations to minimize unnecessary spending.
- Improve reliability and stability: proactively identify potential issues, like version skew or missing Pod Disruption Budgets, to prevent downtime and ensure application resilience.
- Optimize AI/ML Workloads: get help with deploying, managing, and optimizing AI/ML workloads on GKE.
- Simplify troubleshooting: quickly analyze logs and pinpoint the root cause of errors, saving time and effort.
This page is for existing GKE users, and Operators and Developers who provision and configure cloud resources and deploy apps and services. To learn more about common roles and example tasks referenced in Google Cloud content, see Common GKE Enterprise user roles and tasks.
Costs
Gemini: While in Preview, there is no cost for using Gemini Cloud Assist.
GKE: There are no additional costs for using Gemini Cloud Assist in GKE.
Before you begin
To begin using Gemini with GKE, complete the following prerequisites.
-
Make sure that billing is enabled for your Google Cloud project.
- Ask your Identity and account admins to grant you the necessary permissions to access and modify your GKE resources.
- Follow the instructions provided in the Set up Gemini Cloud Assist guide to enable Gemini Cloud Assist in your project or folder, with specific Identity and Access Management (IAM) roles granted to your principal.
This guide assumes that you have a GKE cluster and, preferably some deployments running.
Ask Gemini Cloud Assist
You can invoke Gemini Cloud Assist from the Google Cloud console. Gemini Cloud Assist lets you use natural language prompts to get help with tasks quickly and efficiently.
To open Cloud Assist from a GKE page, follow these steps:
In the Google Cloud console, on the project selector page, select a Google Cloud project where you enabled Gemini Cloud Assist.
In the Google Cloud console, go to a specific page on the Kubernetes Engine console.
For example, go to the Kubernetes Engine Overview page.
Go to Kubernetes Engine Overview
If you have a question about a specific resource, navigate first to the relevant page. For example, on the Clusters page, Gemini Cloud Assist can advise you about managing your clusters, monitoring your cluster health, and troubleshooting cluster issues. Using Gemini on a specific Google Cloud console page helps provide context for your questions. Gemini can then use this context, along with the overall project you're in, to generate more tailored and accurate assistance.
To open the Gemini Cloud Assist pane, click the spark Open or close Gemini AI chat in the toolbar.
If prompted, and you agree to the terms, click Accept.
Enter a prompt in the Gemini pane. See an example workflow of using Gemini to troubleshoot in the following section.
For more information about using Gemini in the Google Cloud console, see Use Gemini Cloud Assist.
Example of using Gemini to troubleshoot
Gemini can help you troubleshoot issues in your GKE services.
Go to the Workloads page in the Google Cloud console.
Select the workload you want to troubleshoot.
Click the Logs tab.
Click the spark Open or close Gemini AI chat in the toolbar.
Enter a prompt to describe the issue you are experiencing. For example, "My
accounts-db
database application is experiencing high latency". Gemini might ask for more context, such as the type of database, the scope of impact, such as the operations and users affected by the latency.Gemini can then provide guidance to analyze the logs yourself, and provide troubleshooting suggestions.
Review and follow the suggestions to resolve the issue.
Example prompts for Gemini Cloud Assist
This section shows some real-world use cases and suggests the prompts that you can try asking Gemini. The actual responses you receive might be generic, or they might be personalized and actionable based on the unique state of your Google Cloud environment. The responses could include Google Cloud console links for reviewing and managing your Cloud resources, and links to the relevant documentation for further information.
Reduce costs
The following table describes the prompts you can use to help reduce costs.
Prompt | Type of response |
---|---|
"How can I save costs on my GKE clusters without sacrificing performance?" |
|
"I'm looking to upgrade my my-docker-cluster GKE cluster. Any recommendations?" |
Suggestions to implement specific Kubernetes configurations and best practices, for example:
|
"I have a large traffic spike coming in a couple of weeks on the my-docker-cluster cluster. Any recommendations?" |
|
"Which of my GKE workloads don't have HPA enabled?" | The list of workloads that don't have the horizontal Pod autoscaler enabled. |
Improve reliability and stability
The following table describes the prompts you can use to help improve reliability and stability of your GKE workloads.
Prompt | Type of response |
---|---|
"How can I make my GKE clusters more reliable and prevent downtime?" |
|
"Show me how I can move my workloads from the Default namespace on my-cluster ." |
Steps to do the following:
|
"How do I ensure high availability for my running pods?" |
|
Optimizing GKE for AI/ML workloads
The following table describes the prompts you can use to get help with deploying, managing, and optimizing AI/ML workloads on GKE.
Prompt | Type of response |
---|---|
"What are the recommended node pool configurations for running large-scale distributed TensorFlow training on GKE with GPUs?" | Recommendations to optimize distributed TensorFlow ML training on GKE can include the following:
|
"How do I use GPUs on GKE for training?" | Overview of the steps and considerations to configure a cluster and workloads to use GPUs. |
"Give me an example of deploying a model serving container on GKE." | An example with sample code to deploy a model serving container on GKE. The example might incorporate best practices and helps ensures scalability. |
"What metrics should I track to assess the effectiveness of my load balancing setup for inference?" | The list of metrics—like traffic distribution, latency, error rates, CPU, and memory utilization—to gain insights into the performance and health of the load balancing setup. |
Simplify troubleshooting
The following table describes the prompts you can use to help quickly analyze logs and identify the root cause of errors, saving time and effort.
Prompt | Type of response |
---|---|
"What's this error about?Readiness probe failed: Get "https://10…./abcd": context deadline exceeded (Client.Timeout exceeded while awaiting headers) "
|
Explains that the kubelet failed to execute the readiness probe for the container within the defined timeout period, and suggests potential causes and troubleshooting actions. |
"Why is my deployment nettools crashing with error ping: socket: Operation not permitted ?"
|
Explains that the ping command requires the CAP_NET_RAW Security Context capability, and that, by default, containers in Kubernetes run with a restricted set of capabilities for security reasons.
|
"What does it mean when my pod is unschedulable due to the error Cannot schedule pods: No preemption victims found for incoming pod. "
|
Explains how Pod scheduling and preemption works in Kubernetes. Lists steps to troubleshoot why no preemption victim was found. |
What's next
- Learn how to write better prompts.
- Learn how to use the Gemini Cloud Assist panel.
- Read Use Gemini for AI assistance and development.
- Learn how Gemini for Google Cloud uses your data.