gcloud alpha container ai profiles

NAME: gcloud alpha container ai profiles - quickstart engine for GKE AI workloads
SYNOPSIS: gcloud alpha container ai profiles GROUP [GCLOUD_WIDE_FLAG …]
DESCRIPTION: (ALPHA) The GKE Inference Quickstart helps simplify deploying AI inference on Google Kubernetes Engine (GKE). It provides tailored profiles based on Google's internal benchmarks. Provide inputs like your preferred open-source model (e.g. Llama, Gemma, or Mistral) and your application's performance target. Based on these inputs, the quickstart generates accelerator choices with performance metrics, and detailed, ready-to-deploy profiles for compute, load balancing, and autoscaling. These profiles are provided as standard Kubernetes YAML manifests, which you can deploy or modify.
To visualize the benchmarking data that support these estimates, see the accompanying Colab notebook: https://colab.research.google.com/github/GoogleCloudPlatform/kubernetes-engine-samples/blob/main/ai-ml/notebooks/giq_visualizations.ipynb
GCLOUD WIDE FLAGS: These flags are available to all commands: --help.
Run $ gcloud help for details.
GROUPS: GROUP is one of the following:

accelerators

(ALPHA) Manage supported accelerators for GKE Inference Quickstart.

manifests

(ALPHA) Generate optimized Kubernetes manifests.

model-and-server-combinations

(ALPHA) Manage supported model and model servers for GKE Inference Quickstart.

model-server-versions

(ALPHA) Manage supported model server versions for GKE Inference Quickstart.

model-servers

(ALPHA) Manage supported model servers for GKE Inference Quickstart.

models

(ALPHA) Manage supported models for GKE Inference Quickstart.
NOTES: This command is currently in alpha and might change without notice. If this command fails with API permission errors despite specifying the correct project, you might be trying to access an API with an invitation-only early access allowlist. This variant is also available:
gcloud container ai profiles