- NAME
-
- gcloud container ai profiles list - list compatible accelerator profiles
- SYNOPSIS
-
-
gcloud container ai profiles list
[--format
=FORMAT
] [--model
=MODEL
] [--model-server
=MODEL_SERVER
] [--model-server-version
=MODEL_SERVER_VERSION
] [--pricing-model
=PRICING_MODEL
] [--target-cost-per-million-input-tokens
=TARGET_COST_PER_MILLION_INPUT_TOKENS
] [--target-cost-per-million-output-tokens
=TARGET_COST_PER_MILLION_OUTPUT_TOKENS
] [--target-ntpot-milliseconds
=TARGET_NTPOT_MILLISECONDS
] [--target-ttft-milliseconds
=TARGET_TTFT_MILLISECONDS
] [--filter
=EXPRESSION
] [--limit
=LIMIT
] [--page-size
=PAGE_SIZE
] [--sort-by
=[FIELD
,…]] [--uri
] [GCLOUD_WIDE_FLAG …
]
-
- DESCRIPTION
-
This command lists all supported accelerators with their performance details. By
default, the supported accelerators are displayed in a table format with select
information for each accelerator. To see all details, use --format=yaml.
To get supported model, model servers, and model server versions, run
gcloud container ai profiles models list
,gcloud container ai profiles model-servers list
, andgcloud container ai profiles model-server-versions list
. - FLAGS
-
--format
=FORMAT
- The output format. Default is profile, which displays the profile information in a table format, including cost conversions. csvprofile displays the profile information in a CSV format.Options include csvprofile, profile, and yaml.
--model
=MODEL
- The model.
--model-server
=MODEL_SERVER
- The model server. If not specified, this defaults to any model server.
--model-server-version
=MODEL_SERVER_VERSION
- The model server version. If not specified, this defaults to the latest version.
--pricing-model
=PRICING_MODEL
- The pricing model to use to calculate token cost. Currently, this supports on-demand, spot, 3-years-cud, 1-year-cud
--target-cost-per-million-input-tokens
=TARGET_COST_PER_MILLION_INPUT_TOKENS
- The target cost per million input tokens to filter profiles by, unit is 1 USD up to 5 decimal places.
--target-cost-per-million-output-tokens
=TARGET_COST_PER_MILLION_OUTPUT_TOKENS
- The target cost per million output tokens to filter profiles by, unit is 1 USD up to 5 decimal places.
--target-ntpot-milliseconds
=TARGET_NTPOT_MILLISECONDS
- The target normalized time per output token (NTPOT) in milliseconds. NTPOT is measured as the request_latency / output_tokens. If this field is set, the command will only return accelerators that can meet the target ntpot milliseconds and display their throughput performance at the target latency. Otherwise, the command will return all accelerators and display their highest throughput performance.
--target-ttft-milliseconds
=TARGET_TTFT_MILLISECONDS
- The target time to first token (TTFT) in milliseconds. TTFT is measured as the request_latency / output_tokens. If this field is set, the command will only return profiles that can meet the target ttft milliseconds and display their throughput performance at the target latency. Otherwise, the command will return all profiles and display their highest throughput performance.
- LIST COMMAND FLAGS
-
--filter
=EXPRESSION
-
Apply a Boolean filter
EXPRESSION
to each resource item to be listed. If the expression evaluatesTrue
, then that item is listed. For more details and examples of filter expressions, run $ gcloud topic filters. This flag interacts with other flags that are applied in this order:--flatten
,--sort-by
,--filter
,--limit
. --limit
=LIMIT
-
Maximum number of resources to list. The default is
unlimited
. This flag interacts with other flags that are applied in this order:--flatten
,--sort-by
,--filter
,--limit
. --page-size
=PAGE_SIZE
-
Some services group resource list output into pages. This flag specifies the
maximum number of resources per page. The default is determined by the service
if it supports paging, otherwise it is
unlimited
(no paging). Paging may be applied before or after--filter
and--limit
depending on the service. --sort-by
=[FIELD
,…]-
Comma-separated list of resource field key names to sort by. The default order
is ascending. Prefix a field with ``~´´ for descending order on that
field. This flag interacts with other flags that are applied in this order:
--flatten
,--sort-by
,--filter
,--limit
. --uri
-
Print a list of resource URIs instead of the default output, and change the
command output to a list of URIs. If this flag is used with
--format
, the formatting is applied on this URI list. To display URIs alongside other keys instead, use theuri()
transform.
- GCLOUD WIDE FLAGS
-
These flags are available to all commands:
--access-token-file
,--account
,--billing-project
,--configuration
,--flags-file
,--flatten
,--format
,--help
,--impersonate-service-account
,--log-http
,--project
,--quiet
,--trace-token
,--user-output-enabled
,--verbosity
.Run
$ gcloud help
for details.
gcloud container ai profiles list
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-13 UTC.