Generative AI on Vertex AI provides a prebuilt model observability dashboard to
view the behavior, health, and performance of fully-managed models.
Fully-managed models, also known as Model as a Service (MaaS), are provided by
Google and include Google's Gemini models and partner
models with managed endpoints. Metrics from self-hosted models aren't included
in the dashboard. Generative AI on Vertex AI automatically collects and reports activity from MaaS
models to help you quickly troubleshoot latency issues and monitor capacity. The model observability dashboard displays a subset of metrics that are
collected by Cloud Monitoring, such as model request per second (QPS), token
throughput, and first token latencies. View the dashboard to see all
the available metrics. As an application developer, you can view how your users are interacting with
the models that you've exposed. For example, you can view how model usage (model
requests per second) and the compute intensity of user prompts (model invocation
latencies) are trending over time. Consequently, because these metrics are
related to model usage, you can also estimate costs for running each model. When an issue arises, you can quickly troubleshoot from the dashboard. You can
check if models are responding reliably and in a timely manner by viewing API
error rates, first token latencies, and token throughput. Vertex AI captures dashboard metrics only for API calls to a
model's endpoint. Google Cloud console usage, such as metrics from
Vertex AI Studio, aren't added to the dashboard. In the Vertex AI section of the Google Cloud console, go to the
Dashboard page. In the Model observability section, click Show all metrics to view
the model observability dashboard in the Google Cloud Observability console. To view metrics for a specific model or in a particular location, set one or
more filters at the top of the dashboard page. For descriptions of each metric, see the "aiplatform" section on the
Google Cloud metrics page.
Available monitoring metrics
Use case
Limitations
View the dashboard
Additional resources
Model monitoring metrics
Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-08-15 UTC.