This page describes how to use built-in metrics, custom metrics, and alerts to monitor your agents in Vertex AI Agent Engine.
Overview
You can use Vertex AI Agent Engine using Cloud Monitoring without any additional setup or configuration. Built-in agent metrics are automatically collected and visualized in Cloud Monitoring pages in the Google Cloud console.
Supported built-in metrics
The following agent metrics are supported and associated with the Vertex AI Agent Engine monitored resource
aiplatform.googleapis.com/ReasoningEngine
:
- Request count
- Request latencies
- Container CPU allocation time
- Container memory allocation time
Refer to the full list of AI Platform metrics for more details about metric types, units, labels, as well as latency and sampling period.
View metrics for an agent
You can view your agent built-in metrics using the Metrics Explorer:
To get permission to view metrics in Metrics Explorer, ask your administrator to grant you the Monitoring Viewer role (
roles/monitoring.viewer
) on your project.Go to Metrics Explorer in the Google Cloud console:
Select your Google Cloud project.
Click Select a metric to open a search bar.
Enter Vertex AI Reasoning Engine in the search bar and click Vertex AI Reasoning Engine.
Click the Reasoning_engine metric category, then click a metric, such as Request count.
Optionally, set additional label filters, aggregation element and adjust the time range.
By default, the charts in the Metrics Explorer for the Request count metric aligns data points with a default time interval and plots data points as request-per-second (a rate metric).
Query metrics for an agent
You can also query metrics through Monitoring Query Language (MQL), Prometheus Query Language (PromQL), or Cloud Monitoring v3 API. MQL and PromQL offer more options for metrics filtering, aggregation and transformation, while the Cloud Monitoring API lets you programmatically list and query all raw data points.
Query metrics with MQL or PromQL
You can use MQL or PromQL to align and aggregate data points with a custom time interval and plot transformed data points as the absolute request count (instead of request-per-second):
MQL
fetch aiplatform.googleapis.com/ReasoningEngine
| metric 'aiplatform.googleapis.com/reasoning_engine/request_count'
| filter
(resource.reasoning_engine_id == 'RESOURCE_ID')
&& (metric.response_code == 'RESPONSE_CODE')
| align delta(10m)
| every 10m
PromQL
sum_over_time(
increase(
aiplatform_googleapis_com:reasoning_engine_request_count{
monitored_resource='aiplatform.googleapis.com/ReasoningEngine',
reasoning_engine_id='RESOURCE_ID',
response_code='RESPONSE_CODE'
}
[10m]
)
[10m:10m]
)
You can query the error rate by calculating the ratio of the requests that are labeled
with certain error response codes (such as 500
) to the total number of requests (percentage of failed requests):
MQL
fetch aiplatform.googleapis.com/ReasoningEngine
| metric 'aiplatform.googleapis.com/reasoning_engine/request_count'
| filter resource.reasoning_engine_id == 'RESOURCE_ID'
| { filter metric.response_code == '500' ; ident }
| align rate(10m)
| every 10m
| group_by [], [value_request_count_aggregate: aggregate(value.request_count)]
| ratio
PromQL
sum_over_time(
sum(
rate(
aiplatform_googleapis_com:reasoning_engine_request_count{
monitored_resource='aiplatform.googleapis.com/ReasoningEngine',
reasoning_engine_id='RESOURCE_ID',
response_code='500'
}
[10m]
)
)
[10m:10m]
)
/
sum_over_time(
sum(
rate(
aiplatform_googleapis_com:reasoning_engine_request_count{
monitored_resource='aiplatform.googleapis.com/ReasoningEngine',
reasoning_engine_id='RESOURCE_ID',
}
[10m]
)
)
[10m:10m]
)
For best practices and restrictions for ratio metrics, see About ratios of metrics. For an example of how to set an alert for the error rate metric, see Sample policies in JSON.
Query metrics with Cloud Monitoring API
You can use the Cloud Monitoring API to do the following:
Get the Vertex AI Agent Engine monitored resource definition
List available agent metric definitions
Query time-series data for
request_count
All Agent metrics are associated with the Agent Engine
monitored resource
aiplatform.googleapis.com/ReasoningEngine
.
You can invoke these APIs through
APIs Explorer, language specific client
libraries or command line. Refer to the
documentation for reading metrics
through APIs Explorer and client libraries. The following examples demonstrate
the usage in command line, more specifically the curl
tool.
Get the Agent Engine monitored resource definition
The following command retrieves the definition of the monitored resource using projects.monitoredResourceDescriptors
, as
well as all available labels which can be used for filtering:
gcurl https://monitoring.googleapis.com/v3/projects/PROJECT_ID/monitoredResourceDescriptors/aiplatform.googleapis.com/ReasoningEngine
The labels should include resource_container
, location
and
reasoning_engine_id
.
List available agent metric definitions
The following command uses projects.metricDescriptors
to retrieve all metrics and label filters for Agent Engine:
gcurl https://monitoring.googleapis.com/v3/projects/PROJECT_ID/metricDescriptors?filter='metric.type=starts_with("aiplatform.googleapis.com/reasoning_engine")'
The result should include the definition for the following metrics as well as their specific labels:
aiplatform.googleapis.com/reasoning_engine/request_count
aiplatform.googleapis.com/reasoning_engine/request_latencies
aiplatform.googleapis.com/reasoning_engine/cpu/allocation_time
aiplatform.googleapis.com/reasoning_engine/memory/allocation_time
Query time-series data for request_count
You can use projects.timeSeries.list
along with parameters like interval
, filter
, and
aggregation
to query time-series data.
The following example shows how to query the raw data points for request_count
metric for a specific agent instance during a specific time window:
gcurl https://monitoring.googleapis.com/v3/projects/PROJECT_ID/timeSeries?filter='metric.type="aiplatform.googleapis.com/reasoning_engine/request_count"%20AND%20resource.labels.reasoning_engine_id="RESOURCE_ID"&interval.endTime=2025-03-26T11:00:0.0-08:00&interval.startTime=2025-03-26T10:00:0.0-08:00'
Replace the following:
- PROJECT_ID: Your Google Cloud project ID.
- RESOURCE_ID: The Agent Engine instance ID. This is not always required. You can query across multiple Agent Engine instances within the same project.
interval.startTime
andinterval.endTime
: The start (inclusive) and end (exclusive) of the time interval, in RFC 3339 format. For example,"2025-03-26T11:22:33Z"
for Coordinated Universal Time (UTC) and"2025-03-26T11:22:33-08:00"
for Pacific Standard Time (PST). See the complete definition and more examples in RFC 3339.
You should receive a response similar to the following:
{
"timeSeries": [
{
"metric": {
"labels": {
"response_code": "200",
"response_code_class": "2xx"
},
"type": "aiplatform.googleapis.com/reasoning_engine/request_count"
},
"resource": {
"type": "aiplatform.googleapis.com/ReasoningEngine",
"labels": {
"reasoning_engine_id": "RESOURCE_ID",
"location": "LOCATION",
"project_id": "PROJECT_ID"
}
},
"metricKind": "DELTA",
"valueType": "INT64",
"points": [
{
"interval": {
"startTime": "2025-03-26T18:55:27.001Z",
"endTime": "2025-03-26T18:56:27Z"
},
"value": {
"int64Value": "25"
}
},
{
"interval": {
"startTime": "2025-03-26T18:54:27.001Z",
"endTime": "2025-03-26T18:55:27Z"
},
"value": {
"int64Value": "36"
}
}
// ... more data points ...
]
}
// ... potentially more time series with other response codes ...
],
"unit": "1"
}
See
projects.timeSeries.list
for more details on the response format.
Create custom metrics for an agent
If the built-in agent metrics don't cover your specific use case, you can define custom metrics. You can create custom metrics using the following methods:
Log-based metrics: Observe trends and patterns in a large volume of log entries.
User-defined metrics: Metrics that aren't defined by Google Cloud, such as capturing application-specific data or client-side system data.
Log-based metrics
The following steps demonstrate how to create and use a log-based metric (tool_calling_count
) for an example workflow where multiple agents call multiple tools, and you
want to count tool invocations:
Specify your tool to write a log entry every time it's called. For example,
"tool-\<tool-id\> invoked by agent-\<agent-id\>"
.Create a new counter-type log-based metric through the Google Cloud console:
Go to Log-based Metrics page in the Google Cloud console:
In the User-defined metrics section, click Create metric. The Create log-based metric pane appears.
For Metric type, select Counter
For Details section, enter the Log-based metric name. For example,
tool_calling_count
. Optionally, enter the Description and Units.For the Filter selection section, do the following:
In the Select project or log bucket drop-down list, select Project logs
In the Build filter field, enter the log filter using the logging query language. For example:
resource.type="aiplatform.googleapis.com/ReasoningEngine" resource.labels.reasoning_engine_id="RESOURCE_ID" textPayload =~ "tool-\d+ invoked by agent-\d+" -- assuming both tool and agent IDs are numeric
For the Labels section, add two new labels by clicking the Add label button.
For the first label, do the following:
In the Label name field, enter
tool
.In the Field name field, enter
textPayload
.In the Regular expression field, enter
(tool-\d+) invoked by agent-\d+
.
For the second label, do the following:
In the Label name field, enter
agent
.In the Field name field, enter
textPayload
.In the Regular expression field, enter
tool-\d+ invoked by (agent-\d+)
.
- Click Done.
Click Create metric.
To view the
tool_calling_count
metric and its associated logs, do the following in the Google Cloud console:Go to Metrics Explorer page in the Google Cloud console:
Click Select a metric to open a search bar.
Enter Vertex AI Reasoning Engine in the search bar and click Vertex AI Reasoning Engine.
Click the Logs-based metrics metric category, then click Logging/user/tool_calling_count. Adjust the time range if necessary.
(Optional) Filter by the labels
tool
andagent
.To get the total invocation count for a specific tool for all agents, set the filter label
tool
with the value of that tool ID.To get the total invocation count for a specific agent for all tools, set the filter label
agent
with the value of that agent ID.
Optionally, set the Sum By to
tool
oragent
to get the total count broken down by different tools or agents.
See Logging an agent for instructions on how to write agent logs, and Log-based metrics overview for more details on log-based metrics.
User-defined metrics
The following steps demonstrate how to create and use a user-defined metric (token_count
) for an example workflow where multiple agents call multiple models, and you
want to calculate the total count of consumed tokens (assuming that you track the number of tokens since application startup for each invoking agent and target model):
Define the custom metric type by calling
projects.metricDescriptors.create
with the following parameters:name
: a URL string, such asprojects/PROJECT_ID
Request body
: aMetricDescriptor
object:{ "name": "token_count", "description": "Token Consumed by models.", "displayName": "Token Count", "type": "custom.googleapis.com/token_count", "metricKind": "CUMULATIVE", "valueType": "INT64", "unit": "1", "labels": [ { "key": "model", "valueType": "STRING", "description": "Model." }, { "key": "agent", "valueType": "STRING", "description": "Agent." } ], "monitoredResourceTypes": [ "generic_node" ] }
The new metric
token_count
is created with the kindCumulative
, representing the total number of tokens since application startup. See Metric kinds and types for more details about theCumulative
metrics. The labelsmodel
andagent
represent the name of the target large language model (LLM) and invoking agent.
You can find the
token_count
metric in the Metrics Explorer:- Go to Metrics Explorer page in the Google Cloud console:
Click Select a metric to open a search bar.
Enter Generic node in the search bar and click Custom metrics.
Click Token Count.
Write data points to the new metric by calling
projects.timeSeries.create
with the following parameters:name
: a URL string, such asprojects/PROJECT_ID
Request body
: a list ofTimeSeries
objects:{ "timeSeries": [ { "metric": { "type": "custom.googleapis.com/token_count", "labels": { "model": "model-1", "agent": "agent-1" } }, "resource": { "type": "generic_node", "labels": { "project_id": "PROJECT_ID", "node_id": "RESOURCE_ID", "namespace": "", "location": "us-central1" } }, "points": [ { "interval": { "startTime": "2025-03-26T10:00:00-08:00", "endTime": "2025-03-26T10:01:00-08:00" }, "value": { "int64Value": 15 } } ] }, { "metric": { "type": "custom.googleapis.com/token_count", "labels": { "model": "model-1", "agent": "agent-2" } }, "resource": { "type": "generic_node", "labels": { "project_id": "PROJECT_ID", "node_id": "RESOURCE_ID", "namespace": "", "location": "us-central1" } }, "points": [ { "interval": { "startTime": "2025-03-26T10:00:00-08:00", "endTime": "2025-03-26T10:01:00-08:00" }, "value": { "int64Value": 20 } } ] } // ... more time series ... ] }
Once the data points are uploaded through the Cloud Monitoring API, you can view the new metric
token_count
through the Google Cloud console:Go to Metrics Explorer page in the Google Cloud console:
Click Select a metric to open a search bar.
Enter Generic node in the search bar and click Custom metrics.
Click Token Count. Adjust the time range and configure label values for
model
oragent
if necessary.
Configure alerts
You can use metrics in combination with alerts. See Alerting overview for more details.
The following example
demonstrates how to create a threshold alert for the request_latencies
metric
so that you receive notifications when the latency crosses a predefined value for a specified duration:
Go to Alerting page in the Google Cloud console:
Click Create Policy. The Create alerting policy page opens.
For Policy configuration mode, select Builder.
In the Select a metric drop-down menu, select
Vertex AI Reasoning Engine
->reasoning_engine
->Request Latency
.In the Add filters section, optionally configure filters (such as
reasoning_engine_id
,response_code
).In the Transform data section, toggle Rolling window and Rolling window function to values such as
5min
and99th percentile
(monitor the 99th percentile of the request latency over the 5-minute alignment period).Click Next.
In the Configure alert trigger section, do the following:
Select Threshold for Condition Types.
Select an Alert trigger, such as Any time series violates.
Select a Threshold position, such as Above threshold.
Enter a threshold value, such as
5000ms
.Click Next.
In the Configure notifications and finalize alert section, do the following:
Select one or more notification channels. See Manage notification channels for more details.
(Optional) Configure notification subject, incident auto-close duration, application labels, policy labels, severity level and additional documentation.
Set the policy name in the Name the alert policy section, such as
latency-99p-alert
.Click Create policy.
In the event of an incident, see Incidents for metric-based alerting policies for more information on acknowledging and investigating the incident and muting the alert.
You can find more alert examples in Sample policies in JSON.