Monitor an agent

This page describes how to use built-in metrics, custom metrics, and alerts to monitor your agents in Vertex AI Agent Engine.

Overview

You can use Vertex AI Agent Engine using Cloud Monitoring without any additional setup or configuration. Built-in agent metrics are automatically collected and visualized in Cloud Monitoring pages in the Google Cloud console.

Supported built-in metrics

The following agent metrics are supported and associated with the Vertex AI Agent Engine monitored resource aiplatform.googleapis.com/ReasoningEngine:

  • Request count
  • Request latencies
  • Container CPU allocation time
  • Container memory allocation time

Refer to the full list of AI Platform metrics for more details about metric types, units, labels, as well as latency and sampling period.

View metrics for an agent

You can view your agent built-in metrics using the Metrics Explorer:

  1. To get permission to view metrics in Metrics Explorer, ask your administrator to grant you the Monitoring Viewer role (roles/monitoring.viewer) on your project.

  2. Go to Metrics Explorer in the Google Cloud console:

    Go to Metrics Explorer

  3. Select your Google Cloud project.

  4. Click Select a metric to open a search bar.

  5. Enter Vertex AI Reasoning Engine in the search bar and click Vertex AI Reasoning Engine.

  6. Click the Reasoning_engine metric category, then click a metric, such as Request count.

  7. Optionally, set additional label filters, aggregation element and adjust the time range.

By default, the charts in the Metrics Explorer for the Request count metric aligns data points with a default time interval and plots data points as request-per-second (a rate metric).

Query metrics for an agent

You can also query metrics through Monitoring Query Language (MQL), Prometheus Query Language (PromQL), or Cloud Monitoring v3 API. MQL and PromQL offer more options for metrics filtering, aggregation and transformation, while the Cloud Monitoring API lets you programmatically list and query all raw data points.

Query metrics with MQL or PromQL

You can use MQL or PromQL to align and aggregate data points with a custom time interval and plot transformed data points as the absolute request count (instead of request-per-second):

MQL

fetch aiplatform.googleapis.com/ReasoningEngine
  | metric 'aiplatform.googleapis.com/reasoning_engine/request_count'
  | filter
      (resource.reasoning_engine_id == 'RESOURCE_ID')
      && (metric.response_code == 'RESPONSE_CODE')
  | align delta(10m)
  | every 10m

PromQL

sum_over_time(
  increase(
      aiplatform_googleapis_com:reasoning_engine_request_count{
          monitored_resource='aiplatform.googleapis.com/ReasoningEngine',
          reasoning_engine_id='RESOURCE_ID',
          response_code='RESPONSE_CODE'
      }
      [10m]
  )
  [10m:10m]
)

You can query the error rate by calculating the ratio of the requests that are labeled with certain error response codes (such as 500) to the total number of requests (percentage of failed requests):

MQL

fetch aiplatform.googleapis.com/ReasoningEngine
  | metric 'aiplatform.googleapis.com/reasoning_engine/request_count'
  | filter resource.reasoning_engine_id == 'RESOURCE_ID'
  | { filter metric.response_code == '500' ; ident }
  | align rate(10m)
  | every 10m
  | group_by [], [value_request_count_aggregate: aggregate(value.request_count)]
  | ratio

PromQL

sum_over_time(
  sum(
    rate(
      aiplatform_googleapis_com:reasoning_engine_request_count{
        monitored_resource='aiplatform.googleapis.com/ReasoningEngine',
        reasoning_engine_id='RESOURCE_ID',
        response_code='500'
      }
      [10m]
    )
  )
  [10m:10m]
)
/
sum_over_time(
  sum(
    rate(
      aiplatform_googleapis_com:reasoning_engine_request_count{
        monitored_resource='aiplatform.googleapis.com/ReasoningEngine',
        reasoning_engine_id='RESOURCE_ID',
      }
      [10m]
    )
  )
  [10m:10m]
)

For best practices and restrictions for ratio metrics, see About ratios of metrics. For an example of how to set an alert for the error rate metric, see Sample policies in JSON.

Query metrics with Cloud Monitoring API

You can use the Cloud Monitoring API to do the following:

  • Get the Vertex AI Agent Engine monitored resource definition

  • List available agent metric definitions

  • Query time-series data for request_count

All Agent metrics are associated with the Agent Engine monitored resource aiplatform.googleapis.com/ReasoningEngine.

You can invoke these APIs through APIs Explorer, language specific client libraries or command line. Refer to the documentation for reading metrics through APIs Explorer and client libraries. The following examples demonstrate the usage in command line, more specifically the curl tool.

Get the Agent Engine monitored resource definition

The following command retrieves the definition of the monitored resource using projects.monitoredResourceDescriptors, as well as all available labels which can be used for filtering:

gcurl https://monitoring.googleapis.com/v3/projects/PROJECT_ID/monitoredResourceDescriptors/aiplatform.googleapis.com/ReasoningEngine

The labels should include resource_container, location and reasoning_engine_id.

List available agent metric definitions

The following command uses projects.metricDescriptors to retrieve all metrics and label filters for Agent Engine:

gcurl https://monitoring.googleapis.com/v3/projects/PROJECT_ID/metricDescriptors?filter='metric.type=starts_with("aiplatform.googleapis.com/reasoning_engine")'

The result should include the definition for the following metrics as well as their specific labels:

  • aiplatform.googleapis.com/reasoning_engine/request_count
  • aiplatform.googleapis.com/reasoning_engine/request_latencies
  • aiplatform.googleapis.com/reasoning_engine/cpu/allocation_time
  • aiplatform.googleapis.com/reasoning_engine/memory/allocation_time

Query time-series data for request_count

You can use projects.timeSeries.list along with parameters like interval, filter, and aggregation to query time-series data.

The following example shows how to query the raw data points for request_count metric for a specific agent instance during a specific time window:

gcurl https://monitoring.googleapis.com/v3/projects/PROJECT_ID/timeSeries?filter='metric.type="aiplatform.googleapis.com/reasoning_engine/request_count"%20AND%20resource.labels.reasoning_engine_id="RESOURCE_ID"&interval.endTime=2025-03-26T11:00:0.0-08:00&interval.startTime=2025-03-26T10:00:0.0-08:00'

Replace the following:

  • PROJECT_ID: Your Google Cloud project ID.
  • RESOURCE_ID: The Agent Engine instance ID. This is not always required. You can query across multiple Agent Engine instances within the same project.
  • interval.startTime and interval.endTime: The start (inclusive) and end (exclusive) of the time interval, in RFC 3339 format. For example, "2025-03-26T11:22:33Z" for Coordinated Universal Time (UTC) and "2025-03-26T11:22:33-08:00" for Pacific Standard Time (PST). See the complete definition and more examples in RFC 3339.

You should receive a response similar to the following:

{
  "timeSeries": [
    {
      "metric": {
        "labels": {
          "response_code": "200",
          "response_code_class": "2xx"
        },
        "type": "aiplatform.googleapis.com/reasoning_engine/request_count"
      },
      "resource": {
        "type": "aiplatform.googleapis.com/ReasoningEngine",
        "labels": {
          "reasoning_engine_id": "RESOURCE_ID",
          "location": "LOCATION",
          "project_id": "PROJECT_ID"
        }
      },
      "metricKind": "DELTA",
      "valueType": "INT64",
      "points": [
        {
          "interval": {
            "startTime": "2025-03-26T18:55:27.001Z",
            "endTime": "2025-03-26T18:56:27Z"
          },
          "value": {
            "int64Value": "25"
          }
        },
        {
          "interval": {
            "startTime": "2025-03-26T18:54:27.001Z",
            "endTime": "2025-03-26T18:55:27Z"
          },
          "value": {
            "int64Value": "36"
          }
        }
        // ... more data points ...
      ]
    }
    // ... potentially more time series with other response codes ...
  ],
  "unit": "1"
}

See projects.timeSeries.list for more details on the response format.

Create custom metrics for an agent

If the built-in agent metrics don't cover your specific use case, you can define custom metrics. You can create custom metrics using the following methods:

  • Log-based metrics: Observe trends and patterns in a large volume of log entries.

  • User-defined metrics: Metrics that aren't defined by Google Cloud, such as capturing application-specific data or client-side system data.

Log-based metrics

The following steps demonstrate how to create and use a log-based metric (tool_calling_count) for an example workflow where multiple agents call multiple tools, and you want to count tool invocations:

  1. Specify your tool to write a log entry every time it's called. For example, "tool-\<tool-id\> invoked by agent-\<agent-id\>".

  2. Create a new counter-type log-based metric through the Google Cloud console:

    1. Go to Log-based Metrics page in the Google Cloud console:

      Go to Log-based Metrics

    2. In the User-defined metrics section, click Create metric. The Create log-based metric pane appears.

    3. For Metric type, select Counter

    4. For Details section, enter the Log-based metric name. For example, tool_calling_count. Optionally, enter the Description and Units.

    5. For the Filter selection section, do the following:

      1. In the Select project or log bucket drop-down list, select Project logs

      2. In the Build filter field, enter the log filter using the logging query language. For example:

        resource.type="aiplatform.googleapis.com/ReasoningEngine"
        resource.labels.reasoning_engine_id="RESOURCE_ID"
        textPayload =~ "tool-\d+ invoked by agent-\d+" -- assuming both tool and agent IDs are numeric
        
    6. For the Labels section, add two new labels by clicking the Add label button.

      1. For the first label, do the following:

        1. In the Label name field, enter tool.

        2. In the Field name field, enter textPayload.

        3. In the Regular expression field, enter (tool-\d+) invoked by agent-\d+.

      2. For the second label, do the following:

        1. In the Label name field, enter agent.

        2. In the Field name field, enter textPayload.

        3. In the Regular expression field, enter tool-\d+ invoked by (agent-\d+).

      1. Click Done.
    7. Click Create metric.

  3. To view the tool_calling_count metric and its associated logs, do the following in the Google Cloud console:

    1. Go to Metrics Explorer page in the Google Cloud console:

      Go to Metrics Explorer

    2. Click Select a metric to open a search bar.

    3. Enter Vertex AI Reasoning Engine in the search bar and click Vertex AI Reasoning Engine.

    4. Click the Logs-based metrics metric category, then click Logging/user/tool_calling_count. Adjust the time range if necessary.

    5. (Optional) Filter by the labels tool and agent.

      • To get the total invocation count for a specific tool for all agents, set the filter label tool with the value of that tool ID.

      • To get the total invocation count for a specific agent for all tools, set the filter label agent with the value of that agent ID.

      Optionally, set the Sum By to tool or agent to get the total count broken down by different tools or agents.

See Logging an agent for instructions on how to write agent logs, and Log-based metrics overview for more details on log-based metrics.

User-defined metrics

The following steps demonstrate how to create and use a user-defined metric (token_count) for an example workflow where multiple agents call multiple models, and you want to calculate the total count of consumed tokens (assuming that you track the number of tokens since application startup for each invoking agent and target model):

  1. Define the custom metric type by calling projects.metricDescriptors.create with the following parameters:

    • name: a URL string, such as projects/PROJECT_ID

    • Request body: a MetricDescriptor object:

      {
        "name": "token_count",
        "description": "Token Consumed by models.",
        "displayName": "Token Count",
        "type": "custom.googleapis.com/token_count",
        "metricKind": "CUMULATIVE",
        "valueType": "INT64",
        "unit": "1",
        "labels": [
          {
            "key": "model",
            "valueType": "STRING",
            "description": "Model."
          },
          {
            "key": "agent",
            "valueType": "STRING",
            "description": "Agent."
          }
        ],
        "monitoredResourceTypes": [
          "generic_node"
        ]
      }
      

      The new metric token_count is created with the kind Cumulative, representing the total number of tokens since application startup. See Metric kinds and types for more details about the Cumulative metrics. The labels model and agent represent the name of the target large language model (LLM) and invoking agent.

    1. You can find the token_count metric in the Metrics Explorer:

      1. Go to Metrics Explorer page in the Google Cloud console:

      Go to Metrics Explorer

      1. Click Select a metric to open a search bar.

      2. Enter Generic node in the search bar and click Custom metrics.

      3. Click Token Count.

  2. Write data points to the new metric by calling projects.timeSeries.create with the following parameters:

    • name: a URL string, such as projects/PROJECT_ID

    • Request body: a list of TimeSeries objects:

      {
        "timeSeries": [
          {
            "metric": {
              "type": "custom.googleapis.com/token_count",
              "labels": {
                "model": "model-1",
                "agent": "agent-1"
              }
            },
            "resource": {
              "type": "generic_node",
              "labels": {
                "project_id": "PROJECT_ID",
                "node_id": "RESOURCE_ID",
                "namespace": "",
                "location": "us-central1"
              }
            },
            "points": [
              {
                "interval": {
                  "startTime": "2025-03-26T10:00:00-08:00",
                  "endTime": "2025-03-26T10:01:00-08:00"
                },
                "value": {
                  "int64Value": 15
                }
              }
            ]
          },
          {
            "metric": {
              "type": "custom.googleapis.com/token_count",
              "labels": {
                "model": "model-1",
                "agent": "agent-2"
              }
            },
            "resource": {
              "type": "generic_node",
              "labels": {
                "project_id": "PROJECT_ID",
                "node_id": "RESOURCE_ID",
                "namespace": "",
                "location": "us-central1"
              }
            },
            "points": [
              {
                "interval": {
                  "startTime": "2025-03-26T10:00:00-08:00",
                  "endTime": "2025-03-26T10:01:00-08:00"
                },
                "value": {
                  "int64Value": 20
                }
              }
            ]
          }
          // ... more time series ...
        ]
      }
      
  3. Once the data points are uploaded through the Cloud Monitoring API, you can view the new metric token_count through the Google Cloud console:

    1. Go to Metrics Explorer page in the Google Cloud console:

      Go to Metrics Explorer

    2. Click Select a metric to open a search bar.

    3. Enter Generic node in the search bar and click Custom metrics.

    4. Click Token Count. Adjust the time range and configure label values for model or agent if necessary.

Configure alerts

You can use metrics in combination with alerts. See Alerting overview for more details.

The following example demonstrates how to create a threshold alert for the request_latencies metric so that you receive notifications when the latency crosses a predefined value for a specified duration:

  1. Go to Alerting page in the Google Cloud console:

    Go to Alerting

  2. Click Create Policy. The Create alerting policy page opens.

    1. For Policy configuration mode, select Builder.

    2. In the Select a metric drop-down menu, select Vertex AI Reasoning Engine -> reasoning_engine -> Request Latency.

    3. In the Add filters section, optionally configure filters (such as reasoning_engine_id, response_code).

    4. In the Transform data section, toggle Rolling window and Rolling window function to values such as 5min and 99th percentile (monitor the 99th percentile of the request latency over the 5-minute alignment period).

    5. Click Next.

  3. In the Configure alert trigger section, do the following:

    1. Select Threshold for Condition Types.

    2. Select an Alert trigger, such as Any time series violates.

    3. Select a Threshold position, such as Above threshold.

    4. Enter a threshold value, such as 5000ms.

    5. Click Next.

  4. In the Configure notifications and finalize alert section, do the following:

    1. Select one or more notification channels. See Manage notification channels for more details.

    2. (Optional) Configure notification subject, incident auto-close duration, application labels, policy labels, severity level and additional documentation.

    3. Set the policy name in the Name the alert policy section, such as latency-99p-alert.

    4. Click Create policy.

In the event of an incident, see Incidents for metric-based alerting policies for more information on acknowledging and investigating the incident and muting the alert.

You can find more alert examples in Sample policies in JSON.