Monitor a deployed index

Vertex AI provides two metrics for monitoring the IndexEndpoint of a deployed index:

aiplatform.googleapis.com/matching_engine/current_shards

The number of shards of the DeployedIndex. As data is added and deleted, Vector Search automatically reshards the index to achieve optimal performance. This metric indicates the current number of shards of the deployed index.
aiplatform.googleapis.com/matching_engine/current_replicas

The total number of active replica servers being used by the DeployedIndex. To match query volume, Vector Search automatically turns up or down replica servers based on the minimum and maximum replica settings specified when deploying the index.

If the index has multiple shards, each shard can be served by using a different number of replica servers. This metric is the total number of replica servers across all shards of the given index.

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-10-31 UTC.