Stay organized with collections
Save and categorize content based on your preferences.
Service level objectives overview
Service Level Objectives (SLOs) are a core tool in the Google service monitoring
toolkit. SLOs can give you a concise and low-noise signal as to the overall
health of your services. Cloud Service Mesh lets you set SLOs for your
services, and monitor and alert on your services in terms of those SLOs.
To monitor the health of a service, you need to understand which behaviors
matter for that service and how to measure and evaluate those behaviors. A
service level indicator (SLI) is a quantitative measure of some aspect of the
service. Typical SLIs are:
Latency: How long it takes to return a response to a request, usually measured
in milliseconds (ms). Latency is typically presented as an aggregate. That is,
the raw data is collected over a period of time and calculated as percentiles.
Cloud Service Mesh displays a Latency graph on the Metrics page
for each of your services. The Latency graph shows you the latency over time,
which can help you determine a latency threshold or upper bound for a service.
Availability: The fraction of the time that a service responds successfully.
This is typically presented as a ratio of the number of successful responses
over the total number of responses. The Error rate graph on the Metrics
page can help you determine the availability of each service.
An SLO is a target value for a service level that is measured by an SLI. An SLO
can be represented as: SLI ≤ upper_bound or SLI ≥ lower_bound. SLOs are
measurable goals for performance over a period of time. For example, you might
have requirements like the following for some of your services:
Latency can exceed 300ms in only 5 percent of the requests over a rolling
30-day period.
The system must have 99% availability measured over a calendar week.
You can set and view SLOs for your services based on their telemetry data on the
Health page. You can then create alerts in
Cloud Monitoring to warn you if a service isn't
performing as expected.
What's next
Learn more about SLOs from Site Reliability Engineering at Google:
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-28 UTC."],[],[],null,["# Service level objectives overview\n=================================\n\n| **Note:** This guide only supports Cloud Service Mesh with Istio APIs and does not support Google Cloud APIs. For more information see, [Cloud Service Mesh overview](/service-mesh/docs/overview).\n\nService Level Objectives (SLOs) are a core tool in the Google service monitoring\ntoolkit. SLOs can give you a concise and low-noise signal as to the overall\nhealth of your services. Cloud Service Mesh lets you set SLOs for your\nservices, and monitor and alert on your services in terms of those SLOs.\n\nTo monitor the health of a service, you need to understand which behaviors\nmatter for that service and how to measure and evaluate those behaviors. A\nservice level indicator (SLI) is a quantitative measure of some aspect of the\nservice. Typical SLIs are:\n\n- Latency: How long it takes to return a response to a request, usually measured in milliseconds (ms). Latency is typically presented as an aggregate. That is, the raw data is collected over a period of time and calculated as percentiles. Cloud Service Mesh displays a Latency graph on the **Metrics** page for each of your services. The Latency graph shows you the latency over time, which can help you determine a latency threshold or upper bound for a service.\n- Availability: The fraction of the time that a service responds successfully. This is typically presented as a ratio of the number of successful responses over the total number of responses. The Error rate graph on the **Metrics** page can help you determine the availability of each service.\n\nAn SLO is a target value for a service level that is measured by an SLI. An SLO\ncan be represented as: `SLI ≤ upper_bound` or `SLI ≥ lower_bound`. SLOs are\nmeasurable goals for performance over a period of time. For example, you might\nhave requirements like the following for some of your services:\n\n- Latency can exceed 300ms in only 5 percent of the requests over a rolling 30-day period.\n- The system must have 99% availability measured over a calendar week.\n\nYou can set and view SLOs for your services based on their telemetry data on the\n**Health** page. You can then create alerts in\n[Cloud Monitoring](/monitoring/alerts) to warn you if a service isn't\nperforming as expected.\n\nWhat's next\n-----------\n\n- Learn more about SLOs from Site Reliability Engineering at Google:\n\n - [Site Reliability Engineering](https://sre.google/sre-book/service-level-objectives/)\n - [The Site Reliability Workbook](https://sre.google/workbook/implementing-slos/)\n- [Designing SLOs](/service-mesh/docs/observability/design-slo)\n\n- [Creating SLOs](/service-mesh/docs/observability/create-slo)\n\n- [Monitoring SLOs](/service-mesh/docs/observability/monitor-slo)"]]