Stay organized with collections
Save and categorize content based on your preferences.
Alerts help you stay informed about the health and performance of your
air-gapped deployments. They provide timely notifications when specific
conditions are met, letting you do the following:
Proactively address issues: Detect and respond to problems before they
impact users or business operations.
Reduce downtime: Minimize service disruptions by taking corrective
action quickly.
Maintain service levels: Ensure your applications meet performance and
availability targets.
Gain operational insights: Identify trends and patterns in your
environment to optimize resource utilization and performance.
This page provides an overview of creating and managing alerts in
Google Distributed Cloud (GDC) air-gapped environments. It explains how to use monitoring
data to proactively identify and respond to critical events within your
applications and infrastructure.
Alerting policy types
Metric-based alerting policies track monitoring data and notify specific people
when a resource meets a pre-established condition. For example, an alerting
policy that monitors the CPU utilization of a virtual machine might send a
notification when an event activates the policy. Alternatively, a policy that
monitors an uptime check might notify on-call and development teams.
On the other hand, to monitor recurring events in your logs over time, use
log-based metrics to create alerting policies. Log-based metrics generate
numerical data from logging data. Log-based metrics are suitable when you want
to do any of the following:
Count the message occurrences in your logs, like a warning or error. Receive
a notification when the number of events crosses a threshold.
Observe trends in your data, like latency values in your logs. Receive a
notification if the values change unacceptably.
Create charts to display the numeric data extracted from your logs.
In GDC, alerts can generate pages and tickets for
critical errors. Pages require immediate attention from an operator, while
tickets are less urgent.
Key components
The GDC alerting service uses the following components:
Prometheus: An open-source monitoring system widely used for collecting
and storing metrics. It provides a powerful query language (PromQL) for
defining alert rules.
Monitoring platform: A managed monitoring service that collects metrics
from various sources, including Prometheus. It offers advanced features like
Grafana dashboards, custom metrics, and alerting.
Alertmanager: A component responsible for receiving, processing, and
routing alerts. It supports grouping, silencing, and inhibiting alerts to
reduce noise and improve efficiency.
Alerting workflow
GDC provides an alerting framework that integrates with
various monitoring tools and services. The typical workflow involves the
following stages:
Data collection: Use tools like Prometheus and Fluent Bit to collect
metrics and logs from your applications, infrastructure, and Kubernetes.
Monitoring: Store and visualize the collected data in Grafana
dashboards.
Alerting rules: Define alert rules based on specific conditions, such as
CPU usage exceeding a threshold or application errors exceeding a certain
rate.
Alertmanager: Alertmanager receives alerts triggered by the defined
rules and handles notification routing and silencing.
Notifications: Receive alerts through various channels, such as email,
messages, or webhooks.
Best practices
When setting up alerts, consider the following best practices:
Define clear and actionable alerts: Ensure your alerts provide specific
information about the issue and suggest appropriate actions.
Set appropriate severity levels: Categorize alerts based on their impact
and urgency to prioritize response efforts.
Avoid alert fatigue: Fine-tune your alert rules to minimize false
positives and unnecessary notifications.
Test your alerts regularly: Verify that your alerts are triggered
correctly and notifications are delivered as expected.
Document your alerting strategy: Document your alert rules, notification
channels, and escalation procedures.
[[["Easy to understand","easyToUnderstand","thumb-up"],["Solved my problem","solvedMyProblem","thumb-up"],["Other","otherUp","thumb-up"]],[["Hard to understand","hardToUnderstand","thumb-down"],["Incorrect information or sample code","incorrectInformationOrSampleCode","thumb-down"],["Missing the information/samples I need","missingTheInformationSamplesINeed","thumb-down"],["Other","otherDown","thumb-down"]],["Last updated 2025-08-29 UTC."],[[["\u003cp\u003eAlerts in Google Distributed Cloud (GDC) air-gapped environments provide notifications on the health and performance of deployments, enabling proactive issue resolution and reduced downtime.\u003c/p\u003e\n"],["\u003cp\u003eMetric-based alerting policies track monitoring data, triggering notifications when a resource meets a pre-established condition, while log-based metrics monitor recurring events in logs.\u003c/p\u003e\n"],["\u003cp\u003eThe GDC alerting service integrates Prometheus for metrics collection, a monitoring platform for data visualization, and Alertmanager for alert processing and routing.\u003c/p\u003e\n"],["\u003cp\u003eThe alerting workflow involves data collection, monitoring, defining alert rules, alert processing by Alertmanager, and receiving notifications through various channels.\u003c/p\u003e\n"],["\u003cp\u003eBest practices for setting up alerts include defining clear, actionable alerts, setting appropriate severity levels, minimizing false positives, regularly testing alerts, and documenting the alerting strategy.\u003c/p\u003e\n"]]],[],null,["# Alerting overview\n\nAlerts help you stay informed about the health and performance of your\nair-gapped deployments. They provide timely notifications when specific\nconditions are met, letting you do the following:\n\n- **Proactively address issues**: Detect and respond to problems before they impact users or business operations.\n- **Reduce downtime**: Minimize service disruptions by taking corrective action quickly.\n- **Maintain service levels**: Ensure your applications meet performance and availability targets.\n- **Gain operational insights**: Identify trends and patterns in your environment to optimize resource utilization and performance.\n\nThis page provides an overview of creating and managing alerts in\nGoogle Distributed Cloud (GDC) air-gapped environments. It explains how to use monitoring\ndata to proactively identify and respond to critical events within your\napplications and infrastructure.\n\nAlerting policy types\n---------------------\n\nMetric-based alerting policies track monitoring data and notify specific people\nwhen a resource meets a pre-established condition. For example, an alerting\npolicy that monitors the CPU utilization of a virtual machine might send a\nnotification when an event activates the policy. Alternatively, a policy that\nmonitors an uptime check might notify on-call and development teams.\n\nOn the other hand, to monitor recurring events in your logs over time, use\nlog-based metrics to create alerting policies. Log-based metrics generate\nnumerical data from logging data. Log-based metrics are suitable when you want\nto do any of the following:\n\n- Count the message occurrences in your logs, like a warning or error. Receive a notification when the number of events crosses a threshold.\n- Observe trends in your data, like latency values in your logs. Receive a notification if the values change unacceptably.\n- Create charts to display the numeric data extracted from your logs.\n\nIn GDC, alerts can generate pages and tickets for\ncritical errors. Pages require immediate attention from an operator, while\ntickets are less urgent.\n\nKey components\n--------------\n\nThe GDC alerting service uses the following components:\n\n- **Prometheus**: An open-source monitoring system widely used for collecting and storing metrics. It provides a powerful query language (PromQL) for defining alert rules.\n- **Monitoring platform**: A managed monitoring service that collects metrics from various sources, including Prometheus. It offers advanced features like Grafana dashboards, custom metrics, and alerting.\n- **Alertmanager**: A component responsible for receiving, processing, and routing alerts. It supports grouping, silencing, and inhibiting alerts to reduce noise and improve efficiency.\n\nAlerting workflow\n-----------------\n\nGDC provides an alerting framework that integrates with\nvarious monitoring tools and services. The typical workflow involves the\nfollowing stages:\n\n1. **Data collection**: Use tools like Prometheus and Fluent Bit to collect metrics and logs from your applications, infrastructure, and Kubernetes.\n2. **Monitoring**: Store and visualize the collected data in Grafana dashboards.\n3. **Alerting rules**: Define alert rules based on specific conditions, such as CPU usage exceeding a threshold or application errors exceeding a certain rate.\n4. **Alertmanager**: Alertmanager receives alerts triggered by the defined rules and handles notification routing and silencing.\n5. **Notifications**: Receive alerts through various channels, such as email, messages, or webhooks.\n\nBest practices\n--------------\n\nWhen setting up alerts, consider the following best practices:\n\n- **Define clear and actionable alerts**: Ensure your alerts provide specific information about the issue and suggest appropriate actions.\n- **Set appropriate severity levels**: Categorize alerts based on their impact and urgency to prioritize response efforts.\n- **Avoid alert fatigue**: Fine-tune your alert rules to minimize false positives and unnecessary notifications.\n- **Test your alerts regularly**: Verify that your alerts are triggered correctly and notifications are delivered as expected.\n- **Document your alerting strategy**: Document your alert rules, notification channels, and escalation procedures."]]