Best practices for managing database fleet health

To optimize your database fleet health, we recommend that you follow these best practices for monitoring, availability, and data protection.

Monitoring

Database monitoring, which entails tracking a database's performance and resources, is critical for maintaining the health of a database management system.

Perform regular, frequent database health checks

To ensure the health of your database, we recommend that you check the Database Center dashboard regularly (for example, once a week). Database Center doesn't notify you about database health issues. A regular check of your organization's database health issues helps you proactively identify and resolve database health issues.

Availability configuration

You can edit configurations to improve the durability and reliability of your databases.

Ensure your resource is failover-protected

When a resource is available in multiple zones, it is considered highly available (HA) and is protected by automatic failover. The purpose of an HA configuration is to reduce downtime when a zone or instance becomes unavailable. This might happen during a zonal outage or when a resource runs out of memory. High availability means that regardless of outages, your data will be available to client applications.

Configuring a resource to have high availability provides data redundancy within a region. Highly available resources have a primary node and a standby node, each in a different zone. Through synchronous replication to each zone's persistent disk, all writes made to the primary node are replicated to disks in both zones before a transaction is reported as committed. In the event of a node or zone failure, the standby node becomes the new primary node, and users are rerouted to the new primary node. This process is called a failover.

Use cross-region replication

When a resource group is available in multiple regions, it's using a feature called cross-region replication, which is a way to asynchronously replicate data and applications across regions. We recommend that you use cross-region replication for the following reasons:

  • Disaster recovery: If the region for a primary resource becomes unavailable, you can promote a secondary resource in another region to become the primary resource and use it to serve requests.
  • Geographically distributed data: Locating your data closer to the applications that need the data can reduce read latency.
  • Geographic load balancing: If slow or overloaded connections occur in one region, you can route traffic to another region.
  • Improved read performance: Provisioning read-only resources around the world can improve capacity and performance in those areas.

Data protection

Data protection is important because it can help protect organization data against loss, manipulation, and illegal access.

Enable automated backups

Backups help you restore lost data to your database resources, and they protect your data from loss or damage. If a database resource experiences a problem, you can restore it to a previous state. Enable automated backups for any resource that contains necessary data.

Set up long backup retention windows

Your backup retention settings determine the window during which you can recover data if your data experiences errors, corruption, or losses. The longer your backup retention period, the larger your recovery window is for that resource.