HA 实例:如果某个可用区发生服务中断,整个键空间都将可供读取和写入,但由于部分读取副本不可用,因此读取容量会降低。我们强烈建议您超额预配集群容量,以便在极少数情况下单个可用区发生故障时,实例有足够的读取容量。服务中断结束后,受影响可用区中的副本会恢复,并且集群的读取容量会恢复为其配置的值。如需了解详情,请参阅构建可扩缩且可靠的应用时应遵循的模式。
[[["易于理解","easyToUnderstand","thumb-up"],["解决了我的问题","solvedMyProblem","thumb-up"],["其他","otherUp","thumb-up"]],[["很难理解","hardToUnderstand","thumb-down"],["信息或示例代码不正确","incorrectInformationOrSampleCode","thumb-down"],["没有我需要的信息/示例","missingTheInformationSamplesINeed","thumb-down"],["翻译问题","translationIssue","thumb-down"],["其他","otherDown","thumb-down"]],["最后更新时间 (UTC):2025-09-03。"],[],[],null,["This page explains how Memorystore for Valkey's architecture supports and provides\nhigh availability (HA). This page also explains recommended configurations that contribute\nto improved instance performance and stability.\n| **Note:** For more information about region-specific considerations, see [Geography and regions](/docs/geography-and-regions#regions_and_zones).\n\nHigh availability\n\nMemorystore for Valkey is built on a highly available architecture where your clients access managed Memorystore for Valkey nodes directly. Your clients do this by connecting to individual endpoints, as described in [Connect to a Memorystore for Valkey instance](/memorystore/docs/valkey/connect-instance).\n\nConnecting to shard(s) directly provides the following benefits:\n\n- Direct connection avoids intermediate hops, which minimizes the round-trip time (client latency) between your client and the Valkey node.\n\n- In Cluster Mode Enabled, direct connection avoids any single point of failure because each shard is designed to fail independently. For example, if traffic from multiple clients overloads a slot (keyspace chunk), shard failure limits the impact to the shard responsible for serving the slot.\n\nRecommended configurations\n\nWe recommend creating highly available multi-zone instances as opposed to [single-zone instances](/memorystore/docs/valkey/single-zone-instances) because of the better reliability they provide. However, if you choose to provision an instance without replicas, we recommend choosing a single-zone instance. For more information, see [Choose a single-zone instance if your instance doesn't use replicas](/memorystore/docs/valkey/general-best-practices#choose_a_single-zone_instance_if_your_instance_doesnt_use_replicas).\n\nTo enable high availability for your instance, you must provision at least 1 replica node for every shard. You can do this when [Creating the instance](/memorystore/docs/valkey/create-instances), or you can [Scale the replica count](/memorystore/docs/valkey/scale-replica-count) to at least 1 replica per shard. Replicas provide [Automatic failover](/memorystore/docs/valkey/ha-and-replicas#automatic_failover) during planned maintenance and unexpected shard failure.\n\nYou should configure your client according to the guidance in [Client best practices](/memorystore/docs/valkey/general-best-practices#client_best_practices). Using recommended best practices allows your client to handle the following items for your instance automatically and without any downtime:\n\n- The role (automatic failovers)\n\n- The endpoint (node replacement)\n\n- Cluster Mode Enabled-related slot assignment changes (consumer scale out and in)\n\nReplicas\n\nA highly available Memorystore for Valkey instance is a regional resource. This means that Memorystore for Valkey distributes primary and replica nodes of shards across multiple zones to safeguard against a zonal outage. Memorystore for Valkey supports instances with 0, 1, or 2 replicas per node.\n\nYou can use replicas to increase read throughput at the cost of potential data staleness.\n\n- **Cluster Mode Enabled:** Use the `READONLY` command to establish a connection that allows your client to read from replicas.\n- **Cluster Mode Disabled:** Connect to the [reader endpoint](/memorystore/docs/valkey/instance-node-specification#standlone_instance) to connect to any of the available replicas.\n\nCluster Mode Enabled Instance shapes\n\nThe following diagrams illustrate shapes for Cluster Mode Enabled instances:\n\nWith 3 shards and 0 replicas per node\n\nWith 3 shards and 1 replica per node\n\nWith 3 shards and 2 replicas per node\n\nCluster Mode Disabled Instance shapes\n\nThe following diagrams illustrate shapes for Cluster Mode Disabled instances:\n\nWith 2 replicas\n\nAutomatic failover\n\nAutomatic failovers within a shard can occur due to [maintenance](/memorystore/docs/valkey/about-maintenance) or an unexpected failure of the primary node. During a failover a replica is promoted to be the primary. You can configure replicas explicitly. The service can also temporarily provision extra replicas during internal maintenance to avoid any downtime.\n\nAutomatic failovers prevent data-loss during maintenance updates. For details about automatic failover behavior during maintenance, see [Automatic failover behavior during maintenance](/memorystore/docs/valkey/about-maintenance#automatic_failover_behavior_during_maintenance).\n\nFailover and node repair duration\n\nAutomatic failovers can take time on the order of tens of seconds for unplanned events such as a primary node process crash, or a hardware failure. During this time the system detects the failure, and elects a replica to be the new primary.\n\nNode repair can take time on the order of minutes for the service to\nreplace the failed node. This is true for all primary and replica nodes. For instances that aren't highly available (no replicas provisioned), repairing a failed primary node also takes time on the order of minutes.\n\nClient behavior during an unplanned failover\n\nClient connections are likely to be reset depending on the nature of the failure. After automatic recovery, connections should be retried with [exponential backoff](/memorystore/docs/valkey/exponential-backoff) to avoid overloading primary and replica nodes.\n\nClients using replicas for read throughput should be prepared for a temporary degradation in capacity until the failed node is automatically replaced.\n\nLost writes\n\nDuring a failover resulting from an unexpected failure, acknowledged writes may\nbe lost due to the asynchronous nature of Valkey's replication protocol.\n\nClient applications can leverage the Valkey WAIT command to improve real world data safety.\n\nKeyspace impact of a single zone outage\n\nThis section describes the impact of a single zone outage on a Memorystore for Valkey\ninstance.\n\nMulti-zone instances\n\n- **HA instances:** If a zone has an outage, the entire keyspace is available for reads and writes, but since some read replicas are unavailable, the read capacity is reduced. We strongly recommend over-provisioning cluster capacity so that the instance has enough read capacity, in the rare event of a single zone outage. Once the outage is over, replicas in the affected zone are restored and the read capacity of the cluster returns to its configured value. For more information, see [Patterns for scalable and reliable apps](/architecture/scalable-and-resilient-apps).\n\n- **Non-HA instances (no replicas):** If a zone has a outage, the portion of the keyspace that is provisioned in the affected zone undergoes a data flush, and is unavailable for writes or reads for the duration of the outage. Once the outage is over, primaries in the affected zone are restored and the capacity of the cluster returns to its configured value.\n\nSingle-zone instances\n\n- **Both HA and Non-HA instances:** If the zone that the instance is provisioned in has an outage, the cluster is unavailable and data is flushed. If a different zone has an outage, the cluster continues to serve read and write requests.\n\nBest practices\n\nThis section describes best practices for high availability and replicas.\n\nAdd a replica\n\nAdding a replica requires an RDB snapshot. RDB snapshots use a process fork and\n['copy-on-write' mechanism](https://valkey.io/topics/persistence/) to take a\nsnapshot of node data. Depending on the pattern of writes to nodes, the used\nmemory of the nodes grows as pages touched by the writes are copied. The memory\nfootprint can be up to double the size of the data in the node.\n\nTo ensure that nodes have sufficient memory to complete the snapshot, keep or\nset [`maxmemory`](/memorystore/docs/valkey/supported-instance-configurations#modifiable_configuration_parameters) at 80% of\nthe node capacity so that 20% is reserved for overhead. This memory overhead, in\naddition to monitoring snapshots, helps you manage your workload to have\nsuccessful snapshots. Also, when you add replicas, lower write traffic as much\nas possible. For more information, see [Monitor memory usage for an instance](/memorystore/docs/valkey/general-best-practices#monitor-memory-usage).\n| **Note**: If you add a replica to an instance that\n| uses more than 80% of the instance's maximum memory, then the operation fails\n| and you receive an error message.\n|\n| To resolve this issue, reduce your instance's memory usage in one of the\n| following ways:\n|\n| - Lower your instance's memory footprint to less than 80% of the node's maximum allowed memory.\n| - [Scale\n| the node type](/memorystore/docs/valkey/scale-instance-capacity#scale_the_node_type) for your instance so that it has more memory.\n|\n| After your instance's memory usage is below the 80% threshold, add the\n| replica again."]]