当前位置：首页 > 数码 > Redis-Sentinel-Redis-节点故障恢复机制-监控和自动化 (redis的五种数据类型)

Redis-Sentinel-Redis-节点故障恢复机制-监控和自动化 (redis的五种数据类型)

admin1年前 (2024-04-18)数码104

Redis Sentinel is a distributed monitoring system that monitors the health status of multiple Redis nodes and automates failover and recovery when a node fails.

Election Mechanism

Redis Sentinel uses an election mechanism to choose a primary node and designate other nodes as replicas. When the primary node fails, the replicas automatically elect a new primary node, maintaining the availability of the cluster.

Node Failure Detection

Redis Sentinel periodically sends PING commands to all Redis nodes to detect their health status. If a node does not respond within a specified time,Sentinel marks that node as unavailable.

Subjective and Objective Down States

Redis Sentinel uses two concepts, subjective down and objective down, to determine if a node has failed.

Subjective Down: A node is marked as subjectively down when it fails to respond to PING commands for a certain number of times.
Objective Down: A node is marked as objectively down when multiple Sentinel nodes have marked it as subjectively down.

Failover

When the primary node is marked as objectively down, Sentinel elects a replica node to be the new primary node using the election mechanism.

The election process uses the Raft algorithm to ensure that the newly elected primary node is recognized by a majority of Sentinel nodes.

Automated Failover Recovery

After the new primary node is elected, Sentinel notifies other nodes to perform the failover.

Sentinel sends a SLAVEOFNOONE command to the new primary node, making itthe primary.
Sentinel sends SLAVEOF commands to other replicas, making them replicas of the new primary.
Sentinel updates the client configuration to connect to the new primary node.

Data Synchronization and Replication

During the failover process, the new primary node synchronizes its data with the replicas, ensuring data consistency.

Redis Sentinel uses Redis's replication mechanism to achieve data synchronization and replication. Replicas connect to the new primary node and receive data updates through a replication stream.

Client Redirection

When a failover occurs, clients might have disconnected from the primary node that was marked as unavailable.

Redis Sentinel returns a MOVED or ASK redirection directive to clients, instructing them to connect to the new primary node. This ensures that clients can re-establish their connection to the Redis cluster.

Ensuring High Availability

Master-Replica Replication

Redis Sentinel improves the system's availability and reliability by using master-replica replication, replicating data across multiple nodes.

Even if the primary node fails, replicas can still provide read and partial write services.

Multi-Node Monitoring

Redis Sentinel can simultaneously monitor multiple Redis nodes and automate handling when a node fails.

Multi-node monitoring ensures the health and high availability of the entire Redis cluster.

Automated Failover

Redis Sentinel minimizes the need for manual intervention by automating failover, improving the system's reliability and recovery time.

The election mechanism and data synchronization mechanism during failover ensure the correctness of the failover and data consistency.

Conclusion

Redis Sentinel is a robust monitoring and failover tool that monitors the health status of Redis nodes in real-time and automates failover and recovery when a node fails.

With its subjective and objective down state determination mechanisms, Sentinel can accurately determine a node's failure status. The automated failover recovery process with failover and data synchronization ensures the high availability and data consistency of the Redis cluster.

By configuring and using Redis Sentinel judiciously,you can enhance your system's reliability, scalability, and fault tolerance.

Redis中的Sentinel机制

Redis的Sentinel文档概述：Redis的Sentinel系统用于管理多个Redis服务器，该系统执行以下三个任务：如何使用？启动Sentinel 对于 redis-sentinel 程序，你可以用一下命令来启动Sentinel系统：对于 redis-server 程序，你可以用下面的命令来启动一个运行在Sentinel模式下的Redis服务器两种方式都可以启动一个sentinel实例，启动sentinel实例必须指定相应的配置文件，系统会使用配置文件来保存sentinel的当前状态，并在Sentinel重启时通过载入配置文件来进行状态还原。注意：如果启动Sentinel时没有指定相应的配置文件，或者指定的配置文件不可用（not writabel），那么Sentinel会拒绝启动。如何配置Sentinel？ Redis 源码中包含了一个名为的文件，这个文件是一个带有详细注释的 Sentinel 配置文件示例。运行一个 Sentinel 所需的最少配置如下所示：解读一下第一条指令的意思：其他选项的基本格式如下学到这里我们进行实操一下，感受一下哨兵的威力！我们先在test目录下，新建三个配置文件、、（Sentinel服务器端口号默认是在redis服务器前拼个2），用 vi 命令创建这三个配置文件，然后我们在配置文件中写入一些简单的配置：端口号，哨兵名称：mymaster，主机地址：127.0.0.1，监控的redis端口号：6379，必须要2台从Sentinel服务器同意才会切换master，并进行故障迁移。（注意，这三个配置文件监控的redis服务器端口都是6379）用相同的方法，创建了另外两个sentinel配置文件我们先启动一个6379作为master再启动6380、6381，作为两个slave接下来正菜上场了！启动Sentinel！可以发现，有两个slave正在跟随master，我们只要拿哨兵监控master，就可以看到有几个slave 我们继续启动，再接着启动两个Sentinel服务器现在我们做一个小实验：如果我们将master服务器（6379）关闭，两个slave之间会发生什么？当把master关闭之后，两个slave直接会有一段时间提示主服务器拒绝访问：而哨兵开始也没有立马进行选举投票，选出新master，因为redis选举默认配的时间是有些长的，要过一点时间才开始选举投票，经过重新选举之后，sentinel选择了6381作为新的master。那既然6381作为新秀，它应该有了很大的指导权，我们现在看看：我们可以看到，在6381中设置的数据，确实在6380中可以查的到！说明6380在跟随6381，说明哨兵自动帮我们实现了故障转移。我们再查看一下配置文件，看看有何变化？可以发现，和原来我们写进去的2句配置完全不一样了，也就是说哨兵会自己改动配置文件。现在的master是6381。接下来探讨一个问题：哨兵是如何发现其他哨兵的？答案是：发布订阅机制。活着的master会去查看slave是谁，然后会去订阅其他的slave 我们可以用 psubscribe 去查看相关的发布订阅情况

搞懂Redis (八) - 哨兵机制

哨兵的核心功能是主节点的自动故障转移下图是一个典型的哨兵集群监控的逻辑图

Redis Sentinel包含了若干个Sentinel 节点，这样做也带来了两个好处：

1、对于节点的故障判断是由多个sentinel节点共同完成，这样可以有效地防止误判 2、即使个别sentinel节点不可用，整个sentinel集群依然是可用的

哨兵实现了以下功能： 1、监控：每个sentinel节点会对数据节点（Redis master/slave节点）和其余sentinel节点进行监控 2、通知：sentinel节点会将故障转移的结果通知给应用方 3、故障转移：实现slave晋升为master，并维护后续正确的主从关系 4、配置中心：在Redis sentinel模式中，客户端在初始化的时候连接的是sentinel节点集合，从中获取主节点信息

其中，监控和自动故障转移功能，使得哨兵可以及时发现主节点故障并完成转移；而配置中心和通知功能，则需要在与客户端的交互中才能体现

1、原理监控 sentinel节点需要监控master、slave以及其他sentinel节点的状态。这一过程是通过Redis的pub\sub系统实现的。Redis sentinel一共有三个定时监控任务，完成对各个节点发现和监控：

主观/客观下线

主观下线

每个sentinel节点，每隔1s会对数据节点发送ping命令做心跳检测，当这些节点超过down-after-milliseconds没有进行有效回复时，sentinel节点会对该节点做失败判定，这叫主观下线

客观下线

客观下线，是指当大多数sentinel节点都认为master节点宕机了，那这个判定就是客观的，叫客观下线。那大多数是指什么呢？其实就是分布式协调中的quorum判定啦，大多数就是指半数。如哨兵数量是5，那大多数就是5/2+1=3个，哨兵数量是10大多数就是10/2+1=6个。注：sentinel节点的数量至少为3个，否则不满足quorum判定条件

哨兵选举

如果发生了客观下线，那哨兵节点会选举出一个leader来进行实际的故障转移工作。Redis使用了Raft算法来实现哨兵领导者选举，大致思路如下：

故障转移选举出的leader sentinel节点将负责故障转移，也就是进行master/slave节点的主从切换。故障转移，首先要从slave节点中筛选出一个作为新的master，主要考虑以下slave信息

注：Leader sentinel 节点，会从新的master节点那里得到一个configuration epoch，本质是个version版本号，每次主从切换的version号都必须是唯一的。其他的哨兵都是根据version来更新自己的master配置

免责声明：本文转载或采集自网络，版权归原作者所有。本网站刊发此文旨在传递更多信息，并不代表本网赞同其观点和对其真实性负责。如涉及版权、内容等问题，请联系本网，我们将在第一时间删除。同时，本网站不对所刊发内容的准确性、真实性、完整性、及时性、原创性等进行保证，请读者仅作参考，并请自行核实相关内容。对于因使用或依赖本文内容所产生的任何直接或间接损失，本网站不承担任何责任。

标签: Redis