Redis-Sentinel
标签:Redis

Sentinel

主从复制存在master出现故障后,需要手动处理故障,选择新的从节点作为master节点。

1. Redis Sentinel架构

sentinel不会存储数据,它只负责检测故障,和故障转移以及通知客户端,sentinel节点有多个,可以保证对节点的判断的公平性,以及高可用。对客户端,不会从redis直接获取数据,而是从sentinel获取数据。客户端不用知道谁是真的master,sentinel会告诉客户端谁是真的master。

故障转移

监控多套

使用一套sentinel可用监控多套master和slave,使用master-name进行配置。

2. 安装与配置

Step 1:配置开启主从节点

Step 2:配置开启sentinel监控主节点。(sentinel是特殊的redis)

Step 3:创建配置文件

redis-7000.conf :

port 7000
daemonize yes
pidfile /usr/local/redis/data/redis-7000.pid
logfile "7000.log"
dir "/usr/local/redis/data"
protected-mode no

redis-7001.conf:

daemonize yes
pidfile /usr/local/redis/data/redis-7001.pid
logfile "7001.log"
dir "/usr/local/redis/data"
slaveof 192.168.91.136 7000
protected-mode no

redis-7002.conf

port 7002
daemonize yes
pidfile /usr/local/redis/data/redis-7002.pid
logfile "7002.log"
dir "/usr/local/redis/data"
slaveof 192.168.91.136 7000
protected-mode no

Step 4:修改sentinel.conf 配置

cp sentinel.conf config/ #拷贝模板到config目录下
cat sentinel.conf | grep -v "#" |grep -v "^$" > redis-sentinel-26379.conf #将过滤掉注释和空格的配置重定向到新的文件

redis-sentinel-26379.conf

port 26379
daemonize yes
dir /usr/local/redis/data
logfile "26379.log"
sentinel monitor mymaster 192.168.91.136 7000 2
sentinel down-after-milliseconds mymaster 30000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
protected-mode no

Step 5:启动

# 在redis目录下
./bin/redis-sentinel ./config/redis-sentinel-26379.conf

Step 6:查看

# 在redis目录下
./bin/redis-cli -p 263709

Step 7:查看配置文件

# 在config目录下
cat redis-sentinel-26379.conf

会发现redis对配置文件进行了重写

port 26379
daemonize yes
dir "/usr/local/redis/data"
logfile "26379.log"
sentinel myid 5d7bfef985963848542600940b543b81db96d9b1
sentinel monitor mymaster 192.168.91.136 7002 2
sentinel down-after-milliseconds mymaster 10000
sentinel config-epoch mymaster 1
protected-mode no
# Generated by CONFIG REWRITE
sentinel leader-epoch mymaster 1
sentinel known-slave mymaster 192.168.91.136 7000
sentinel known-slave mymaster 192.168.91.136 7001
sentinel current-epoch 0

上面删除了一些默认配置,从第9行开始添加了新的配置信息。

Step 8:复制redis-sentinel.conf

sed "s/26379/26380/g" redis-sentinel-26379.conf > redis-sentinel-26380.conf
sed "s/26379/26381/g" redis-sentinel-26379.conf > redis-sentinel-26381.conf

然后再对上面这两个sentinel进行启动。

3. 客户端连接

3.1 原理

Step 1:

客户端获取所有的Sentinel节点集合和masterName,然后遍历Sentinel节点集合,获取一个可用的Sentinel节点。

Step 2:

客户端找到了一个sentinel-k节点,获取相关的master信息,sentinel会定时的去查询master信息。

Step 3:

最后客户端获取到master节点,还要执行role和role replication再次验证节点角色信息。

Step 4:

当master发生变化时,sentinel是可以感知到的,它们之间通过发布订阅的方式来获取最新的master信息,客户端去订阅sentinel的某个频道,获取最新的master信息,再去进行连接。

3.2 接入流程

  1. Sentinel地址集合
  2. masterName
  3. 不是代理模式(在master修改之前没必要每次都去sentinel获取master信息)

3.3 Jedis

JedisSentinelPool sentinelPool = new JedisSentinelPool(masterName,sentinelSet,poolConfig,timeout);
Jedis jedis=null;
try{
    jedis=redisSentinelPool.getResource();
    //jedis command
}catch(Exception e){
    logger.error(e.getMessage(),e);
}finally{
    if(jedis != null){
        jedis.close();
    }
}

3.4 redis-py

from redis.sentinel import Sentinel
sentinel = Sentinel([('localhost',26379),('localhost',26380),('localhost',26381)],socket_timeout=0.1)
sentinel.discover_master('mymaster')
sentinel.dicover_slaves('mymaster')

4. 故障转移

测试代码:

package com.liuyao;

import lombok.extern.slf4j.Slf4j;
import redis.clients.jedis.Jedis;
import redis.clients.jedis.JedisSentinelPool;

import java.util.HashSet;
import java.util.Random;
import java.util.Timer;
import java.util.concurrent.TimeUnit;

/**
 * Created By liuyao on 2018/6/14 17:03.
 */
@Slf4j
public class RedisSentinelFailoverTest {
    public static void main(String[] args) {
        String masterName = "mymaster";
        HashSet<String> sentinels = new HashSet<>();
        sentinels.add("192.168.91.136:26379");
        sentinels.add("192.168.91.136:26380");
        sentinels.add("192.168.91.136:26381");
        JedisSentinelPool jedisSentinelPool = new JedisSentinelPool(masterName, sentinels);
        int n = 0;
        while (true) {

            Jedis jedis = null;
            try {
                jedis = jedisSentinelPool.getResource();
                int index = new Random().nextInt(10000);
                String key = "k-" + index;
                String value = "v-" + index;
                jedis.set(key, value);
                n++;
                if (n % 100 == 0) {
                    log.info("{},{}", key, jedis.get(key));
                }
                TimeUnit.MILLISECONDS.sleep(10);
            } catch (Exception e) {
                log.info("{}", e.getMessage());
            } finally {
                if (jedis != null) {
                    jedis.close();
                }
            }
        }
    }
}

测试结果是:

六月 14, 2018 10:45:17 下午 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Trying to find master from available Sentinels...
六月 14, 2018 10:45:17 下午 redis.clients.jedis.JedisSentinelPool initSentinels
信息: Redis master running at 192.168.91.136:7000, starting Sentinel listeners...
六月 14, 2018 10:45:18 下午 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 192.168.91.136:7000
22:45:19.075 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4553,v-4553
22:45:20.114 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-3739,v-3739
22:45:21.148 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4176,v-4176
22:45:22.182 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-1727,v-1727
22:45:23.222 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-7951,v-7951
22:45:24.255 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-2284,v-2284
22:45:25.292 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-1070,v-1070
22:45:26.335 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-6078,v-6078
22:45:27.381 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-9107,v-9107
22:45:28.420 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-6236,v-6236
22:45:29.456 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-8493,v-8493
22:45:30.506 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-6578,v-6578
22:45:31.551 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-1610,v-1610
22:45:32.607 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-2974,v-2974
22:45:33.651 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-5087,v-5087
22:45:34.697 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-3468,v-3468
22:45:35.736 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-2507,v-2507
22:45:36.782 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-7331,v-7331
22:45:37.824 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4611,v-4611
22:45:38.880 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-926,v-926
22:45:39.912 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-2248,v-2248
22:45:40.955 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-6121,v-6121
22:45:41.999 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-2163,v-2163
22:45:43.039 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4506,v-4506
22:45:44.082 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-3175,v-3175
22:45:45.132 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-3269,v-3269
22:45:46.189 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4802,v-4802
22:45:47.227 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-471,v-471
22:45:47.999 [main] INFO com.liuyao.RedisSentinelFailoverTest - Unexpected end of stream.
22:45:49.004 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:50.015 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:51.016 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:52.028 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:53.039 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:54.041 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:55.053 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:56.055 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:57.057 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:58.059 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
六月 14, 2018 10:45:58 下午 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 192.168.91.136:7002
六月 14, 2018 10:45:58 下午 redis.clients.jedis.JedisSentinelPool initPool
信息: Created JedisPool to master at 192.168.91.136:7002
22:45:59.060 [main] INFO com.liuyao.RedisSentinelFailoverTest - Could not get a resource from the pool
22:45:59.328 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4156,v-4156
22:46:00.375 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-3059,v-3059
22:46:01.423 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-1983,v-1983
22:46:02.469 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4415,v-4415
22:46:03.518 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-8249,v-8249
22:46:04.560 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-5494,v-5494
22:46:05.605 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-3830,v-3830
22:46:06.652 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-2295,v-2295
22:46:07.701 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-609,v-609
22:46:08.742 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-6854,v-6854
22:46:09.783 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-4389,v-4389
22:46:10.832 [main] INFO com.liuyao.RedisSentinelFailoverTest - k-6122,v-6122

上面在中途关闭了原来的master,7000可见sentinel在一段时间后,自动切换了master节点为7002。

4.1 日志分析

7000:

7001:

7002:

26379:

26380:

26381:

5. 三个定时任务

  1. 每10秒每个sentinel对master和slave执行info
    1. 发现slave节点,我们没有配置slave节点,它通过对master节点info操作获取
    2. 确认主从关系

  1. 每2秒每隔sentinel通过master节点的channel交换信息(pub/sub)

    1. 通过 _snetinel_:hello频道交互
    2. 交互对节点的 “看法” 和自身信息

    1. 每1秒每隔sentinel对其他sentinel和redis执行ping
      1. 心跳检测,失败判定依据

6. 主观下线和客观下线

区分多套master,quorum:仲裁人数,客户端下线要达到仲裁人数,一般sentinel节点设置成奇数,quorum为 n/2 +1个

sentinel monitor <masterName> <ip> <port> <quorum>
snetinel monitor myMaster 192.168.91.136 6379 2

主观判断master和slave下线,ping30秒后没收到就表示下线

sentinel down-after-milliseconds <masterName> <timeout>
sentinel down-after-milliseconds mymaster 30000

**主观下线:**每个sentinel节点对Redis节点失败的“偏见”

**客观下线:**所有sentinel接待对Redis节点失败“达成共识”(超过quorum个则表示统一意见)

sentinel之间会执行 sentinel is-master-down-by-addr 去询问其他的sentinel节点对应的master是否下线。

7. 领导者选择

**原因:**只有一个sentinel节点完成故障转移

**选举:**通过 sentinel is-master-down-by-addr命令都希望成为领导者

  1. 每个做主观下线的Sentinel节点向其他Sentinel节点发送命令,要求将他设置为领导者
  2. 收到命令的Sentinel节点如果没有同意其他Sentinel节点发送的命令,那么将同意该请求,否者拒绝。
  3. 如果该Sentinel节点发现自己的票数已经超过Sentinel集合半数且超过quorum,那么它将成为领导者
  4. 如果此过程有多个Sentinel节点成为领导者,那么将等待一段时间重新进行选举。

8. 故障转移

  1. 从slave节点中选出一个 合适的节点作为新的master节点

  2. 对上面的slave节点执行 slaveof no one命令让其成为master节点

  3. 向剩余的slave节点发送命令,让他们成为新的master节点的slave节点,复制规则和parallel-sync参数有关。

  4. 更新对原来的master节点配置为slave,并保持对其“关注”,当其恢复后命令它去复制新的额master节点。

    选择合适的slave节点:

    1. 选择slave-priority(slave节点优先级)最高的slave节点,如果存在则返回,不存在则继续。(一般不配置,除非想要某台作为slave,该slave的配置比较高)
    2. 选择复制偏移量最大的slave节点(复制的最完整),如果存在则返回,不存在则继续。(因为master已经挂掉了,可能数据出现了问题,选择偏移量最大的)
    3. 选择runid最小的slave节点,启动最早的节点。

9. 运维

9.1 主节点运维

执行手动的故障转移,直接设置一个新的master,给任意的一个sentinel去执行,就会完成故障转移

sentinel failover <masterName>

  • 12 min read

CONTRIBUTORS


  • 12 min read