2. 原理
单个redis sentinel(哨兵)架构:
多个redis sentinel(哨兵)架构:
多个哨兵,不仅同时监控主从数据库,而且哨兵之间互为监控。
3. 配置redis sentinel(哨兵)前的主从环境
当前处于一主多从的环境中: [root@master redis-master-slave]# ps -ef | grep redis root 2027 1 0 07:30 ? 00:00:00 /usr/local/bin/redis-server 127.0.0.1:6380 root 2031 1 0 07:31 ? 00:00:00 /usr/local/bin/redis-server 127.0.0.1:6381 root 2036 1 0 07:31 ? 00:00:00 /usr/local/bin/redis-server 127.0.0.1:6382
127.0.0.1:6380> info replication
# Replication role:master connected_slaves:2 slave0:ip=127.0.0.1,port=6381,state=online,offset=113,lag=1 slave1:ip=127.0.0.1,port=6382,state=online,offset=113,lag=1 master_repl_offset:113 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:112
4. 配置哨兵
把源码中的sentinel.conf配置文件拷贝到指定目录,我这里是拷贝到/opt/redis/redis-master-slave [root@master redis-master-slave]# cp /opt/redis/redis-3.2.1/sentinel.conf /opt/redis/redis-master-slave/
修改sentinel.conf:
sentinel monitor mymaster 127.0.0.1 6379 2 改为 sentinel monitor mymaster 127.0.0.1 6380 1
启动哨兵进程:
[root@master redis-master-slave]# /usr/local/bin/redis-sentinel /opt/redis/redis-master-slave/sentinel.conf
由上图可以看到:
1、 哨兵已经启动,它的id为236f14b361fc5a0dc0621cf88823ed6e6252b2f3 2、 为master数据库添加了一个监控 3、 发现了2个slave(由此可以看出,哨兵无需配置slave,只需要指定master,哨兵会自动发现slave)
5. 从Redis宕机测试
把其中一个从redis 的进程6382 kill掉, [root@master hadoop]# ps -ef | grep redis root 2027 1 0 07:30 ? 00:00:02 /usr/local/bin/redis-server 127.0.0.1:6380 root 2031 1 0 07:31 ? 00:00:02 /usr/local/bin/redis-server 127.0.0.1:6381 root 2036 1 0 07:31 ? 00:00:02 /usr/local/bin/redis-server 127.0.0.1:6382 root 2043 1937 0 07:32 pts/1 00:00:00 /usr/local/bin/redis-cli -p 6380 root 2044 1950 0 07:32 pts/2 00:00:00 /usr/local/bin/redis-cli -p 6381 root 2045 1965 0 07:32 pts/3 00:00:00 /usr/local/bin/redis-cli -p 6382 root 2461 1817 0 08:14 pts/0 00:00:00 /usr/local/bin/redis-sentinel *:26379 [sentinel] root 2471 2179 0 08:18 pts/4 00:00:00 grep redis [root@master hadoop]# kill -9 2036
30秒后哨兵的控制台输出:
2461:X 18 Jul 08:19:49.922 # +sdown slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6380 说明已经监控到刚才我人kill掉的slave宕机了
启动刚才kill掉的6382
[root@master redis-master-slave]# /usr/local/bin/redis-server /opt/redis/redis-master-slave/6382/redis.conf
观察sentinel控制台输出:
2461:X 18 Jul 08:24:27.580 * +reboot slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6380 2461:X 18 Jul 08:24:27.631 # -sdown slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6380
可以看出,slave重新加入到了主从复制中。-sdown:说明是恢复服务。
127.0.0.1:6380> info replication # Replication role:master connected_slaves:2 slave0:ip=127.0.0.1,port=6381,state=online,offset=67317,lag=0 slave1:ip=127.0.0.1,port=6382,state=online,offset=67184,lag=0 master_repl_offset:67317 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:67316
6. 主Redis宕机测试
把主redis的进程kill掉,观察sentinel控制台输出信息: 2461:X 18 Jul 08:29:19.276 # +sdown master mymaster 127.0.0.1 6380【 master服务已经宕机 】 2461:X 18 Jul 08:29:19.276 # +odown master mymaster 127.0.0.1 6380 #quorum 1/1 2461:X 18 Jul 08:29:19.276 # +new-epoch 1 2461:X 18 Jul 08:29:19.276 # +try-failover master mymaster 127.0.0.1 6380【尝试恢复master】 2461:X 18 Jul 08:29:19.282 # +vote-for-leader 236f14b361fc5a0dc0621cf88823ed6e6252b2f3 1 【 投票选举哨兵leader,现在就一个哨兵所以leader就自己 】 2461:X 18 Jul 08:29:19.282 # +elected-leader master mymaster 127.0.0.1 6380 【选出leader】 2461:X 18 Jul 08:29:19.282 # +failover-state-select-slave master mymaster 127.0.0.1 6380 【选中其中的一个slave当做master】 2461:X 18 Jul 08:29:19.345 # +selected-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380【选中6381当master】 2461:X 18 Jul 08:29:19.345 * +failover-state-send-slaveof-noone slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380 【发送slaveof no one命令】 2461:X 18 Jul 08:29:19.401 * +failover-state-wait-promotion slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380 【等待升级master】 2461:X 18 Jul 08:29:20.291 # +promoted-slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380 【升级6381为master】 2461:X 18 Jul 08:29:20.291 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6380 2461:X 18 Jul 08:29:20.361 * +slave-reconf-sent slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6380 2461:X 18 Jul 08:29:20.666 * +slave-reconf-inprog slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6380 2461:X 18 Jul 08:29:21.694 * +slave-reconf-done slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6380 2461:X 18 Jul 08:29:21.770 # +failover-end master mymaster 127.0.0.1 6380【 故障恢复完成 】 2461:X 18 Jul 08:29:21.770 # +switch-master mymaster 127.0.0.1 6380 127.0.0.1 6381【master 从6380切换到6381】 2461:X 18 Jul 08:29:21.771 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381 【添加6382为6381的从库】 2461:X 18 Jul 08:29:21.771 * +slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 【添加6380为6381的从库】 2461:X 18 Jul 08:29:51.828 # +sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 【 发现6380已经宕机,等待6380的恢复 】
查看主从复制信息可以看出,现在6381为master,有一个从库
127.0.0.1:6381> info replication # Replication role:master connected_slaves:1 slave0:ip=127.0.0.1,port=6382,state=online,offset=35213,lag=1 master_repl_offset:35346 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:35345
把6380恢复
[root@master redis-master-slave]# /usr/local/bin/redis-server /opt/redis/redis-master-slave/6380/redis.conf 观察sentinel控制台输出信息: 2461:X 18 Jul 08:42:35.202 # -sdown slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 2461:X 18 Jul 08:42:45.207 * +convert-to-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381
查看主从复制信息:
127.0.0.1:6381> info replication # Replication role:master connected_slaves:2 slave0:ip=127.0.0.1,port=6382,state=online,offset=56432,lag=0 slave1:ip=127.0.0.1,port=6380,state=online,offset=56432,lag=0 master_repl_offset:56432 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:56431
7. 配置多个哨兵
编辑sentinel.conf,设置两个监听 sentinel monitor mymaster2 127.0.0.1 6381 2 sentinel monitor mymaster 127.0.0.1 6381 1
启动sentinel服务
[root@master redis-master-slave]# /usr/local/bin/redis-sentinel /opt/redis/redis-master-slave/sentinel.conf
[root@master redis-master-slave]# ps -ef | grep redis
root 2031 1 0 07:31 ? 00:00:05 /usr/local/bin/redis-server 127.0.0.1:6381 root 2486 1 0 08:24 ? 00:00:02 /usr/local/bin/redis-server 127.0.0.1:6382 root 2536 1 0 08:42 ? 00:00:01 /usr/local/bin/redis-server 127.0.0.1:6380 root 2579 1817 0 08:58 pts/0 00:00:00 /usr/local/bin/redis-sentinel *:26379 [sentinel] root 2603 2179 0 09:01 pts/4 00:00:00 grep redis
kill掉一个从redis 6382,观察sentinel控制台输出信息
2579:X 18 Jul 09:01:56.999 # +sdown slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster2 127.0.0.1 6381 2579:X 18 Jul 09:01:56.999 # +sdown slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381 恢复6382 2579:X 18 Jul 09:10:40.223 * +reboot slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster2 127.0.0.1 6381 2579:X 18 Jul 09:10:40.275 # -sdown slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster2 127.0.0.1 6381 2579:X 18 Jul 09:10:40.478 * +reboot slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:10:40.561 # -sdown slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381
停止主redis
2579:X 18 Jul 09:12:29.324 # +sdown master mymaster2 127.0.0.1 6381 2579:X 18 Jul 09:12:29.324 # +sdown master mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:29.324 # +odown master mymaster 127.0.0.1 6381 #quorum 1/1 2579:X 18 Jul 09:12:29.324 # +new-epoch 2 2579:X 18 Jul 09:12:29.324 # +try-failover master mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:29.329 # +vote-for-leader 236f14b361fc5a0dc0621cf88823ed6e6252b2f3 2 2579:X 18 Jul 09:12:29.329 # +elected-leader master mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:29.329 # +failover-state-select-slave master mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:29.382 # +selected-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:29.382 * +failover-state-send-slaveof-noone slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:29.459 * +failover-state-wait-promotion slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:30.113 # +promoted-slave slave 127.0.0.1:6380 127.0.0.1 6380 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:30.113 # +failover-state-reconf-slaves master mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:30.162 * +slave-reconf-sent slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:30.338 * +slave-reconf-inprog slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:31.343 * +slave-reconf-done slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:31.410 # +failover-end master mymaster 127.0.0.1 6381 2579:X 18 Jul 09:12:31.410 # +switch-master mymaster 127.0.0.1 6381 127.0.0.1 6380 2579:X 18 Jul 09:12:31.411 * +slave slave 127.0.0.1:6382 127.0.0.1 6382 @ mymaster 127.0.0.1 6380 2579:X 18 Jul 09:12:31.411 * +slave slave 127.0.0.1:6381 127.0.0.1 6381 @ mymaster 127.0.0.1 6380
查看主从复制信息
127.0.0.1:6380> info replication # Replication role:master connected_slaves:1 slave0:ip=127.0.0.1,port=6382,state=online,offset=3671,lag=1 master_repl_offset:3671 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:3670