How to setup HAProxy and Redis Sentinel for automatic failover between Redis Master and Slave servers -- Redis中国用户组（CRUG）

Environment

Product

Version

Redis Sentinel

HAProxy

Purpose

Users can easily setup and configure Redis Sentinel to allow for automatic failover between the master and slave Redis replica instances for high availability with the use of HAProxy server.

Redis Sentinel is designed to monitor and manage automatic failover of Redis instances by performing the following tasks according to Redis Sentinel Documentation: http://redis.io/topics/sentinel

Monitoring: Sentinel constantly checks if your master and slave instances are working as expected
Notification: Sentinel can notify the system administrator, or another computer program, via an API, that something is wrong with one of the monitored Redis instances
Automatic failover: If a master is not working as expected, Sentinel can start a failover process where a slave is promoted to master. The other additional slaves are reconfigured to use the new master, and the applications using the Redis server informed about the new address to use when connecting.
Configuration provider: Sentinel acts as a source of authority for client’s service discovery. Clients connect to Sentinels in order to ask for the address of the current Redis master responsible for a given service. If a failover occurs, Sentinels would report the new address.

We can use Sentinel to monitor and handle automatic failover of Redis Master by promoting the slave to master when the master goes down.

HAProxy 1.5+ comes with a new built-in TCP health check feature for Redis to perform an automatic failover. To avoid having to change Redis IP/Port in the front-end client application after each failover, setup HAProxy with the TCP health check to test if a Redis instance is a master or slave. This setup is needed to make sure it only proxy connections to an active master instance.

For this simple demo, we are using Redis-3.0.1 and HAProxy-1.5.12 as the front-end load balancer for a tcServer/Tomcat7 server for caching and automatic failover of users HTTP sessions between backend Redis Servers all running on a Ubuntu 14.04 host.

Layout diagram for Redis Sentinel and HAProxy for TCP load balancing between Redis servers:

---6378------6379/6380---<Redis\_6379 | Redis\_6380>--- * HAProxy - A front-end TCP load balancer for proxying incoming client application connections on port 6378 to backend Redis master server. * Redis_6379 - Master server on port 6379 * Redis_6380 - Slave server on port 6380 * Sentinel - Sentinel running on port 26379 to monitoring the Master and Slave servers and promote the Slave to Master role when the Master is down. If you do not have Redis Master/Slave or HAProxy servers install yet, see: * [How to setup Redis Master and Slave replication](https://discuss.zendesk.com/hc/en-us/articles/205309278-How-to-setup-Redis-Master-and-Salve-replication) KB article * [Web Server Load-Balancing with HAProxy on Ubuntu 14.04 ](https://www.howtoforge.com/tutorial/ubuntu-load-balancer-haproxy/) **Procedure** **Setup and deploy HAProxy load balancer for Redis HA** 1) Update /etc/haproxy/haproxy.conf file with the following parameter values. It is recommended to add comments for keeping instruction (starting with #). ``` # Specifies TCP timeout on connect for use by the frontend ft_redis # Set the max time to wait for a connection attempt to a server to succeed # The server and client side expected to acknowledge or send data.defaults REDIS mode tcp timeout connect 3s timeout server 6s timeout client 6s# Specifies listening socket for accepting client connections using the default # REDIS TCP timeout and backend bk\_redis TCP health check.frontend ft\_redis bind *:6378 name redis default\_backend bk\_redis# Specifies the backend Redis proxy server TCP health settings # Ensure it only forward incoming connections to reach a master.backend bk_redis option tcp-check tcp-check connect tcp-check send PING\\r\\n tcp-check expect string +PONG tcp-check send info\ replication\\r\\n tcp-check expect string role:master tcp-check send QUIT\\r\\n tcp-check expect string +OK server redis_6379 localhost:6379 check inter 1s server redis_6380 localhost:6380 check inter 1s ``` 2) Start up haproxy server ``` /etc/init.d/haproxy start ``` **Setup and deploy Redis Sentinel for automatic failover should be between Redis Master and Slave servers** 1) Create a /etc/redis/sentinel_26379.conf files with the following lines: ``` # Specifies the Sentinel port, log file, and a work directory and run it a daemon service.port 26379 daemonize yes pidfile "/var/run/sentinel_26379.pid" logfile "/var/log/sentinel_26379.log" dir /var/lib/redis/sentinel\_26379# Monitor redis\_6379 master running on port 6379 and consider it in 0DOWN state with at # least 1 quorum sentinels agree in order to start a failover. # Sentinel will auto discover the slaves and rewrite the configuration files to add the slaves # So you do not need to specify slaves. Also, note that the configuration file is rewritten. # When a slave is promoted to master state.sentinel monitor master01 localhost 6379 1# Specifies the password to use to authenticate with the master or slaves if required. # For Redis instances mixed with 'auth' and 'nonauth', you need to ensure to set the same # Password is required in all the instances. The demo Redis servers require no password.#sentinel auth-pass myredis mypass# Specifies the number of milliseconds the master, slave or sentinel should be consider # down and unreachable in SDOWN state.sentinel down-after-milliseconds master01 3000# Specifies the failover timeout in milliseconds that because use in the following ways: # * The time needed to restart a failover after a previous failover attempted on the same master # * The time needed for a slave replicating to a wrong master with current configuration, to # Force to replicate with the right master. # * The time needed to cancel a failover that is already in progress but did not produce any # Configuration change (The slave has not yet acknowledged as SLAVEOF NO ONE) # * The max time a failover in progress waits for all the slaves to be reconfigured as slaves of the new master.sentinel failover-timeout master01 6000# Specify how many slaves we can reconfigure to point to the new slave during the failover. # Recommends a low number if the slaves are used to serve query in order to avoid the slaves to # Becomes unreachable while performing the synchronization with the master.sentinel parallel-syncs master01 6379 1 ``` 2) Create an executable Sentinel service script ``` /etc/init.d/sentinel_26379 #!/bin/bash # Start/Stop/restart script for Redis Sentinel NAME=\`basename ${0}\` EXEC=/usr/local/bin/redis-server PIDFILE="/var/run/${NAME}.pid" CONF="/etc/redis/${NAME}.conf" PID=\`cat $PIDFILE 2> /dev/null\` case "$1" in start) echo "Starting $NAME ..." touch $PIDFILE exec $EXEC $CONF --sentinel --pidfile $PIDFILE ;; stop) echo "Stopping $NAME PID: $PID ..." kill $PID ;; restart) echo "Restarting $NAME ..." $0 stop sleep 2 $0 start ;; *) echo "Usage $0 {start|stop|restart}" ;; esac ``` 3) Create a Sentinel work directory and Start up the Sentinel, Redis Master and Slave services ``` mkdir /var/lib/redis/sentinel_26379 /etc/init.d/sentinel_26379 start /etc/init.d/redis_6379 start /etc/init.d/redis_6380 start ``` **Testings HAProxy and Redis Automatic failover to make they are work as expected.** tail -f /var/log/sentinel\_26379.log on a separate console to monitor sentinel log messages. Check redis\_6379 replication info: ``` > redis-cli -p 6379 info replication # Replication role:master connected_slaves:0 master\_repl\_offset:0 repl\_backlog\_active:0 repl\_backlog\_size:1048576 repl\_backlog\_first\_byte\_offset:0 repl\_backlog\_histlen:0 ``` The server is a master without a slave. Sentinel\_26379.log shows redis\_6379 is the master and up and running in ODOWN state: ``` ... 26956:X 11 May 23:05:23.632 # -sdown master master01 ::1 6379 26956:X 11 May 23:05:23.632 # -odown master master01 ::1 6379 ``` Check the redis_6380 replication info: > redis-cli -p 6380 info replication ``` # Replication role:slave master_host:127.0.0.1 master_port:6379 master\_link\_status:up master\_last\_io\_seconds\_ago:0 master\_sync\_in_progress:0 slave\_repl\_offset:42577 slave_priority:100 slave\_read\_only:1 connected_slaves:0 ... ``` The server is a read_only slave, and it master is on host ip 127.0.0.1 and port 6379. Sentinel\_26379.log shows it detected a slave redis\_6380 and rewrote the sentinel configuration file to added it to the redis_6379 master: ``` 26956:X 11 May 23:07:34.172 * +slave slave \[::1\]:6380 ::1 6380 @ master01 ::1 6379 26956:X 11 May 23:07:44.244 * +fix-slave-config slave \[::1\]:6380 ::1 6380 @ master01 ::1 6379 ``` If it not startup as a Slave server, make sure the configuration file has this directive ``` slaveof localhost 6379 ``` and restart it. Check redis_6379 replication info: ``` > redis-cli -p 6379 info replication # Replication role:master connected_slaves:1 slave0:ip=127.0.0.1,port=6380,state=online,offset=60477,lag=1 master\_repl\_offset:60477 repl\_backlog\_active:1 repl\_backlog\_size:1048576 repl\_backlog\_first\_byte\_offset:2 repl\_backlog\_histlen:60476 ``` Now, we have a master server with one running slave connected. **Let us test the automatic failover by simulating a crash on the master server and let the sentinel automatically promote the slave to a master role.** Crashed redis_6379 Master: ``` > redis-cli -p 6379 debug segfault Error: Server closed the connection ``` Check the redis_6380 slave replication info: ``` > redis-cli -p 6380 info server # Replication role:master connected_slaves:0 master\_repl\_offset:0 repl\_backlog\_active:0 repl\_backlog\_size:1048576 repl\_backlog\_first\_byte\_offset:0 repl\_backlog\_histlen:0 ``` The server is converted to a master role without no slave connected. Sentinel\_26379.log show it detected the master redis\_6379 is down and elected (base on quorum of 1) to failover by promoting the slave redis\_6380 server to master role by making the same a slave of no one. It also rewrote the sentinel configuration to make the failed master as a slave and the slave redis\_6380 to be the new master. ``` 26956:X 11 May 23:09:52.566 # +sdown master master01 ::1 6379 26956:X 11 May 23:09:52.566 # +odown master master01 ::1 6379 #quorum 1/1 26956:X 11 May 23:09:52.566 # +new-epoch 5 26956:X 11 May 23:09:52.566 # +try-failover master master01 ::1 6379 26956:X 11 May 23:09:52.571 # +vote-for-leader 49c1cc2d0492c9bf60ad0dfc8258ee01986fae4b 5 26956:X 11 May 23:09:52.571 # +elected-leader master master01 ::1 6379 26956:X 11 May 23:09:52.571 # +failover-state-select-slave master master01 ::1 6379 26956:X 11 May 23:09:52.633 # +selected-slave slave \[::1\]:6380 ::1 6380 @ master01 ::1 6379 26956:X 11 May 23:09:52.634 * +failover-state-send-slaveof-noone slave \[::1\]:6380 ::1 6380 @ master01 ::1 6379 26956:X 11 May 23:09:52.717 * +failover-state-wait-promotion slave \[::1\]:6380 ::1 6380 @ master01 ::1 6379 26956:X 11 May 23:09:53.621 # +promoted-slave slave \[::1\]:6380 ::1 6380 @ master01 ::1 6379 26956:X 11 May 23:09:53.622 # +failover-state-reconf-slaves master master01 ::1 6379 26956:X 11 May 23:09:53.692 # +failover-end master master01 ::1 6379 26956:X 11 May 23:09:53.692 # +switch-master master01 ::1 6379 ::1 6380 26956:X 11 May 23:09:53.693 * +slave slave \[::1\]:6379 ::1 6379 @ master01 ::1 6380 26956:X 11 May 23:09:56.726 # +sdown slave \[::1\]:6379 ::1 6379 @ master01 ::1 6380 ``` When we fixed and start up the crashed redis\_6379 server, it should become a slave and should automatically are added to redis\_6380 master. ``` > /etc/init.d/redis/redis_6379 start ``` Wait a few seconds before checking the server replication info ``` > redis-cli -p 6379 info replication # Replication role:slave master_host:::1 master_port:6380 master\_link\_status:up master\_last\_io\_seconds\_ago:1 master\_sync\_in_progress:0 slave\_repl\_offset:4122 slave_priority:100 slave\_read\_only:1 connected_slaves:0 ... ``` The sentinel log shows it detected and converted the server into a slave of the redis_6380 master server. ``` 26956:X 11 May 23:23:59.963 # -sdown slave \[::1\]:6379 ::1 6379 @ master01 ::1 6380 26956:X 11 May 23:24:09.944 * +convert-to-slave slave \[::1\]:6379 ::1 6379 @ master01 ::1 6380 ``` Check redis_6380 replication info: ``` > redis-cli -p 6380 info replication # Replication role:master connected_slaves:1 slave0:ip=::1,port=6379,state=online,offset=61410,lag=1 master\_repl\_offset:61410 repl\_backlog\_active:1 repl\_backlog\_size:1048576 repl\_backlog\_first\_byte\_offset:2 repl\_backlog\_histlen:61409 ``` The new failover master server now has one redis_6379 slave connected. We can also connect to the Sentinel on port 26379 to monitor and manage the Redis master/slave servers. For example, list all monitor masters and their state: ``` > redis-cli -p 26379 sentinel masters 1) 1) "name" 2) "master01" 3) "ip" 4) "::1" 5) "port" 6) "6380" 7) "runid" 8) "d43ff591a1fe556fcd345eb462588c6570c47df5" 9) "flags" 10) "master" 11) "pending-commands" 12) "0" 13) "last-ping-sent" 14) "0" 15) "last-ok-ping-reply" 16) "278" 17) "last-ping-reply" 18) "278" 19) "down-after-milliseconds" 20) "3000" 21) "info-refresh" 22) "3451" 23) "role-reported" 24) "master" 25) "role-reported-time" 26) "1159212" 27) "config-epoch" 28) "5" 29) "num-slaves" 30) "1" 31) "num-other-sentinels" 32) "0"360 33) "quorum" 34) "1" 35) "failover-timeout" 36) "9000" 37) "parallel-syncs" 38) "1" ``` For other sentinel commands, see [http://redis.io/topics/sentinel-clients](http://redis.io/topics/sentinel-clients) **Perform one more test to make sure our HAProxy, Sentinel and Redis HA with automatic failover are working correctly.** On a separate console, use a redis-cli client to connect to HAProxy server on port 6378 connect to master Redis and it replication information every 2 seconds: ``` > while true; do redis-cli -p 6378 info replication; redis-cli -p 6378 info server | grep tcp_port; sleep 2; clear; done # Replication role:master connected_slaves:1 slave0:ip=::1,port=6380,state=online,offset=1741860,lag=0 master\_repl\_offset:1741860 repl\_backlog\_active:1 repl\_backlog\_size:1048576 repl\_backlog\_first\_byte\_offset:693285 repl\_backlog\_histlen:1048576 tcp_port:6380 ``` Notice we are getting the master redis_6380 replication info through HAProxy. Lets crash this master and see if auto-failover is working and we still can connect to the new master. On another console, crash the master: ``` > redis-cli -p 6380 debug segfault Error: Server closed the connection ``` Back on the other console with redis-cli client connecting to HAProxy: ``` Error: Server closed the connection ``` It shows the master on port 6380 is down. After couple seconds, the automatic failover is completed and we should be able to connect to the newly promoted master server running on port 6379 and get it replication info. ``` # Replication role:master connected_slaves:0 master\_repl\_offset:0 repl\_backlog\_active:0 repl\_backlog\_size:1048576 repl\_backlog\_first\_byte\_offset:0 repl\_backlog\_histlen:0 tcp_port:6379 ``` Now we restart up the crashed redis_6380: ``` /etc/init.d/redis_6380 start ``` It will automatically be added to the new master as a slave. ``` # Replication role:master connected_slaves:1 slave0:ip=::1,port=6380,state=online,offset=1282,lag=1 master\_repl\_offset:1405 repl\_backlog\_active:1 repl\_backlog\_size:1048576 repl\_backlog\_first\_byte\_offset:2 repl\_backlog\_histlen:1404 tcp_port:6379 ``` We have successfully setup HAProxy with Redis Master/Slave Replicas for HA using Sentinel for automatic failover. **Additional Information** * [How to setup Redis Session Manager on tcServer/Tomcat](https://discuss.zendesk.com/hc/en-us/articles/206085337-How-to-setup-Redis-Session-Manager-on-tcServer-Tomcat) KB article * [How to setup Redis HA Cluster ](https://discuss.zendesk.com/hc/en-us/articles/205309278-How-to-setup-Redis-Master-and-Salve-replication)KB article * [http://www.symantec.com/connect/blogs/configuring-redis-high-availability](http://www.symantec.com/connect/blogs/configuring-redis-high-availability)