Issue with Redis replication while transferring rdb file from master to slave.

manohar ht

unread,

Apr 2, 2014, 6:45:11 AM4/2/14

to redi...@googlegroups.com

Hi,

we are facing issue with redis replication wherein the transfer of rdb file from master to slave after bgsave has been completed slows down to very low speed to ~20KB/sec, even though first 900MB of data transfer is done at 20 MB/sec, while the transfer of rdb slows down, redis takes lot of time to execute simple auth command and we start seeing clients timeouts. We have a dataset of 47GB and both master and slave are running on redis 2.6.5. Master is a production server with approx 1.5k client and with ~20k ops/sec under normal load. Client output buffer limit is disabled.

There is no issue with the network as i see no packet drops and normal file transfer over network can be done at 64MB/sec.

Has anyone come across something similar, or any suggestions ? This is a bit critical.

Thanks,

Manohar.

Josiah Carlson

unread,

Apr 2, 2014, 12:11:21 PM4/2/14

to redi...@googlegroups.com

Have you checked the memory use on your master? Have you checked the outgoing buffer sizes on your master?

Based on what you are describing, it's possible that your master is running out of memory during the sync process and is swapping. Without more information, this is what I suspect is happening:

1. An rdb is created and starts streaming to your slave

2. All write commands to your master are held in an in-memory backlog

3. The backlog grows big enough to cause Redis to start swapping

4. Network IO drops because the master is busy swapping

Now, there could certainly be something else going on. Can you get the output of INFO while this is going on and post it here?

- Josiah

--
You received this message because you are subscribed to the Google Groups "Redis DB" group.
To unsubscribe from this group and stop receiving emails from it, send an email to redis-db+u...@googlegroups.com.
To post to this group, send email to redi...@googlegroups.com.
Visit this group at http://groups.google.com/group/redis-db.
For more options, visit https://groups.google.com/d/optout.

manohar ht

unread,

Apr 2, 2014, 1:05:49 PM4/2/14

to redi...@googlegroups.com

Hi Josiah,

Thanks for replying, Below is the output of INFO command while this data transfer to slave was going on.

# Server

redis_version:2.6.5

redis_git_sha1:00000000

redis_git_dirty:0

redis_mode:standalone

os:Linux 2.6.32-5-amd64 x86_64

arch_bits:64

multiplexing_api:epoll

gcc_version:4.7.1

process_id:3298

run_id:d580c2f14057eb4fcac5ea39bdf6df64f8affb42

tcp_port:6379

uptime_in_seconds:27884216

uptime_in_days:322

lru_clock:1230993

# Clients

connected_clients:702

client_longest_output_list:4061

client_biggest_input_buf:0

blocked_clients:97

# Memory

used_memory:47263968248

used_memory_human:44.02G

used_memory_rss:51173629952

used_memory_peak:50351466720

used_memory_peak_human:46.89G

used_memory_lua:31744

mem_fragmentation_ratio:1.08

mem_allocator:jemalloc-3.0.0

# Persistence

loading:0

rdb_changes_since_last_save:299638

rdb_bgsave_in_progress:0

rdb_last_save_time:1396429699

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:682

rdb_current_bgsave_time_sec:-1

aof_enabled:0

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:-1

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

# Stats

total_connections_received:58068338

total_commands_processed:373271899948

instantaneous_ops_per_sec:9803

rejected_connections:0

expired_keys:2335356

evicted_keys:0

keyspace_hits:266945144550

keyspace_misses:129989805166

pubsub_channels:2

pubsub_patterns:0

latest_fork_usec:2552065

# Replication

role:master

connected_slaves:1

slave0:[x.x.x.x],6379,send_bulk

# CPU

used_cpu_sys:18310397952.00

used_cpu_user:138492768.00

used_cpu_sys_children:223836.31

used_cpu_user_children:1814577.50

# Keyspace

db0:keys=60125716,expires=32

db1:keys=310,expires=0

(0.67s)

Process state:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

3298 redis 20 0 50.5g 47g 1028 D 0 75.7 5124692h redis-server

Master and Slave are 64bit, 64GB machines on gigabit network. The machine was not swapping while doing data transfer. Client output buffer soft and hard limit on master was set to 0. RDB snapshot is 14G on disk.

Thanks,

Manohar.

Josiah Carlson

unread,

Apr 2, 2014, 1:42:30 PM4/2/14

to redi...@googlegroups.com

I'm not able to determine the cause of the slowdown based on the INFO output. But one thing I noticed is that it takes about 11 minutes to create the snapshot, and at 20 megs/second, another 10+ minutes to transfer the snapshot. But once the snapshot is transferred, remember that the slave needs to *load* that data, which could take a long time.

Have you examined the slave to determine it's state during all of this? Thinking about it, it's possible that the sending of the snapshot has completed, and the 20k/second transfer is what it takes to sync write commands from the master of the slave. Can you check INFO output on the slave there? Does db0/db1 have a number of keys that is growing over time? What does the Redis log say for both your master and slave?

- Josiah

manohar ht

unread,

Apr 2, 2014, 2:06:38 PM4/2/14

to redi...@googlegroups.com

Hi Josiah,

The issue is the transfer itself doesn't complete, after BGSAVE completes on master, dump transfer starts from master to slave, but slows down immediately and this leads to client disconnections and slow command execution on master and we had to break replication to recover back. We tried twice with same result once with client-output-buffer-limit enabled and later by setting no limit on it.

- Manohar.

manohar ht

unread,

Apr 2, 2014, 2:10:34 PM4/2/14

to redi...@googlegroups.com

Also tried looking into redis logs on both master and slave. I found no errors logged. The rdb transfers continues but with slow transfer rates by which time many issues started showing up on master.

Josiah Carlson

unread,

Apr 2, 2014, 2:13:20 PM4/2/14

to redi...@googlegroups.com

... what other issues are showing up on your Master?

- Josiah

manohar ht

unread,

Apr 2, 2014, 2:42:59 PM4/2/14

to redi...@googlegroups.com

2 main issues

- Client disconnections.