[ERR] ... CLUSTERDOWN but the cluster isn't down

2,285 views
Skip to first unread message

Joey Ayap

unread,
Jan 27, 2017, 7:02:50 AM1/27/17
to Redis DB
We were resharding data to a new cluster node, encountered a problem and are now stuck in situation probably caused by a bug. When trying to reshard, we get this message:

[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down

But the cluster is up! Below the steps we followed.

First we created an empty node to our new separate server then we add it to our existing redis clusters:

server1-ip:port master connected
server2-ip:port master connected
server3-ip:port master connected
server4-ip:port master connected
server5-ip:port master connected
new-server-ip:port master connected

We started to reshard data from server1-ip:port to new-server-ip:port using this command -> "./redis-trib.rb reshard --from --to --slots --yes ::" . We encountered an error:

Moving slot 7402 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 7403 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 6904 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6905 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6906 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6907 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6908 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6909 from server1-ip:port to new-server-ip:port: ........$
[ERR] Calling MIGRATE: IOERR error or timeout reading to target instance

We try to fix/check for open slots using this command ./redis-trib.rb fix ip:port before restart the resharding.

Performing Cluster Check (using node new-server-ip:port)
M: 80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port
slots:6904-6909 (6 slots) master
0 additional replica(s)
M: 9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port
slots:0-50 (51 slots) master
0 additional replica(s)
M: 8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port
slots:51-592,6566-6903 (880 slots) master
0 additional replica(s)
M: 5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port
slots:926-3318 (2393 slots) master
0 additional replica(s)
M: 6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port
slots:6910-16383 (9474 slots) master
0 additional replica(s)
M: 0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port
slots:593-925,3319-6565 (3580 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
Check for open slots...
Check slots coverage...
[OK] All 16384 slots covered.

We restart the resharding and it was successfully restarted but we have encountered an error:

Moving slot 7007 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 7008 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 7009 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 6910 from server1-ip:port to new-server-ip:port: ..............................$
Moving slot 6911 from server1-ip:port to new-server-ip:port: ..............................$
Moving slot 6912 from server1-ip:port to new-server-ip:port: ..............................$
[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down

But actually the cluster isn't down:

9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port master - 0 1485250688989 2 connected 0-50
5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port master - 0 1485250686984 3 connected 926-3318
80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port myself,master - 0 0 6 connected 6904-6911 [6912-<-6f70203705a1f26b561f39a600930f7b22dfeb98]
8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port master - 0 1485250687986 5 connected 51-592 6566-6903
6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port master - 0 1485250689993 1 connected 6912-16383
0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port master - 0 1485250688989 4 connected 593-925 3319-6565

We have try to fixed it again by running the ./redis-trib.rb fix ip:port but it gives us this error:

Performing Cluster Check (using node new-server-ip:port)
M: 80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port
slots:6904-6911 (8 slots) master
0 additional replica(s)
M: 9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port
slots:0-50 (51 slots) master
0 additional replica(s)
M: 5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port
slots:926-3318 (2393 slots) master
0 additional replica(s)
M: 8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port
slots:51-592,6566-6903 (880 slots) master
0 additional replica(s)
M: 6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port
slots:6912-16383 (9472 slots) master
0 additional replica(s)
M: 0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port
slots:593-925,3319-6565 (3580 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
Check for open slots...
[WARNING] Node new-server-ip:port has slots in importing state (6912).
[WARNING] Node server1-ip:port has slots in migrating state (6912).
[WARNING] The following slots are open: 6912
Fixing open slot 6912
Set as migrating in: server1-ip:port
Set as importing in: new-server-ip:port
Moving slot 6912 from server1-ip:port to new-server-ip:port:
[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down
 
Info for server1-ip:port - SOURCE NODE

Server

redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4992f89db2d932d
redis_mode:cluster
os:Linux 3.13.0-37-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.2
process_id:25284
run_id:eeb0be947760b033df999a84b1f1024ffc56f94d
tcp_port:7010
uptime_in_seconds:6719679
uptime_in_days:77
hz:10
lru_clock:8854109
executable:/home/cybranding/redis-3.2.3/redis-stable/src/redis-server
config_file:/etc/redis_cluster_client2/redis-3.2.3/7010/redis.conf

Clients

connected_clients:6
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

Memory

used_memory:263262791176
used_memory_human:245.18G
used_memory_rss:222207938560
used_memory_rss_human:206.95G
used_memory_peak:263262843256
used_memory_peak_human:245.18G
total_system_memory:405738954752
total_system_memory_human:377.87G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:0.84
mem_allocator:jemalloc-4.0.3

Persistence

loading:0
rdb_changes_since_last_save:3477248820
rdb_bgsave_in_progress:0
rdb_last_save_time:1478529438
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:12415
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:76954766881
aof_base_size:71475261210
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

Stats

total_connections_received:135923
total_commands_processed:1624882108
instantaneous_ops_per_sec:121
total_net_input_bytes:183344702562
total_net_output_bytes:238996158132
instantaneous_input_kbps:7.65
instantaneous_output_kbps:0.94
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:2696602
evicted_keys:0
keyspace_hits:293331974
keyspace_misses:4634274
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:8247933
migrate_cached_sockets:0

Replication

role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

CPU

used_cpu_sys:228998.14
used_cpu_user:106213.70
used_cpu_sys_children:13948.03
used_cpu_user_children:38121.80

Cluster

cluster_enabled:1

Keyspace

db0:keys=157638834,expires=32133,avg_ttl=38497283

Info for new-server-ip:port - TARGET NODE

Server

redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:b5038506891fcfe5
redis_mode:cluster
os:Linux 4.4.0-47-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:5.4.0
process_id:29729
run_id:be9a3b0fa9e56dd78829f432189cc3faed2b70a4
tcp_port:7015
uptime_in_seconds:600025
uptime_in_days:6
hz:10
lru_clock:8853916
executable:/root/redis-3.2.3/redis-3.2.3/src/redis-server
config_file:/etc/redis_cluster_client2/7015/redis.conf

Clients

connected_clients:5
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

Memory

used_memory:197574704
used_memory_human:188.42M
used_memory_rss:209297408
used_memory_rss_human:199.60M
used_memory_peak:399048784
used_memory_peak_human:380.56M
total_system_memory:270378438656
total_system_memory_human:251.81G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:1.06
mem_allocator:jemalloc-4.0.3

Persistence

loading:0
rdb_changes_since_last_save:173468
rdb_bgsave_in_progress:0
rdb_last_save_time:1484648899
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:71610854
aof_base_size:64129446
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

Stats

total_connections_received:4477
total_commands_processed:56480
instantaneous_ops_per_sec:0
total_net_input_bytes:3772430822
total_net_output_bytes:200708212
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:217
evicted_keys:0
keyspace_hits:3981
keyspace_misses:403
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0

Replication

role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

CPU

used_cpu_sys:317.34
used_cpu_user:209.47
used_cpu_sys_children:0.00
used_cpu_user_children:0.00

Cluster

cluster_enabled:1

Keyspace

db0:keys=150389,expires=28,avg_ttl=37790580

Salvatore Sanfilippo

unread,
Jan 27, 2017, 7:19:19 AM1/27/17
to redi...@googlegroups.com
Hello, likely you can "solve" it by modifying the Redis Cluster nodes
configuration in order to don't require all the slots to be covered
for nodes to accept writes:

cluster-require-full-coverage no

However this does not solve the root cause of your issue which is that
nodes from time to time detect other nodes as down. This could be done
to network issues, the fact you set a node timeout time which is too
short, latency issues of any kind (see Redis doc and the LATENCY
DOCTOR command). And so forth.

For the sake of your comfort, I'm cut & pasting here the relevant part
of the redis.conf comment documenting the full coverage feature:

# By default Redis Cluster nodes stop accepting queries if they detect there
# is at least an hash slot uncovered (no available node is serving it).
# This way if the cluster is partially down (for example a range of hash slots
# are no longer covered) all the cluster becomes, eventually, unavailable.
# It automatically returns available as soon as all the slots are covered again.
#
# However sometimes you want the subset of the cluster which is working,
# to continue to accept queries for the part of the key space that is still
# covered. In order to do so, just set the cluster-require-full-coverage
# option to no.

Cheers,
Salvatore
> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.



--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Joey Ayap

unread,
Jan 28, 2017, 5:08:11 PM1/28/17
to Redis DB
Thanks Salvatore. We checked the cluster info and found out that all nodes are communicating normally (they're on a local network in the same data center). We also noticed that there are no uncovered hash slot: 

./redis-cli -p <port> cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:6
cluster_current_epoch:6
cluster_my_epoch:6
cluster_stats_messages_sent:2398975
cluster_stats_messages_received:239867

We temporarily fixed the situation by using this redis-cli command -> 'CLUSTER SETSLOT <slot-number> STABLE' on both nodes(source/target node) to clear any importing/migrating state from hash slot and restart the resharding, manually excluding the slot which was previously creating problems - it probably has some key which is too big. The resharding is working correctly for now, but the error message we got is still weird.
Message has been deleted
Message has been deleted

Joey Ayap

unread,
Jan 29, 2017, 6:45:52 AM1/29/17
to Redis DB
We were continuing with the resharding and this happened. I don't know if it's unrelated to the other problem, but we had to restart the instance using kill -9.

=== REDIS BUG REPORT START: Cut & paste starting from here ===
25284:M 29 Jan 00:31:38.424 # Redis 3.2.3 crashed by signal: 11
25284:M 29 Jan 00:31:38.424 # Crashed running the instuction at: 0x468d4d
25284:M 29 Jan 00:31:38.424 # Accessing address: (nil)
25284:M 29 Jan 00:31:38.424 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------
EIP:
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](migrateCloseSocket+0x5d)[0x468d4d]

Backtrace:
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](logStackTrace+0x29)[0x45e699]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](sigsegvHandler+0xaa)[0x45ebca]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fac7e856340]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](migrateCloseSocket+0x5d)[0x468d4d]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](migrateCommand+0x6d6)[0x469526]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](call+0x85)[0x426fb5]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](processCommand+0x367)[0x42a0e7]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](processInputBuffer+0x105)[0x436d05]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](aeProcessEvents+0x218)[0x421428]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](aeMain+0x2b)[0x4216db]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](main+0x410)[0x41e690]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fac7e4a1ec5]
/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster][0x41e902]

------ INFO OUTPUT ------
# Server
redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4992f89db2d932d
redis_mode:cluster
os:Linux 3.13.0-37-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.2
process_id:25284
run_id:eeb0be947760b033df999a84b1f1024ffc56f94d
tcp_port:7010
uptime_in_seconds:7120457
uptime_in_days:82
hz:10
lru_clock:9254887
executable:/home/cybranding/redis-3.2.3/redis-stable/src/redis-server
config_file:/etc/redis_cluster_client2/redis-3.2.3/7010/redis.conf

# Clients
connected_clients:6
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0

# Memory
used_memory:257715868152
used_memory_human:240.02G
used_memory_rss:223309766656
used_memory_rss_human:207.97G
used_memory_peak:263668325640
used_memory_peak_human:245.56G
total_system_memory:405738954752
total_system_memory_human:377.87G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:0.87
mem_allocator:jemalloc-4.0.3

# Persistence
loading:0
rdb_changes_since_last_save:3552986998
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:12540
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:71141120009
aof_base_size:70485665759
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0

# Stats
total_connections_received:140670
total_commands_processed:1735135802
instantaneous_ops_per_sec:357
total_net_input_bytes:195579247727
total_net_output_bytes:261868247391
instantaneous_input_kbps:60.94
instantaneous_output_kbps:76.45
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:2878786
evicted_keys:0
keyspace_hits:337331319
keyspace_misses:4846665
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:6778542
migrate_cached_sockets:1

# Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

# CPU
used_cpu_sys:249059.19
used_cpu_user:111903.34
used_cpu_sys_children:15743.88
used_cpu_user_children:41666.80

# Commandstats
cmdstat_get:calls=100326378,usec=2161972276,usec_per_call=21.55
cmdstat_setex:calls=3232580,usec=878423039,usec_per_call=271.74
cmdstat_incr:calls=91918869,usec=576561183,usec_per_call=6.27
cmdstat_sadd:calls=201398,usec=5693857,usec_per_call=28.27
cmdstat_sismember:calls=64643923,usec=1533471811,usec_per_call=23.72
cmdstat_zincrby:calls=991678281,usec=83128353759,usec_per_call=83.83
cmdstat_zrevrange:calls=9440971,usec=3053766884,usec_per_call=323.46
cmdstat_zscore:calls=151045084,usec=4531229670,usec_per_call=30.00
cmdstat_expire:calls=320723836,usec=1720504651,usec_per_call=5.36
cmdstat_scan:calls=26,usec=2272424,usec_per_call=87400.92
cmdstat_ping:calls=109,usec=272,usec_per_call=2.50
cmdstat_info:calls=132128,usec=29907934,usec_per_call=226.36
cmdstat_cluster:calls=952552,usec=238006779,usec_per_call=249.86
cmdstat_migrate:calls=839655,usec=6599582721,usec_per_call=7859.87
cmdstat_command:calls=12,usec=173464,usec_per_call=14455.33

# Cluster
cluster_enabled:1

# Keyspace
db0:keys=149860920,expires=19375,avg_ttl=43967332
hash_init_value: 1479098466

------ CLIENT LIST OUTPUT ------
id=140307 addr=server1-ip:34104 fd=19 name= age=28407 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events$
id=140646 addr=server1-ip:40171 fd=15 name= age=1651 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=$
id=140647 addr=server1-ip:40172 fd=10 name= age=1651 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=$
id=140667 addr=new-server-ip:34886 fd=22 name= age=243 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events$
id=140670 addr=server2-ip:46584 fd=11 name= age=132 idle=132 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r $
id=140626 addr=server2-ip:56675 fd=16 name= age=2484 idle=85 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r $

------ CURRENT CLIENT INFO ------
id=140667 addr=new-server-ip:34886 fd=22 name= age=243 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events$
argv[0]: 'DEL'
argv[1]: 'nht.tw.urls.dec:流し猫'

------ REGISTERS ------
25284:M 29 Jan 00:31:38.455 #
RAX:00007f7a0a62992b RBX:b0e6a781e3726574
RCX:0000000000000002 RDX:0000000000000001
RDI:00007f7a0a629944 RSI:00000000004e7857
RBP:00007f7a0a6307a0 RSP:00007fff37faa460
R8 :0000000000000006 R9 :00007fac7e00d5a0
R10:0000000000000000 R11:00007fac7e200180
R12:0000000000000001 R13:00007f6d357d2000
R14:0000000000000002 R15:0000000000000000
RIP:0000000000468d4d EFL:0000000000010202
CSGSFS:0000000000000033
25284:M 29 Jan 00:31:38.455 # (00007fff37faa46f) -> 0000000000000038
25284:M 29 Jan 00:31:38.455 # (00007fff37faa46e) -> 0000000003fb9fc2
25284:M 29 Jan 00:31:38.455 # (00007fff37faa46d) -> 0000000000000000
25284:M 29 Jan 00:31:38.455 # (00007fff37faa46c) -> 00007f993e577870
25284:M 29 Jan 00:31:38.455 # (00007fff37faa46b) -> 0000000000000000
25284:M 29 Jan 00:31:38.455 # (00007fff37faa46a) -> 00007fac7ef7d690
25284:M 29 Jan 00:31:38.455 # (00007fff37faa469) -> 0000000100000000
25284:M 29 Jan 00:31:38.455 # (00007fff37faa468) -> 00007f6d00000002
25284:M 29 Jan 00:31:38.455 # (00007fff37faa467) -> 00007f6d5c4d20e8
25284:M 29 Jan 00:31:38.455 # (00007fff37faa466) -> 00007f993e577840
25284:M 29 Jan 00:31:38.455 # (00007fff37faa465) -> 00007f6d00000006
25284:M 29 Jan 00:31:38.455 # (00007fff37faa464) -> 00007f9900000000
25284:M 29 Jan 00:31:38.455 # (00007fff37faa463) -> 0000000000469526
25284:M 29 Jan 00:31:38.455 # (00007fff37faa462) -> 0000000000000001
25284:M 29 Jan 00:31:38.455 # (00007fff37faa461) -> 00007f993e577870
25284:M 29 Jan 00:31:38.455 # (00007fff37faa460) -> 00007f6d357d2000

------ FAST MEMORY TEST ------
25284:M 29 Jan 00:31:38.456 # Bio thread for job type #0 terminated
25284:M 29 Jan 00:31:38.456 # Bio thread for job type #1 terminated
*** Preparing to test memory region 724000 (94208 bytes)
*** Preparing to test memory region 1c3d000 (135168 bytes)
*** Preparing to test memory region 7f6d02c00000 (272625565696 bytes)
*** Preparing to test memory region 7fac7c9ff000 (8388608 bytes)
*** Preparing to test memory region 7fac7d200000 (14680064 bytes)
*** Preparing to test memory region 7fac7e000000 (4194304 bytes)
*** Preparing to test memory region 7fac7e841000 (20480 bytes)
*** Preparing to test memory region 7fac7ea60000 (16384 bytes)
*** Preparing to test memory region 7fac7ef7d000 (16384 bytes)
*** Preparing to test memory region 7fac7ef89000 (4096 bytes)
*** Preparing to test memory region 7fac7ef8a000 (8192 bytes)
*** Preparing to test memory region 7fac7ef8e000 (4096 bytes)
Message has been deleted

Tuco

unread,
Jan 29, 2017, 10:41:19 AM1/29/17
to Redis DB
Hi Joey, 

I have had similar experience while resharding using redis-trib.rb, getting errors related to timeouts(even after increasing timeout, or getting some other errors)
Considering my lack of expertise with ruby, i managed to write a simple java program(using Lettuce library), which would move slots using the following.
It is essentially the same as the sharding process on redis site, and seems to work every time.

for each slot to move
     set destination node to be importing slot from source node.
     //destination.clusterSetSlotImporting(slot, sourceNodeId);

     set source node to be migrating slot to destination node
     //source.clusterSetSlotMigrating(slot, destinationNodeId);

     get all keys in slot
    //List<String> keys = source.clusterGetKeysInSlot(slot, 100000000);
           migrate all keys from source to destination in batches of 1000, and appropriate timeout(set to large num like 3600 secs)
           //source.migrate(destHost, destPort, 0, 5000000, migrateArgs);

     set on the source node that the slot belongs to destination.
     //source.clusterSetSlotNode(slot, destinationNodeId);

     set on the destination node that the slot belongs to destination.
     //destination.clusterSetSlotNode(slot, destinationNodeId);


Hope it helps.

Reply all
Reply to author
Forward
0 new messages