[ERR] ... CLUSTERDOWN but the cluster isn't down

2,285 views

Skip to first unread message

Joey Ayap

unread,

Jan 27, 2017, 7:02:50 AM1/27/17

to Redis DB

We were resharding data to a new cluster node, encountered a problem and are now stuck in situation probably caused by a bug. When trying to reshard, we get this message:

[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down

But the cluster is up! Below the steps we followed.

First we created an empty node to our new separate server then we add it to our existing redis clusters:

server1-ip:port master connected

server2-ip:port master connected

server3-ip:port master connected

server4-ip:port master connected

server5-ip:port master connected

new-server-ip:port master connected

We started to reshard data from server1-ip:port to new-server-ip:port using this command -> "./redis-trib.rb reshard --from --to --slots --yes ::" . We encountered an error:

Moving slot 7402 from 6f70203705a1f26b561f39a600930f7b22dfeb98

Moving slot 7403 from 6f70203705a1f26b561f39a600930f7b22dfeb98

Moving slot 6904 from server1-ip:port to new-server-ip:port: ........$

Moving slot 6905 from server1-ip:port to new-server-ip:port: ........$

Moving slot 6906 from server1-ip:port to new-server-ip:port: ........$

Moving slot 6907 from server1-ip:port to new-server-ip:port: ........$

Moving slot 6908 from server1-ip:port to new-server-ip:port: ........$

Moving slot 6909 from server1-ip:port to new-server-ip:port: ........$

[ERR] Calling MIGRATE: IOERR error or timeout reading to target instance

We try to fix/check for open slots using this command ./redis-trib.rb fix ip:port before restart the resharding.

Performing Cluster Check (using node new-server-ip:port)

M: 80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port

slots:6904-6909 (6 slots) master

0 additional replica(s)

M: 9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port

slots:0-50 (51 slots) master

0 additional replica(s)

M: 8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port

slots:51-592,6566-6903 (880 slots) master

0 additional replica(s)

M: 5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port

slots:926-3318 (2393 slots) master

0 additional replica(s)

M: 6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port

slots:6910-16383 (9474 slots) master

0 additional replica(s)

M: 0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port

slots:593-925,3319-6565 (3580 slots) master

0 additional replica(s)

[OK] All nodes agree about slots configuration.

Check for open slots...

Check slots coverage...

[OK] All 16384 slots covered.

We restart the resharding and it was successfully restarted but we have encountered an error:

Moving slot 7007 from 6f70203705a1f26b561f39a600930f7b22dfeb98

Moving slot 7008 from 6f70203705a1f26b561f39a600930f7b22dfeb98

Moving slot 7009 from 6f70203705a1f26b561f39a600930f7b22dfeb98

Moving slot 6910 from server1-ip:port to new-server-ip:port: ..............................$

Moving slot 6911 from server1-ip:port to new-server-ip:port: ..............................$

Moving slot 6912 from server1-ip:port to new-server-ip:port: ..............................$

[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down

But actually the cluster isn't down:

9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port master - 0 1485250688989 2 connected 0-50

5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port master - 0 1485250686984 3 connected 926-3318

80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port myself,master - 0 0 6 connected 6904-6911 [6912-<-6f70203705a1f26b561f39a600930f7b22dfeb98]

8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port master - 0 1485250687986 5 connected 51-592 6566-6903

6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port master - 0 1485250689993 1 connected 6912-16383

0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port master - 0 1485250688989 4 connected 593-925 3319-6565

We have try to fixed it again by running the ./redis-trib.rb fix ip:port but it gives us this error:

Performing Cluster Check (using node new-server-ip:port)

M: 80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port

slots:6904-6911 (8 slots) master

0 additional replica(s)

M: 9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port

slots:0-50 (51 slots) master

0 additional replica(s)

M: 5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port

slots:926-3318 (2393 slots) master

0 additional replica(s)

M: 8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port

slots:51-592,6566-6903 (880 slots) master

0 additional replica(s)

M: 6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port

slots:6912-16383 (9472 slots) master

0 additional replica(s)

M: 0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port

slots:593-925,3319-6565 (3580 slots) master

0 additional replica(s)

[OK] All nodes agree about slots configuration.

Check for open slots...

[WARNING] Node new-server-ip:port has slots in importing state (6912).

[WARNING] Node server1-ip:port has slots in migrating state (6912).

[WARNING] The following slots are open: 6912

Fixing open slot 6912

Set as migrating in: server1-ip:port

Set as importing in: new-server-ip:port

Moving slot 6912 from server1-ip:port to new-server-ip:port:

[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down

Info for server1-ip:port - SOURCE NODE

Server

redis_version:3.2.3

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:4992f89db2d932d

redis_mode:cluster

os:Linux 3.13.0-37-generic x86_64

arch_bits:64

multiplexing_api:epoll

gcc_version:4.8.2

process_id:25284

run_id:eeb0be947760b033df999a84b1f1024ffc56f94d

tcp_port:7010

uptime_in_seconds:6719679

uptime_in_days:77

hz:10

lru_clock:8854109

executable:/home/cybranding/redis-3.2.3/redis-stable/src/redis-server

config_file:/etc/redis_cluster_client2/redis-3.2.3/7010/redis.conf

Clients

connected_clients:6

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:0

Memory

used_memory:263262791176

used_memory_human:245.18G

used_memory_rss:222207938560

used_memory_rss_human:206.95G

used_memory_peak:263262843256

used_memory_peak_human:245.18G

total_system_memory:405738954752

total_system_memory_human:377.87G

used_memory_lua:37888

used_memory_lua_human:37.00K

maxmemory:0

maxmemory_human:0B

maxmemory_policy:noeviction

mem_fragmentation_ratio:0.84

mem_allocator:jemalloc-4.0.3

Persistence

loading:0

rdb_changes_since_last_save:3477248820

rdb_bgsave_in_progress:0

rdb_last_save_time:1478529438

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:-1

rdb_current_bgsave_time_sec:-1

aof_enabled:1

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:12415

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

aof_current_size:76954766881

aof_base_size:71475261210

aof_pending_rewrite:0

aof_buffer_length:0

aof_rewrite_buffer_length:0

aof_pending_bio_fsync:0

aof_delayed_fsync:0

Stats

total_connections_received:135923

total_commands_processed:1624882108

instantaneous_ops_per_sec:121

total_net_input_bytes:183344702562

total_net_output_bytes:238996158132

instantaneous_input_kbps:7.65

instantaneous_output_kbps:0.94

rejected_connections:0

sync_full:0

sync_partial_ok:0

sync_partial_err:0

expired_keys:2696602

evicted_keys:0

keyspace_hits:293331974

keyspace_misses:4634274

pubsub_channels:0

pubsub_patterns:0

latest_fork_usec:8247933

migrate_cached_sockets:0

Replication

role:master

connected_slaves:0

master_repl_offset:0

repl_backlog_active:0

repl_backlog_size:1048576

repl_backlog_first_byte_offset:0

repl_backlog_histlen:0

CPU

used_cpu_sys:228998.14

used_cpu_user:106213.70

used_cpu_sys_children:13948.03

used_cpu_user_children:38121.80

Cluster

cluster_enabled:1

Keyspace

db0:keys=157638834,expires=32133,avg_ttl=38497283

Info for new-server-ip:port - TARGET NODE

Server

redis_version:3.2.3

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:b5038506891fcfe5

redis_mode:cluster

os:Linux 4.4.0-47-generic x86_64

arch_bits:64

multiplexing_api:epoll

gcc_version:5.4.0

process_id:29729

run_id:be9a3b0fa9e56dd78829f432189cc3faed2b70a4

tcp_port:7015

uptime_in_seconds:600025

uptime_in_days:6

hz:10

lru_clock:8853916

executable:/root/redis-3.2.3/redis-3.2.3/src/redis-server

config_file:/etc/redis_cluster_client2/7015/redis.conf

Clients

connected_clients:5

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:0

Memory

used_memory:197574704

used_memory_human:188.42M

used_memory_rss:209297408

used_memory_rss_human:199.60M

used_memory_peak:399048784

used_memory_peak_human:380.56M

total_system_memory:270378438656

total_system_memory_human:251.81G

used_memory_lua:37888

used_memory_lua_human:37.00K

maxmemory:0

maxmemory_human:0B

maxmemory_policy:noeviction

mem_fragmentation_ratio:1.06

mem_allocator:jemalloc-4.0.3

Persistence

loading:0

rdb_changes_since_last_save:173468

rdb_bgsave_in_progress:0

rdb_last_save_time:1484648899

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:-1

rdb_current_bgsave_time_sec:-1

aof_enabled:1

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:-1

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

aof_current_size:71610854

aof_base_size:64129446

aof_pending_rewrite:0

aof_buffer_length:0

aof_rewrite_buffer_length:0

aof_pending_bio_fsync:0

aof_delayed_fsync:0

Stats

total_connections_received:4477

total_commands_processed:56480

instantaneous_ops_per_sec:0

total_net_input_bytes:3772430822

total_net_output_bytes:200708212

instantaneous_input_kbps:0.00

instantaneous_output_kbps:0.00

rejected_connections:0

sync_full:0

sync_partial_ok:0

sync_partial_err:0

expired_keys:217

evicted_keys:0

keyspace_hits:3981

keyspace_misses:403

pubsub_channels:0

pubsub_patterns:0

latest_fork_usec:0

migrate_cached_sockets:0

Replication

role:master

connected_slaves:0

master_repl_offset:0

repl_backlog_active:0

repl_backlog_size:1048576

repl_backlog_first_byte_offset:0

repl_backlog_histlen:0

CPU

used_cpu_sys:317.34

used_cpu_user:209.47

used_cpu_sys_children:0.00

used_cpu_user_children:0.00

Cluster

cluster_enabled:1

Keyspace

db0:keys=150389,expires=28,avg_ttl=37790580

Salvatore Sanfilippo

unread,

Jan 27, 2017, 7:19:19 AM1/27/17

to redi...@googlegroups.com

Hello, likely you can "solve" it by modifying the Redis Cluster nodes
configuration in order to don't require all the slots to be covered
for nodes to accept writes:

cluster-require-full-coverage no

However this does not solve the root cause of your issue which is that
nodes from time to time detect other nodes as down. This could be done
to network issues, the fact you set a node timeout time which is too
short, latency issues of any kind (see Redis doc and the LATENCY
DOCTOR command). And so forth.

For the sake of your comfort, I'm cut & pasting here the relevant part
of the redis.conf comment documenting the full coverage feature:

# By default Redis Cluster nodes stop accepting queries if they detect there
# is at least an hash slot uncovered (no available node is serving it).
# This way if the cluster is partially down (for example a range of hash slots
# are no longer covered) all the cluster becomes, eventually, unavailable.
# It automatically returns available as soon as all the slots are covered again.
#
# However sometimes you want the subset of the cluster which is working,
# to continue to accept queries for the part of the key space that is still
# covered. In order to do so, just set the cluster-require-full-coverage
# option to no.

Cheers,
Salvatore

> --
> You received this message because you are subscribed to the Google Groups
> "Redis DB" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to redis-db+u...@googlegroups.com.
> To post to this group, send email to redi...@googlegroups.com.
> Visit this group at https://groups.google.com/group/redis-db.
> For more options, visit https://groups.google.com/d/optout.

--
Salvatore 'antirez' Sanfilippo
open source developer - Redis Labs https://redislabs.com

"If a system is to have conceptual integrity, someone must control the
concepts."
— Fred Brooks, "The Mythical Man-Month", 1975.

Joey Ayap

unread,

Jan 28, 2017, 5:08:11 PM1/28/17

to Redis DB

Thanks Salvatore. We checked the cluster info and found out that all nodes are communicating normally (they're on a local network in the same data center). We also noticed that there are no uncovered hash slot:

./redis-cli -p <port> cluster info

cluster_state:ok

cluster_slots_assigned:16384

cluster_slots_ok:16384

cluster_slots_pfail:0

cluster_slots_fail:0

cluster_known_nodes:6

cluster_size:6

cluster_current_epoch:6

cluster_my_epoch:6

cluster_stats_messages_sent:2398975

cluster_stats_messages_received:239867

We temporarily fixed the situation by using this redis-cli command -> 'CLUSTER SETSLOT <slot-number> STABLE' on both nodes(source/target node) to clear any importing/migrating state from hash slot and restart the resharding, manually excluding the slot which was previously creating problems - it probably has some key which is too big. The resharding is working correctly for now, but the error message we got is still weird.

Message has been deleted

Joey Ayap

unread,

Jan 29, 2017, 6:45:52 AM1/29/17

to Redis DB

We were continuing with the resharding and this happened. I don't know if it's unrelated to the other problem, but we had to restart the instance using kill -9.

=== REDIS BUG REPORT START: Cut & paste starting from here ===

25284:M 29 Jan 00:31:38.424 # Redis 3.2.3 crashed by signal: 11

25284:M 29 Jan 00:31:38.424 # Crashed running the instuction at: 0x468d4d

25284:M 29 Jan 00:31:38.424 # Accessing address: (nil)

25284:M 29 Jan 00:31:38.424 # Failed assertion: <no assertion failed> (<no file>:0)

------ STACK TRACE ------

EIP:

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](migrateCloseSocket+0x5d)[0x468d4d]

Backtrace:

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](logStackTrace+0x29)[0x45e699]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](sigsegvHandler+0xaa)[0x45ebca]

/lib/x86_64-linux-gnu/libpthread.so.0(+0x10340)[0x7fac7e856340]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](migrateCloseSocket+0x5d)[0x468d4d]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](migrateCommand+0x6d6)[0x469526]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](call+0x85)[0x426fb5]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](processCommand+0x367)[0x42a0e7]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](processInputBuffer+0x105)[0x436d05]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](aeProcessEvents+0x218)[0x421428]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](aeMain+0x2b)[0x4216db]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster](main+0x410)[0x41e690]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fac7e4a1ec5]

/home/cybranding/redis-3.2.3/redis-stable/src/redis-server *:<port> [cluster][0x41e902]

------ INFO OUTPUT ------

# Server

redis_version:3.2.3

redis_git_sha1:00000000

redis_git_dirty:0

redis_build_id:4992f89db2d932d

redis_mode:cluster

os:Linux 3.13.0-37-generic x86_64

arch_bits:64

multiplexing_api:epoll

gcc_version:4.8.2

process_id:25284

run_id:eeb0be947760b033df999a84b1f1024ffc56f94d

tcp_port:7010

uptime_in_seconds:7120457

uptime_in_days:82

hz:10

lru_clock:9254887

executable:/home/cybranding/redis-3.2.3/redis-stable/src/redis-server

config_file:/etc/redis_cluster_client2/redis-3.2.3/7010/redis.conf

# Clients

connected_clients:6

client_longest_output_list:0

client_biggest_input_buf:0

blocked_clients:0

# Memory

used_memory:257715868152

used_memory_human:240.02G

used_memory_rss:223309766656

used_memory_rss_human:207.97G

used_memory_peak:263668325640

used_memory_peak_human:245.56G

total_system_memory:405738954752

total_system_memory_human:377.87G

used_memory_lua:37888

used_memory_lua_human:37.00K

maxmemory:0

maxmemory_human:0B

maxmemory_policy:noeviction

mem_fragmentation_ratio:0.87

mem_allocator:jemalloc-4.0.3

# Persistence

loading:0

rdb_changes_since_last_save:3552986998

rdb_last_bgsave_status:ok

rdb_last_bgsave_time_sec:-1

rdb_current_bgsave_time_sec:-1

aof_enabled:1

aof_rewrite_in_progress:0

aof_rewrite_scheduled:0

aof_last_rewrite_time_sec:12540

aof_current_rewrite_time_sec:-1

aof_last_bgrewrite_status:ok

aof_last_write_status:ok

aof_current_size:71141120009

aof_base_size:70485665759

aof_pending_rewrite:0

aof_buffer_length:0

aof_rewrite_buffer_length:0

aof_pending_bio_fsync:0

aof_delayed_fsync:0

# Stats

total_connections_received:140670

total_commands_processed:1735135802

instantaneous_ops_per_sec:357

total_net_input_bytes:195579247727

total_net_output_bytes:261868247391

instantaneous_input_kbps:60.94

instantaneous_output_kbps:76.45

rejected_connections:0

sync_full:0

sync_partial_ok:0

sync_partial_err:0

expired_keys:2878786

evicted_keys:0

keyspace_hits:337331319

keyspace_misses:4846665

pubsub_channels:0

pubsub_patterns:0

latest_fork_usec:6778542

migrate_cached_sockets:1

# Replication

role:master

connected_slaves:0

master_repl_offset:0

repl_backlog_active:0

repl_backlog_size:1048576

repl_backlog_first_byte_offset:0

repl_backlog_histlen:0

# CPU

used_cpu_sys:249059.19

used_cpu_user:111903.34

used_cpu_sys_children:15743.88

used_cpu_user_children:41666.80

# Commandstats

cmdstat_get:calls=100326378,usec=2161972276,usec_per_call=21.55

cmdstat_setex:calls=3232580,usec=878423039,usec_per_call=271.74

cmdstat_incr:calls=91918869,usec=576561183,usec_per_call=6.27

cmdstat_sadd:calls=201398,usec=5693857,usec_per_call=28.27

cmdstat_sismember:calls=64643923,usec=1533471811,usec_per_call=23.72

cmdstat_zincrby:calls=991678281,usec=83128353759,usec_per_call=83.83

cmdstat_zrevrange:calls=9440971,usec=3053766884,usec_per_call=323.46

cmdstat_zscore:calls=151045084,usec=4531229670,usec_per_call=30.00

cmdstat_expire:calls=320723836,usec=1720504651,usec_per_call=5.36

cmdstat_scan:calls=26,usec=2272424,usec_per_call=87400.92

cmdstat_ping:calls=109,usec=272,usec_per_call=2.50

cmdstat_info:calls=132128,usec=29907934,usec_per_call=226.36

cmdstat_cluster:calls=952552,usec=238006779,usec_per_call=249.86

cmdstat_migrate:calls=839655,usec=6599582721,usec_per_call=7859.87

cmdstat_command:calls=12,usec=173464,usec_per_call=14455.33

# Cluster

cluster_enabled:1

# Keyspace

db0:keys=149860920,expires=19375,avg_ttl=43967332

hash_init_value: 1479098466

------ CLIENT LIST OUTPUT ------

id=140307 addr=server1-ip:34104 fd=19 name= age=28407 idle=3 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events$

id=140646 addr=server1-ip:40171 fd=15 name= age=1651 idle=1 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=$

id=140647 addr=server1-ip:40172 fd=10 name= age=1651 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=$

id=140667 addr=new-server-ip:34886 fd=22 name= age=243 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events$

id=140670 addr=server2-ip:46584 fd=11 name= age=132 idle=132 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r $

id=140626 addr=server2-ip:56675 fd=16 name= age=2484 idle=85 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=0 obl=0 oll=0 omem=0 events=r $

------ CURRENT CLIENT INFO ------

id=140667 addr=new-server-ip:34886 fd=22 name= age=243 idle=0 flags=N db=0 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=0 events$

argv[0]: 'DEL'

argv[1]: 'nht.tw.urls.dec:流し猫'

------ REGISTERS ------

25284:M 29 Jan 00:31:38.455 #

RAX:00007f7a0a62992b RBX:b0e6a781e3726574

RCX:0000000000000002 RDX:0000000000000001

RDI:00007f7a0a629944 RSI:00000000004e7857

RBP:00007f7a0a6307a0 RSP:00007fff37faa460

R8 :0000000000000006 R9 :00007fac7e00d5a0

R10:0000000000000000 R11:00007fac7e200180

R12:0000000000000001 R13:00007f6d357d2000

R14:0000000000000002 R15:0000000000000000

RIP:0000000000468d4d EFL:0000000000010202

CSGSFS:0000000000000033

25284:M 29 Jan 00:31:38.455 # (00007fff37faa46f) -> 0000000000000038

25284:M 29 Jan 00:31:38.455 # (00007fff37faa46e) -> 0000000003fb9fc2

25284:M 29 Jan 00:31:38.455 # (00007fff37faa46d) -> 0000000000000000

25284:M 29 Jan 00:31:38.455 # (00007fff37faa46c) -> 00007f993e577870

25284:M 29 Jan 00:31:38.455 # (00007fff37faa46b) -> 0000000000000000

25284:M 29 Jan 00:31:38.455 # (00007fff37faa46a) -> 00007fac7ef7d690

25284:M 29 Jan 00:31:38.455 # (00007fff37faa469) -> 0000000100000000

25284:M 29 Jan 00:31:38.455 # (00007fff37faa468) -> 00007f6d00000002

25284:M 29 Jan 00:31:38.455 # (00007fff37faa467) -> 00007f6d5c4d20e8

25284:M 29 Jan 00:31:38.455 # (00007fff37faa466) -> 00007f993e577840

25284:M 29 Jan 00:31:38.455 # (00007fff37faa465) -> 00007f6d00000006

25284:M 29 Jan 00:31:38.455 # (00007fff37faa464) -> 00007f9900000000

25284:M 29 Jan 00:31:38.455 # (00007fff37faa463) -> 0000000000469526

25284:M 29 Jan 00:31:38.455 # (00007fff37faa462) -> 0000000000000001

25284:M 29 Jan 00:31:38.455 # (00007fff37faa461) -> 00007f993e577870

25284:M 29 Jan 00:31:38.455 # (00007fff37faa460) -> 00007f6d357d2000

------ FAST MEMORY TEST ------

25284:M 29 Jan 00:31:38.456 # Bio thread for job type #0 terminated

25284:M 29 Jan 00:31:38.456 # Bio thread for job type #1 terminated

*** Preparing to test memory region 724000 (94208 bytes)

*** Preparing to test memory region 1c3d000 (135168 bytes)

*** Preparing to test memory region 7f6d02c00000 (272625565696 bytes)

*** Preparing to test memory region 7fac7c9ff000 (8388608 bytes)

*** Preparing to test memory region 7fac7d200000 (14680064 bytes)

*** Preparing to test memory region 7fac7e000000 (4194304 bytes)

*** Preparing to test memory region 7fac7e841000 (20480 bytes)

*** Preparing to test memory region 7fac7ea60000 (16384 bytes)

*** Preparing to test memory region 7fac7ef7d000 (16384 bytes)

*** Preparing to test memory region 7fac7ef89000 (4096 bytes)

*** Preparing to test memory region 7fac7ef8a000 (8192 bytes)

*** Preparing to test memory region 7fac7ef8e000 (4096 bytes)

Message has been deleted

Tuco

unread,

Jan 29, 2017, 10:41:19 AM1/29/17

to Redis DB

Hi Joey,

I have had similar experience while resharding using redis-trib.rb, getting errors related to timeouts(even after increasing timeout, or getting some other errors)

Considering my lack of expertise with ruby, i managed to write a simple java program(using Lettuce library), which would move slots using the following.

It is essentially the same as the sharding process on redis site, and seems to work every time.

for each slot to move

set destination node to be importing slot from source node.

//destination.clusterSetSlotImporting(slot, sourceNodeId);

set source node to be migrating slot to destination node

//source.clusterSetSlotMigrating(slot, destinationNodeId);

get all keys in slot

//List<String> keys = source.clusterGetKeysInSlot(slot, 100000000);

migrate all keys from source to destination in batches of 1000, and appropriate timeout(set to large num like 3600 secs)

//source.migrate(destHost, destPort, 0, 5000000, migrateArgs);

set on the source node that the slot belongs to destination.

//source.clusterSetSlotNode(slot, destinationNodeId);

set on the destination node that the slot belongs to destination.

//destination.clusterSetSlotNode(slot, destinationNodeId);

Hope it helps.

Reply all

Reply to author

Forward

0 new messages