We were resharding data to a new cluster node, encountered a problem and are now stuck in situation probably caused by a bug. When trying to reshard, we get this message:
[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down
But the cluster is up! Below the steps we followed.
First we created an empty node to our new separate server then we add it to our existing redis clusters:
server1-ip:port master connected
server2-ip:port master connected
server3-ip:port master connected
server4-ip:port master connected
server5-ip:port master connected
new-server-ip:port master connected
We started to reshard data from server1-ip:port to new-server-ip:port using this command -> "./redis-trib.rb reshard --from --to --slots --yes ::" . We encountered an error:
Moving slot 7402 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 7403 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 6904 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6905 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6906 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6907 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6908 from server1-ip:port to new-server-ip:port: ........$
Moving slot 6909 from server1-ip:port to new-server-ip:port: ........$
[ERR] Calling MIGRATE: IOERR error or timeout reading to target instance
We try to fix/check for open slots using this command ./redis-trib.rb fix ip:port before restart the resharding.
Performing Cluster Check (using node new-server-ip:port)
M: 80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port
slots:6904-6909 (6 slots) master
0 additional replica(s)
M: 9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port
slots:0-50 (51 slots) master
0 additional replica(s)
M: 8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port
slots:51-592,6566-6903 (880 slots) master
0 additional replica(s)
M: 5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port
slots:926-3318 (2393 slots) master
0 additional replica(s)
M: 6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port
slots:6910-16383 (9474 slots) master
0 additional replica(s)
M: 0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port
slots:593-925,3319-6565 (3580 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
Check for open slots...
Check slots coverage...
[OK] All 16384 slots covered.
We restart the resharding and it was successfully restarted but we have encountered an error:
Moving slot 7007 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 7008 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 7009 from 6f70203705a1f26b561f39a600930f7b22dfeb98
Moving slot 6910 from server1-ip:port to new-server-ip:port: ..............................$
Moving slot 6911 from server1-ip:port to new-server-ip:port: ..............................$
Moving slot 6912 from server1-ip:port to new-server-ip:port: ..............................$
[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down
But actually the cluster isn't down:
9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port master - 0 1485250688989 2 connected 0-50
5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port master - 0 1485250686984 3 connected 926-3318
80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port myself,master - 0 0 6 connected 6904-6911 [6912-<-6f70203705a1f26b561f39a600930f7b22dfeb98]
8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port master - 0 1485250687986 5 connected 51-592 6566-6903
6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port master - 0 1485250689993 1 connected 6912-16383
0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port master - 0 1485250688989 4 connected 593-925 3319-6565
We have try to fixed it again by running the ./redis-trib.rb fix ip:port but it gives us this error:
Performing Cluster Check (using node new-server-ip:port)
M: 80570f4d791d9834bd28322c25337be00e1370b2 new-server-ip:port
slots:6904-6911 (8 slots) master
0 additional replica(s)
M: 9527684833c252c5dd0ee5f44afa13730cb689ee server2-ip:port
slots:0-50 (51 slots) master
0 additional replica(s)
M: 5b887a2fc38eade4b6366b4d1de2926733e082d2 server3-ip:port
slots:926-3318 (2393 slots) master
0 additional replica(s)
M: 8b6accb0259089f4f5fc3942b34fb6b7fcbde33e server5-ip:port
slots:51-592,6566-6903 (880 slots) master
0 additional replica(s)
M: 6f70203705a1f26b561f39a600930f7b22dfeb98 server1-ip:port
slots:6912-16383 (9472 slots) master
0 additional replica(s)
M: 0a52eec580372bd365351be0b0833dbd364aa633 server4-ip:port
slots:593-925,3319-6565 (3580 slots) master
0 additional replica(s)
[OK] All nodes agree about slots configuration.
Check for open slots...
[WARNING] Node new-server-ip:port has slots in importing state (6912).
[WARNING] Node server1-ip:port has slots in migrating state (6912).
[WARNING] The following slots are open: 6912
Fixing open slot 6912
Set as migrating in: server1-ip:port
Set as importing in: new-server-ip:port
Moving slot 6912 from server1-ip:port to new-server-ip:port:
[ERR] Calling MIGRATE: ERR Target instance replied with error: CLUSTERDOWN The cluster is down
Info for server1-ip:port - SOURCE NODE
Server
redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:4992f89db2d932d
redis_mode:cluster
os:Linux 3.13.0-37-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:4.8.2
process_id:25284
run_id:eeb0be947760b033df999a84b1f1024ffc56f94d
tcp_port:7010
uptime_in_seconds:6719679
uptime_in_days:77
hz:10
lru_clock:8854109
executable:/home/cybranding/redis-3.2.3/redis-stable/src/redis-server
config_file:/etc/redis_cluster_client2/redis-3.2.3/7010/redis.conf
Clients
connected_clients:6
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
Memory
used_memory:263262791176
used_memory_human:245.18G
used_memory_rss:222207938560
used_memory_rss_human:206.95G
used_memory_peak:263262843256
used_memory_peak_human:245.18G
total_system_memory:405738954752
total_system_memory_human:377.87G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:0.84
mem_allocator:jemalloc-4.0.3
Persistence
loading:0
rdb_bgsave_in_progress:0
rdb_last_save_time:1478529438
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:12415
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:76954766881
aof_base_size:71475261210
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
Stats
total_connections_received:135923
total_commands_processed:1624882108
instantaneous_ops_per_sec:121
total_net_input_bytes:183344702562
total_net_output_bytes:238996158132
instantaneous_input_kbps:7.65
instantaneous_output_kbps:0.94
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:2696602
evicted_keys:0
keyspace_hits:293331974
keyspace_misses:4634274
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:8247933
migrate_cached_sockets:0
Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
CPU
used_cpu_sys:228998.14
used_cpu_user:106213.70
used_cpu_sys_children:13948.03
used_cpu_user_children:38121.80
Cluster
cluster_enabled:1
Keyspace
db0:keys=157638834,expires=32133,avg_ttl=38497283
Info for new-server-ip:port - TARGET NODE
Server
redis_version:3.2.3
redis_git_sha1:00000000
redis_git_dirty:0
redis_build_id:b5038506891fcfe5
redis_mode:cluster
os:Linux 4.4.0-47-generic x86_64
arch_bits:64
multiplexing_api:epoll
gcc_version:5.4.0
process_id:29729
run_id:be9a3b0fa9e56dd78829f432189cc3faed2b70a4
tcp_port:7015
uptime_in_seconds:600025
uptime_in_days:6
hz:10
lru_clock:8853916
executable:/root/redis-3.2.3/redis-3.2.3/src/redis-server
config_file:/etc/redis_cluster_client2/7015/redis.conf
Clients
connected_clients:5
client_longest_output_list:0
client_biggest_input_buf:0
blocked_clients:0
Memory
used_memory:197574704
used_memory_human:188.42M
used_memory_rss:209297408
used_memory_rss_human:199.60M
used_memory_peak:399048784
used_memory_peak_human:380.56M
total_system_memory:270378438656
total_system_memory_human:251.81G
used_memory_lua:37888
used_memory_lua_human:37.00K
maxmemory:0
maxmemory_human:0B
maxmemory_policy:noeviction
mem_fragmentation_ratio:1.06
mem_allocator:jemalloc-4.0.3
Persistence
loading:0
rdb_changes_since_last_save:173468
rdb_bgsave_in_progress:0
rdb_last_save_time:1484648899
rdb_last_bgsave_status:ok
rdb_last_bgsave_time_sec:-1
rdb_current_bgsave_time_sec:-1
aof_enabled:1
aof_rewrite_in_progress:0
aof_rewrite_scheduled:0
aof_last_rewrite_time_sec:-1
aof_current_rewrite_time_sec:-1
aof_last_bgrewrite_status:ok
aof_last_write_status:ok
aof_current_size:71610854
aof_base_size:64129446
aof_pending_rewrite:0
aof_buffer_length:0
aof_rewrite_buffer_length:0
aof_pending_bio_fsync:0
aof_delayed_fsync:0
Stats
total_connections_received:4477
total_commands_processed:56480
instantaneous_ops_per_sec:0
total_net_input_bytes:3772430822
total_net_output_bytes:200708212
instantaneous_input_kbps:0.00
instantaneous_output_kbps:0.00
rejected_connections:0
sync_full:0
sync_partial_ok:0
sync_partial_err:0
expired_keys:217
evicted_keys:0
keyspace_hits:3981
keyspace_misses:403
pubsub_channels:0
pubsub_patterns:0
latest_fork_usec:0
migrate_cached_sockets:0
Replication
role:master
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
CPU
used_cpu_sys:317.34
used_cpu_user:209.47
used_cpu_sys_children:0.00
used_cpu_user_children:0.00
Cluster
cluster_enabled:1
Keyspace
db0:keys=150389,expires=28,avg_ttl=37790580