Hi all, I have a PRODUCTION cluster with 2 nodes:
dog node info
Id Size Used Avail Use%
0 1.7 TB 142 GB 1.6 TB 8%
1 3.4 TB 142 GB 3.3 TB 4%
Total 5.1 TB 284 GB 4.9 TB 5%
I noticed a recovery has started
node sheep01
Jul 19 22:33:22 ERROR [main] check_request_epoch(158) old node version 5, 4 (READ_PEER)
Jul 19 22:33:26 INFO [main] recover_object_main(864) object recovery progress 1%
Jul 19 22:33:29 INFO [main] recover_object_main(864) object recovery progress 2%
Jul 19 22:33:37 INFO [main] recover_object_main(864) object recovery progress 3%
Jul 19 22:33:49 INFO [main] recover_object_main(864) object recovery progress 4%
...
Jul 19 22:47:27 INFO [main] recover_object_main(864) object recovery progress 98%
Jul 19 22:47:34 INFO [main] recover_object_main(864) object recovery progress 99%
node sheep02
Jul 19 22:33:22 ERROR [io 10120] default_read_from_path(227) failed to read object 80a86f00004a6f, path=/mnt/sheep/0/0080a86f00004a6f, offset=3096576, size=102400, result=-1, Input/output error
Jul 19 22:33:22 ERROR [io 10120] err_to_sderr(79) oid=80a86f00004a6f, Input/output error
Jul 19 22:33:22 INFO [main] md_remove_disk(360) /mnt/sheep/0 from multi-disk array
Jul 19 22:33:22 INFO [main] zk_leave(1036) leaving from cluster
Jul 19 22:33:22 ERROR [main] check_request_epoch(158) old node version 5, 4 (READ_PEER)
It seems a disk was getting removed because of I/O error on node sheep02.
This is the only disk sheepdog is working on, except for the metadata directory.
Anyway the node is up and is still showing the /mnt/sheep/0 device:
dog node md info --all
Id Size Used Avail Use% Path
Node 0:
0 1.7 TB 142 GB 1.6 TB 8% /mnt/sheep/0
Node 1:
0 3.4 TB 142 GB 3.3 TB 4% /mnt/sheep/0
What do you think about it?
Here are some more info
Sheepdog daemon version 0.9.0_327_gdc0496e
This are sheep options (they are the same on both nodes, except for --myaddr):
sheep \
-n /var/lib/sheepdog,/mnt/sheep/0 \
--cluster zookeeper:
192.168.6.111:2181,
192.168.6.112:2181,
192.168.6.80:2181 \
--myaddr 192.168.6.112 \
--ioaddr host=192.168.5.112,port=3333
dog cluster info -v
Cluster status: running, auto-recovery enabled
Cluster store: plain with 2 redundancy policy
Cluster vnode mode: disk
Cluster created at Tue Mar 8 17:46:48 2016
Epoch Time Version
2016-07-19 22:33:22 5 [192.168.6.80:7000(1), 192.168.6.111:7000(1)]
2016-03-08 19:14:52 4 [192.168.6.80:7000(1), 192.168.6.111:7000(1), 192.168.6.112:7000(1)]
2016-03-08 19:13:55 3 [192.168.6.111:7000(1), 192.168.6.112:7000(1)]
2016-03-08 19:13:31 2 [192.168.6.80:7000(1), 192.168.6.111:7000(1), 192.168.6.112:7000(1)]
2016-03-08 17:46:48 1 [192.168.6.111:7000(1), 192.168.6.112:7000(1)]