BeeOND - some node fails

25 views
Skip to first unread message

Pinkesh Valdria

unread,
Jun 18, 2021, 3:17:25 PM6/18/21
to fhgfs...@googlegroups.com
I created a 5 node BeeOND cluster with local NVMe and run storage and client  on all nodes and MGS and MDS on node1.  

If my 5th node which has storage dies and not recoverable,  how do I still start BeeOND on rest of the nodes so I can still access the data stored on rest of the nodes ?  

If I do the below with original 5 nodes in nodefile.rdma, then it will fail 
If I edit the nodefile.rdma to remove the lost node (node5),  will it work and will the data on other nodes remain or it will get deleted or fail,  since mgs node has different count of storage nodes?  


sudo beeond start -n /home/opc/nodefile.rdma -m 1  -d /mnt/localdisk/ -c /mnt/beeond -f /etc/beegfs/tuning




--
Thanks,
Pinkesh Valdria
Singapore: +65 8932-3639
USA: +1 206-234-4314 (cancelled)

Pinkesh Valdria

unread,
Jun 19, 2021, 3:55:01 AM6/19/21
to fhgfs...@googlegroups.com
Found a fix.   Do debug:   add.   set -x to the beeond shell script, so u can see where it fails.  


beegfs-ctl --sysMgmtdHost=inst-prwiv-direct-marten-rdma.local.rdma --connPortShift=1000 --listnodes --nodetype=storage
inst-prwiv-direct-marten [ID: 3]
inst-sfy8z-direct-marten [ID: 4]
inst-iqkak-direct-marten [ID: 6]
inst-l7d8u-direct-marten [ID: 7]
inst-rdr22-direct-marten [ID: 8]

sudo beegfs-ctl --sysMgmtdHost=inst-prwiv-direct-marten-rdma.local.rdma --connPortShift=1000  --removenode --nodetype=storage 8

Then run the beeond start command with only 4 nodes in the nodefile.   

sudo beeond start -n /home/opc/nodefile.rdma -m 1  -d /mnt/localdisk/ -c /mnt/beeond -f /etc/beegfs/tuning
Reply all
Reply to author
Forward
0 new messages