Hi,
i noticed that when starting beeond with pdsh, there is still a serial part.
In that serial part the STATUSFILE on each node is updated one after one with ssh.
This results in pretty long startup for beeond, e.g around 4 minutes on 256 Nodes.
I think this can be improved, instead initiating a ssh connection to each node,
write the STATUSFILE locally on the node where beeond is invoked and then copy it to the nodes.
The copy itself can be done parallel.
Is there a reason why the update of the STATUSFILE is done serial.
I tried the modifications, first writing file locally and then copy it with "scp" and "gnu parallel".
Received very good speedup on the first try:
On 256 nodes only around 1 minute for startup.
I only checked for speeding up the startup process,
since im stopping beeond trough epilogue scripts on each node with stoplocal.
Of course this is a quick and dirty way of doing this,
for production usage it should be checked and maybe improved:
---------------------------------------------------------------
@@ -966,7 +968,8 @@
fi
INFO="${HOST},${SERVICE},${DATAPATH},${LOGFILE},${PIDFILE}"
- execute_ssh_cmd ${HOST} "echo ${INFO} >> ${STATUSFILE}"
+ #execute_ssh_cmd ${HOST} "echo ${INFO} >> ${STATUSFILE}"
+ echo ${INFO} >> /var/tmp/beeond/${HOST}-`basename ${STATUSFILE}`
}
### internal functions for beegfs-ondemand stop ###
@@ -1045,6 +1048,7 @@
execute_ssh_cmd ${HOST} "${DELETESTATUSFILECMD}"
done
fi
+ rm /var/tmp/beeond/*
}
unmount_clients()
@@ -1245,6 +1249,14 @@
# give the management daemon some time to get all information from servers
sleep 8
+
+ #Copy the statusfiles to nodes
+ IFS=","
+ for HOST in ${ALLNODES}
+ do
+ echo "scp -q -i /root/.ssh/adafs /var/tmp/beeond/${HOST}-beeond.tmp ${HOST}:/tmp/beeond.tmp"
+ done | parallel -j 32
+ unset IFS
# take all hosts as client
# port shift and config path may be empty, but that's ok
----------------------------------------------------------------
What do you think of that idea ?
best regards
M.Soysal