Phyre2 server nodes rebooting - how to avoid

42 views
Skip to first unread message

Powell, Harold

unread,
May 20, 2021, 5:03:48 AM5/20/21
to Phyre
Hi
The observant amongst you will have noticed that the server nodes that Phyre2 runs on occasionally reboot themselves - in spite of investigating over a period of months, none of our tech people have been able to find a cause (apart from "old age"). Mostly this doesn't cause a problem - we manage ~1000 runs a day and about once every 3 weeks one of the nodes reboots and takes ~20 - 50 jobs with it (so that's ~0.1 - 0.2% of jobs). I'm currently restarting these jobs "manually", but it is a little disconcerting.
We do have a new test node that (a few) people can use - but only if they ask specifically and if they are prepared for slightly different answers (I've had to update some of the underlying programs so that they will run on modern machines). This is currently in testing, so not available to all - but if you are interested in using ("testing") our new reliable node before we replace all of the older nodes sometime in the next few months, please let me know by e-mailing me directly.
The ideal person would be someone who submits "more than a few but not loads of" jobs and is prepared to spend a little time comparing results between jobs run on the old and new nodes. You don't have to be ideal to take part...
Harry Powell
Reply all
Reply to author
Forward
0 new messages