Hi all!
I've just pushed through the promised hotfix[1,2] to address a lot of issues concerning Iris in distributed environments. With this version I was able to run a system of 50 nodes (x8 cores) on top of the Google Compute Engine ok. Note, I haven't done crash testing for when nodes start failing, so there may be some hidden bugs there. Will attend to thorough testing as soon as time allows.
Beside the fixes I've added a little more logging output, each node reporting it's own ID on boot, as well as topic/group creation/subscription/unsubscription/termination events to be able to see if something's off. Leveled logging is still planned, just not for a hotfix release.
I've also reduced the bootstrap scan time from 250ms to 100 to allow faster peer discovery in sparse address spaces.
Happy hacking,
Peter
PS: The race condition I talked about earlier turned out to be caused by a Go issue[3] combined with a misconfigured VM date/time + ntp daemon. So you should definitely watch out for this issue in running Iris code.