Right now we have four app servers all using hazelcast. The network configuration is done via multicast. The only use the application has for hazelcast right now is session syncing. Somewhat unfortunately I've noticed that we don't even actually use hazelcast's built-in mechanism, but instead it seems they're using a distributed map and shoe-horned in their own session syncing/sharing on top of that, but I digress. Regardless, the issue we're experiencing right now is that on our QA server after light use throughout the day one of our app servers appears to disconnect from the cluster without warning. This subsequently causes a ton of redo messages in the log, an amount actually to the point of making finding useful information really difficult, even with splunk. It doesn't appear to recover and we end up having to restart all of the tomcat instances on the cluster to resume with normalcy. I can appreciate the fact that this is virtually no information to go on which is why I mostly focused this post on whether or not there is a clean upgrade path from 2.2 to 2.6.7 (latest as of now in the 2.x branch.)
Thanks,
Joe