I've uploaded a trace here ...
Our servers are settled down a bit in the evenings, the errors are much fewer. In this case, there's a pretty big gap in time between the previous request and this one.
Some topology if it is interesting: 10 web servers (5 each in two data centers), 10 nodes in a c* cluster (5 each in two data centers).
I notice from the errors that the attempts to connect are only 5 of the 10 possible nodes - my presumption being that it's trying to stay near. I will verify the ips and their C* DC affiliation.
Any thoughts on the attached trace and the behavior?
ze