Hi Mark,
We've investigated with AWS, and found no redis instances were crashing during our tests. However, there is always a trickle (some 50 requests per 3000 per sec, intermittently) that fail with this error.
The stacktrace says this error occurs always on the first redis 'get' on a io.lettuce.core.cluster.api.StatefulRedisClusterConnection. We reuse this connection. One one particular client, this error occurred 16343 times out of 531725 requests.
The other interesting thing is this error is always localized to a few docker instances (that the java/lettuce client runs on). Most of the docker instances run error free, but a few don't.
We have also tried ClusterTopologyRefreshOptions - refresh every 10 minutes.
Does this help diagnose the problem?
Many thanks.