After migration to the newest driver version (3.6.0) from some very old one (2.X) we started to receive quite a lot of OperationTimedOut exceptions, raised due to client timeouts (specifically,
here when called as a callback from
here).
We have observed that the rate of errors correlates with load on the machine running the code - the more loaded it is, the more errors we receive. Some errors still occur even if not-that-small host is running single python process, thus we are looking for a way to mitigate the issue.
So, what are the suggested strategy? Just retry the request until it succeeds (all the read/write operations we do are idempotent)? Or should we look further into the issue and try to find the root cause? The version of C* we have deployed is 2.1.15.1403.
Thanks!