Recovery from OperationTimedOut during reads/writes

13 views
Skip to first unread message

Roman Bodnarchuk

unread,
Sep 12, 2016, 4:00:48 PM9/12/16
to DataStax Python Driver for Apache Cassandra User Mailing List
After migration to the newest driver version (3.6.0) from some very old one (2.X) we started to receive quite a lot of OperationTimedOut exceptions, raised due to client timeouts (specifically, here when called as a callback from here).

We have observed that the rate of errors correlates with load on the machine running the code - the more loaded it is, the more errors we receive.  Some errors still occur even if not-that-small host is running single python process, thus we are looking for a way to mitigate the issue.

So, what are the suggested strategy?  Just retry the request until it succeeds (all the read/write operations we do are idempotent)?  Or should we look further into the issue and try to find the root cause?  The version of C* we have deployed is 2.1.15.1403.

Thanks!

Alan Boudreault

unread,
Sep 14, 2016, 10:51:58 AM9/14/16
to python-dr...@lists.datastax.com
Hello Roman,

We are interested to have more details about your use case, so we can check if something important changed between 2 and 3. It is hard to help on that kind of errors with a little information.

What exact version 2.x were you using before? 
When you say ´errors correlates with load on the machine running the code´, are you talking about the machine load in general or the load of the python process itself?
Do you have any data processing callbacks?

It might be a good idea to investigate further to find the root cause.


--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.



--

Alan Boudreault
Software Engineer (Drivers) | alan.bo...@datastax.com


Roman Bodnarchuk

unread,
Sep 21, 2016, 3:23:12 PM9/21/16
to DataStax Python Driver for Apache Cassandra User Mailing List
Hello Alan,

Th exact version we were on before is 2.1.2.

When talking about the load, I meant over LAs of the machine.  We are using execute_async to get the response future instance, and then add callback and errback.

Another interesting thing is that changing the timeout from 60s (our default setting for Session) to 10s doesn't change number of OperationTimedOut errors - it remains the same.  This suggests that something makes the response future "stuck", and it never finishes.

I am going to dig into this further, and will share any findings.

We are using Python 2.7 and Ubuntu 14.04. 

Thanks!
Reply all
Reply to author
Forward
0 new messages