Reconnect logic stuck in fast infinite loop

26 views
Skip to first unread message

Kostas Chalikias

unread,
Jan 25, 2018, 11:17:56 AM1/25/18
to python-dr...@lists.datastax.com
Hello everyone - something we noticed a few minutes ago in production using version 3.8.1 of the driver. One of our nodes went down (the server was powered down) and it was handled gracefully by the driver, eventually reconnecting when it came back. About 20 minutes later the server was powered down again and at that point multiple client processes across multiple other machines got stuck in a very fast re-connection loop (many attempts per second)

Is this perhaps a known issue? Thanks!

multiple reconnect attempts eventually leading to max backoff:

WARNING:cassandra.pool:Error attempting to reconnect to 78.129.246.150, scheduling retry in 600.0 seconds: [Errno 111] Tried connecting to [('78.129.246.150', 9042)]. Last error: Connection refused

the node comes back

INFO:cassandra.cluster:Host 78.129.246.150 may be up; will prepare queries and open connection pool
INFO:cassandra.cluster:Connection pools established for node 78.129.246.150

the node goes away again and things go crazy:

WARNING:cassandra.connection:Heartbeat failed for connection (140356915457808) to 78.129.246.150
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.

Alan Boudreault

unread,
Jan 25, 2018, 1:51:17 PM1/25/18
to python-dr...@lists.datastax.com
Hello Kostas,

This is indeed unexpected. Have you seen this issue more than once? I will try to reproduce and see what come up.

Regards,
Alan

--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.



--

Alan Boudreault
Software Engineer (Drivers) | alan.bo...@datastax.com


Kostas Chalikias

unread,
Jan 25, 2018, 2:02:02 PM1/25/18
to python-dr...@lists.datastax.com
Hi Alan - thanks for coming back.

I have never seen this issue before. I am not sure if we have had nodes go away, come back and go away again in such a short amount of time and therefore within a client processes' single lifetime.
However, this happened across multiple clients (perhaps >10).

Some other data points:

- We use the LibevConnection class
- On the first disconnect, the logs say

WARNING:cassandra.cluster:Host 78.129.246.150 has been marked down
WARNING:cassandra.pool:Error attempting to reconnect to 78.129.246.150, scheduling retry in 2.0 seconds: [Errno 111] Tried connecting to [('78.129.246.150', 9042)]. Last error: Connection refused
WARNING:cassandra.pool:Error attempting to reconnect to 78.129.246.150, scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to [('78.129.246.150', 9042)]. Last error: Connection refused
WARNING:cassandra.pool:Error attempting to reconnect to 78.129.246.150, scheduling retry in 8.0 seconds: [Errno 111] Tried connecting to [('78.129.246.150', 9042)]. Last error: Connection refused

and so on

- On the second disconnect the logs say something different (per previous email)

WARNING:cassandra.connection:Heartbeat failed for connection (140356915457808) to 78.129.246.150
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.
WARNING:cassandra.pool:Failed reconnecting 78.129.246.150. Retrying.

Makes me wonder whether the first disconnect/reconnect leaves something fragile behind...

Let me know how I can help & thanks again.



To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsubscribe@lists.datastax.com.



--

Alan Boudreault
Software Engineer (Drivers) | alan.bo...@datastax.com


Alan Boudreault

unread,
Jan 26, 2018, 9:12:09 AM1/26/18
to python-dr...@lists.datastax.com
Hi, 

I tried to reproduce without luck. It looks like you hit a very rare case that the pool is trying to replace the connection... which implies that the node has not been set DOWN for a reason? My recommendation would be to upgrade the driver to the latest version. If you see that issue again, try running at least one client in DEBUG mode and send us the full logs so we can inspect exactly what´s going on under the hood.

Regards,
Alan

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsubscribe@lists.datastax.com.

--
You received this message because you are subscribed to the Google Groups "DataStax Python Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.
Reply all
Reply to author
Forward
0 new messages