Re: Application hung retrying with cassandra driver v3.15 in DR scenario.

17 views
Skip to first unread message

Alan Boudreault

unread,
Aug 5, 2020, 8:29:11 AM8/5/20
to DataStax Python Driver for Apache Cassandra User Mailing List
Hello,

There are indeed many useful bug fixes that were done since 3.15, like PYTHON-1044. It is hard to tell if you are experiencing this particular issue though. You can see the full list of changes here:


I know you mentioned you couldn't test the latest version, but I just want to make it clear that 3.24 is backward compatible with 3.15.

Regards,
Alan

On Tue, Aug 4, 2020 at 3:55 PM Suvidha Kancharla <suvid...@gmail.com> wrote:
Hi,

Our cassandra cluster has 2 DC's, and we were testing if our application is DR(disaster recovery) compliant where disaster here means one DC is completely down and the other DC is up. Application uses each_quorum to perform reads/writes, with a custom retry policy that retries with next host for first 15 times for ReadTimeout, WriteTimeout and Unavailable errors and then throws an error on the 16th retry. This error is then caught in the application code, and query is re-executed with local_quorum.

During DR, we got a few writeTimeout's (as expected), but after that, the driver never returned to the application code. i,e driver seems hung retrying to connect with the following logs:

WARNING /opt/bb/lib/python3.6/site-packages/cassandra/cluster.py:1520 cassandra.cluster Host 10.32.192.37 has been marked down
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.84, scheduling retry in 2.0 seconds: [Errno 111] Tried connecting to [('10.32.192.84', 1445)]. Last error: Connection refused
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.37, scheduling retry in 2.0 seconds: [Errno 111] Tried connecting to [('10.32.192.37', 1445)]. Last error: Connection refused
WARNING /bb/bin/package/lib/python3.6/site-packages/application/cassandra_policies.py:68 Cassandra timeout error - WriteTimeoutPolicy, retry_num=0, write_type=0 required_responses=4, received_responses=3 
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.84, scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to [('10.32.192.84', 1445)]. Last error: Connection refused
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.37, scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to [('10.32.192.37', 1445)]. Last error: Connection refused

WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.84, scheduling retry in 8.0 seconds: [Errno 111] Tried connecting to [('10.32.192.84', 1445)]. Last error: Connection refused
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.37, scheduling retry in 8.0 seconds: [Errno 111] Tried connecting to [('10.32.192.37', 1445)]. Last error: Connection refused
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/connection.py:1042 cassandra.connection Heartbeat failed for connection (139705598533704) to 10.32.207.14
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/connection.py:1042 cassandra.connection Heartbeat failed for connection (139705598488136) to 10.32.219.100
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/connection.py:1042 cassandra.connection Heartbeat failed for connection (139705598515632) to 10.32.197.107
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/connection.py:1042 cassandra.connection Heartbeat failed for connection (139705598516808) to 10.32.192.5
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/cluster.py:1520 cassandra.cluster Host 10.32.192.37 has been marked down
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/cluster.py:1520 cassandra.cluster Host 10.32.192.69 has been marked down
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/cluster.py:1520 cassandra.cluster Host 10.32.194.79 has been marked down
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.39, scheduling retry in 2.0 seconds: [Errno 111] Tried connecting to [('10.32.192.39', 1445)]. Last error: Connection refused WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.192.5, scheduling retry in 2.0 seconds: [Errno 111] Tried connecting to [('10.32.192.5', 1445)]. Last error: Connection refused
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/cluster.py:2812 cassandra.cluster [control connection] Error connecting to 10.32.207.63:
Traceback (most recent call last):
  File "/opt/bb/lib/python3.6/site-packages/cassandra/cluster.py", line 2805, in _reconnect_internal
    return self._try_connect(host)
  File "/opt/bb/lib/python3.6/site-packages/cassandra/cluster.py", line 2827, in _try_connect
    connection = self._cluster.connection_factory(host.address, is_control_connection=True)
  File "/opt/bb/lib/python3.6/site-packages/cassandra/cluster.py", line 1205, in connection_factory
    return self.connection_class.factory(address, self.connect_timeout, *args, **kwargs)
  File "/opt/bb/lib/python3.6/site-packages/cassandra/connection.py", line 341, in factory
    raise OperationTimedOut("Timed out creating connection (%s seconds)" % timeout)
cassandra.OperationTimedOut: errors=Timed out creating connection (5 seconds), last_host=None
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/connection.py:1042 cassandra.connection Heartbeat failed for connection (139705596591016) to 10.32.192.19
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.197.53, scheduling retry in 4.0 seconds: [Errno 111] Tried connecting to [('10.32.197.53', 1445)]. Last error: Connection refused
WARNING /opt/bb/lib/python3.6/site-packages/cassandra/pool.py:293 cassandra.pool Error attempting to reconnect to 10.32.193.148, scheduling retry in 8.0 seconds: [Errno 111] Tried connecting to [('10.32.193.148', 1445)]. Last error: Connection refused

..

Is there a known issue with python datastax driver v3.15 that could make the driver hang when some nodes go down? Unfortunately, we cannot use the latest driver yet.

Thanks!

--
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.


--
Alan Boudreault

Reply all
Reply to author
Forward
0 new messages