Re: Cassandra with uwsgi and django raise NoHostAvailable error after long time normal running

117 views
Skip to first unread message

Alan Boudreault

unread,
May 20, 2020, 7:23:40 AM5/20/20
to DataStax Python Driver for Apache Cassandra User Mailing List
Hi Yi Zhou,

From the error, it seems that a shutdown is happening on the Session. Can you enable debug logs and see if you see something useful there?

Alan

On Tue, May 19, 2020 at 4:49 PM Yi Zhou <zhouy...@gmail.com> wrote:
I use Cassandra with uwsgi multiprocess mode and django, it worked very well for a lone time, But i got a exception recently

Traceback (most recent call last):

  File "cassandra/cluster.py", line 2615, in cassandra.cluster.Session.execute

    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state, host, execute_as).result()

  File "cassandra/cluster.py", line 4871, in cassandra.cluster.ResponseFuture.result

    raise self._final_exception

NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 10.10.10.72:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.71:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.69:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.70:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.68:9042 dc1>: ConnectionException('Pool is shutdown',)})

It happend by accident, sometime work well about 1 or 2 hour then raise exception suddenly,sometimes may work well all the day. The exception will disapeared when the uwsgi worker restart. 
when the exception happened , the server side system log was normal without any error msg. So it should be the client side error.
In my case:
python 2.7
uwsgi 1.9.17
django 1.5.7
cassandra-driver 3.23
cassandra server 3.11.5
we have a 5 nodes cassandra cluster
the num of connections form client to server almost 6000+, contains offline script and online web server.
this error is very strange, becasue we have use cassandra very well almost half a year without any raised exception, This error happened first about 2 days ago.

  

 

--
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.


--
Alan Boudreault

Yi Zhou

unread,
May 20, 2020, 7:30:02 AM5/20/20
to DataStax Python Driver for Apache Cassandra User Mailing List
I have open the debug log and i saw log as below
2020-05-19 17:05:18,533  threading.py[line:810] WARNING Heartbeat failed for connection (140137991247760) to 10.10.10.69:9042                                                                                                              
2020-05-19 17:06:27,414  threading.py[line:810] WARNING Heartbeat failed for connection (140137987175824) to 10.10.10.70:9042                                                                                                              
2020-05-19 17:06:28,001  threading.py[line:810] WARNING Heartbeat failed for connection (140137995318608) to 10.10.10.72:9042                                                                                                              
2020-05-19 17:06:28,106  threading.py[line:810] WARNING Heartbeat failed for connection (140138553865488) to 10.10.10.68:9042                                                                                                              
2020-05-19 17:06:28,107  threading.py[line:810] WARNING Heartbeat failed for connection (140138553218128) to 10.10.10.71:9042                                                                                                              
2020-05-19 17:06:28,111  threading.py[line:810] WARNING Heartbeat failed for connection (140138553220368) to 10.10.10.69:9042                                                                                                              
2020-05-19 17:06:28,114  threading.py[line:810] WARNING Heartbeat failed for connection (140138553541712) to 10.10.10.70:9042                                                                                                              
2020-05-19 17:06:28,117  threading.py[line:810] WARNING Heartbeat failed for connection (140138553541264) to 10.10.10.72:9042                                                                                                              
2020-05-19 17:06:28,118  threading.py[line:810] WARNING Heartbeat failed for connection (140138000056528) to 10.10.10.68:9042                                                                                                              
2020-05-19 17:06:28,120  threading.py[line:810] WARNING Heartbeat failed for connection (140138000057872) to 10.10.10.71:9042                                                                                                              
2020-05-19 17:06:28,122  threading.py[line:810] WARNING Heartbeat failed for connection (140138000058896) to 10.10.10.69:9042                                                                                                              
2020-05-19 17:06:28,146  threading.py[line:810] WARNING Heartbeat failed for connection (140138000058576) to 10.10.10.70:9042                                                                                                              
2020-05-19 17:06:28,147  threading.py[line:810] WARNING Heartbeat failed for connection (140140147110864) to 10.10.10.72:9042                                                                                                              
2020-05-19 17:06:28,148  threading.py[line:810] DEBUG Defuncting connection (140137995319824) to 10.10.10.71:9042: errors=Connection heartbeat timeout after -8.8481798172 seconds, last_host=10.10.10.71:9042                             
2020-05-19 17:06:28,149  libevreactor.py[line:287] DEBUG Closing connection (140137995319824) to 10.10.10.71:9042                                                                                                                          
2020-05-19 17:06:28,149  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.71:9042                                                                                                                                                 
2020-05-19 17:06:28,152  threading.py[line:810] DEBUG Defunct or closed connection (140137995319824) returned to pool, potentially marking host 10.10.10.71:9042 as down                                                                   
2020-05-19 17:06:28,154  threading.py[line:810] DEBUG Defuncting connection (140137991247760) to 10.10.10.69:9042: errors=Connection heartbeat timeout after -56.42085886 seconds, last_host=10.10.10.69:9042                              
2020-05-19 17:06:28,155  libevreactor.py[line:287] DEBUG Closing connection (140137991247760) to 10.10.10.69:9042                                                                                                                          
2020-05-19 17:06:28,156  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.69:9042                                                                                                                                                 
2020-05-19 17:06:28,156  threading.py[line:810] DEBUG Defunct or closed connection (140137991247760) returned to pool, potentially marking host 10.10.10.69:9042 as down                                                                   
2020-05-19 17:06:28,157  threading.py[line:810] DEBUG Defuncting connection (140137987175824) to 10.10.10.70:9042: errors=Connection heartbeat timeout after -150.997974873 seconds, last_host=10.10.10.70:9042                            
2020-05-19 17:06:28,157  libevreactor.py[line:287] DEBUG Closing connection (140137987175824) to 10.10.10.70:9042                                                                                                                          
2020-05-19 17:06:28,158  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.70:9042                                                                                                                                                 
2020-05-19 17:06:28,158  threading.py[line:810] DEBUG Defunct or closed connection (140137987175824) returned to pool, potentially marking host 10.10.10.70:9042 as down                                                                   
2020-05-19 17:06:28,159  threading.py[line:810] DEBUG Defuncting connection (140137995318608) to 10.10.10.72:9042: errors=Connection heartbeat timeout after -151.599088907 seconds, last_host=10.10.10.72:9042                            
2020-05-19 17:06:28,159  libevreactor.py[line:287] DEBUG Closing connection (140137995318608) to 10.10.10.72:9042
2020-05-19 17:06:28,160  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.72:9042                           
2020-05-19 17:06:28,160  threading.py[line:810] DEBUG Defunct or closed connection (140137995318608) returned to pool, potentially marking host 10.10.10.72:9042 as down
2020-05-19 17:06:28,161  threading.py[line:810] DEBUG Defuncting connection (140138553865488) to 10.10.10.68:9042: errors=Connection heartbeat timeout after -151.798864841 seconds, last_host=10.10.10.68:9042
2020-05-19 17:06:28,161  libevreactor.py[line:287] DEBUG Closing connection (140138553865488) to 10.10.10.68:9042
2020-05-19 17:06:28,161  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.68:9042                           
2020-05-19 17:06:28,162  threading.py[line:810] DEBUG Defunct or closed connection (140138553865488) returned to pool, potentially marking host 10.10.10.68:9042 as down                                                                   
2020-05-19 17:06:28,162  threading.py[line:810] DEBUG Defuncting connection (140138553218128) to 10.10.10.71:9042: errors=Connection heartbeat timeout after -151.82218194 seconds, last_host=10.10.10.71:9042
2020-05-19 17:06:28,163  libevreactor.py[line:287] DEBUG Closing connection (140138553218128) to 10.10.10.71:9042
2020-05-19 17:06:28,163  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.71:9042                                                                                                                                                 2020-05-19 17:06:28,163  threading.py[line:810] DEBUG Defunct or closed connection (140138553218128) returned to pool, potentially marking host 10.10.10.71:9042 as down                                                                   
2020-05-19 17:06:28,164  threading.py[line:810] DEBUG Defuncting connection (140138553220368) to 10.10.10.69:9042: errors=Connection heartbeat timeout after -151.824490786 seconds, last_host=10.10.10.69:9042
2020-05-19 17:06:28,164  libevreactor.py[line:287] DEBUG Closing connection (140138553220368) to 10.10.10.69:9042
2020-05-19 17:06:28,164  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.69:9042
2020-05-19 17:06:28,165  threading.py[line:810] DEBUG Defunct or closed connection (140138553220368) returned to pool, potentially marking host 10.10.10.69:9042 as down
2020-05-19 17:06:28,165  threading.py[line:810] DEBUG Defuncting connection (140138553541712) to 10.10.10.70:9042: errors=Connection heartbeat timeout after -151.828993797 seconds, last_host=10.10.10.70:9042
2020-05-19 17:06:28,166  libevreactor.py[line:287] DEBUG Closing connection (140138553541712) to 10.10.10.70:9042
2020-05-19 17:06:28,166  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.70:9042
2020-05-19 17:06:28,166  threading.py[line:810] DEBUG Defunct or closed connection (140138553541712) returned to pool, potentially marking host 10.10.10.70:9042 as down
2020-05-19 17:06:28,167  threading.py[line:810] DEBUG Defuncting connection (140138553541264) to 10.10.10.72:9042: errors=Connection heartbeat timeout after -151.831508875 seconds, last_host=10.10.10.72:9042
2020-05-19 17:06:28,167  libevreactor.py[line:287] DEBUG Closing connection (140138553541264) to 10.10.10.72:9042
2020-05-19 17:06:28,167  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.72:9042
2020-05-19 17:06:28,168  threading.py[line:810] DEBUG Defunct or closed connection (140138553541264) returned to pool, potentially marking host 10.10.10.72:9042 as down
2020-05-19 17:06:28,168  threading.py[line:810] DEBUG Defuncting connection (140138000056528) to 10.10.10.68:9042: errors=Connection heartbeat timeout after -151.833234787 seconds, last_host=10.10.10.68:9042                            
2020-05-19 17:06:28,168  libevreactor.py[line:287] DEBUG Closing connection (140138000056528) to 10.10.10.68:9042                                                                                                                          
2020-05-19 17:06:28,169  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.68:9042                                                                                                                                                 
2020-05-19 17:06:28,169  threading.py[line:810] DEBUG Defunct or closed connection (140138000056528) returned to pool, potentially marking host 10.10.10.68:9042 as down                                                                   
2020-05-19 17:06:28,170  threading.py[line:810] DEBUG Defuncting connection (140138000057872) to 10.10.10.71:9042: errors=Connection heartbeat timeout after -151.835705996 seconds, last_host=10.10.10.71:9042                            
2020-05-19 17:06:28,170  libevreactor.py[line:287] DEBUG Closing connection (140138000057872) to 10.10.10.71:9042                                                                                                                          
2020-05-19 17:06:28,170  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.71:9042                                                                                                                                                 
2020-05-19 17:06:28,171  threading.py[line:810] DEBUG Defunct or closed connection (140138000057872) returned to pool, potentially marking host 10.10.10.71:9042 as down                                                                   
2020-05-19 17:06:28,171  threading.py[line:810] DEBUG Defuncting connection (140138000058896) to 10.10.10.69:9042: errors=Connection heartbeat timeout after -151.836911917 seconds, last_host=10.10.10.69:9042                            
2020-05-19 17:06:28,172  libevreactor.py[line:287] DEBUG Closing connection (140138000058896) to 10.10.10.69:9042                                                                                                                          
2020-05-19 17:06:28,172  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.69:9042                                                                                                                                                 
2020-05-19 17:06:28,172  threading.py[line:810] DEBUG Defunct or closed connection (140138000058896) returned to pool, potentially marking host 10.10.10.69:9042 as down                                                                   
2020-05-19 17:06:28,173  threading.py[line:810] DEBUG Defuncting connection (140138000058576) to 10.10.10.70:9042: errors=Connection heartbeat timeout after -151.861776829 seconds, last_host=10.10.10.70:9042                            
2020-05-19 17:06:28,173  libevreactor.py[line:287] DEBUG Closing connection (140138000058576) to 10.10.10.70:9042                                                                                                                          
2020-05-19 17:06:28,173  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.70:9042                                                                                                                                                 
2020-05-19 17:06:28,174  threading.py[line:810] DEBUG Defunct or closed connection (140138000058576) returned to pool, potentially marking host 10.10.10.70:9042 as down                                                                   
2020-05-19 17:06:28,174  threading.py[line:810] DEBUG Defuncting connection (140140147110864) to 10.10.10.72:9042: errors=Connection heartbeat timeout after -151.862592936 seconds, last_host=10.10.10.72:9042                            
2020-05-19 17:06:28,175  libevreactor.py[line:287] DEBUG Closing connection (140140147110864) to 10.10.10.72:9042                                                                                                                          
2020-05-19 17:06:28,175  libevreactor.py[line:291] DEBUG Closed socket to 10.10.10.72:9042                                                                                                                                                 
2020-05-19 17:06:28,175  threading.py[line:810] DEBUG Defunct or closed connection (140140147110864) returned to pool, potentially marking host 10.10.10.72:9042 as down                                                                   
2020-05-19 17:06:29,174  base.py[line:212] ERROR Internal Server Error: /feeds/news                                                                                                                                                
Traceback (most recent call last):                                                                                                                                                                                                                                                            
  File "cassandra/cluster.py", line 2615, in cassandra.cluster.Session.execute                                                                                                                                                             
    return self.execute_async(query, parameters, trace, custom_payload, timeout, execution_profile, paging_state, host, execute_as).result()                                                                   
  File "cassandra/cluster.py", line 4871, in cassandra.cluster.ResponseFuture.result                             
    raise self._final_exception                                                                                      
NoHostAvailable: ('Unable to complete the operation against any hosts', {<Host: 10.10.10.72:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.71:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.69
:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.70:9042 dc1>: ConnectionException('Pool is shutdown',), <Host: 10.10.10.68:9042 dc1>: ConnectionException('Pool is shutdown',)})      

It seems like the heartbeat cost 150s+ then got  timeout and the client shutdown all connections

在 2020年5月20日星期三 UTC+8下午7:23:40,Alan Boudreault写道:
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-user+unsub...@lists.datastax.com.

Alan Boudreault

unread,
May 20, 2020, 8:08:48 AM5/20/20
to DataStax Python Driver for Apache Cassandra User Mailing List
It is strange that your timeout value in the error message is negative (-151.x seconds). You should see something like that (e.g for an idle_heartbeat_timeout value of 30):

DEBUG:cassandra.connection:Defuncting connection (140556320875912) to 127.0.0.1:9042: errors=Connection heartbeat timeout after 30 seconds, last_host=127.0.0.1:9042

Can you double check your configuration and see how you initialize the cluster?

To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.


--
Alan Boudreault

--
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.

Yi Zhou

unread,
May 20, 2020, 8:21:13 AM5/20/20
to DataStax Python Driver for Apache Cassandra User Mailing List
cluster = Cluster(contact_points=['10.10.10.68','10.10.10.69'],

                  protocol_version=4,

                  port=9042,

                  load_balancing_policy=RoundRobinPolicy())

mydb = cluster.connect('mydb')
the idle_heartbeat_timeout was deault 30 seconds.

Yi Zhou

unread,
May 20, 2020, 8:27:55 AM5/20/20
to DataStax Python Driver for Apache Cassandra User Mailing List
By the way, my web django application manage by uwsgi  run in multiprocess mode with 40 cores and 120 uwsgi worker, and the worker are busy sometimes with 100% cpu usage. Maybe this was the reason for raising exception.

在 2020年5月20日星期三 UTC+8下午8:21:13,Yi Zhou写道:

Alan Boudreault

unread,
May 20, 2020, 9:08:06 AM5/20/20
to DataStax Python Driver for Apache Cassandra User Mailing List
That is a possibility. I looked in the code and I think that if the system is in a very bad shape it could happen to see a negative value. That will need more debugging on your end. If you think the driver is in cause, try to downgrade cassandra-driver and see how it goes. That will at least give us an indication. Also make sure there is nothing shared between processes. They all need to have their own session.

--
To unsubscribe from this group and stop receiving emails from it, send an email to python-driver-u...@lists.datastax.com.
Reply all
Reply to author
Forward
0 new messages