NoHostAvailableException frequently

1,462 views
Skip to first unread message

Arun Chaitanya

unread,
Aug 5, 2015, 4:11:38 AM8/5/15
to DataStax Java Driver for Apache Cassandra User Mailing List
Hello everyone,

Our team is using Cassandra 2.1.1. We use a single node instance for development purpose.
At a given point of time, there could be ~100 connections made from different developer environments.

We are frequently getting NoHostAvailableException when we try to connect through our web application. There is no problem when we use CQLSH.

After some search on web, I also updated our driver from 2.1.5 to 2.1.7. But this seems to not solve our problem.

Can someone tell us what is wrong?

Relevant Error log:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /xxxxxxxxxxxxxxx:9042 (com.datastax.driver.core.OperationTimedOutException: [/xxxxxxxxxxxxxxxx:9042] Operation timed out))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.Cluster.init(Cluster.java:159) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.Cluster.connect(Cluster.java:249) ~[cassandra-driver-core-2.1.7.jar:na]


Our relevant Java Code
      
            this.cluster = Cluster.builder()
                    .withPort(setting.getNativeTransportPort())
                    .withTimestampGenerator(ControllableTimestampGenerator.INSTANCE)
                    .addContactPoint(setting.getHost())
                    .build();

            this.session = this.cluster.connect();

Thanks

Arun Chaitanya

unread,
Aug 6, 2015, 12:57:28 AM8/6/15
to java-dri...@lists.datastax.com
Any idea, why this is happening?

It seems that no one else is facing similar issue. So, maybe my settings are wrong.

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.

Andrew Tolbert

unread,
Aug 6, 2015, 9:32:48 PM8/6/15
to DataStax Java Driver for Apache Cassandra User Mailing List
Hi Arun,

This exception:

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /xxxxxxxxxxxxxxx:9042 (com.datastax.driver.core.OperationTimedOutException: [/xxxxxxxxxxxxxxxx:9042] Operation timed out))
at com.datastax.driver.core.ControlConnection.reconnectInternal(ControlConnection.java:227) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.ControlConnection.connect(ControlConnection.java:82) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.Cluster$Manager.init(Cluster.java:1307) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.Cluster.init(Cluster.java:159) ~[cassandra-driver-core-2.1.7.jar:na]
at com.datastax.driver.core.Cluster.connect(Cluster.java:249) ~[cassandra-driver-core-2.1.7.jar:na]

Indicates a time out for your request on that host.   This error means the host did not complete the query within SocketOptions#getReadTimeoutMillis().  Since you are not explicitly configuring this that means a query is not completing in 12 seconds (the default).

Some questions:
  1. Do you have an idea of what queries are timing out?  If so can you share an example?
  2. How many nodes are in your cluster?   This exception seems to indicate that only one node was tried.   Is it possible that nodes are being marked down?  
  3. Take a look at your cassandra system.logs to see if any nodes are detected as DOWN.   It may also be that your nodes are under high GC activity, which may be indicated under your logs (look for 'GcInspector' in your logs showing long running gc events).
  4. What is 'read_request_timeout_in_ms' configured as in your cassandra.yaml file on your cassandra nodes?   If it is larger then 12 seconds, that could be an issue as the driver may be marking queries as timed out before giving cassandra a chance to respond and in turn marking a host down.  If you are configured read_request_timeout_in_ms to be greater than 12000ms, you should consider tuning SocketOptions#getReadTimeoutMillis() to be larger than that value.
You may also want to consider turning on the Slow Query Logger to see how long your queries are taking when they don't time out.  If you have queries taking many seconds, there may be a problem with your C* cluster and/or your data model.

Andy

Arun Chaitanya

unread,
Aug 11, 2015, 11:06:00 PM8/11/15
to java-dri...@lists.datastax.com
Hello Andrew,

Thank you for your reply.


Some questions:
  1. Do you have an idea of what queries are timing out?  If so can you share an example?
We haven't queried yet. This happens at the time of establishing connection. Not always, but sometimes.
We have multiple developers trying to connect to this database from their application during development. For some of them this fails.
  1. How many nodes are in your cluster?   This exception seems to indicate that only one node was tried. Is it possible that nodes are being marked down?  
 This is our development database. So we are using only one node. Do you suggest to use more nodes?
  1. Take a look at your cassandra system.logs to see if any nodes are detected as DOWN.   It may also be that your nodes are under high GC activity, which may be indicated under your logs (look for 'GcInspector' in your logs showing long running gc events).
I think the node is under high GC activity (Screenshot from VisualVM). Any idea why this is happening? 
  1. What is 'read_request_timeout_in_ms' configured as in your cassandra.yaml file on your cassandra nodes?   If it is larger then 12 seconds, that could be an issue as the driver may be marking queries as timed out before giving cassandra a chance to respond and in turn marking a host down.  If you are configured read_request_timeout_in_ms to be greater than 12000ms, you should consider tuning SocketOptions#getReadTimeoutMillis() to be larger than that value.
cassandra.yaml has default value (10s). Does this timeout include network latency as well?
 
You may also want to consider turning on the Slow Query Logger to see how long your queries are taking when they don't time out.  If you have queries taking many seconds, there may be a problem with your C* cluster and/or your data model.


Sure I will check turning on this. 

Thanks,
Selection_016.png

Kevin Gallardo

unread,
Aug 12, 2015, 8:54:13 AM8/12/15
to java-dri...@lists.datastax.com
Hi,

Indeed this timeout exception occurs during connection establishment to the cluster, regarding to the stack trace. So, have you set a particular value to SocketOptions.connectTimeoutMillis ? Seems like no, regarding the code you listed. Then the default value for connectionTimeoutMillis is 5000 ms.

Would you have an idea of what are the queries that are causing the node to do such GC work ?

It is possible that, if all clients are simultaneously executing requests producing massive load (and also causing frequent GC collections), the node becomes overloaded, then is not able to respond to new connections within the 5000ms time limit.
Then, I see 2 options that could help. First, increasing the connectionTimeoutMillis to a higher value, this is a bit obvious, and may not solve the potential fact that the server may be misused (see more after). 
Other option, your requests produce the high load you expect, and the cluster configuration is ok, but the cluster still gets overloaded by all the requests, causing timeouts for the driver. In that case, it may be reasonable to consider the scalability potential of Cassandra. And add more nodes to your cluster. That could allow both Cassandra and the driver, to distribute the load through the nodes. Then, the nodes would be able to respond to connections from the driver in time.

And for your last question about the 10s, the 10s is a strict limit that doesn't get affected by the network latency. It is the time a node waits for a read request to complete on the Cassandra side, whether the network has high latency or not.


Hope this information can help you a little bit more.
Cheers.

To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-us...@lists.datastax.com.



--
Kevin Gallardo, 
Drivers and Tools Team
DataStax.

Rand Saleh

unread,
Jul 1, 2020, 6:47:07 AM7/1/20
to DataStax Java Driver for Apache Cassandra User Mailing List, kevin.g...@datastax.com
I've same issue exactly  !! 
what was the solution ? 
I tried to edit the socket option to be more than the read_request and it's still   com.datastax.driver.core.exceptions.NoHostAvailableException:
All host(s) tried for query failed (tried:  10.10.9.119:9042 ) the node  10.10.9.119  is working well through cqlsh 
Here's my code 
 Cluster c = Cluster.builder().addContactPoint(ip).withPort(9042)
    .withSocketOptions(new SocketOptions().setConnectTimeoutMillis(6000))
    .build();
System.out.println(c.getMetadata());
     }
It's through the exception when trying to get the metadata or any other query...  
Reply all
Reply to author
Forward
0 new messages