Driver selecting node having timeouts

14 views
Skip to first unread message

Joseph Anish Alex

unread,
Sep 26, 2016, 12:29:55 PM9/26/16
to java-dri...@lists.datastax.com
Hi,

We faced a situation today where one particular node was continuously timing out when being used as co-ordinator. Errors on client side were like below: 

09/26/2016 04:04:30,984  DEBUG (cluster1-nio-worker-6) [ERROR] UC  [cluster1] [hostname/IP.addr] Query error after 2006 ms: [1 bound values] SELECT * FROM "XYZ" WHERE k1=?;
09/26/2016 04:21:03,506  DEBUG (cluster1-nio-worker-6) [ERROR] UC  [cluster1] [hostname/IP.addr] Query error after 2009 ms: [1 bound values] SELECT * FROM "ABC" WHERE k2=?;
etc..

The queries hitting other nodes were going fine and occassionally slow (threshold 1000ms).

Is there any way to tell the driver to avoid using this node, or ideally it should figure out on its own? . 

The node in qn was having issues and was eventually restarted. The client timeouts stopped when this node was being shutdown (i.e not accepting CQL connections).

We are using Datastax driver 2.1.4 and DSE Cassandra 4.8.7 (Apache 2.1)

Thanks,
Joseph

Olivier Michallat

unread,
Sep 26, 2016, 5:37:11 PM9/26/16
to java-dri...@lists.datastax.com
Hi Joseph,

LatencyAwarePolicy might help, it keeps track of the average latency of each node, and tries to query the best-performing nodes first.

I would also suggest the following, but you'll need to upgrade as they were introduced in later versions of the driver:
- speculative executions (2.1.6): queries another node if the initial coordinator is too slow
- error-aware policy (3.1.0): ignores nodes with an error rate above a given threshold.

--

Olivier Michallat

Driver & tools engineer, DataStax


--
You received this message because you are subscribed to the Google Groups "DataStax Java Driver for Apache Cassandra User Mailing List" group.
To unsubscribe from this group and stop receiving emails from it, send an email to java-driver-user+unsubscribe@lists.datastax.com.

Joseph Anish Alex

unread,
Sep 27, 2016, 5:24:06 AM9/27/16
to java-dri...@lists.datastax.com
Thanks, Olivier. Will DowngradingConsistencyRetryPolicy help? . We have RF 3 and CL of LOCAL_QUORUM and the read timeout says 1 replica out of 2 was available.

Related question : How can we specify the CLs separately for reads and write queries? . I didnt see an option, but isn't this a common case required for apps. I had used Hector library (before moving to CQL) and it supported this.

Olivier Michallat

unread,
Sep 27, 2016, 1:07:11 PM9/27/16
to java-dri...@lists.datastax.com
We have RF 3 and CL of LOCAL_QUORUM and the read timeout says 1 replica out of 2 was available.

It looks like the node was isolated from the rest of the cluster.

DowngradingConsistencyRetryPolicy would have downgraded the CL to ONE, allowing the query to succeed. But it would have read from the failing node only, therefore returning potentially stale data.

How can we specify the CLs separately for reads and write queries?

The driver does not know what is a read and what is a write, it just passes query strings to Cassandra for execution, without parsing them. You have to set the CL yourself using Statement.setConsistencyLevel.

--

Olivier Michallat

Driver & tools engineer, DataStax


On Tue, Sep 27, 2016 at 2:24 AM, Joseph Anish Alex <jaa...@gmail.com> wrote:
Thanks, Olivier. Will DowngradingConsistencyRetryPolicy help? . We have RF 3 and CL of LOCAL_QUORUM and the read timeout says 1 replica out of 2 was available.

Related question : How can we specify the CLs separately for reads and write queries? . I didnt see an option, but isn't this a common case required for apps. I had used Hector library (before moving to CQL) and it supported this.
Reply all
Reply to author
Forward
0 new messages