Frequent Write TimeOuts Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout

Naresh Yadav

unread,

Aug 1, 2013, 2:15:33 AM8/1/13

to java-dri...@lists.datastax.com

i had changed whole my code from Thrift to CQL using datastax java driver 1.0.1 and cassandra 1.2.6..

with thrift i was getting frequent timeouts from start was not able to proceed...Adopting CQL and tables designed as per that i got success and lesser timeouts....

With that i was able to insert huge data which were not working with thrift...But after a stage, data folder around 3.5GB......i am getting frequent write timeout exceptions........even i do
same earlier working use case again that also throws timeout exception now...

Exact exception is :

Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214)
    at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169)
    at com.datastax.driver.core.Session.execute(Session.java:107)
    at com.datastax.driver.core.Session.execute(Session.java:76)

Infrastructure :

I am using SINGLE node cassandra with this yaml tweaked for timeout, everything else is default :

# How long the coordinator should wait for read operations to complete
read_request_timeout_in_ms: 30000
# How long the coordinator should wait for seq or index scans to complete
range_request_timeout_in_ms: 30000
# How long the coordinator should wait for writes to complete
write_request_timeout_in_ms: 30000
# How long the coordinator should wait for truncates to complete
# (This can be much longer, because unless auto_snapshot is disabled
# we need to flush first so we can snapshot before removing the data.)
truncate_request_timeout_in_ms: 60000
# The default timeout for other, miscellaneous operations
request_timeout_in_ms: 30000

I tried to increase these timeouts drastically but no use.....Second try was cluster.getConfiguration().getSocketOptions().setConnectTimeoutMillis(30000);.....

Please HELP me i need to support many writes in cassandra with parallel threads around 100....

Thanks in advance...

Alex Popescu

unread,

Aug 1, 2013, 12:40:26 PM8/1/13

to java-dri...@lists.datastax.com

Naresh,

The fact that you were seeing timeout before makes me think this is less related to the drivers per se, but rather to the writes you are making. Could you add some details about what numbers are you currently seeing:

1. writes/second

2. concurrency level

3. size of the rows you are writing

:- a)

Naresh Yadav

unread,

Aug 4, 2013, 2:14:13 PM8/4/13

to java-dri...@lists.datastax.com

hi alex,

yesterday i had replied with further details but by mistake i clicked "Reply to author" so that reply may have gone to your inbox, not in post...please reply based on that, also if possible then include that in this post...

thanks

Naresh

Alex Popescu

unread,

Aug 5, 2013, 12:40:13 PM8/5/13

to java-dri...@lists.datastax.com

Naresh,

Can I post those details here so everyone can see them and think about your question?

thanks,

:- a)

Naresh Yadav

unread,

Aug 5, 2013, 1:24:18 PM8/5/13

to java-dri...@lists.datastax.com

hi alex,

i am not having my own reply with stats, i also searched in sent folder, no where found may be google does not save them, you may have got that in your inbox...

Anyways if you can give solution to my problem i will be very thankful to you....Please give me pointers to try....

Thanks
Naresh

Alex Popescu

unread,

Aug 5, 2013, 1:39:24 PM8/5/13

to java-dri...@lists.datastax.com

Naresh gave permission to reproduce here some details he sent me. Here's his message:

[quote]

Sorry for late reply....So here are more details of my case...i am running a usecase which stores Combinations(my project terminology) in cassandra....Currently testing storing 2.5 lakh combinations with 100 parallel threads..each thread storing one combination...real case i need to support of many CRORES but that would need different hardware and multi node cluster...

In Storing ONE combination takes around 2sec and involves :

527 INSERT INTO queries
506 UPDATE queries
954 SELECT queries

100 parallel threads parallely storing 100 combinations....

I had found behviour of WRITE TIMEOUTS random some time it works till 2 lakh then throw timeouts AND sometimes do not work even for 10k combinations...
.I am lost what is the real problem give me direct to understand problem.......PLEASE HELP ME it is very critical for my delivery....

[/quote]

Alex Popescu

unread,

Aug 5, 2013, 1:44:34 PM8/5/13

to java-dri...@lists.datastax.com

On Monday, August 5, 2013 10:24:18 AM UTC-7, Naresh Yadav wrote:

hi alex,

Anyways if you can give solution to my problem i will be very thankful to you....Please give me pointers to try....

Naresh,

We will try to look into this, but as I've already said this seems to be more of a Cassandra question and your chances to get a good answer on their mailing list or irc room are much bigger than here.

IMO this sounds like you might be reaching your hardware I/O limits (considering how many inserts/updates) you are doing. Also considering you are doing so many updates this might also be a related to compaction and garbage collection which might kick in. Unfortunately my best advice for now is to ask this question including all the details you've provided on the Cassandra mailing list.

sorry for not being able to be more helpful,

:- a)

Naresh Yadav

unread,

Aug 5, 2013, 1:53:01 PM8/5/13

to java-dri...@lists.datastax.com

ok will try there...

i had increased default read/write timeouts in cassandra.yaml.....BUT i am not able to figure out how to set read/write timeouts on datastax java driver client on Cluster object.

Naresh

On Thursday, August 1, 2013 11:45:33 AM UTC+5:30, Naresh Yadav wrote:

Michael Figuiere

unread,

Aug 5, 2013, 2:56:28 PM8/5/13

to java-dri...@lists.datastax.com

Naresh,

The WriteTimeoutException is triggered by the Cassandra node itself because it's not able to answer to your write query within the rcp_timeout time that is defined. As Alex mentioned, you're likely to have reached some IO limits within the environment you're working on. As writes only use sequential IOs you shouldn't see any WriteTimeout except maybe during some compactions. If you have such timeouts, it's likely to be because you have a mixed workload with reads. If you have such a mixed workload, you should:

- Make sure that you use a separate disk for the commitlog, which will help you to reduce the pressure on the SSTable disk as you will then have most of the IOs of the latter disk dedicated to reads, to memtable flushes and to sstable compactions.

- Try to tune your concurrent_reads in cassandra.yaml in order to reduce the pressure on the disk. See the comments in cassandra.yaml that are fairly explicit about it:

# For workloads with more data than can fit in memory, Cassandra's

# bottleneck will be reads that need to fetch data from

# disk. "concurrent_reads" should be set to (16 * number_of_drives) in

# order to allow the operations to enqueue low enough in the stack

# that the OS and drives can reorder them.

#

# On the other hand, since writes are almost never IO bound, the ideal

# number of "concurrent_writes" is dependent on the number of cores in

# your system; (8 * number_of_cores) is a good rule of thumb.

concurrent_reads: 32

concurrent_writes: 32

- If the above don't work or if you just have a laptop with a single disk, try to increase the timeouts in the cassandra.yaml as:

# How long the coordinator should wait for read operations to complete

read_request_timeout_in_ms: 10000

# How long the coordinator should wait for seq or index scans to complete

range_request_timeout_in_ms: 10000

# How long the coordinator should wait for writes to complete

write_request_timeout_in_ms: 10000

# How long a coordinator should continue to retry a CAS operation

# that contends with other proposals for the same row

cas_contention_timeout_in_ms: 1000

# How long the coordinator should wait for truncates to complete

# (This can be much longer, because unless auto_snapshot is disabled

# we need to flush first so we can snapshot before removing the data.)

truncate_request_timeout_in_ms: 60000

# The default timeout for other, miscellaneous operations

request_timeout_in_ms: 10000

You don't have any timeout to increase on the driver side as the driver will wait until the coordinator node (your single node in your case) timeout.

Michael

Naresh Yadav

unread,

Aug 5, 2013, 9:52:19 PM8/5/13

to java-dri...@lists.datastax.com

Michael,

Thank you very much for explaining in detail...Now i got direction to try out few things...

Current i have single laptop with single disk and two drives(C:/ D/)...First would try to have commit log in different drive C:/ and then will try to configure on a external disk connected on usb port..

I also tried to increase all timeouts in yaml to even 124s from default 10s...but with that only timeout exception comes late but it will come for sure....

One strange thing is once it worked for 2lakh combinations and now it is throwing timeouts even after 1k combinations....i tried fresh installation also, little clueless what has changed...

thanks again for giving me direction...

Naresh

On Thursday, August 1, 2013 11:45:33 AM UTC+5:30, Naresh Yadav wrote:

Naresh Yadav

unread,

Aug 7, 2013, 9:58:17 AM8/7/13

to java-dri...@lists.datastax.com

I had run cassandra server in DEBUG mode and here is log which has errorrs :

http://pastebin.com/rW0B4MD0

Please have a look and guide me thanks

Naresh

On Thursday, August 1, 2013 11:45:33 AM UTC+5:30, Naresh Yadav wrote:

Michael Figuiere

unread,

Aug 7, 2013, 11:12:18 AM8/7/13

to java-dri...@lists.datastax.com

Hi Naresh,

I feel at this step it's mostly a matter of tuning your Cassandra configuration for your hardware setup and your workload, so I'd suggest to continue this conversation on the cassandra-user mailing list where you may find some additional suggestions beyond the basic advices we could give you here.

Michael

Reply all

Reply to author

Forward