Delayed mutator.execute()

47 views
Skip to first unread message

felix

unread,
Dec 9, 2011, 9:57:39 AM12/9/11
to sca...@googlegroups.com
Hi,

i have a small application which recieves messages from multiple remote clients, parses these messages and store them into cassandra using pelops.
I create every single message with mutator.writeColumn followed by mutator.execute.

This is kinda slow and generate a moderate pressure to the cassandra servers.

My first idea was to start a seperate Thread which periodically (every second) runs mutator.execute. The creation still uses mutator.writeColumn but not followed by a execute.

When i try this, it always leads to a 
org.scale7.cassandra.pelops.exceptions.PelopsException: java.util.ConcurrentModificationException

during the mutator.execute call.

So my Question is how can i avoid to flush my writes to cassandra on every single message received ?

Greetings

Stefan Majer


Dominic Williams

unread,
Dec 9, 2011, 6:43:05 PM12/9/11
to sca...@googlegroups.com

Hi Felix, the Mutator already writes batches:

1.  Write some number of messages to the Mutator (maybe keep a counter)

2. Execute the Mutator to send all the batched up messages all at once (for example when the counter hits some threshold)

Best Dominic

Dan Washusen

unread,
Dec 11, 2011, 10:47:04 PM12/11/11
to sca...@googlegroups.com
The Mutator isn't thread safe which is why you're seeing the concurrent modification errors...  If you want to continue down that path then I'd suggest having an instance per thread (see ThreadLocal), alternatively a producer-consumer pattern could be used (many threads generating messages, one thread writing to cassandra in larger batches).

However, before you start changing your code have you verified the cause of the slowness?  Cassandra should be able to do many thousands of writes a second so the first thing I'd do is check if Cassandra has been optimized for your scenario (MemTables are a good place to start: http://wiki.apache.org/cassandra/MemtableThresholds#Memtable_Thresholds).

Cheers,
Dan

felix

unread,
Dec 12, 2011, 2:05:28 PM12/12/11
to sca...@googlegroups.com
Hi,

I already found the reason for the writes getting slower over time. I was using a single Mutator and didnt create a new one after execute. I change my whole application (in fact only my DAOs) to create a new Mutator instance for every write. Now the write speed is at the same pace the whole day long.

Im in progress of writing a batch interface to my DAOs where the asking client can send many writes to Cassandra and i create a Mutator per Thread. In another Thread i collect all created Mutator instances an execute on all of them every second.

With this approach i can write ~100.000 rows/ sec on my decent laptop, but im not sure if this is the way to go, as i cant read as many rows back as i wrote into. Still writing tests to figure out whats going on.

Do you have any pointers to examples for this pattern ?

Greetings
Stefan

Dan Washusen

unread,
Dec 12, 2011, 6:45:10 PM12/12/11
to sca...@googlegroups.com
Unless you have a real performance issue then I'd suggest keeping it simple.  Have each thread create a new Mutator instance and write the values to Cassandra for each unit of work.  You should be able to do a LOT of writes per second using that method and it keeps your code clean, lean and easy to test…

If you do start seeing performance issues then I'd suggest you start tuning Cassandra for your scenario (the default setup is conservative).

Cheers,
-- 
Dan Washusen
Make big files fly

felix

unread,
Dec 13, 2011, 3:35:54 AM12/13/11
to sca...@googlegroups.com
Hi,

thanks for the advise, i was on the same track. I will check this agains once we have real performance issues.

Greetings

Stefan
Reply all
Reply to author
Forward
0 new messages