Andrew Swan
unread,Jan 17, 2012, 11:42:08 PM1/17/12Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to Scale 7 - Libraries and systems for scalable computing
I'm new to Pelops and Cassandra in general, so please forgive me if
this question is naive. I've Googled a little and nothing relevant
popped up.
I was wondering if there are any plans to implement automatic batch
flushing in Pelops. Under the current API, you:
1. obtain a Mutator
2. register one or more operations (e.g. puts and deletions) with it
3. execute it
For large numbers of operations (e.g. during bulk loads), postponing
all operations until execute is called leads to problems such as
running out of memory. This could be solved by the Mutator being able
to automatically flush any outstanding operations upon certain
criteria being reached, for example:
* a given time period has elapsed
* a given number of operations are outstanding
* a given number of bytes are waiting to be written
These criteria would be encapsulated into say a FlushCriteria class
that would be passed to the factory method that creates the Mutator.
The client's workflow would then change to:
1. obtain a Mutator, passing the desired FlushPolicy
2. register one or more operations (e.g. puts and deletions) with it
3. flush it explicitly on completion (to execute any operations that
weren't automatically flushed during step 2)
The most trivial FlushCriteria would be FlushCriteria.NEVER, which
replicates the current behaviour of not sending any operations to
Cassandra until the batch has been fully loaded into memory.
This would be a relatively simple enhancement; has anything similar
already been considered?