Batch Changes

Tyler Hobbs

unread,

Oct 18, 2010, 6:12:25 PM10/18/10

to pycass...@googlegroups.com

I would like to do a couple of things with the batch interface:

1. Change the default batch queue size to 0 (unbounded), if not
remove it entirely.

2. Not clear the batch after a send

Here's my reasoning for the changes:

Lowering overhead is not the primary purpose of batch operations;
in fact, large, infrequent batches actually perform worse than
spreading writes out evenly.

What batches *are* useful is replay ability in the event of a failure.
However, clearing after a send() removes this ability. I don't think
clear() should be provided as an operation, either. I tend to favor
encouraging creation of new batches instead.

Since I wasn't the one responsible for the original batch operations,
I wanted to check with everyone on this. Is there some advantage
here that the current version offers that I am missing?

- Tyler

Eric Evans

unread,

Oct 19, 2010, 11:22:33 AM10/19/10

to pycass...@googlegroups.com

On Mon, Oct 18, 2010 at 5:12 PM, Tyler Hobbs <ty...@riptano.com> wrote:
> What batches *are* useful is replay ability in the event of a failure.
> However, clearing after a send() removes this ability. I don't think
> clear() should be provided as an operation, either. I tend to favor
> encouraging creation of new batches instead.

As Devil's Advocate, isn't it the current behavior that an exception
encountered in send() will result in the buffer *not* being cleared?

--
Eric Evans
john.er...@gmail.com

Daniel Lundin

unread,

Oct 19, 2010, 12:03:45 PM10/19/10

to pycass...@googlegroups.com

Yeah, that sound about right.

A typical use case for batch ops is as an async Actor:

* Create batch mutator. Keep it around as a long-lived object.
* Continuously feed/send it updates.
* Periodically, as the buffer overflows, ops will be sent "automatically".

To enable replay, wouldn't it be simplest just to do something like:

* On send, do NOT clear immediately, but set a flag marking the
buffer as done *before doing the (presumably failing) batch_mutate*.
* On (the next) update, check the flag - and if set clear the buffer
before proceeding.

This will allow for replay/resend on error. Just call 'send' on the
mutator again, before doing additional updates.

I suppose one might even add some sort of automatic retry strategy I
suppose, but I don't think that particular hammer belongs in the API.

/d

Tyler Hobbs

unread,

Oct 19, 2010, 6:15:26 PM10/19/10

to pycass...@googlegroups.com

I agree that automatic replay is probably not a good thing; users should
handle that individually.

Did you mean to suggest setting the flag *after* a successful send?

Would you object to disabling autosend/autoclear by default but still
provide those as options?

- Tyler

Reply all

Reply to author

Forward