Issue with ObsoleteVersionException

35 views

Skip to first unread message

gxm

unread,

Aug 26, 2010, 1:03:21 PM8/26/10

to project-voldemort

I'm running Voldemort 0.81.
I've been seeing occasional InsufficientOperationalNodesExceptions on
puts when doing a 3+ hour test with our application in EC2. I
increased the routing timeout to a full minute, yet I still see the
error.

voldemort.store.InsufficientOperationalNodesException: 1 writes
succeeded, but 2 are required.

This seems very odd to me, as I have replication set to 3 and required
writes set to 2, so that means 2 nodes are not responding within a
minute, and meanwhile, thousands of other puts are succeeding to the
same Voldemort cluster.

Digging into the RoutedStore.put code, I see that
ObsoleteVersionException is ignored, not treated as a success or an
explicit failure, so it becomes a silent failure, which leads to my
scenario above.
catch(ObsoleteVersionException e) {
// ignore this completely here
// this means that a higher version was able
// to write on this node and should be termed
as clean
// success.
}

In my case, the design of the store in question causes multiple
simultaneous writers, which will lead to the occasional
ObsoleteVersionException. I wasn't worried about that, because the
DefaultStoreClient handles retries for ObsoleteVersionException by
using applyUpdate.

I think that any ObsoleteVersionExceptions should be thrown up the
stack. Claiming that it is a "success" in the RoutedStore feels like
business logic.

I have to correct this issue for my application. I'd be happy to do
so as a patch release to 0.81, and/or on the master.

gxm

unread,

Sep 3, 2010, 5:20:11 PM9/3/10

to project-voldemort

On Aug 26, 10:03 am, gxm <moull...@gmail.com> wrote:
> I think that any ObsoleteVersionExceptions should be thrown up the
> stack. Claiming that it is a "success" in the RoutedStore feels like
> business logic.

I've done some more testing with the Voldemort code, and now realize
that the RoutedStore is effectively using the first node as a lock for
detecting obsolete versions, so my quoted comment can be ignored.

However, ObsoleteVersionException still needs to be explicitly counted
as a success.

Reply all

Reply to author

Forward

0 new messages