gxm
unread,Aug 26, 2010, 1:03:21 PM8/26/10Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to project-voldemort
I'm running Voldemort 0.81.
I've been seeing occasional InsufficientOperationalNodesExceptions on
puts when doing a 3+ hour test with our application in EC2. I
increased the routing timeout to a full minute, yet I still see the
error.
voldemort.store.InsufficientOperationalNodesException: 1 writes
succeeded, but 2 are required.
This seems very odd to me, as I have replication set to 3 and required
writes set to 2, so that means 2 nodes are not responding within a
minute, and meanwhile, thousands of other puts are succeeding to the
same Voldemort cluster.
Digging into the RoutedStore.put code, I see that
ObsoleteVersionException is ignored, not treated as a success or an
explicit failure, so it becomes a silent failure, which leads to my
scenario above.
catch(ObsoleteVersionException e) {
// ignore this completely here
// this means that a higher version was able
// to write on this node and should be termed
as clean
// success.
}
In my case, the design of the store in question causes multiple
simultaneous writers, which will lead to the occasional
ObsoleteVersionException. I wasn't worried about that, because the
DefaultStoreClient handles retries for ObsoleteVersionException by
using applyUpdate.
I think that any ObsoleteVersionExceptions should be thrown up the
stack. Claiming that it is a "success" in the RoutedStore feels like
business logic.
I have to correct this issue for my application. I'd be happy to do
so as a patch release to 0.81, and/or on the master.