Read repair fail

51 views
Skip to first unread message

Eishay Smith

unread,
Jan 20, 2010, 2:29:47 PM1/20/10
to project-voldemort
I saw lots of these exceptions in the stdout of of a voldemort client:
Exception in thread "voldemort-client-thread-10" java.util.NoSuchElementException
        at java.util.HashMap$HashIterator.nextEntry(Unknown Source)
        at java.util.HashMap$KeyIterator.next(Unknown Source)
        at com.google.common.collect.StandardMultimap$WrappedCollection$WrappedIterator.next(StandardMultimap.java:491)
        at voldemort.store.routed.ReadRepairer.getRepairs(ReadRepairer.java:109)
        at voldemort.store.routed.RoutedStore$5.run(RoutedStore.java:576)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Unknown Source)
Should we see it in the log output ?
Why do you think it happens ?

To debug it I increased the debug level of Voldemort's RoutedStore logger and saw tons of read repair fail massages of two types, obsolete version and InvalidMetadataException.
Why is it in debug mode ? Isn't it an error that should pop up in error level ?
My voldemort client is using three different stores, it would be nice to have more metadata in the logs since its hard to isolate the problematic store (if there's such).
Why do you think I see these exceptions ?

1.
voldemort.store.routed.RoutedStore$5,Doing read repair on node 0 for key '[116, 95, 83, 71, 80]' with version version(2:1933, 4:3).
voldemort.store.routed.RoutedStore$5,Read repair failed:
voldemort.store.InvalidMetadataException: client routing strategy not in sync with store routing strategy!
        at sun.reflect.GeneratedConstructorAccessor63.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(Unknown Source)
        at java.lang.reflect.Constructor.newInstance(Unknown Source)
        at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:117)
        at voldemort.utils.ReflectUtils.callConstructor(ReflectUtils.java:104)
        at voldemort.store.ErrorCodeMapper.getError(ErrorCodeMapper.java:63)
        at voldemort.client.protocol.vold.VoldemortNativeClientRequestFormat.checkException(VoldemortNativeClientRequestFormat.java:166)
        at voldemort.client.protocol.vold.VoldemortNativeClientRequestFormat.readPutResponse(VoldemortNativeClientRequestFormat.java:156)
        at voldemort.store.socket.SocketStore.put(SocketStore.java:134)
        at voldemort.store.socket.SocketStore.put(SocketStore.java:47)
        at voldemort.store.logging.LoggingStore.put(LoggingStore.java:122)
        at voldemort.store.routed.RoutedStore$5.run(RoutedStore.java:582)
        at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(Unknown Source)
2.
voldemort.store.routed.RoutedStore$5,Doing read repair on node 4 for key '[53, 53, 51, 55, 55, 53, 49, 48, 56]' with version version(1:14, 3:1).
voldemort.store.routed.RoutedStore$5,Read repair cancelled due to obsolete version on node 0 for key '[116, 95, 72, 66, 67]' with version version(0:5, 4:3800): Key 745f484243 version(0:5, 4:3800) is obsolete, it is no greater than the current version of version(0:5, 4:3800).

The client is doing both read and write there are multiple threads but no two threads are updating the same key.
Running put and get on the same keys from the command line and got no error messages.

Thanks,
Eishay

bhupesh bansal

unread,
Jan 20, 2010, 3:40:00 PM1/20/10
to project-...@googlegroups.com
voldemort.store.InvalidMetadataException looks fishy, it is thrown when the client thinks the key should go to Node A but node A should not contatin that key (either mastered or replicated)

read repair checks the routing strategy and should send request to right servers only. Did you changed the cluster.xml after the client  was started ?

Best
Bhupesh


--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To post to this group, send email to project-...@googlegroups.com.
To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.


Eishay Smith

unread,
Jan 20, 2010, 3:47:47 PM1/20/10
to project-...@googlegroups.com
> Did you changed the cluster.xml after the client  was started ?
No, I actually restarted it to have the log4j debug setup.
All the servers use the same configs.

ijuma

unread,
Jan 21, 2010, 2:43:17 AM1/21/10
to project-voldemort
On Jan 20, 8:47 pm, Eishay Smith <eis...@gmail.com> wrote:
> > Did you changed the cluster.xml after the client  was started ?
>
> No, I actually restarted it to have the log4j debug setup.
> All the servers use the same configs.

I think we need a bug report for this. In addition to the odd
InvalidMetadataException, there is also the problem with the
NoSuchElementException.

Ismael

Jay Kreps

unread,
Jan 22, 2010, 1:02:41 AM1/22/10
to project-...@googlegroups.com
Hi Eishay,

The obsolete version is in debug because that just means the value has
been repaired already. This happens commonly when there are two
concurrent reads or something else like that, you end up with two
concurrent repairs attempting to apply the same repairing, one of
which must fail. But that logging was only supposed to be for obsolete
versions and it looks like it is hiding a concurrency bug
(NoSuchElementException). So there are two problems (1) not logging
exceptions and (2) the NoSuchElement exception.

-Jay

On Wed, Jan 20, 2010 at 11:29 AM, Eishay Smith <eis...@gmail.com> wrote:

Alex Feinberg

unread,
Jan 28, 2010, 7:41:48 PM1/28/10
to project-...@googlegroups.com
Bhupesh and I have also observed this in our own test environment last
week. I've filed an issue for this:

http://code.google.com/p/project-voldemort/issues/detail?id=198

Thanks,
- Alex

ijuma

unread,
Feb 2, 2010, 7:51:36 AM2/2/10
to project-voldemort
On Jan 29, 12:41 am, Alex Feinberg <feinb...@gmail.com> wrote:
> Bhupesh and I have also observed this in our own test environment last
> week. I've filed an issue for this:
>
> http://code.google.com/p/project-voldemort/issues/detail?id=198

Also see:

http://code.google.com/p/project-voldemort/issues/detail?id=211

Ismael

ijuma

unread,
Feb 3, 2010, 4:41:06 AM2/3/10
to project-voldemort

Eishay,

The problem below[1] is now fixed in master. That looks like issue
#211 and unnecessary read repair calls would be invoked during getAll
calls and they would fail with that error message once they reached
the server.

Ismael

[1] voldemort.store.routed.RoutedStore$5,Doing read repair on node 4

Eishay Smith

unread,
Feb 3, 2010, 10:22:58 AM2/3/10
to project-...@googlegroups.com
Thanks Ismael!
We do use getAll a lot.

Reply all
Reply to author
Forward
0 new messages