I'm using voldemort 0.57 with six nodes and multiple stores.
The problematic store has:
<replication-factor>4</replication-factor> \
<required-reads>1</required-reads> \
<preferred-reads>1</preferred-reads> \
<preferred-writes>4</preferred-writes> \
<required-writes>2</required-writes> \
<retention-days>1</retention-days> \
Using String for key, protobuf for value.
Read time average is 2ms, using getAll with 1-10 keys. Happy with that
part.
Writing a fresh key/value (key that was never used) is 5ms.
There is a set of few thousand keys and they are being updated and
read all the time.
My cluster got into a state that when I'm trying to update an existing
key I'm getting something like:
============
voldemort.versioning.ObsoleteVersionException: Key 745f41444749 version
(1:14) is obsolete, it is no greater than the current version of
version(1:2404).
============
The voldemort log looks like this:
============
voldemort.store.routed.RoutedStore$3,Error in PUT on node 2(<snip>)
voldemort.versioning.ObsoleteVersionException: Key 745f41444749 version
(1:17) is obsolete, it is no greater than the current version of
version(1:2404).
...
Error in PUT on node 4(<snip>)
voldemort.versioning.ObsoleteVersionException: Key 745f41444749 version
(1:17) is obsolete, it is no greater than the current version of
version(1:2404).
... <after 15 sec>
voldemort.store.routed.RoutedStore,Timed out waiting for put # 3 of 3
to succeed.
============
The server's log looks clean when I'm doing the transaction but in
some point in time I got few BDB lock exceptions:
===========
Exception in thread "main"
voldemort.store.StorageInitializationException:
com.sleepycat.je.EnvironmentLockedException: (JE 3.3.82) A je.lck file
exists in <snip>/data/bd
b The environment can not be locked for single writer access.
at voldemort.store.bdb.BdbStorageConfiguration.getStore
(BdbStorageConfiguration.java:113)
at voldemort.server.storage.StorageService.getStorageEngine
(StorageService.java:311)
at voldemort.server.storage.StorageService.startInner
(StorageService.java:154)
at voldemort.server.AbstractService.start(AbstractService.java:
63)
at voldemort.server.VoldemortServer.startInner
(VoldemortServer.java:180)
at voldemort.server.AbstractService.start(AbstractService.java:
63)
at com.kaching.supermap.SuperMapServer.start
(SuperMapServer.java:119)
at com.kaching.supermap.SuperMapServer.main
(SuperMapServer.java:194)
Caused by: com.sleepycat.je.EnvironmentLockedException: (JE 3.3.82) A
je.lck file exists in <snip>/data/bdb The environment can not be
locked for single writer access.
at com.sleepycat.je.log.FileManager.lockEnvironment
(FileManager.java:1724)
at com.sleepycat.je.log.FileManager.<init>(FileManager.java:
260)
at com.sleepycat.je.dbi.EnvironmentImpl.<init>
(EnvironmentImpl.java:327)
at com.sleepycat.je.dbi.DbEnvPool.getEnvironment
(DbEnvPool.java:147)
at com.sleepycat.je.Environment.<init>(Environment.java:210)
at com.sleepycat.je.Environment.<init>(Environment.java:150)
at voldemort.store.bdb.BdbStorageConfiguration.getEnvironment
(BdbStorageConfiguration.java:143)
at voldemort.store.bdb.BdbStorageConfiguration.getStore
(BdbStorageConfiguration.java:101)
===========
Any ideas of how to approach it?
Sorry for the delay, catching up on email.
The lock exception indicates that a writer thread failed to obtain the
lock in a given period of time. This can happen naturally if too many
slow writes occur simultaneously, but it is usually not a problem. One
issue we had was overaggressive locking in previous version of v.
where we acquired a write lock even when reading, which exacerbated
the problem. We fixed this so upgrading will likely ameliorate the
lock timeout issue.
I don't think the obsolete version exception is related to the lock
timeout. An exception about an obsolete version is usually an
application level error--two threads trying to simultaneously
overwrite the same value, loser gets an exception rather than
clobbering the new value, which is the right behavior. However, your
exception gives the version of the write as 17, but the version of the
value in the store as 2404, and it seems unlikely that 2404 - 17
updates occurred in between the read and the update (right?). You
aren't somehow manually setting the version are you?
One suspicion I had was that maybe there was some way that we could
incorrectly deserialize the version number. I tried to reproduce it
with the following test VectorClock.testIncrementAndSerialize():
public void testIncrementAndSerialize() {
int node = 1;
VectorClock vc = getClock(node);
assertEquals(node, vc.getMaxVersion());
int increments = 3000;
for(int i = 0; i < increments; i++) {
vc.incrementVersion(node, 45);
// serialize
vc = new VectorClock(vc.toBytes());
}
assertEquals(increments + 1, vc.getMaxVersion());
}
However this test passes. So I am a bit mystified.
-Jay
> --
>
> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> To post to this group, send email to project-...@googlegroups.com.
> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
>
>
>
-Jay
-Jay
Indeed. I've been using preferred reads = 2 since then to avoid such
problems.
Ismael