public class ObsTest {
static final int count = 127;
static final String bootstrapUrl = "tcp://localhost:6666";
static final ClientConfig config = new ClientConfig()
.setBootstrapUrls(bootstrapUrl);
static final StoreClientFactory factory = new
SocketStoreClientFactory(
config);
static final String key = "1";
public String myToString(int[] array) {
StringBuilder sb = new StringBuilder();
for (int x : array)
sb.append(x + ",");
return sb.toString();
}
static int[] newIntArray(int num) {
return new int[num];
}
static int[] newIntArray(String sb) {
String sa[] = sb.split(",");
int[] array = new int[sa.length];
int i = 0;
for (String s : sa)
array[i++] = Integer.parseInt(s);
return array;
}
class Worker extends Thread {
final Logger log = Logger.getLogger("testVoldemort");
final StoreClient<String, String> client = factory
.getStoreClient("test");
int num;
public void run() {
for (int i = 0; i < count; i++) {
try {
Versioned<String> value = client.get(key);
int[] di = newIntArray(value.getValue());
di[num]++;
value.setObject(myToString(di));
client.put(key, value);
} catch (ObsoleteVersionException e) {
log.info("Obsolete Version, " + num + ", " + i);
i--; // this retries the operation....
} catch (InsufficientOperationalNodesException e) {
log.info("Insufficient Nodes, " + num + ", " + i);
// the operation seemed to work. Do not need to retry?!?....
}
}
}
Worker(int num) {
this.num = num;
}
}
ObsTest(int numThreads) throws InterruptedException {
StoreClient<String, String> client = factory.getStoreClient("test");
client.delete(key);
String value = myToString(new int[numThreads]);
client.put(key, value);
Worker[] threads = new Worker[numThreads];
final Logger log = Logger.getLogger("testVoldemort");
int i = 0;
log.info("Alloc " + numThreads);
for (Worker t : threads)
threads[i] = new Worker(i++);
log.info("Start " + numThreads);
for (Worker t : threads)
t.start();
log.info("Join " + numThreads);
for (Worker t : threads)
t.join();
value = client.getValue(key);
int[] v = newIntArray(value);
i = 0;
for (Worker t : threads)
assertEquals(count, v[i++]);
}
static public void main(String[] args) throws InterruptedException {
BasicConfigurator.configure();
Logger.getRootLogger().setLevel(Level.INFO);
for (int i = 1; i < 10; i++)
new ObsTest(i);
}
On Tue, Jun 7, 2011 at 4:39 PM, James Hughes <james.hug...@gmail.com> wrote: > In the following multi threaded demonstration code I get timeouts and > Insufficient Nodes Exceptions when no node failures have happened. > The
> This is has occurred in the past on 0.81 and most recently on 0.90 > using voldemort-voldemort-46a0ec8.
> Any suggestions as to what I am doing wrong would be appreciated.
> public class ObsTest { > static final int count = 127; > static final String bootstrapUrl = "tcp://localhost:6666"; > static final ClientConfig config = new ClientConfig() > .setBootstrapUrls(bootstrapUrl); > static final StoreClientFactory factory = new > SocketStoreClientFactory( > config); > static final String key = "1";
> public String myToString(int[] array) { > StringBuilder sb = new StringBuilder(); > for (int x : array) > sb.append(x + ","); > return sb.toString(); > }
> static int[] newIntArray(String sb) { > String sa[] = sb.split(","); > int[] array = new int[sa.length]; > int i = 0; > for (String s : sa) > array[i++] = Integer.parseInt(s); > return array; > }
> class Worker extends Thread { > final Logger log = Logger.getLogger("testVoldemort"); > final StoreClient<String, String> client = factory > .getStoreClient("test"); > int num;
> public void run() { > for (int i = 0; i < count; i++) { > try { > Versioned<String> value = client.get(key); > int[] di = newIntArray(value.getValue()); > di[num]++; > value.setObject(myToString(di)); > client.put(key, value); > } catch (ObsoleteVersionException e) { > log.info("Obsolete Version, " + num + ", " + i); > i--; // this retries the operation.... > } catch (InsufficientOperationalNodesException e) { > log.info("Insufficient Nodes, " + num + ", " + i); > // the operation seemed to work. Do not need to retry?!?.... > } > } > }
> Worker(int num) { > this.num = num; > } > }
> ObsTest(int numThreads) throws InterruptedException { > StoreClient<String, String> client = factory.getStoreClient("test"); > client.delete(key); > String value = myToString(new int[numThreads]); > client.put(key, value); > Worker[] threads = new Worker[numThreads]; > final Logger log = Logger.getLogger("testVoldemort"); > int i = 0; > log.info("Alloc " + numThreads); > for (Worker t : threads) > threads[i] = new Worker(i++); > log.info("Start " + numThreads); > for (Worker t : threads) > t.start(); > log.info("Join " + numThreads); > for (Worker t : threads) > t.join(); > value = client.getValue(key); > int[] v = newIntArray(value); > i = 0; > for (Worker t : threads) > assertEquals(count, v[i++]); > }
> static public void main(String[] args) throws InterruptedException { > BasicConfigurator.configure(); > Logger.getRootLogger().setLevel(Level.INFO); > for (int i = 1; i < 10; i++) > new ObsTest(i); > } > }
> -- > You received this message because you are subscribed to the Google Groups "project-voldemort" group. > To post to this group, send email to project-voldemort@googlegroups.com. > To unsubscribe from this group, send email to project-voldemort+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
Default JVM settings. The timeout seems to be 10 seconds, and I do not
see high CPU time during the pauses, so I find it hard to think that
this is a timeout because of GC. Any suggestions as to what I should
try for settings?
Also, I am using vanilla test_config1 and test_config2 with no
messages of problems.
Additionally, I do not get this problem with 1 thread.
Jim
On Jun 7, 4:48 pm, Alex Feinberg <feinb...@gmail.com> wrote:
> Is your client or server experiencing garbage collection?
> What JVM settings are you using?
> Thanks,
> - Alex
> On Tue, Jun 7, 2011 at 4:39 PM, James Hughes <james.hug...@gmail.com> wrote:
> > In the following multi threaded demonstration code I get timeouts and
> > Insufficient Nodes Exceptions when no node failures have happened.
> > The
> > This is has occurred in the past on 0.81 and most recently on 0.90
> > using voldemort-voldemort-46a0ec8.
> > Any suggestions as to what I am doing wrong would be appreciated.
> > ObsTest(int numThreads) throws InterruptedException {
> > StoreClient<String, String> client = factory.getStoreClient("test");
> > client.delete(key);
> > String value = myToString(new int[numThreads]);
> > client.put(key, value);
> > Worker[] threads = new Worker[numThreads];
> > final Logger log = Logger.getLogger("testVoldemort");
> > int i = 0;
> > log.info("Alloc " + numThreads);
> > for (Worker t : threads)
> > threads[i] = new Worker(i++);
> > log.info("Start " + numThreads);
> > for (Worker t : threads)
> > t.start();
> > log.info("Join " + numThreads);
> > for (Worker t : threads)
> > t.join();
> > value = client.getValue(key);
> > int[] v = newIntArray(value);
> > i = 0;
> > for (Worker t : threads)
> > assertEquals(count, v[i++]);
> > }
> > static public void main(String[] args) throws InterruptedException {
> > BasicConfigurator.configure();
> > Logger.getRootLogger().setLevel(Level.INFO);
> > for (int i = 1; i < 10; i++)
> > new ObsTest(i);
> > }
> > }
> > --
> > You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> > To post to this group, send email to project-voldemort@googlegroups.com.
> > To unsubscribe from this group, send email to project-voldemort+unsubscribe@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.
The fact you're seeing it with multiple threads but not a single thread indicates that is likely a client side GC issue. Can you enable GC logging and see how long the individual pauses are on the client? You may want to enable GC logging on the server as well.
For the server, I suggest the following JVM settings (this is on a machine with 32gb of ram):
On Tue, Jun 7, 2011 at 5:13 PM, James Hughes <james.hug...@gmail.com> wrote: > Default JVM settings. The timeout seems to be 10 seconds, and I do not > see high CPU time during the pauses, so I find it hard to think that > this is a timeout because of GC. Any suggestions as to what I should > try for settings?
> Also, I am using vanilla test_config1 and test_config2 with no > messages of problems.
> Additionally, I do not get this problem with 1 thread.
> Jim
> On Jun 7, 4:48 pm, Alex Feinberg <feinb...@gmail.com> wrote: >> Is your client or server experiencing garbage collection?
>> What JVM settings are you using?
>> Thanks, >> - Alex
>> On Tue, Jun 7, 2011 at 4:39 PM, James Hughes <james.hug...@gmail.com> wrote: >> > In the following multi threaded demonstration code I get timeouts and >> > Insufficient Nodes Exceptions when no node failures have happened. >> > The
>> > This is has occurred in the past on 0.81 and most recently on 0.90 >> > using voldemort-voldemort-46a0ec8.
>> > Any suggestions as to what I am doing wrong would be appreciated.
>> > ObsTest(int numThreads) throws InterruptedException { >> > StoreClient<String, String> client = factory.getStoreClient("test"); >> > client.delete(key); >> > String value = myToString(new int[numThreads]); >> > client.put(key, value); >> > Worker[] threads = new Worker[numThreads]; >> > final Logger log = Logger.getLogger("testVoldemort"); >> > int i = 0; >> > log.info("Alloc " + numThreads); >> > for (Worker t : threads) >> > threads[i] = new Worker(i++); >> > log.info("Start " + numThreads); >> > for (Worker t : threads) >> > t.start(); >> > log.info("Join " + numThreads); >> > for (Worker t : threads) >> > t.join(); >> > value = client.getValue(key); >> > int[] v = newIntArray(value); >> > i = 0; >> > for (Worker t : threads) >> > assertEquals(count, v[i++]); >> > }
>> > static public void main(String[] args) throws InterruptedException { >> > BasicConfigurator.configure(); >> > Logger.getRootLogger().setLevel(Level.INFO); >> > for (int i = 1; i < 10; i++) >> > new ObsTest(i); >> > } >> > }
>> > -- >> > You received this message because you are subscribed to the Google Groups "project-voldemort" group. >> > To post to this group, send email to project-voldemort@googlegroups.com. >> > To unsubscribe from this group, send email to project-voldemort+unsubscribe@googlegroups.com. >> > For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.
> -- > You received this message because you are subscribed to the Google Groups "project-voldemort" group. > To post to this group, send email to project-voldemort@googlegroups.com. > To unsubscribe from this group, send email to project-voldemort+unsubscribe@googlegroups.com. > For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.
You are only using a single key for all operations among 45 threads.
This will lead to significant contention for the puts, and can lead to
a number of issues, including timeouts,
InsufficientOperationalNodesException and ObsoleteVersionException.
When you log an exception, use the exception as the second parameter,
log.info("msg", e) to get additional information.
You may want to increase the default max_connections from 6 to a
number significantly higher, such as the number of threads you
create. You can monitor SocketPool via jmx (use jconsole) to see how
long threads are waiting for a socket.
I'm curious about your intended usage pattern, and what this test is
intended to demonstrate.
Hi Greg and Alex: I will try to answer both in sort of reverse order.
This program is to test the corner case where several machines contend
for the same record. The program is the smallest program I can think
of to demonstrate the problem.
The motivation is a set of larger problems that use Voldemort as a
shared database with a goal of using the versioned put as an atomic
operation. As you can tell, I am being a bit coy about the exact
application. Each of our larger problems have 1000s of lines of code,
so discussing them in this forum would be more difficult, so I created
this test program to provide the smallest application that
demonstrates what the larger programs are seeing.
We fully expected to see many ObsoleteVersionExceptions, but we did
not expect to receive InsufficientOperationalNodesExceptions as normal
congestion indication ,and the documentation did not suggest how to
handle this if it occurs. If this is indeed normal, OK, I just want to
know.
Even though the program can go up to 10 worker threads, with just 2
threads we get both the ObsoleteVersionExceptions(expected) and the
timeout/InsufficientOperationalNodesExceptions.
Additionally, we have seen this problem on large scale systems and on
a single machine. This program fails on my laptop communicating to 2
Voldemort services (also on the same laptop). The machine has 8GB of
RAM and is not busy at all. Am I correct that with just 2 threads and
the test_config1/2 (a 222 store), there should never be more than 4
operations outstanding at a time?
I changed to log.info(msg,e) as suggested and get the the following.
The ObsoleteVersionException is expected and there are many of these
107154 [Thread-16] INFO testVoldemort - Obsolete Version, 3, 41
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:283)
is obsolete, it is no greater than the current version of
version(0:283).
The timeout, and the InsufficientOperationalNodesException always
happen together and are much less frequent.
107263 [Thread-17] WARN voldemort.store.routed.RoutedStore - Timed
out waiting for put # 1 of 1 to succeed.
107263 [Thread-17] INFO testVoldemort - Insufficient Nodes, 4, 24
voldemort.store.InsufficientOperationalNodesException: 1 writes
succeeded, but 2 are required.
at voldemort.store.routed.RoutedStore.put(RoutedStore.java:776)
at voldemort.store.routed.RoutedStore.put(RoutedStore.java:72)
at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
at voldemort.store.stats.StatTrackingStore.put(StatTrackingStore.java:
90)
at
voldemort.store.serialized.SerializingStore.put(SerializingStore.java:
109)
at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
at voldemort.client.DefaultStoreClient.put(DefaultStoreClient.java:
208)
at com.jims.ObsTest.ObsTest$Worker.run(ObsTest.java:61)
On Jun 8, 8:11 am, gxm <moull...@gmail.com> wrote:
> You are only using a single key for all operations among 45 threads.
> This will lead to significant contention for the puts, and can lead to
> a number of issues, including timeouts,
> InsufficientOperationalNodesException and ObsoleteVersionException.
> When you log an exception, use the exception as the second parameter,
> log.info("msg", e) to get additional information.
> You may want to increase the default max_connections from 6 to a
> number significantly higher, such as the number of threads you
> create. You can monitor SocketPool via jmx (use jconsole) to see how
> long threads are waiting for a socket.
> I'm curious about your intended usage pattern, and what this test is
> intended to demonstrate.
I enabled client GC and the GC runs just after launch (from Eclipse)
and there is no other GC during the run of 1 or 2 workers. The 2
workers gets the timeout. Here is the complete output.
I realize this is a lot to ask, but can someone run the program in
their environment and tell me if you have the same issues?
1 [main] INFO voldemort.client.DefaultStoreClient - bootstrapping
metadata.
[GC 17024K->977K(83008K), 0.0074006 secs]
198 [main] INFO testVoldemort - Alloc 1
198 [main] INFO voldemort.client.DefaultStoreClient - bootstrapping
metadata.
213 [main] INFO testVoldemort - Start 1
213 [main] INFO testVoldemort - Join 1
378 [main] INFO voldemort.client.DefaultStoreClient - bootstrapping
metadata.
394 [main] INFO testVoldemort - Alloc 2
394 [main] INFO voldemort.client.DefaultStoreClient - bootstrapping
metadata.
407 [main] INFO voldemort.client.DefaultStoreClient - bootstrapping
metadata.
419 [main] INFO testVoldemort - Start 2
420 [main] INFO testVoldemort - Join 2
428 [Thread-3] INFO testVoldemort - Obsolete Version, 0, 1
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:3) is
obsolete, it is no greater than the current version of version(0:3).
446 [Thread-3] INFO testVoldemort - Obsolete Version, 0, 3
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:7) is
obsolete, it is no greater than the current version of version(0:7).
454 [Thread-3] INFO testVoldemort - Obsolete Version, 0, 5
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:12) is
obsolete, it is no greater than the current version of version(0:12).
455 [Thread-3] INFO testVoldemort - Obsolete Version, 0, 5
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:13) is
obsolete, it is no greater than the current version of version(0:13).
15456 [Thread-4] WARN voldemort.store.routed.RoutedStore - Timed out
waiting for put # 1 of 1 to succeed.
15457 [Thread-4] INFO testVoldemort - Insufficient Nodes, 1, 7
voldemort.store.InsufficientOperationalNodesException: 1 writes
succeeded, but 2 are required.
at voldemort.store.routed.RoutedStore.put(RoutedStore.java:776)
at voldemort.store.routed.RoutedStore.put(RoutedStore.java:72)
at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
at voldemort.store.stats.StatTrackingStore.put(StatTrackingStore.java:
90)
at
voldemort.store.serialized.SerializingStore.put(SerializingStore.java:
109)
at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
at voldemort.client.DefaultStoreClient.put(DefaultStoreClient.java:
208)
at com.jims.ObsTest.ObsTest$Worker.run(ObsTest.java:61)
On Jun 8, 2:01 pm, James Hughes <james.hug...@gmail.com> wrote:
> Hi Greg and Alex: I will try to answer both in sort of reverse order.
> This program is to test the corner case where several machines contend
> for the same record. The program is the smallest program I can think
> of to demonstrate the problem.
> The motivation is a set of larger problems that use Voldemort as a
> shared database with a goal of using the versioned put as an atomic
> operation. As you can tell, I am being a bit coy about the exact
> application. Each of our larger problems have 1000s of lines of code,
> so discussing them in this forum would be more difficult, so I created
> this test program to provide the smallest application that
> demonstrates what the larger programs are seeing.
> We fully expected to see many ObsoleteVersionExceptions, but we did
> not expect to receive InsufficientOperationalNodesExceptions as normal
> congestion indication ,and the documentation did not suggest how to
> handle this if it occurs. If this is indeed normal, OK, I just want to
> know.
> Even though the program can go up to 10 worker threads, with just 2
> threads we get both the ObsoleteVersionExceptions(expected) and the
> timeout/InsufficientOperationalNodesExceptions.
> Additionally, we have seen this problem on large scale systems and on
> a single machine. This program fails on my laptop communicating to 2
> Voldemort services (also on the same laptop). The machine has 8GB of
> RAM and is not busy at all. Am I correct that with just 2 threads and
> the test_config1/2 (a 222 store), there should never be more than 4
> operations outstanding at a time?
> I changed to log.info(msg,e) as suggested and get the the following.
> The ObsoleteVersionException is expected and there are many of these
> 107154 [Thread-16] INFO testVoldemort - Obsolete Version, 3, 41
> voldemort.versioning.ObsoleteVersionException: Key 31 version(0:283)
> is obsolete, it is no greater than the current version of
> version(0:283).
> The timeout, and the InsufficientOperationalNodesException always
> happen together and are much less frequent.
> 107263 [Thread-17] WARN voldemort.store.routed.RoutedStore - Timed
> out waiting for put # 1 of 1 to succeed.
> 107263 [Thread-17] INFO testVoldemort - Insufficient Nodes, 4, 24
> voldemort.store.InsufficientOperationalNodesException: 1 writes
> succeeded, but 2 are required.
> at voldemort.store.routed.RoutedStore.put(RoutedStore.java:776)
> at voldemort.store.routed.RoutedStore.put(RoutedStore.java:72)
> at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
> at voldemort.store.stats.StatTrackingStore.put(StatTrackingStore.java:
> 90)
> at
> voldemort.store.serialized.SerializingStore.put(SerializingStore.java:
> 109)
> at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
> at voldemort.client.DefaultStoreClient.put(DefaultStoreClient.java:
> 208)
> at com.jims.ObsTest.ObsTest$Worker.run(ObsTest.java:61)
> On Jun 8, 8:11 am, gxm <moull...@gmail.com> wrote:
> > A couple of throughts:
> > You are only using a single key for all operations among 45 threads.
> > This will lead to significant contention for the puts, and can lead to
> > a number of issues, including timeouts,
> > InsufficientOperationalNodesException and ObsoleteVersionException.
> > When you log an exception, use the exception as the second parameter,
> > log.info("msg", e) to get additional information.
> > You may want to increase the default max_connections from 6 to a
> > number significantly higher, such as the number of threads you
> > create. You can monitor SocketPool via jmx (use jconsole) to see how
> > long threads are waiting for a socket.
> > I'm curious about your intended usage pattern, and what this test is
> > intended to demonstrate.
In version 0.81, if you modify RoutedStore.put so that the ignored
ObsoleteVersionException instead does this:
successes.incrementAndGet();
recordSuccess(node, startNsLocal);
you will no longer get InsufficientOperationalNodesException for what
are really ObsoleteVersionExceptions.
Also, you may want to use StoreClient.applyUpdate so you don't have to
manually deal with ObsoleteVersionExceptions.
class VMClient[K,V](delegate: StoreClient[K,V]) with StoreDelta[K,V] { var factory : SocketStoreClientFactory = _ private val maxtries = 50; override def applyDelta[D](key: K,delta: D,newValue: (V,D) => V) { debug("applying delta") //nifty code here for trying to apply a delta maxtries times var tried = 0 var updated = false var next_update: Versioned[V] = get_?(key) match { case Some(v) => v case None => {throw new VoldemortWrapperException}
}
//Guard against null values from Voldemort if (! (null == next_update)) { next_update setObject newValue(next_update getValue,delta); //We are goint to try maxtries until updated while (!((tried > maxtries) || (updated))) { tried += 1 try { //This wil throw an exception if our data is stale put(key,next_update) updated = true debug("delta applied")
}
catch { //Stale data, let's try and reconcile case o : ObsoleteVersionException => { debug("ObsoleteVersionException, retry") get_?(key) match { case Some(v) => { v setObject(newValue(v getValue,delta)) next_update = v
}
case None => {throw new VoldemortWrapperException}
} } } } }
//This will also be thrown if the key didn't exist if (! updated) {throw new UpdateFailedException} debug("applied delta to Voldemort store")
}
With the UpdateFiledException thrown in as marker (just a subclass from Exception). You can probably rewrite this using inner classes in Java quite easily and does precisely what you want: an atomic update (and in this case tries 50 times). The idea is this: get a Versioned Value of type V from a Key of type K. Then apply a function (this would be your inner class in Java) that takes a value and a delta of type D, and returns a new value. E.g. you store a java serialized class with a field that is a List[Int], and D is an Int that is added to that List[Int]. Your new value would be an object of type V with an updated list which would then be stored.
If it sounds complicated, that's because it's a sophisticated transformation, as you noticed ;-)
> In version 0.81, if you modify RoutedStore.put so that the ignored > ObsoleteVersionException instead does this: > successes.incrementAndGet(); > recordSuccess(node, startNsLocal);
> you will no longer get InsufficientOperationalNodesException for what > are really ObsoleteVersionExceptions.
> Also, you may want to use StoreClient.applyUpdate so you don't have to > manually deal with ObsoleteVersionExceptions.
> Cheers,
> Greg
> -- > You received this message because you are subscribed to the Google Groups > "project-voldemort" group. > To post to this group, send email to project-voldemort@googlegroups.com. > To unsubscribe from this group, send email to > project-voldemort+unsubscribe@googlegroups.com. > For more options, visit this group at > http://groups.google.com/group/project-voldemort?hl=en.
> In version 0.81, if you modify RoutedStore.put so that the ignored
> ObsoleteVersionException instead does this:
> successes.incrementAndGet();
> recordSuccess(node, startNsLocal);
> you will no longer get InsufficientOperationalNodesException for what
> are really ObsoleteVersionExceptions.
> Also, you may want to use StoreClient.applyUpdate so you don't have to
> manually deal with ObsoleteVersionExceptions.