Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Insufficient Nodes Exception
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  10 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
James Hughes  
View profile  
 More options Jun 7 2011, 7:39 pm
From: James Hughes <james.hug...@gmail.com>
Date: Tue, 7 Jun 2011 16:39:48 -0700 (PDT)
Local: Tues, Jun 7 2011 7:39 pm
Subject: Insufficient Nodes Exception
In the following multi threaded demonstration code I get timeouts and
Insufficient Nodes Exceptions when no node failures have happened.
The

This is has occurred in the past on 0.81 and most recently on 0.90
using voldemort-voldemort-46a0ec8.

Any suggestions as to what I am doing wrong would be appreciated.

Sincerely

Jim

-----------------------8<----------------------------

>8-----------------------------------

package com.jims.ObsTest;

import static org.junit.Assert.*;

import java.lang.String;

import org.apache.log4j.BasicConfigurator;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;

import voldemort.client.ClientConfig;
import voldemort.client.SocketStoreClientFactory;
import voldemort.client.StoreClient;
import voldemort.client.StoreClientFactory;
import voldemort.store.InsufficientOperationalNodesException;
import voldemort.versioning.ObsoleteVersionException;
import voldemort.versioning.Versioned;

public class ObsTest {
        static final int count = 127;
        static final String bootstrapUrl = "tcp://localhost:6666";
        static final ClientConfig config = new ClientConfig()
                        .setBootstrapUrls(bootstrapUrl);
        static final StoreClientFactory factory = new
SocketStoreClientFactory(
                        config);
        static final String key = "1";

        public String myToString(int[] array) {
                StringBuilder sb = new StringBuilder();
                for (int x : array)
                        sb.append(x + ",");
                return sb.toString();
        }

        static int[] newIntArray(int num) {
                return new int[num];
        }

        static int[] newIntArray(String sb) {
                String sa[] = sb.split(",");
                int[] array = new int[sa.length];
                int i = 0;
                for (String s : sa)
                        array[i++] = Integer.parseInt(s);
                return array;
        }

        class Worker extends Thread {
                final Logger log = Logger.getLogger("testVoldemort");
                final StoreClient<String, String> client = factory
                                .getStoreClient("test");
                int num;

                public void run() {
                        for (int i = 0; i < count; i++) {
                                try {
                                        Versioned<String> value = client.get(key);
                                        int[] di = newIntArray(value.getValue());
                                        di[num]++;
                                        value.setObject(myToString(di));
                                        client.put(key, value);
                                } catch (ObsoleteVersionException e) {
                                        log.info("Obsolete Version, " + num + ", " + i);
                                        i--; // this retries the operation....
                                } catch (InsufficientOperationalNodesException e) {
                                        log.info("Insufficient Nodes, " + num + ", " + i);
                                        // the operation seemed to work. Do not need to retry?!?....
                                }
                        }
                }

                Worker(int num) {
                        this.num = num;
                }
        }

        ObsTest(int numThreads) throws InterruptedException {
                StoreClient<String, String> client = factory.getStoreClient("test");
                client.delete(key);
                String value = myToString(new int[numThreads]);
                client.put(key, value);
                Worker[] threads = new Worker[numThreads];
                final Logger log = Logger.getLogger("testVoldemort");
                int i = 0;
                log.info("Alloc " + numThreads);
                for (Worker t : threads)
                        threads[i] = new Worker(i++);
                log.info("Start " + numThreads);
                for (Worker t : threads)
                        t.start();
                log.info("Join  " + numThreads);
                for (Worker t : threads)
                        t.join();
                value = client.getValue(key);
                int[] v = newIntArray(value);
                i = 0;
                for (Worker t : threads)
                        assertEquals(count, v[i++]);
        }

        static public void main(String[] args) throws InterruptedException {
                BasicConfigurator.configure();
                Logger.getRootLogger().setLevel(Level.INFO);
                for (int i = 1; i < 10; i++)
                        new ObsTest(i);
        }


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Feinberg  
View profile  
 More options Jun 7 2011, 7:48 pm
From: Alex Feinberg <feinb...@gmail.com>
Date: Tue, 7 Jun 2011 16:48:55 -0700
Local: Tues, Jun 7 2011 7:48 pm
Subject: Re: [project-voldemort] Insufficient Nodes Exception
Is your client or server experiencing garbage collection?

What JVM settings are you using?

Thanks,
- Alex


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Hughes  
View profile  
 More options Jun 7 2011, 8:13 pm
From: James Hughes <james.hug...@gmail.com>
Date: Tue, 7 Jun 2011 17:13:16 -0700 (PDT)
Local: Tues, Jun 7 2011 8:13 pm
Subject: Re: Insufficient Nodes Exception
Default JVM settings. The timeout seems to be 10 seconds, and I do not
see high CPU time during the pauses, so I find it hard to think that
this is a timeout because of GC. Any suggestions as to what I should
try for settings?

Also, I am using vanilla test_config1 and test_config2 with no
messages of problems.

Additionally, I do not get this problem with 1 thread.

Jim

On Jun 7, 4:48 pm, Alex Feinberg <feinb...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Alex Feinberg  
View profile  
 More options Jun 7 2011, 8:17 pm
From: Alex Feinberg <feinb...@gmail.com>
Date: Tue, 7 Jun 2011 17:17:06 -0700
Local: Tues, Jun 7 2011 8:17 pm
Subject: Re: [project-voldemort] Re: Insufficient Nodes Exception
The fact you're seeing it with multiple threads but not a single
thread indicates that is likely a client side GC issue. Can you enable
GC logging and see how long the individual pauses are on the client?
You may want to enable GC logging on the server as well.

For the server, I suggest the following JVM settings (this is on a
machine with 32gb of ram):

Xmx18G -server -XX:NewSize=2048m -XX:MaxNewSize=2048m
-XX:+UseConcMarkSweepGC -XX:+UseParNewGC
-XX:CMSInitiatingOccupancyFraction=70 -XX:SurvivorRatio=2

this is with bdb.cache.size=10g

You may want to scale it down, according, with the desired bdb cache
size and the amount of ram in the machine.

Thanks,
- Alex


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
gxm  
View profile  
 More options Jun 8 2011, 11:11 am
From: gxm <moull...@gmail.com>
Date: Wed, 8 Jun 2011 08:11:40 -0700 (PDT)
Local: Wed, Jun 8 2011 11:11 am
Subject: Re: Insufficient Nodes Exception
A couple of throughts:

You are only using a single key for all operations among 45 threads.
This will lead to significant contention for the puts, and can lead to
a number of issues, including timeouts,
InsufficientOperationalNodesException and ObsoleteVersionException.
When you log an exception, use the exception as the second parameter,
log.info("msg", e) to get additional information.

You may want to increase the default max_connections from 6 to a
number significantly higher, such as the number of threads you
create.  You can monitor SocketPool via jmx (use jconsole) to see how
long threads are waiting for a socket.

I'm curious about your intended usage pattern, and what this test is
intended to demonstrate.

Cheers,

Greg


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Hughes  
View profile  
 More options Jun 8 2011, 5:01 pm
From: James Hughes <james.hug...@gmail.com>
Date: Wed, 8 Jun 2011 14:01:45 -0700 (PDT)
Local: Wed, Jun 8 2011 5:01 pm
Subject: Re: Insufficient Nodes Exception
Hi Greg and Alex: I will try to answer both in sort of reverse order.

This program is to test the corner case where several machines contend
for the same record. The program is the smallest program I can think
of to demonstrate the problem.

The motivation is a set of larger problems that use Voldemort as a
shared database with a goal of using the versioned put as an atomic
operation. As you can tell, I am being a bit coy about the exact
application. Each of our larger problems have 1000s of lines of code,
so discussing them in this forum would be more difficult, so I created
this test program to provide the smallest application that
demonstrates what the larger programs are seeing.

We fully expected to see many ObsoleteVersionExceptions, but we did
not expect to receive InsufficientOperationalNodesExceptions as normal
congestion indication ,and the documentation did not suggest how to
handle this if it occurs. If this is indeed normal, OK, I just want to
know.

Even though the program can go up to 10 worker threads, with just 2
threads we get both the ObsoleteVersionExceptions(expected) and the
timeout/InsufficientOperationalNodesExceptions.

Additionally, we have seen this problem on large scale systems and on
a single machine. This program fails on my laptop communicating to 2
Voldemort services (also on the same laptop). The machine has 8GB of
RAM and is not busy at all. Am I correct that with just 2 threads and
the test_config1/2 (a 222 store), there should never be more than 4
operations outstanding at a time?

I changed to log.info(msg,e) as suggested and get the the following.
The ObsoleteVersionException is expected and there are many of these

107154 [Thread-16] INFO testVoldemort  - Obsolete Version, 3, 41
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:283)
is obsolete, it is no greater than the current version of
version(0:283).

The timeout, and the InsufficientOperationalNodesException always
happen together and are much less frequent.

107263 [Thread-17] WARN voldemort.store.routed.RoutedStore  - Timed
out waiting for put # 1 of 1 to succeed.
107263 [Thread-17] INFO testVoldemort  - Insufficient Nodes, 4, 24
voldemort.store.InsufficientOperationalNodesException: 1 writes
succeeded, but 2 are required.
        at voldemort.store.routed.RoutedStore.put(RoutedStore.java:776)
        at voldemort.store.routed.RoutedStore.put(RoutedStore.java:72)
        at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
        at voldemort.store.stats.StatTrackingStore.put(StatTrackingStore.java:
90)
        at
voldemort.store.serialized.SerializingStore.put(SerializingStore.java:
109)
        at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
        at voldemort.client.DefaultStoreClient.put(DefaultStoreClient.java:
208)
        at com.jims.ObsTest.ObsTest$Worker.run(ObsTest.java:61)

On Jun 8, 8:11 am, gxm <moull...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Hughes  
View profile  
 More options Jun 8 2011, 7:07 pm
From: James Hughes <james.hug...@gmail.com>
Date: Wed, 8 Jun 2011 16:07:22 -0700 (PDT)
Local: Wed, Jun 8 2011 7:07 pm
Subject: Re: Insufficient Nodes Exception
I enabled client GC and the GC runs just after launch (from Eclipse)
and there is no other GC during the run of 1 or 2 workers. The 2
workers gets the timeout. Here is the complete output.

I realize this is a lot to ask, but can someone run the program in
their environment and tell me if you have the same issues?

1 [main] INFO voldemort.client.DefaultStoreClient  - bootstrapping
metadata.
[GC 17024K->977K(83008K), 0.0074006 secs]
198 [main] INFO testVoldemort  - Alloc 1
198 [main] INFO voldemort.client.DefaultStoreClient  - bootstrapping
metadata.
213 [main] INFO testVoldemort  - Start 1
213 [main] INFO testVoldemort  - Join  1
378 [main] INFO voldemort.client.DefaultStoreClient  - bootstrapping
metadata.
394 [main] INFO testVoldemort  - Alloc 2
394 [main] INFO voldemort.client.DefaultStoreClient  - bootstrapping
metadata.
407 [main] INFO voldemort.client.DefaultStoreClient  - bootstrapping
metadata.
419 [main] INFO testVoldemort  - Start 2
420 [main] INFO testVoldemort  - Join  2
428 [Thread-3] INFO testVoldemort  - Obsolete Version, 0, 1
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:3) is
obsolete, it is no greater than the current version of version(0:3).
446 [Thread-3] INFO testVoldemort  - Obsolete Version, 0, 3
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:7) is
obsolete, it is no greater than the current version of version(0:7).
454 [Thread-3] INFO testVoldemort  - Obsolete Version, 0, 5
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:12) is
obsolete, it is no greater than the current version of version(0:12).
455 [Thread-3] INFO testVoldemort  - Obsolete Version, 0, 5
voldemort.versioning.ObsoleteVersionException: Key 31 version(0:13) is
obsolete, it is no greater than the current version of version(0:13).
15456 [Thread-4] WARN voldemort.store.routed.RoutedStore  - Timed out
waiting for put # 1 of 1 to succeed.
15457 [Thread-4] INFO testVoldemort  - Insufficient Nodes, 1, 7
voldemort.store.InsufficientOperationalNodesException: 1 writes
succeeded, but 2 are required.
        at voldemort.store.routed.RoutedStore.put(RoutedStore.java:776)
        at voldemort.store.routed.RoutedStore.put(RoutedStore.java:72)
        at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
        at voldemort.store.stats.StatTrackingStore.put(StatTrackingStore.java:
90)
        at
voldemort.store.serialized.SerializingStore.put(SerializingStore.java:
109)
        at voldemort.store.DelegatingStore.put(DelegatingStore.java:68)
        at voldemort.client.DefaultStoreClient.put(DefaultStoreClient.java:
208)
        at com.jims.ObsTest.ObsTest$Worker.run(ObsTest.java:61)

On Jun 8, 2:01 pm, James Hughes <james.hug...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
gxm  
View profile  
 More options Jun 9 2011, 10:59 am
From: gxm <moull...@gmail.com>
Date: Thu, 9 Jun 2011 07:59:01 -0700 (PDT)
Local: Thurs, Jun 9 2011 10:59 am
Subject: Re: Insufficient Nodes Exception
I knew this issue sounded familiar.
http://groups.google.com/group/project-voldemort/browse_thread/thread...

In version 0.81, if you modify RoutedStore.put so that the ignored
ObsoleteVersionException instead does this:
    successes.incrementAndGet();
    recordSuccess(node, startNsLocal);

you will no longer get InsufficientOperationalNodesException for what
are really ObsoleteVersionExceptions.

Also, you may want to use StoreClient.applyUpdate so you don't have to
manually deal with ObsoleteVersionExceptions.

Cheers,

Greg


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Maarten Koopmans  
View profile  
 More options Jun 9 2011, 1:15 pm
From: Maarten Koopmans <maarten.koopm...@gmail.com>
Date: Thu, 9 Jun 2011 19:15:59 +0200
Local: Thurs, Jun 9 2011 1:15 pm
Subject: Re: [project-voldemort] Re: Insufficient Nodes Exception

Look at this scala class:

class VMClient[K,V](delegate: StoreClient[K,V]) with StoreDelta[K,V]   {
 var factory : SocketStoreClientFactory = _
 private val maxtries = 50;
 override def applyDelta[D](key: K,delta: D,newValue: (V,D) => V) {
debug("applying delta")
//nifty code here for trying to apply a delta maxtries times
var tried = 0
var updated = false
var next_update: Versioned[V] = get_?(key) match {
case Some(v) => v
case None => {throw new VoldemortWrapperException}

}

//Guard against null values from Voldemort
if (! (null == next_update)) {
next_update setObject newValue(next_update getValue,delta);
 //We are goint to try maxtries until updated
while (!((tried > maxtries) || (updated))) {
tried += 1
try {
//This wil throw an exception if our data is stale
put(key,next_update)
updated = true
debug("delta applied")
}

catch {
//Stale data, let's try and reconcile
case o : ObsoleteVersionException => {
debug("ObsoleteVersionException, retry")
get_?(key) match {
case Some(v) => {
v setObject(newValue(v getValue,delta))
next_update = v
}

case None => {throw new VoldemortWrapperException}
}
}
}
}
}

//This will also be thrown if the key didn't exist
if (! updated) {throw new UpdateFailedException}
debug("applied delta to Voldemort store")

}

With the UpdateFiledException thrown in as marker (just a subclass from
Exception). You can probably rewrite this using inner classes in Java quite
easily and does precisely what you want: an atomic update (and in this case
tries 50 times). The idea is this: get a Versioned Value of type V from a
Key of type K. Then apply a function (this would be your inner class in
Java) that takes a value and a delta of type D, and returns a new value.
E.g. you store a java serialized class with a field that is a List[Int], and
D is an Int that is added to that List[Int]. Your new value would be an
object of type V with an updated list which would then be stored.

If it sounds complicated, that's because it's a sophisticated
transformation, as you noticed ;-)

Best, Maarten


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
James Hughes  
View profile  
 More options Jun 13 2011, 5:58 pm
From: James Hughes <james.hug...@gmail.com>
Date: Mon, 13 Jun 2011 14:58:55 -0700 (PDT)
Local: Mon, Jun 13 2011 5:58 pm
Subject: Re: Insufficient Nodes Exception
Can you send me a diff? I want to make sure the code goes into the
right place.

Thanks!

On Jun 9, 7:59 am, gxm <moull...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »