Updated: Problems about two memcached java clients: spy and gwhalin

619 views
Skip to first unread message

Shi Yu

unread,
Oct 16, 2010, 8:27:24 PM10/16/10
to memc...@googlegroups.com, spymem...@googlegroups.com
Follow up my previous question. I tried to read those 6 million
<key,value> records out. Both API are capable, but the spymemcache API
is faster (16 minutes) than the Whalin's v2.5.1 distribution (24
minutes) (http://github.com/gwhalin/Memcached-Java-Client/downloads)
.Since the spymemcache cannot insert that much, I came to a strange
hybrid settings using Whalin's API to store, and using spymemcache to
read ...

I am really curious why spy cannot store up to 6 million...


On Sat, Oct 16, 2010 at 12:59 PM, Shi Yu <she...@gmail.com> wrote:
> Hi,
>
> I have two problems when using memcached java clients the spymemcached
> (http://code.google.com/p/spymemcached/) and the gwhalin java client
> for memcached (http://github.com/gwhalin/Memcached-Java-Client). I
> found that the spymemcached failed to store more than 4.3 million
> records sometimes 3.7 million (please see my code below). There was no
> error no exception, but simply the code automatically stopped at the
> 4.3 million and didn't even hit the final line. In contrast, the
> gwhalin java client was able to insert 6 million records without
> problem, however, comparing the speed of inserting the first 4 million
> records that the gwhalin client is much slower than the spymemcached.
> The memcached server is set up using the following command
> "./memcached -d -m 4000 127.0.0.1 -p 11211" and I think there is no
> problem at the server side. What is the problem here, should I adjust
> any settings? Thanks?
>
> -Shi
>
>
> //spymemcached code
> public static void main(String[] args) throws Exception {
>    MemcachedClient mc=new MemcachedClient(new
> InetSocketAddress("ocuic32.research", 11211));
>     mc.flush();
>     System.out.println("Memchaced flushed ...");
>     int count = 0;
>     for(int i=0;i<6000000;i++){
>            String a = "String"+i;
>            String b = "Value"+i;
>
>            mc.add(a,i,(String) b);
>            count ++;
>            if (String.valueOf(count).endsWith("00000"))
> System.out.println(count+ " elements added.");
>    }
>
>    System.out.println("done "+ count +" records inserted");
> //spymemcached aint able to get this line
> }
>
>
>
> //gwhalin memcached code
> public static void main(String[] args) throws Exception {
>                BasicConfigurator.configure();
>                String[] servers = { "ocuic32.research:11211" };
>                SockIOPool pool = SockIOPool.getInstance();
>                pool.setServers( servers );
>                pool.setFailover( true );
>                pool.setInitConn( 10 );
>                pool.setMinConn( 5 );
>                pool.setMaxConn( 250 );
>                pool.setMaintSleep( 30 );
>                pool.setNagle( false );
>                pool.setSocketTO( 3000 );
>                pool.setAliveCheck( true );
>                pool.initialize();
>
>                 MemcachedClient mcc = new MemcachedClient();
>                 mcc.flushAll();
>                 int count = 0;
>                 int maxlength = 0;
>                 //while((line=br.readLine())!=null){
>
>                 for(int i=0;i<6000000;i++){
>                     String a = "String"+i;
>                     String b = "Value"+i;
>                     String sha1_ad1 = AeSimpleSHA1.SHA1(a);
>                     mcc.set(sha1_ad1,(String) b);
>                     count ++;
>                     if (String.valueOf(count).endsWith("00000"))
> System.out.println(count+ " elements added.");
>
>                 }
>
>               System.out.println("done "+ count +" records
> inserted");  //gwhalin 's client is able to get this line, but very
> slow
>      }
>

Boris Partensky

unread,
Oct 16, 2010, 8:53:19 PM10/16/10
to memc...@googlegroups.com, spymem...@googlegroups.com
<<I am really curious why spy cannot store up to 6 million...

I'd definitely spend some more time analyzing what's going on if I
were you before going down that road. Turn up gc logging
(-verbose:gc), see if the app is heavily gc-ing when the program
"stops"; dump threads (kill -3 pid) etc....

Dustin

unread,
Oct 16, 2010, 9:33:51 PM10/16/10
to memcached

On Oct 16, 5:27 pm, Shi Yu <shee...@gmail.com> wrote:

> I am really curious why spy cannot store up to 6 million...

I'm quite sure you can add more than 6M items to something. Part of
the problem might be that you're adding to an in-memory queue as fast
as possible, not checking results, or slowing down when you blow past
maximum queue depths. That's not "normal" use of the simple cache
operation APIs.

If you just want to go fast, use the CacheLoader API:

http://dustin.github.com/java-memcached-client/apidocs/net/spy/memcached/util/CacheLoader.html

That will ensure you're balancing the speed of the fast loop in the
JVM against the time it takes to get over the network.

Shi Yu

unread,
Oct 16, 2010, 9:45:13 PM10/16/10
to memc...@googlegroups.com
I have also tried the CacheLoader API, it pops a java GC error. The
thing I haven't tried is to separate 6 million records into several
objects and try CacheLoader. But I don't think it should be that
fragile and complicated. I have spent a whole day on this issue, now I
just rely the hybrid approach to finish the work. But I would be very
interested to hear any solution to solve this issue.

Shi

Dustin

unread,
Oct 16, 2010, 11:23:54 PM10/16/10
to memcached

On Oct 16, 6:45 pm, Shi Yu <shee...@gmail.com> wrote:
> I have also tried the CacheLoader API, it pops a java GC error. The
> thing I haven't tried is to separate 6 million records into several
> objects and try CacheLoader. But I don't think it should be that
> fragile and complicated. I have spent a whole day on this issue, now I
> just rely the hybrid approach to finish the work. But I would be very
> interested to hear any solution to solve this issue.

I cannot make any suggestions as to why you got an error without
knowing what you did and what error you got.

I would not expect the same that you posted to work without a lot of
memory, tweaking, and a very fast network since you're just filling an
output queue as fast as java will allow you.

You didn't share any code using CacheLoader, so I can only guess as
to how you may have used it to get an error. There are three
different methods you can use -- did you try to create a map with six
million values and then pass it to the CacheLoader API (that would
very likely give you an out of memory error).

You could also be taxing the GC considerably by converting integers
to strings to compute modulus if your jvm doesn't do proper escape
analysis.

I can assure you there's no magic that will make it fail to load six
million records through the API as long as you account for the
realities of your network (which CacheLoader does for you) and your
available memory.

Shi Yu

unread,
Oct 17, 2010, 12:28:54 AM10/17/10
to memc...@googlegroups.com, spymem...@googlegroups.com
Okay. I have to empty the useful data from my memcache server to do
the experiment again. The code of method is as follows.

public static void mapload() throws Exception{


MemcachedClient mc=new MemcachedClient(new
InetSocketAddress("ocuic32.research", 11211));
mc.flush();
System.out.println("Memchaced flushed ...");

CacheLoader cl = new CacheLoader(mc);
System.out.println("Cache loader created ...");
Map<String,String> map1 = new HashMap<String,String>();
Map<String,String> map2 = new HashMap<String,String>();
Map<String,String> map3 = new HashMap<String,String>();


for (int i=0;i<1999999;i++){
map1.put("key"+i,"value"+i);
}
try{
cl.loadData(map1);

System.out.println("map1 loaded");
}catch(Exception e1){
e1.printStackTrace();
}

map1=null;

for (int i=2000000;i<3999999;i++){
map2.put("key"+i,"value"+i);

}
try{
cl.loadData(map2);

System.out.println("map2 loaded");
}catch(Exception e2){
e2.printStackTrace();
}

map2=null;

for (int i=4000000;i<5999999;i++){
map3.put("key"+i,"value"+i);
}
try{
cl.loadData(map3);
System.out.println("map3 loaded");
}catch(Exception e3){
e3.printStackTrace();
}
map3=null;

System.out.println("All done");
}

And I run with the following java command on a 64-bit Unix machine
which has 8G memory. I separate the Map into three parts, still
failed. TBH I think there is some bug in the spymemcached input
method. With Whalin's API there is no any problem with only 2G heap
size, just a little bit slower but thats definitely better than being
stuck for 6 hours on a bugged API.

java -Xms4G -Xmx4G -classpath ./lib/spymemcached-2.5.jar Memcaceload

Here is the error output:

2010-10-16 22:40:50.959 INFO net.spy.memcached.MemcachedConnection:
Added {QA sa=ocuic32.research/192.168.136.36:11211, #Rops=0, #Wops=0,
#iq=0, topRop=null, topWop=null, toWrite=0, interested=0} to connect
queue
Memchaced flushed ...
Cache loader created ...
2010-10-16 22:40:50.989 INFO net.spy.memcached.MemcachedConnection:
Connection state changed for sun.nio.ch.SelectionKeyImpl@25fa1bb6
map1 loaded
map2 loaded
java.lang.OutOfMemoryError: Java heap space
at sun.nio.cs.UTF_8.newEncoder(UTF_8.java:51)
at java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:215)
at java.lang.StringCoding$StringEncoder.<init>(StringCoding.java:207)
at java.lang.StringCoding.encode(StringCoding.java:266)
at java.lang.String.getBytes(String.java:947)
at net.spy.memcached.KeyUtil.getKeyBytes(KeyUtil.java:20)
at net.spy.memcached.protocol.ascii.OperationImpl.setArguments(OperationImpl.java:86)
at net.spy.memcached.protocol.ascii.BaseStoreOperationImpl.initialize(BaseStoreOperationImpl.java:48)
at net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:601)
at net.spy.memcached.MemcachedConnection.addOperation(MemcachedConnection.java:582)
at net.spy.memcached.MemcachedClient.addOp(MemcachedClient.java:277)
at net.spy.memcached.MemcachedClient.asyncStore(MemcachedClient.java:314)
at net.spy.memcached.MemcachedClient.set(MemcachedClient.java:691)
at net.spy.memcached.util.CacheLoader.push(CacheLoader.java:92)
at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:61)
at net.spy.memcached.util.CacheLoader.loadData(CacheLoader.java:75)
at MemchacedLoad.mapload(MemchacedLoad.java:90)
at MemchacedLoad.main(MemchacedLoad.java:159)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:165)
at org.apache.hadoop.mapred.JobShell.run(JobShell.java:54)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:79)
at org.apache.hadoop.mapred.JobShell.main(JobShell.java:68)

Shi

Shi Yu

unread,
Oct 17, 2010, 2:05:47 AM10/17/10
to spymem...@googlegroups.com, memc...@googlegroups.com
Okay, nice to know that. Is there any asynchronous writing method in
spy? I don't understand, I have set the first two maps to null when
writing the third one, so you mean the memclient still reads over all
the existing records in server when inserting new data? Or its a heap
size error at the server side?

On Sun, Oct 17, 2010 at 12:36 AM, Jonathan Leech <jona...@gmail.com> wrote:
> Sounds like bug 125 to me. Your thread inserting the records can write them
> to the queue faster than they are written from the queue to memcached, the
> queue fills up with more and more records, and each one takes longer than
> the last to get written. If you are also bouncing up against the upper limit
> of the heap, then the VM will compound the issue spending a lot of time
> garbage collecting.
>
>
>
> On Oct 16, 2010, at 6:53 PM, Boris Partensky <boris.p...@gmail.com>

>> --
>> You received this message because you are subscribed to the Google Groups
>> "spymemcached" group.
>> To post to this group, send email to spymem...@googlegroups.com.
>> To unsubscribe from this group, send email to
>> spymemcached...@googlegroups.com.
>> For more options, visit this group at
>> http://groups.google.com/group/spymemcached?hl=en.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "spymemcached" group.
> To post to this group, send email to spymem...@googlegroups.com.
> To unsubscribe from this group, send email to
> spymemcached...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/spymemcached?hl=en.
>
>

Kelvin Edmison

unread,
Oct 17, 2010, 2:11:08 AM10/17/10
to memc...@googlegroups.com, spymem...@googlegroups.com
Shi,

Be careful when you start calling it a buggy API, especially as you
present the quality of code that you did in your initial test case. Your
bugs-per-LOC was pretty high.

However, it seems that you did in fact stumble into a bug in the Spy client,
but only because you did no error checking at all.

Dustin,
while trying to re-create this problem and point out the various errors in
his code, I found that, in his test case, if I did not call Future.get() to
verify the result of the set, the spyMemcached client leaked memory. Given
that the Spymemcached wiki says that fire-and-forget is a valid mode of
usage, this appears to be a bug.

Here's my testcase against spymemcached-2.5.jar:
'java -cp .:./memcached-2.5.jar FutureResultLeak true' leaks memory and will
eventually die OOM.
' java -cp .:./memcached-2.5.jar FutureResultLeak false' does not leak and
runs to completion.

Here's the code. It's based on Shi's testcase so he and I now share the
blame for code quality :)

----------------------
import net.spy.memcached.*;
import java.lang.*;
import java.net.*;
import java.util.concurrent.*;

public class FutureResultLeak {

public static void main(String[] args) throws Exception {

boolean leakMemory = false;
if (args.length >= 1) {
leakMemory = Boolean.valueOf(args[0]);
}
System.out.println("Testcase will " + (leakMemory ? "leak memory" : "not
leak memory"));
MemcachedClient mc=new MemcachedClient(new
InetSocketAddress("localhost", 11211));
mc.flush();
System.out.println("Memcached flushed ...");
int count = 0;
int logInterval = 100000;
int itemExpiryTime = 600;
long intervalStartTime = System.currentTimeMillis();


for(int i=0;i<6000000;i++){
String a = "String"+i;
String b = "Value"+i;


Future<Boolean> f =mc.add(a,itemExpiryTime, b);
if (!leakMemory) {
f.get();
}
count++;
if (count % logInterval == 0) {
long elapsed = System.currentTimeMillis() - intervalStartTime;
double itemsPerSec = logInterval*1.0/elapsed;
System.out.println(count+ " elements added in " + elapsed + " (" +
itemsPerSec + " per sec).");
intervalStartTime = System.currentTimeMillis();
}
}

System.out.println("done "+ count +" records inserted");

mc.shutdown(60, TimeUnit.SECONDS);
}
}
----------------------


Regards,
Kelvin

Shi Yu

unread,
Oct 17, 2010, 2:24:43 AM10/17/10
to memc...@googlegroups.com, spymem...@googlegroups.com
Kelvin.

This is year 2010 and computer programs should not be that fragile.
And I believe my code is just a fast simple toy problem trying to find
out why I failed too many times in my real problem. Before I post my
problem, I checked and searched many documents, I read through the API
and there is no clear instruction telling me what should I do to
prevent such an error. I don't have time to bug an API on purpose, I
am doing NLP pos tagging and I have exactly 6 million stemmed word to
store. Fortunately or unlucky to me, that number exactly triggers the
failure so I had to spend 6 hours finding out the reason. Actually spy
client is the first API I tried, as I pointed out in my first post, it
is fast, however, there is an error. I don't think for a normal
end-product API, the memory leak issue should be considered by the
user.

Shi

dormando

unread,
Oct 17, 2010, 2:35:53 AM10/17/10
to memc...@googlegroups.com, spymem...@googlegroups.com
I think you're supposed to read to the point where it says "queues stuff
in memory before sending to the server" and extrapolate that writing to
the queue too fast is a bad thing.

Dustin

unread,
Oct 17, 2010, 2:45:10 AM10/17/10
to memcached

On Oct 16, 9:28 pm, Shi Yu <shee...@gmail.com> wrote:
>         Map<String,String> map1 = new HashMap<String,String>();
>         Map<String,String> map2 = new HashMap<String,String>();
>         Map<String,String> map3 = new HashMap<String,String>();

You're loading at least four million strings into two million hash
table entries on a 64-bit system. Each hash table entry contains a
pointer to a key, a pointer to a value, a pointer to another entry,
and an integer copy of the hash code that's there. That's a huge
amount of memory just to load up a generated set of data.

If you instead either implemented an iterator as a generator to
dynamically do what that large map is doing or just used the
CacheLoader.push method in a way similar to how you were doing add
before, I suspect you'd have no problems and significantly less memory
consumption.

Shi Yu

unread,
Oct 17, 2010, 2:51:28 AM10/17/10
to memc...@googlegroups.com
Hi Dustin. I have to go on my work now so I probably wont spend any
time on this issue. Please, before you suggest, try some experiment to
load more than 6 million records using the same API. I would be happy
to hear how you do that. I now fully rely on Whalin's API, it can
handle 14 million records without any problem.

Shi

Dustin

unread,
Oct 17, 2010, 3:05:33 AM10/17/10
to memcached

On Oct 16, 11:11 pm, Kelvin Edmison <kel...@kindsight.net> wrote:

>   while trying to re-create this problem and point out the various errors in
> his code, I found that, in his test case, if I did not call Future.get() to
> verify the result of the set, the spyMemcached client leaked memory.  Given
> that the Spymemcached wiki says that fire-and-forget is a valid mode of
> usage, this appears to be a bug.

I'm not entirely sure that's a memory leak. I would expect an OOM
in the case where you're not calling f.get(), but not in the other
simply because it's keeping the queue as small as possible every step
along the way.

Fire and forget *is* valid for most of the kinds of things people do
in an application, but I wouldn't say that using the class as a write
only bulk data loader while completely ignoring system limitations is
valid. It has come up mostly with people doing tests in a tight loop,
or in this case, a bulk data load.

The thing that CacheLoader does is actually *very* close to what
you've got there, except it backs off on full queues assuming it gets
an exception.

I think the problem is that we have a change that slows down the
input queue to keep it full for the case where people aren't
necessarily bulk loading, but keeping stuff pretty full in general. I
think even that'd work if you had a smaller operation queue or you
just set the minimum blocking timeout to 0 and used the CacheLoader.
Perhaps the bug is that CacheLoader has no way to override the queue
offer timeout and just take the exception.

In either case, if you just wait on the result of every the sets
every once in awhile (e.g. every 250,000 sets, do an f.get()) then it
should get bursty, but just work.

Dustin

unread,
Oct 17, 2010, 3:16:16 AM10/17/10
to memcached

On Oct 16, 11:24 pm, Shi Yu <shee...@gmail.com> wrote:
> Kelvin.
>
> This is year 2010 and computer programs should not be that fragile.
> And I believe my code is just a fast simple toy problem trying to find
> out why I failed too many times in my real problem. Before I post my
> problem, I checked and searched many documents, I read through the API
> and there is no clear instruction telling me what should I do to
> prevent such an error. I don't have time to bug an API on purpose, I
> am doing NLP pos tagging and I have exactly 6 million stemmed word to
> store. Fortunately or unlucky to me, that number exactly triggers the
> failure so I had to spend 6 hours finding out the reason. Actually spy
> client is the first API I tried, as I pointed out in my first post, it
> is fast, however, there is an error. I don't think for a normal
> end-product API, the memory leak issue should be considered by the
> user.

I agree, but I don't see anything that I would consider a memory
leak. I see a few things generating massive amounts of data and
storing it in memory faster than it can get processed.

Unfortunately, attempting to make this easier in the common case has
made it more confusing. A while back, instead of slowing down queue
insertion, it would just fail and tell you you were overflowing the
queue. That made it easy to understand when and how to back off. Now
it just lets you use up memory on the client side excessively by
keeping the op queue completely full which, in turn, keeps the read
and write operations completely full which means you need a tremendous
amount of memory to do anything at all.

I'm sure you can see that using memcached as a write-only data store
is a bit of an edge case. It's come up enough for CacheLoader to have
been written in the first place, but a typical application is reading
more than it's writing. This situation you're running into doesn't
happen if your filler thread has as much as 0.01% reads mixed in with
its writes (or even just checks the return occasionally).

If you do one of the following, you will be fine:

1) Set op queue max block time to 0 and
a) Build an iterator of your data and use CacheLoader.load with
that iterator
b) Use CacheLoader.push
2) Every one or two hundred thousand operations, check a return
value.

Shi Yu

unread,
Oct 17, 2010, 3:20:50 AM10/17/10
to memc...@googlegroups.com
Thanks. I will try, when there is time and free server available.
Currently my servers are all loaded by work data. I am not going to
flush and reload everything again and again ..D

Dustin

unread,
Oct 17, 2010, 3:38:03 AM10/17/10
to memcached

On Oct 16, 11:51 pm, Shi Yu <shee...@gmail.com> wrote:
> Hi Dustin. I have to go on my work now so I probably wont spend any
> time on this issue. Please, before you suggest, try some experiment to
> load more than 6 million records using the same API. I would be happy
> to hear how you do that.  I now fully rely on Whalin's API, it can
> handle 14 million records without any problem.

I do actually try the code now and then -- I have this in the source
tree which I think is very close to what you were trying to do:

http://github.com/dustin/java-memcached-client/blob/master/src/test/manual/net/spy/memcached/test/LoaderTest.java

I modified that as I suggested to set the op timeout to 0 and the
number of items to 14,000,000 (which you mentioned here). Running
both the server and the client on my fairly old MBP, it reported
144534ms (which is a bit over 97k sets/second).

Shi Yu

unread,
Oct 17, 2010, 3:57:44 AM10/17/10
to memc...@googlegroups.com
Thanks Dustin, marked and will try again.
Reply all
Reply to author
Forward
0 new messages