Voldemort Performance on Windows.

61 views
Skip to first unread message

swatkatz

unread,
Jul 13, 2009, 4:23:24 PM7/13/09
to project-voldemort
Hi,

I did a quick performance test running Voldemort on Windows Server
2003 and I am getting numbers nowhere near what everyone else seems to
be getting. Please see my setup information below -

Running Server on Windows Server 2003 64 Bit with 8GB RAM.
Database is on a SCSI - 15K RPM drive. RAID 1.

Voldemort Server is running using -

D:\voldemort-0.51>java -server -Xmx2G -Dcom.sun.management.jmxremote
voldemort.server.VoldemortServer d:\voldemort-0.51\config
\testChannelDownload

I am running only one node with 4 partitions.

The config is as follows -

# The ID of *this* particular cluster node
node.id=0
max.threads=100
############### DB options ######################
http.enable=true
socket.enable=true
# BDB
bdb.sync.transactions=false
bdb.cache.size=1000MB
bdb.max.logfile.size=500MB

Store.xml

<stores>
<store>
<name>test</name>
<persistence>bdb</persistence>
<routing>client</routing>
<replication-factor>1</replication-factor>
<required-reads>1</required-reads>
<required-writes>1</required-writes>
<key-serializer>
<type>string</type>
<schema-info>UTF-8</schema-info>
</key-serializer>
<value-serializer>
<type>string</type>
<schema-info>UTF-8</schema-info>
</value-serializer>
</store>
</stores>

I populated the data using 1 thread on a second Windows Server 2003 64
bit and put about 100,000 keys with 160K values each. The DB size is
about 29.8 GB. The initial DB population was quite nice <40ms per put.

Then I started load testing using the second Windows Server 2003 64
bit - connected using a GB switch to the Server running Voldemort. For
the load test I used 50 threads each doing 1000 gets on a random key
between 1 and 100,000.

The numbers are just awful :( I am seeing an average read time close
to 500ms and the standard deviation is extremely high. I repeated 3
iterations of this test and the numbers were around the same when it
came to average.

Is it something related to how I have things setup ? Any tips on what
I can do in order to get better performance ? Or am I expecting too
much and this is what I should see given the size of my values -
160K ?

Thanks,
Mohan

Rob Adams

unread,
Jul 13, 2009, 5:38:32 PM7/13/09
to project-...@googlegroups.com
What is this "Windows" you speak of?

Erich Nachbar

unread,
Jul 13, 2009, 6:49:25 PM7/13/09
to project-...@googlegroups.com
I also saw bigger swings initially until I switched to a parallel GC for the *server* and *client*.

Check the JVM settings at the bottom of this page: 

Geir Magnusson Jr.

unread,
Jul 13, 2009, 7:04:05 PM7/13/09
to project-...@googlegroups.com
Truly awful :)

1) What version of java?

2) 160k is much bigger than I've tested, so I don't have a feel. But
what are the real numbers?

3) Why only 1000 iterations? That's way too short to get any
meaningful info (IMO). You should let the clients just run, and
measure rate later, as you need to let the JIT on both client and
server do it's thing, and that needs iterations, and also you need to
let BDB fill it's cache, which may be too small.

4) BDB cache is your friend - Maybe you should consider doubling the
cache size, and doubling the heap size and seeing what diff that makes.

Jay Kreps

unread,
Jul 13, 2009, 10:29:48 PM7/13/09
to project-...@googlegroups.com
Are you CPU, network, or disk bound?

-Jay

On Mon, Jul 13, 2009 at 1:23 PM, swatkatz<mohan...@gmail.com> wrote:
>

Rob Adams

unread,
Jul 13, 2009, 11:29:15 PM7/13/09
to project-...@googlegroups.com
I think point 3 is the big one here.  In my tests the Voldemort performance really goes way up after a few seconds after startup once it all settles.

swatkatz

unread,
Jul 14, 2009, 9:16:01 AM7/14/09
to project-voldemort
To answer some of your questions -

1. Version of Java -

java version "1.6.0_14"
Java(TM) SE Runtime Environment (build 1.6.0_14-b08)
Java HotSpot(TM) 64-Bit Server VM (build 14.0-b16, mixed mode)

2. 160k is much bigger than I've tested, so I don't have a feel.
But
what are the real numbers?

Not sure what you mean by real numbers. Can I post csv files into this
discussion forum ? If yes then I can upload the csv files from each
iteration.

3. Why only 1000 iterations?

Can easily increase it. What do you suggest ? 10,000 ? 100,000 ?
I repeated each test 3 times one after another. So it was actually 3
iterations each of 1000 parallel reads by 50 threads.

4. BDB and Java settings -

I will make appropriate settings change for Java GC and BDB cache
settings and repeat this test.

5. Are you CPU, network, or disk bound?

CPU usage is < 5%. I was initially network bound but then I switched
to a GB switch between the client and the server after which I wasn't
network bound. Disk bound - I am not sure I will look at the counters
when I repeat the test. Although SCSI, 15K RPM in RAID1 gives me
amazing disk reads normally.

I have a feeling there might be a bottleneck somewhere related to
multiple threads. Like I said one thread doing writes, the performance
was pretty good. The minute I go to 50 threads it just falls apart.

Also is it worth doing this with the latest snapshot of Voldemort ? I
am using the latest release which is 0.51.

Geir Magnusson Jr.

unread,
Jul 14, 2009, 9:18:20 AM7/14/09
to project-...@googlegroups.com
oh - yes - first thing, just clone V and build it. Use that

Tatu Saloranta

unread,
Jul 14, 2009, 2:13:01 PM7/14/09
to project-...@googlegroups.com
On Tue, Jul 14, 2009 at 6:16 AM, swatkatz<mohan...@gmail.com> wrote:
>
> To answer some of your questions -
...

> 3. Why only 1000 iterations?
>
> Can easily increase it. What do you suggest ? 10,000 ? 100,000 ?
>  I repeated each test 3 times one after another. So it was actually 3
> iterations each of 1000 parallel reads by 50 threads.

The important part is really time to let things run, not so much iterations.
It takes multiple seconds for HotSpot to kick in: typically first 5 or
so seconds of running (or more for complex systems) should be
discarded completely.
Or just dynamically find steady state of throughput, point at which
rate does not vary significantly.

A simple way would be for example to let it run for 5 seconds to warm
up, then measure for next 30 seconds (or whatever).

...


> I have a feeling there might be a bottleneck somewhere related to
> multiple threads. Like I said one thread doing writes, the performance
> was pretty good. The minute I go to 50 threads it just falls apart.

Java threading does not scale particularly well for large number of
active threads (blocked threads are less problematic before thousands
of them). So this could well be causing lower throughput; often you do
not get improved throughput well beyond couple of threads per core (or
maybe even just one thread per core).

-+ Tatu +-

ijuma

unread,
Jul 14, 2009, 3:31:26 PM7/14/09
to project-voldemort
On Jul 14, 7:13 pm, Tatu Saloranta <tsalora...@gmail.com> wrote:
> Java threading does not scale particularly well for large number of
> active threads (blocked threads are less problematic before thousands
> of them). So this could well be causing lower throughput; often you do
> not get improved throughput well beyond couple of threads per core (or
> maybe even just one thread per core).

I think this is expected in any case (not specific to Java). If all
threads are doing actual work, then there is little point in having
more than 1-2 per logical core. The common case is that a lot of them
are blocked waiting for something though.

Ismael

Jay Kreps

unread,
Jul 14, 2009, 10:43:00 PM7/14/09
to project-...@googlegroups.com
What are the client settings you are using? If you aren't maxing out
resources then maybe you need to up the client connections or threads.
All the hotspot stuff is good advice, but if you aren't seeing any CPU
usage then it is not this.

-Jay

On Tue, Jul 14, 2009 at 6:16 AM, swatkatz<mohan...@gmail.com> wrote:
>

swatkatz

unread,
Jul 15, 2009, 8:54:41 AM7/15/09
to project-voldemort
The client settings I am using are -

int numThreads = 100;
int maxQueuedRequests = 100;
int maxConnectionsPerNode = 100;
int maxTotalConnections = 100;
int maxTimeout = 120*1000;

Hopefully I will be able to repeat all the tests today with a larger
number of iterations and the appropriate JVM and BDB settings changes.

BTW the Voldemort Server is running on a Dell Server with 2 Dual Core
Intel Xeon Processors.
> > am using the latest release which is 0.51.- Hide quoted text -
>
> - Show quoted text -

swatkatz

unread,
Jul 15, 2009, 3:49:07 PM7/15/09
to project-voldemort
Finished my first iteration after a warmup. Results are not good.

Used the following settings -

1. Latest snapshot build.

2. JVM settings -

-Xms6G -Xmx6G -XX:NewSize=2048m -XX:MaxNewSize=2048m -XX:
+UseConcMarkSweepGC -XX:+UseParNewGC -
XX:CMSInitiatingOccupancyFraction=70

3. Server Properties -

# The ID of *this* particular cluster node
node.id=0
max.threads=100
############### DB options ######################
http.enable=false
socket.enable=true
# BDB
bdb.sync.transactions=false
bdb.cache.size=3072MB
bdb.max.logfile.size=500MB

4. Client settings -

int numThreads = 100;
int maxQueuedRequests = 100;
int maxConnectionsPerNode = 100;
int maxTotalConnections = 100;
int maxTimeout = 120*1000;
String bootstrapUrl = "tcp://myurl:6666";

ClientConfig config = new ClientConfig();
config.setBootstrapUrls(bootstrapUrl);
config.setConnectionTimeout(maxTimeout,
TimeUnit.MILLISECONDS);
config.setMaxConnectionsPerNode(maxConnectionsPerNode);
config.setMaxTotalConnections(maxTotalConnections);
config.setMaxThreads(numThreads);
config.setMaxQueuedRequests(maxQueuedRequests);
config.setRoutingTimeout(maxTimeout, TimeUnit.MILLISECONDS);
config.setSocketTimeout(maxTimeout, TimeUnit.MILLISECONDS);

5. Database contains 100,000 keys with values that are 160K each.

Ran about 10,000 iterations on 50 threads and discarded the results.

Then ran 10,000 iterations on 50 threads again.

Average read time = 292.55ms
95th Percentile = 969
Standard Deviation - 1112.2 !!!

So the numbers are all over the place.

CPU usage stays below 20%
Network utilization < 1 %
RAM - still had about 68MB free throughout the test
Disk - Average queue size ranged between 0 and 10.

Running the client with -Xmx2G -XX:+UseConcMarkSweepGC -XX:
+UseParNewGC -XX:CMSInitiatingOccupancyFraction=70 as well.

If I just run one thread I get good numbers.

Not sure what I am doing wrong. I so want this to work so that I can
start using it in production.

swatkatz

unread,
Jul 15, 2009, 4:03:02 PM7/15/09
to project-voldemort
Second iteration was even worse

Average read time = 316.6ms
95th Percentile = 1016
Standard Deviation - 1277.949 !!!

Running with only 10 threads now - will post results soon.

bhupesh bansal

unread,
Jul 15, 2009, 4:04:24 PM7/15/09
to project-...@googlegroups.com
BDB is not great for large values, please check other threads about discussion on same.
you might be hitting disc too hard for this test.

These are my back of the envelope calculation.

100K keys with 160K value each = 16 G
BDB cache = 6G

disk hit = 37.5 %
disk speed = 15000 RPM
disk seeks per value read = 160K/4K (page-size) = 40
MAX disk value read per sec = 15000/40 = 375 !!

I think 50 threads will thrash the disk very bad .. and I think disk will keep spinning and give the kind of behavior you are seeing right now ..
What is the QPS are you seeing with 1 threads / 5 threads / 10 threads / 50 threads??

Best
Bhupesh

swatkatz

unread,
Jul 15, 2009, 4:20:31 PM7/15/09
to project-voldemort
With 10 threads -

Average read time =120.145ms
95th Percentile = 141
Standard Deviation - 1321.146 !!!

With 1 Thread -

Average read time =18.6ms
95th Percentile = 31
Standard Deviation - 206.59 !
> > start using it in production.- Hide quoted text -

Rob Adams

unread,
Jul 15, 2009, 4:51:15 PM7/15/09
to project-...@googlegroups.com
95th percentile is 141, average is 120.  How do you get 1321 as the standard deviation?  If this is a normal distribution, 95th percentile is 2 sigma, so sigma should be 10.  If it's not normal, of course standard deviation has no meaning.  What distribution are you really seeing here?

Jay Kreps

unread,
Jul 15, 2009, 9:06:47 PM7/15/09
to project-...@googlegroups.com
Do you see similar results with smaller values? Can you enable gc
logging and make sure that is not the issue? I do not have a great
deal of experience with performance on Windows but I would not expect
much difference.

-Jay

swatkatz

unread,
Jul 16, 2009, 8:06:53 AM7/16/09
to project-voldemort
Will try it out and let you know. I'm thinking of doing this same test
with SQL Server on this same hardware. Write values of 160K each to an
image field in a table and then read them out. Not sure what that will
prove but at least it will give me a set of numbers to compare the
results with.

I will also try Voldemort with smaller values as well as large values
with less number of keys such that my DB < 6GB.

Question - does Voldemort support gzip serialization out of the box ?
My content of 160K zips nicely to 7K since it's all plain text.

On Jul 15, 9:06 pm, Jay Kreps <jay.kr...@gmail.com> wrote:
> Do you see similar results with smaller values? Can you enable gc
> logging and make sure that is not the issue? I do not have a great
> deal of experience with performance on Windows but I would not expect
> much difference.
>
> -Jay
>
>
>
> On Wed, Jul 15, 2009 at 1:51 PM, Rob Adams<read...@readams.net> wrote:
> > 95th percentile is 141, average is 120.  How do you get 1321 as the standard
> > deviation?  If this is a normal distribution, 95th percentile is 2 sigma, so
> > sigma should be 10.  If it's not normal, of course standard deviation has no
> > meaning.  What distribution are you really seeing here?
>
> >> > - Show quoted text -- Hide quoted text -

ijuma

unread,
Jul 16, 2009, 9:43:38 AM7/16/09
to project-voldemort
On Jul 16, 1:06 pm, swatkatz <mohanrao...@gmail.com> wrote:
> Question - does Voldemort support gzip serialization out of the box ?
> My content of 160K zips nicely to 7K since it's all plain text.

There is a GzipStore that uses GzipInputStream and GzipOutputStream
under the covers. Not sure if there's a property to have it wired up
automatically or if you need to do it yourself.

Ismael

swatkatz

unread,
Jul 17, 2009, 11:29:27 AM7/17/09
to project-voldemort
In case anyone is interested. With SQL Server *Standard Edition* -
same machines that I tested Voldemort on. Everything is exactly the
same in my test setup - same disks. Max memory allowed for SQL Server
= 6GB.

50 threads, 10,000 iterations each, randomly reading from 100,000
keys.

Average – 65ms
95th – 93
Standard Deviation - 90.288

I'll keep digging around JVM and Voldemort settings to see why I get
such poor performance from it.

Jay Kreps

unread,
Jul 17, 2009, 2:51:54 PM7/17/09
to project-...@googlegroups.com
Hi,

I have an idea of what the problem is and why you are seeing such high
variance in request time. We recently upgraded the connection pool
library in voldemort, which is commons-pool. Can you check which
version of this you are using? We detected some problems with
deadlocks in the commons-pool 1.5. Since you are seeing very low CPU
usage, I am wondering if maybe you are running into this issue. What
version of commons pool do you have? Do you see the same problem with
commons pool 1.2?

-Jay

swatkatz

unread,
Jul 17, 2009, 3:07:28 PM7/17/09
to project-voldemort
It is using commons pool 1.5.1. I will downgrade to 1.2 and see how
that goes.

Thanks.
> >> Ismael- Hide quoted text -

swatkatz

unread,
Jul 20, 2009, 2:36:15 PM7/20/09
to project-voldemort
So no luck after downgrading to commons pool 1.2. Numbers are very
similar to the first test. Average around 300ms with a high standard
deviation and 95th percentile.

What I did was delete commons 1.5.1 jar from both the client and
server directory and replaced it with the 1.2 jar and then restarted
both the server and the client. Is that OK or should I have rebuilt
the client and server ?
> > - Show quoted text -- Hide quoted text -

Rob Adams

unread,
Jul 20, 2009, 2:41:32 PM7/20/09
to project-...@googlegroups.com
Do you have a linux box somewhere you could possibly run comparisons on?

Holger Hoffstätte

unread,
Jul 20, 2009, 3:16:15 PM7/20/09
to project-...@googlegroups.com
swatkatz wrote:
> So no luck after downgrading to commons pool 1.2. Numbers are very

Actually you probably should have upgraded to 1.5.2 to fix some bugs :),
but that's apparently not the problem. A different shot in the dark - the
TCP stack. Start regedit and for all instances of the following keys under
HKEY_LOCAL_MACHINE find/post all their values:

DefaultReceiveWindow
DefaultSendWindow
GlobalMaxTcpWindowSize
TcpWindowSize
SackOpts
Tcp1323Opts

That might give a clue.

swatkatz

unread,
Jul 20, 2009, 3:19:52 PM7/20/09
to project-voldemort
Yes thats my last resort :) In fact was just trying to find some spare
hardware that I could put Linux on.

Windows would have been perfect though because we already run SQL
Server on Windows and this would have replaced those SQL Servers.

Rob Adams

unread,
Jul 20, 2009, 3:47:36 PM7/20/09
to project-...@googlegroups.com
Maybe look into tuning filesystem parameters? Also try defragging? Can
you tune OS filesystem caches?

Linux does have better filesystem performance than windows generally,
sometimes by an order of magnitude.

Mohan Rao

unread,
Jul 21, 2009, 9:07:33 AM7/21/09
to project-...@googlegroups.com
Hi Holger,
 
Actually the TCP settings cannot be an issue because I did this test with the same hardware but using SQL Server and it works great and gives me great results. So it is either Java, BDB or multi-threading issues in the Voldemort Client/Server.

2009/7/20 Holger Hoffstätte <holger.ho...@googlemail.com>

Holger Hoffstätte

unread,
Jul 21, 2009, 12:47:56 PM7/21/09
to project-...@googlegroups.com
Mohan Rao wrote:
> Actually the TCP settings cannot be an issue because I did this test
> with the same hardware but using SQL Server and it works great and gives
> me great results. So it is either Java, BDB or multi-threading issues in
> the Voldemort Client/Server.

Microsoft apps (and others) frequently set custom options or even bypass
the regular Win32 API in favor of internal APIs. I don't know whether the
existing Java client does any of this (which should be easy to add, like
e.g. disabling Nagle or setting the receive/send buffer size), but finding
this out and/or seeing the values in any case cannot hurt. There is a huge
number of knobs that could affect the behaviour that you're seeing, and
normally Java is just as fast on Windows as on Linux (minus thread CPU
affinity but let's take it slow here). I do agree that bdb/IO looks like a
more likely culprit, but the point was to rule out seemimgly unrelated
factors.

I also assume there are no virus scanners enabled. :-)

-h

swatkatz

unread,
Aug 11, 2009, 12:30:26 PM8/11/09
to project-voldemort
Hi,

So I am not having much luck with this :) I tried repeating the test
on Linux - started a couple of large EC2 instances. One to run the
server and the other to run the client. I even downgraded to commons-
pool 1.2. Both instances are running Java 1.6 64 bit. BDB cache size -
4GB. Using all recommended JVM Settings. Still getting about 900ms
average per read - reading from 100K keys with values of 160K each.

What performance are others seeing and for what value size ? I'm going
to try with a reduced value size of about 2K and see how that works.
Also any ideas how I can set use multi-version concurrency control for
BDB configuration ?

On Jul 21, 12:47 pm, Holger Hoffstätte
<holger.hoffstae...@googlemail.com> wrote:
> Mohan Rao wrote:
> > Actually the TCP settings cannot be an issue because I did this test
> > with the same hardware but using SQL Server and it works great and gives
> > me great results. So it is either Java, BDB or multi-threading issues in
> > the Voldemort Client/Server.
>
> Microsoft apps (and others) frequently set custom options or even bypass
> the regular Win32 API in favor of internal APIs. I don't know whether the
> existing Java client does any of this (which should be easy to add, like
> e.g. disabling Nagle or setting the receive/send buffer size), but finding
> this out and/or seeing the values in any case cannot hurt. There is a huge
> number of knobs that could affect the behaviour that you're seeing, and
> normally Java is just as fast onWindowsas on Linux (minus thread CPU

swatkatz

unread,
Aug 12, 2009, 8:55:54 AM8/12/09
to project-voldemort
So further update on this -

If my data size is 2K then I get awesome performance.

With 100K Keys - 50 Threads read - average about 5.16ms and 95th
percentile about 24 ms. 50 threads write - average about 12ms and 95th
percentile about 40 ms.
With 500K Keys - 50 threads read - average about 6.34ms and 95th
percentile about 23 ms. 50 threads write - average about 13.26ms and
95th percentile about 41 ms.

It starts getting worse at about 5M Keys. I am running more iterations
to get final numbers on this but currently I saw the following numbers
for read with 50 threads - average 229ms and 95th percentile about
927ms.

So looks like as the overall database size increases beyond to what
can fit in the cache the performance starts degrading.All this is
great but my problem is that my data size is approx 160K - I can
compress it and it comes to 32K compressed which is still fairly large
from what I can see. Is there any other BDB optimization that you
would recommend or is possible ? e.g. can I set BIN size for BDB to
something larger so that my 32K data fits on one BIN and it doesn't
have to read from multiple BINs to return my data ? Ideally I would
like to store millions of keys of 32K each on one server if I get all
the tuning right then I can size the RAM and CPU for this one server.

Regarding the published performance on the project-voldemort website -
can someone please tell me what was the data size used and how many
keys were in the database ?

p.s. - I have done similar tests with Cassandra and found that
Voldemort is twice as fast as Cassandra under the same test. Haven't
done the Cassandra test with 2K keys - probably something I will do
when I get some time.

ijuma

unread,
Aug 13, 2009, 6:21:38 AM8/13/09
to project-voldemort
On Aug 12, 1:55 pm, swatkatz <mohanrao...@gmail.com> wrote:
> So looks like as the overall database size increases beyond to what
> can fit in the cache the performance starts degrading.All this is
> great but my problem is that my data size is approx 160K - I can
> compress it and it comes to 32K compressed which is still fairly large
> from what I can see. Is there any other BDB optimization that you
> would recommend or is possible ?

One option is to use MySQL instead of BDB as the store. MySQL handles
big values better.

Ismael

Jay Kreps

unread,
Aug 14, 2009, 3:12:39 AM8/14/09
to project-...@googlegroups.com
That test was done with very small values (like 300 bytes--db record
size). With small values the seek time and network time dominate and
req/sec is the proper metric, for big values overall disk and network
thoughput is the limitation so you need to think in terms of MB/second
and latencies will be much much worse. You can definitely get better
big value performance out of BDB with some tuning, the oracle website
and forums have some info on this, though I am not sure if you are IO
bound in your large value test.

-Jay

ijuma

unread,
Aug 17, 2009, 2:12:59 PM8/17/09
to project-voldemort
On Aug 12, 1:55 pm, swatkatz <mohanrao...@gmail.com> wrote:
> All this is great but my problem is that my data size is approx 160K - I can
> compress it and it comes to 32K compressed which is still fairly large
> from what I can see.

By the way, with the latest code in master you can perform experiments
that involve gzip compression by adding the following to the
appropriate serializer:

<compression>
<type>gzip</type>
</compression>

Best,
Ismael
Reply all
Reply to author
Forward
0 new messages