performance decreased dramatically for big value size

zjwang

unread,

Jan 8, 2012, 9:34:32 PM1/8/12

to project-voldemort

I test the write performance by voldemort performance tool, NRW set
322, bdb set WRITE_NO_SYNC, no compression and string serialization.
For different value size the write performance decreased
dramatically. The results like this:

1k: 14000ops/sec
10k:1960ops/sec
100k:202ops/sec
1m: 20ops/sec

There is no limit for network bandwidth. All the data will be wrote
into system cache in bdb layer. Why the write performance is linear
decreasing with value size? Who can tell me the reason for this? Is it
the limit of bdb-je performance? Thanks in advance!

Robert Butler

unread,

Jan 9, 2012, 10:04:59 AM1/9/12

to project-...@googlegroups.com

At first glance,

1) Linear scaling is actually pretty good. If all you are seeing is a linear decrease then Voldemort is scaling well. According to the numbers below, it's actually slightly better than linear. I'm not sure why you would expect much better than that.

2) At the 1m value size, it looks like you are writing about 20mb/sec write speed which is pretty good. I don't know what hardware you are running on, but that's going to be the peak of what mosts disks can handle, even without write sync. In certain, expensive hardware configurations you will be able to get more. These seem like reasonable numbers to me. That said, I don't have a lot of experience with Voldemort yet so I may be off.

- Robert

Robert Butler

President
Pancake Technology, LLC

972.861.0525

P.O. Box 271416
Flower Mound, TX 75027

@roberttheiv

http://www.linkedin.com/in/roberttheiv

--
You received this message because you are subscribed to the Google Groups "project-voldemort" group.
To post to this group, send email to project-...@googlegroups.com.
To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/project-voldemort?hl=en.

zjwang

unread,

Jan 9, 2012, 10:23:50 PM1/9/12

to project-voldemort

Thank you for your explaination.

My linux server is 48G RAM, when I set bdb to write with sync and test
the write operation with the value of 1m, the result of " iostat -x 1"
is like this:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
cciss/c0d0 0.00 9961.00 0.00 490.00 0.00 41804.00
170.63 0.39 0.79 0.00 0.79 0.14 6.80

The average wkB/s is between 30m/s and 40m/s.

When I set bdb to write without sync and that means the data will be
wrote to the operation system cache, and the system will decide when
to flush the data into disk.
The result for write_no_sync is like this:

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s
avgrq-sz avgqu-sz await r_await w_await svctm %util
cciss/c0d0 0.00 49202.00 0.00 2237.00 0.00 205756.00
183.96 18.05 8.07 0.00 8.07 0.07 16.40

The wkB/s is between 100-200m/s every five seconds, maybe the
operation system flush data every five seconds.

I am confused of one thing. For write without sync, the process of
writing data into cache and flushing cache into disk is asynchronous,
and the cache size of operation system is about 40g, so it can hold
more data. I expect the throughput be higher.

Would you kindly give me the further explaination of disk factor? I
will appreciate it very much.

On Jan 9, 10:04 am, Robert Butler <rbut...@pancaketech.com> wrote:
> At first glance,
>
> 1) Linear scaling is actually pretty good. If all you are seeing is a linear decrease then Voldemort is scaling well. According to the numbers below, it's actually slightly better than linear. I'm not sure why you would expect much better than that.
>
> 2) At the 1m value size, it looks like you are writing about 20mb/sec write speed which is pretty good. I don't know what hardware you are running on, but that's going to be the peak of what mosts disks can handle, even without write sync. In certain, expensive hardware configurations you will be able to get more. These seem like reasonable numbers to me. That said, I don't have a lot of experience with Voldemort yet so I may be off.
>
> - Robert
>
> Robert Butler
> President
> Pancake Technology, LLC
>
> 972.861.0525
> P.O. Box 271416
> Flower Mound, TX 75027
>

> @roberttheivhttp://www.linkedin.com/in/roberttheiv

>
> On Jan 8, 2012, at 8:34 PM, zjwang wrote:
>
>
>
> > I test the write performance by voldemort performance tool, NRW set
> > 322, bdb set WRITE_NO_SYNC, no compression and string serialization.
> > For different value size the write performance decreased
> > dramatically. The results like this:
>
> > 1k: 14000ops/sec
> > 10k:1960ops/sec
> > 100k:202ops/sec
> > 1m: 20ops/sec
>
> > There is no limit for network bandwidth. All the data will be wrote
> > into system cache in bdb layer. Why the write performance is linear
> > decreasing with value size? Who can tell me the reason for this? Is it
> > the limit of bdb-je performance? Thanks in advance!
>
> > --
> > You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> > To post to this group, send email to project-...@googlegroups.com.
> > To unsubscribe from this group, send email to project-voldem...@googlegroups.com.

> > For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.- Hide quoted text -
>
> - Show quoted text -

Robert Butler

unread,

Jan 9, 2012, 11:13:05 PM1/9/12

to project-...@googlegroups.com

If you are seeing the OS flush at 100-200m/s every 5 seconds or so, that says to me that the issue isn't disk bound. If the problem was disk limited, you would see sustained write speeds. It sounds like you have plenty of cache available too. What is your CPU doing during these tests? How many cores and at what percentage are they running?

I found a paper on benchmarking bdb here: http://www.oracle.com/technetwork/database/berkeleydb/berkeley-db-perf-128909.pdf

They had a 1Ghz system writing 40 byte records in memory (no disk at all) at about 30mb/s at peak performance. While I'm guessing your hardware is much better than that, I don't know what kind of Voldemort configuration you have setup and you are going to tend to be well below the peak, native bdb performance due to the scalability and availability features built into Voldemort.

What does your Voldemort setup look like? Do you just have the 1 server? Do you have multiple nodes running on that server and how many?

- Robert

Robert Butler

President
Pancake Technology, LLC

972.861.0525

P.O. Box 271416
Flower Mound, TX 75027

@roberttheiv

http://www.linkedin.com/in/roberttheiv

zjwang

unread,

Jan 10, 2012, 3:11:11 AM1/10/12

to project-voldemort

Thank you for your bdb test link.

My test is based on one client and three servers, one linux server
just has one voldemort node. The NWR is set as 322, that means every
write operation will write to three servers. There are 16 cores for
each server and client. The running percentage of cpu for both client
and server is less than 20%, so it is not cpu limited.

I configured the max connection for each node as 50 for the client,
and this is the default, increasing this number will not bring better
performance.

I guess it is not disk and cpu limited, will it be linear when writing
different size data into system cache? I will further work and test on
it , and welcome any issues or suggestions.

On Jan 9, 11:13 pm, Robert Butler <rbut...@pancaketech.com> wrote:
> If you are seeing the OS flush at 100-200m/s every 5 seconds or so, that says to me that the issue isn't disk bound. If the problem was disk limited, you would see sustained write speeds. It sounds like you have plenty of cache available too. What is your CPU doing during these tests? How many cores and at what percentage are they running?
>

> I found a paper on benchmarking bdb here:http://www.oracle.com/technetwork/database/berkeleydb/berkeley-db-per...

>
> They had a 1Ghz system writing 40 byte records in memory (no disk at all) at about 30mb/s at peak performance. While I'm guessing your hardware is much better than that, I don't know what kind of Voldemort configuration you have setup and you are going to tend to be well below the peak, native bdb performance due to the scalability and availability features built into Voldemort.
>
> What does your Voldemort setup look like? Do you just have the 1 server? Do you have multiple nodes running on that server and how many?
>
> - Robert
>
> Robert Butler
> President
> Pancake Technology, LLC
>
> 972.861.0525
> P.O. Box 271416
> Flower Mound, TX 75027
>

> @roberttheivhttp://www.linkedin.com/in/roberttheiv

> >>> For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.-Hide quoted text -

Robert Butler

unread,

Jan 10, 2012, 7:55:21 AM1/10/12

to project-...@googlegroups.com

You are probably hitting the limit of your system for a single client. The data size and the latency between servers, bandwidth, CPU, etc. are stacking up. There is no single hotspot in your system, which is good. The way to get more throughput in the system is to introduce more clients writing data.

Your single client is consuming < 20% CPU, servers are less than 20%, your servers are syncing to disk at probably at max speed every 5 seconds which is about 20% capacity, your are sending at least 2x20mb/s or 8x40 mbits/sec from the client which is about 32% of a gigabit Ethernet connection. Individually none of those are maxed out. However, none of those happen simultaneously.

Roughly speaking, first the client sends the data, then the server reads the data, then it processes the data, then it writes it to cache, then it sends a response, then the client reads the response. To make it worse, the client is sending to more than one server and has to wait for success from the minimum number before it can continue, which will introduce even more latency. I have of course exaggerated the latency in this scenario, but the performance numbers you are seeing don't seem unreasonable to me. Can you add more clients? A distributed system like Voldemort is designed for multiple clients and is decidedly inefficient for a single point of access.

Anyway, that's what I'm seeing based on my understanding of how the system works. I'm not an expert in Voldemort by any means, and have pretty much reached the limit of my understanding on the particular system. I'm just getting started and digging into Voldemort recently.

- Robert

Sent from my iPad

zjwang

unread,

Jan 10, 2012, 11:35:42 PM1/10/12

to project-voldemort

You are right. The testing performance is actually the maximum for one
client , and one client can not measure the maximum throughput of a
server.

When I add more threads in one client or run more clients in one
machine, the total throughput will not be improved.
But when I run more clients in different machines, every client will
reach the same throughput as before, so the total throughput is
improved greatly.

Based on the above tests ,I guess the limit of one client is related
with cpu cores, although the cpu usage is less than 20% for one
client, more threads can not bring performance improvements because of
limit of cpu cores. And there should be a rough calculation for the
proper number of threads based on cpu cores and other factors in order
to get best performance.

In addition, adding more clients can get better performance until
reaching the limit of a server.

If my understanding has something wrong, welcome to indicate and I can
learn more from you.

On 1月10日, 下午8时55分, Robert Butler <rbut...@pancaketech.com> wrote:
> You are probably hitting the limit of your system for a single client. The data size and the latency between servers, bandwidth, CPU, etc. are stacking up. There is no single hotspot in your system, which is good. The way to get more throughput in the system is to introduce more clients writing data.
>
> Your single client is consuming < 20% CPU, servers are less than 20%, your servers are syncing to disk at probably at max speed every 5 seconds which is about 20% capacity, your are sending at least 2x20mb/s or 8x40 mbits/sec from the client which is about 32% of a gigabit Ethernet connection. Individually none of those are maxed out. However, none of those happen simultaneously.
>
> Roughly speaking, first the client sends the data, then the server reads the data, then it processes the data, then it writes it to cache, then it sends a response, then the client reads the response. To make it worse, the client is sending to more than one server and has to wait for success from the minimum number before it can continue, which will introduce even more latency. I have of course exaggerated the latency in this scenario, but the performance numbers you are seeing don't seem unreasonable to me. Can you add more clients? A distributed system like Voldemort is designed for multiple clients and is decidedly inefficient for a single point of access.
>
> Anyway, that's what I'm seeing based on my understanding of how the system works. I'm not an expert in Voldemort by any means, and have pretty much reached the limit of my understanding on the particular system. I'm just getting started and digging into Voldemort recently.
>
> - Robert
>
> Sent from my iPad
>

> >>>>> For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.-Hidequoted text -

>
> >>>> - Show quoted text -
>
> >>> --
> >>> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> >>> To post to this group, send email to project-...@googlegroups.com.
> >>> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> >>> For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.-Hide quoted text -
>
> >> - Show quoted text -
>
> > --
> > You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> > To post to this group, send email to project-...@googlegroups.com.
> > To unsubscribe from this group, send email to project-voldem...@googlegroups.com.

> > For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.- 隐藏被引用文字 -
>
> - 显示引用的文字 -

zjwang

unread,

Jan 11, 2012, 4:23:07 AM1/11/12

to project-voldemort

I find that the limit is actually network bandwidth. I test the
bandwidth between client and server, and the result is about 60m/s.

The write process for NRW 322 is like this:
First the client will choose the first server as master from three
servers, and then send the request.
When the client gets the response from the server and it success, then
it will send the requests to other two servers parallel.
As long as client gets the successful response from one of the two
servers, then this write process will end and regard as success.

The time is related with at least two network transfer. The testing
result is 20ops/sec for the value size 1m.

When I set the NRW as 111, that means each request will write just one
server, and the testing result is 60ops/sec for the value size 1m of
one client.
And it can be seen that it reach the limit of network bandwidth.

So for one client in my above test, the main limit is network
bandwidth, not cpu cores as I mentioned last posts.

> > >>>>> For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.-Hidequotedtext -

>
> > >>>> - Show quoted text -
>
> > >>> --
> > >>> You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> > >>> To post to this group, send email to project-...@googlegroups.com.
> > >>> To unsubscribe from this group, send email to project-voldem...@googlegroups.com.
> > >>> For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.-Hidequoted text -
>
> > >> - Show quoted text -
>
> > > --
> > > You received this message because you are subscribed to the Google Groups "project-voldemort" group.
> > > To post to this group, send email to project-...@googlegroups.com.
> > > To unsubscribe from this group, send email to project-voldem...@googlegroups.com.

> > > For more options, visit this group athttp://groups.google.com/group/project-voldemort?hl=en.-隐藏被引用文字 -
>
> > - 显示引用的文字 -- 隐藏被引用文字 -
>
> - 显示引用的文字 -

Reply all

Reply to author

Forward