Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Message from discussion performance decreased dramatically for big value size

Received: by 10.101.199.23 with SMTP id b23mr2160108anq.8.1326256543761;
        Tue, 10 Jan 2012 20:35:43 -0800 (PST)
X-BeenThere: project-voldemort@googlegroups.com
Received: by 10.101.9.34 with SMTP id m34ls72016ani.7.gmail; Tue, 10 Jan 2012
 20:35:42 -0800 (PST)
MIME-Version: 1.0
Received: by 10.236.197.100 with SMTP id s64mr3202617yhn.5.1326256542646; Tue,
 10 Jan 2012 20:35:42 -0800 (PST)
Authentication-Results: ls.google.com; spf=pass (google.com: domain of
 wangzhijiang...@yahoo.com.cn designates internal as permitted sender)
 smtp.mail=wangzhijiang...@yahoo.com.cn; dkim=pass
 header...@yahoo.com.cn
Received: by h12g2000yqg.googlegroups.com with HTTP; Tue, 10 Jan 2012 20:35:42
 -0800 (PST)
Date: Tue, 10 Jan 2012 20:35:42 -0800 (PST)
In-Reply-To: <A5925975-6BE2-4D50-8142-12C96CDE69E0@pancaketech.com>
References: <5a18a0e4-f5c6-4e3c-97fa-3521eb767809@24g2000yqi.googlegroups.com>
 <B935D27C-C265-40AA-9E52-A90921C34F4C@pancaketech.com> <b4c922b9-0d7c-477d-9dca-9a3376708120@h12g2000yqg.googlegroups.com>
 <8D3A2A55-0EA3-4A0C-9549-BB24A8250E68@pancaketech.com> <97a779e3-79ca-4eed-a9bd-20254a24ff6b@h12g2000yqg.googlegroups.com>
 <A5925975-6BE2-4D50-8142-12C96CDE69E0@pancaketech.com>
User-Agent: G2/1.0
X-HTTP-UserAgent: Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1;
 Trident/4.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR
 3.0.30729; Media Center PC 6.0; .NET4.0C; InfoPath.2),gzip(gfe)
Message-ID: <48b720a9-0cdf-41d1-9e36-fdf4a5baf2b4@h12g2000yqg.googlegroups.com>
Subject: Re: performance decreased dramatically for big value size
From: zjwang <wangzhijiang...@yahoo.com.cn>
To: project-voldemort <project-voldemort@googlegroups.com>
Content-Type: text/plain; charset=GB2312
Content-Transfer-Encoding: quoted-printable

You are right. The testing performance is actually the maximum for one
client , and one client can not measure the maximum throughput of a
server.

When I add more threads in one client or run more clients in one
machine, the total throughput will not be improved.
But when I run more clients in different machines, every client will
reach the same throughput as before, so the total throughput is
improved greatly.

Based on the above tests ,I guess the limit of one client is related
with cpu cores, although the cpu usage is less than 20% for one
client, more threads can not bring performance improvements because of
limit of cpu cores. And there should be a rough calculation for the
proper number of threads based on cpu cores and other factors in order
to get best performance.

In addition, adding more clients can get better performance until
reaching the limit of a server.

If my understanding has something wrong, welcome to indicate and I can
learn more from you.

On 1=D4=C210=C8=D5, =CF=C2=CE=E78=CA=B155=B7=D6, Robert Butler <rbut...@pan=
caketech.com> wrote:
> You are probably hitting the limit of your system for a single client. Th=
e data size and the latency between servers, bandwidth, CPU, etc. are stack=
ing up. There is no single hotspot in your system, which is good. The way t=
o get more throughput in the system is to introduce more clients writing da=
ta.
>
> Your single client is consuming < 20% CPU, servers are less than 20%, you=
r servers are syncing to disk at probably at max speed every 5 seconds whic=
h is about 20% capacity, your are sending at least 2x20mb/s or 8x40 mbits/s=
ec from the client which is about 32% of a gigabit Ethernet connection. Ind=
ividually none of those are maxed out. However, none of those happen simult=
aneously.
>
> Roughly speaking, first the client sends the data, then the server reads =
the data, then it processes the data, then it writes it to cache, then it s=
ends a response, then the client reads the response. To make it worse, the =
client is sending to more than one server and has to wait for success from =
the minimum number before it can continue, which will introduce even more l=
atency. I have of course exaggerated the latency in this scenario, but the =
performance numbers you are seeing don't seem unreasonable to me. Can you a=
dd more clients? A distributed system like Voldemort is designed for multip=
le clients and is decidedly inefficient for a single point of access.
>
> Anyway, that's what I'm seeing based on my understanding of how the syste=
m works. I'm not an expert in Voldemort by any means, and have pretty much =
reached the limit of my understanding on the particular system. I'm just ge=
tting started and digging into Voldemort recently.
>
> - Robert
>
> Sent from my iPad
>
> On Jan 10, 2012, at 2:11 AM, zjwang <wangzhijiang...@yahoo.com.cn> wrote:
>
>
>
> > Thank you for your bdb test link.
>
> > My test is based on one client and three servers, one linux server
> > just has one voldemort node. The NWR is set as 322, that means every
> > write operation will write to three servers. There are 16 cores for
> > each server  and client. The running percentage of cpu for both client
> > and server is less than 20%, so it is not cpu limited.
>
> > I configured the max connection for each node as 50 for the client,
> > and this is the default, increasing this number will not bring better
> > performance.
>
> > I guess it is not disk and cpu limited, will it be linear when writing
> > different size data into system cache? I will further work and test on
> > it , and welcome any issues or suggestions.
>
> > On Jan 9, 11:13 pm, Robert Butler <rbut...@pancaketech.com> wrote:
> >> If you are seeing the OS flush at 100-200m/s every 5 seconds or so, th=
at says to me that the issue isn't disk bound. If the problem was disk limi=
ted, you would see sustained write speeds. It sounds like you have plenty o=
f cache available too. What is your CPU doing during these tests? How many =
cores and at what percentage are they running?
>
> >> I found a paper on benchmarking bdb here:http://www.oracle.com/technet=
work/database/berkeleydb/berkeley-db-per...
>
> >> They had a 1Ghz system writing 40 byte records in memory (no disk at a=
ll) at about 30mb/s at peak performance. While I'm guessing your hardware i=
s much better than that, I don't know what kind of Voldemort configuration =
you have setup and you are going to tend to be well below the peak, native =
bdb performance due to the scalability and availability features built into=
 Voldemort.
>
> >> What does your Voldemort setup look like? Do you just have the 1 serve=
r? Do you have multiple nodes running on that server and how many?
>
> >> - Robert
>
> >> Robert Butler
> >> President
> >> Pancake Technology, LLC
>
> >>972.861.0525
> >> P.O. Box 271416
> >> Flower Mound, TX 75027
>
> >> @roberttheivhttp://www.linkedin.com/in/roberttheiv
>
> >> On Jan 9, 2012, at 9:23 PM, zjwang wrote:
>
> >>> Thank you for your explaination.
>
> >>> My linux server is 48G RAM, when I set bdb to write with sync and tes=
t
> >>> the write operation with the value of 1m, the result of " iostat -x 1=
"
> >>> is like this:
>
> >>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> >>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> >>> cciss/c0d0        0.00  9961.00    0.00  490.00     0.00 41804.00
> >>> 170.63     0.39    0.79    0.00    0.79   0.14   6.80
>
> >>> The average wkB/s is between 30m/s and 40m/s.
>
> >>> When I set bdb to write without sync and that means the data will be
> >>> wrote to the operation system cache, and the system will decide when
> >>> to flush the data into disk.
> >>> The result for write_no_sync is like this:
>
> >>> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s
> >>> avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
> >>> cciss/c0d0        0.00 49202.00    0.00 2237.00     0.00 205756.00
> >>> 183.96    18.05    8.07    0.00    8.07   0.07  16.40
>
> >>> The wkB/s is between 100-200m/s every five seconds, maybe the
> >>> operation system flush data every five seconds.
>
> >>> I am confused of one thing. For write without sync, the process of
> >>> writing data into cache and flushing cache into disk is asynchronous,
> >>> and the cache size of operation system is about 40g, so it can hold
> >>> more data. I expect the throughput be higher.
>
> >>> Would you kindly give me the further explaination of disk factor? I
> >>> will appreciate it very much.
>
> >>> On Jan 9, 10:04 am, Robert Butler <rbut...@pancaketech.com> wrote:
> >>>> At first glance,
>
> >>>> 1) Linear scaling is actually pretty good. If all you are seeing is =
a linear decrease then Voldemort is scaling well. According to the numbers =
below, it's actually slightly better than linear. I'm not sure why you woul=
d expect much better than that.
>
> >>>> 2) At the 1m value size, it looks like you are writing about 20mb/se=
c write speed which is pretty good. I don't know what hardware you are runn=
ing on, but that's going to be the peak of what mosts disks can handle, eve=
n without write sync. In certain, expensive hardware configurations you wil=
l be able to get more. These seem like reasonable numbers to me. That said,=
 I don't have a lot of experience with Voldemort yet so I may be off.
>
> >>>> - Robert
>
> >>>> Robert Butler
> >>>> President
> >>>> Pancake Technology, LLC
>
> >>>>972.861.0525
> >>>> P.O. Box 271416
> >>>> Flower Mound, TX 75027
>
> >>>> @roberttheivhttp://www.linkedin.com/in/roberttheiv
>
> >>>> On Jan 8, 2012, at 8:34 PM, zjwang wrote:
>
> >>>>> I test the write performance by voldemort performance tool,  NRW se=
t
> >>>>> 322, bdb set WRITE_NO_SYNC, no compression and string serialization=
.
> >>>>> For different value size the write performance decreased
> >>>>> dramatically.  The results like this:
>
> >>>>> 1k: 14000ops/sec
> >>>>> 10k:1960ops/sec
> >>>>> 100k:202ops/sec
> >>>>> 1m: 20ops/sec
>
> >>>>> There is no limit for network bandwidth.  All the data will be wrot=
e
> >>>>> into system cache in bdb layer. Why the write performance is linear
> >>>>> decreasing with value size? Who can tell me the reason for this? Is=
 it
> >>>>> the limit of bdb-je performance?  Thanks in advance!
>
> >>>>> --
> >>>>> You received this message because you are subscribed to the Google =
Groups "project-voldemort" group.
> >>>>> To post to this group, send email to project-voldemort@googlegroups=
.com.
> >>>>> To unsubscribe from this group, send email to project-voldemort+uns=
ubscribe@googlegroups.com.
> >>>>> For more options, visit this group athttp://groups.google.com/group=
/project-voldemort?hl=3Den.-Hidequoted text -
>
> >>>> - Show quoted text -
>
> >>> --
> >>> You received this message because you are subscribed to the Google Gr=
oups "project-voldemort" group.
> >>> To post to this group, send email to project-voldem...@googlegroups.c=
om.
> >>> To unsubscribe from this group, send email to project-voldemort+unsub=
scribe@googlegroups.com.
> >>> For more options, visit this group athttp://groups.google.com/group/p=
roject-voldemort?hl=3Den.-Hide quoted text -
>
> >> - Show quoted text -
>
> > --
> > You received this message because you are subscribed to the Google Grou=
ps "project-voldemort" group.
> > To post to this group, send email to project-voldemort@googlegroups.com=
.
> > To unsubscribe from this group, send email to project-voldemort+unsubsc=
ribe@googlegroups.com.
> > For more options, visit this group athttp://groups.google.com/group/pro=
ject-voldemort?hl=3Den.- =D2=FE=B2=D8=B1=BB=D2=FD=D3=C3=CE=C4=D7=D6 -
>