slow drbd?

rudi

unread,

Apr 9, 2008, 11:34:55 AM4/9/08

to ganeti

Hey list,

ganeti works like a charm now (e.g. setting up instances, failover
between the nodes etc.)

But drbd is SLOW! The two-node cluster consists of equal hardware,
QuadCore Xeons, 3ware RAID5, 4GB mem and two e1000 nics

Maybe it's not even DRBD, but using dd inside a DomU (reading from /
dev/zero and writing to /tmp/somefile) gives me a performance of
~5-10MB/sec

Also, running the initial sync does not take the sync rate above 1-3MB/
sec, according to /proc/drbd. the syncer-rate is set to 100M on both
nodes.

I don't have the exact figures at hand right now, but the real
performance of the underlaying RAID-5 was *much* higher than that!

Does anybody have any hints on that? I'm using only Debian Etch
packets + backports.org for drbd (8.0.7) and ganeti..

Regards,
rudi

Iustin Pop

unread,

Apr 9, 2008, 11:49:56 AM4/9/08

to gan...@googlegroups.com

On Wed, Apr 09, 2008 at 08:34:55AM -0700, rudi wrote:
>
> Hey list,
>
> ganeti works like a charm now (e.g. setting up instances, failover
> between the nodes etc.)
>
> But drbd is SLOW! The two-node cluster consists of equal hardware,
> QuadCore Xeons, 3ware RAID5, 4GB mem and two e1000 nics
>
> Maybe it's not even DRBD, but using dd inside a DomU (reading from /
> dev/zero and writing to /tmp/somefile) gives me a performance of
> ~5-10MB/sec
>
> Also, running the initial sync does not take the sync rate above 1-3MB/
> sec, according to /proc/drbd. the syncer-rate is set to 100M on both
> nodes.

what does "iostat -x /dev/sd? 2 2000" give you while the sync is
running?

> I don't have the exact figures at hand right now, but the real
> performance of the underlaying RAID-5 was *much* higher than that!

Ah, I think I have an idea what's the problem.... If I'm not mistaken,
DRBD writes syncronously to disk, and this doesn't allow the RAID5 to
buffer any full stripes, and therefore it has to do read-modify-write
at every 4k block or such.

Do you have write cache enabled on the 3wares?

I'd say, do the following:
- run dd on dom0 against the LV and see what you get (disks unmounted)
- run dd on dom0 against the drbd device (disks mounted by instance
not started)
- run dd on domU

I would expect that, if I'm right about the raid5 caching, if you run dd
without caching (oflag=direct), you get bad results on dom0 too (against
the LV). If not, I'll try to help more.

thanks for the feedback, by the way.

iustin

rudi

unread,

Apr 9, 2008, 12:58:39 PM4/9/08

to ganeti

Hey iustin,

thanks again for your fast reply :)

On Apr 9, 5:49 pm, ius...@google.com (Iustin Pop) wrote:
> On Wed, Apr 09, 2008 at 08:34:55AM -0700, rudi wrote:
>
> > Hey list,
>
> > ganeti works like a charm now (e.g. setting up instances, failover
> > between the nodes etc.)
>
> > But drbd is SLOW! The two-node cluster consists of equal hardware,
> > QuadCore Xeons, 3ware RAID5, 4GB mem and two e1000 nics
>
> > Maybe it's not even DRBD, but using dd inside a DomU (reading from /
> > dev/zero and writing to /tmp/somefile) gives me a performance of
> > ~5-10MB/sec
>
> > Also, running the initial sync does not take the sync rate above 1-3MB/
> > sec, according to /proc/drbd. the syncer-rate is set to 100M on both
> > nodes.
>
> what does "iostat -x /dev/sd? 2 2000" give you while the sync is
> running?
>
> > I don't have the exact figures at hand right now, but the real
> > performance of the underlaying RAID-5 was *much* higher than that!
>
> Ah, I think I have an idea what's the problem.... If I'm not mistaken,
> DRBD writes syncronously to disk, and this doesn't allow the RAID5 to
> buffer any full stripes, and therefore it has to do read-modify-write
> at every 4k block or such.
>
> Do you have write cache enabled on the 3wares?

Right now yes - but only for testing purposes because I don't have a
BBU installed :/

>
> I'd say, do the following:
> - run dd on dom0 against the LV and see what you get (disks unmounted)

xen01:~# dd if=/dev/zero of=/dev/xenvg/testLV bs=1K count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 80.7683 seconds, 13.3 MB/s

interesting...this is rather "slow"

> - run dd on dom0 against the drbd device (disks mounted by instance
> not started)

xen01:~# dd if=/dev/zero of=/dev/drbd2 bs=1K count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 299.625 seconds, 3.6 MB/s

now it's getting awkard...
On top of that, there was only very few disk activity during the dd-
run. Both controllers have 256mb cache built-in

> - run dd on domU

test1:/tmp# dd if=/dev/zero of=/tmp/somefile bs=1K count=1M
1048576+0 records in
1048576+0 records out
1073741824 bytes (1.1 GB) copied, 220.198 seconds, 4.9 MB/s

(test1 is currently located on xen02)

>
> I would expect that, if I'm right about the raid5 caching, if you run dd
> without caching (oflag=direct), you get bad results on dom0 too (against
> the LV). If not, I'll try to help more.

oh and FYI: I followed the instructions from the ganeti installation
guide and restricted dom0 to 512M memory and 1 fo the 4 cores
available on both nodes

Leonardo Rodrigues de Mello

unread,

Apr 9, 2008, 1:52:07 PM4/9/08

to gan...@googlegroups.com

Hi everyone,

Some years ago i read one paper about "Performance comparison between
iSCSI and other hardware and software solutions"[1], an in this paper
the author found that hardware raid-5 is slow than linux software
raid-5. and that raid-10 or raid-1+raid-0 perform better than raid-5.

I had done similar tests and get the same results.

Rudi, i believe you have some problem in your disks or in your
controller. the performance you measured in direct access to the
device in dom0 is really slow.

There is network improvement between xen 3.0 and >=xen3.1, this
performance boost in network is really IMPRESSIVE!!!!!! [2]

Test the network between the two machines using for example netpipe.
#apt-get install netpipe-tcp
#man NPtcp

[1] === www.slac.stanford.edu/econf/C0303241/proc/papers/TUDP001.PDF
[2] === http://wiki.xen-br.org/index.php?title=Comparação_de_i/o_de_rede_das_versões_3.0.3_e_3.1.0

2008/4/9, rudi <Rudol...@gmail.com>:

--
Leonardo Rodrigues de Mello
jabber: l...@lmello.eu.org

rudi

unread,

Apr 10, 2008, 5:29:06 AM4/10/08

to ganeti

On Apr 9, 5:49 pm, ius...@google.com (Iustin Pop) wrote:

just FYI, I added another test with bonnie on one of the instances
(1GB RAM, 2 Cores assigned):
test1:~# bonnie -d /tmp -s 2048 -u 0

at the same time I issued a watch -n 2 iostat -kx on the Dom0 (512MB
RAM, 1 Cores assigned):

Linux 2.6.18-6-xen-amd64 (xen02.megabit.local) 10.04.2008

avg-cpu: %user %nice %system %iowait %steal %idle
0,32 0,00 0,30 11,54 0,02 87,82

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz
avgqu-sz await svctm %util
sda 27,57 32,69 3,16 16,97 172,23 657,74
82,46 17,74 881,22 9,26 18,64

Iustin Pop

unread,

Apr 10, 2008, 5:58:30 AM4/10/08

to gan...@googlegroups.com

On Wed, Apr 09, 2008 at 09:58:39AM -0700, rudi wrote:
> > I'd say, do the following:
> > - run dd on dom0 against the LV and see what you get (disks unmounted)
>
> xen01:~# dd if=/dev/zero of=/dev/xenvg/testLV bs=1K count=1M
> 1048576+0 records in
> 1048576+0 records out
> 1073741824 bytes (1.1 GB) copied, 80.7683 seconds, 13.3 MB/s
>
> interesting...this is rather "slow"

it is slow, and it was done in cached mode. can you repeat this and add
the parameter 'oflag=direct' and modify bs=1K to bs=4K? if I'm not
mistaken, this should show the max speed you will get from drbd.

> > - run dd on dom0 against the drbd device (disks mounted by instance
> > not started)
>
> xen01:~# dd if=/dev/zero of=/dev/drbd2 bs=1K count=1M
> 1048576+0 records in
> 1048576+0 records out
> 1073741824 bytes (1.1 GB) copied, 299.625 seconds, 3.6 MB/s

yeah, I think here drbd forces sync to disk and this is why it's so
slow....

> now it's getting awkard...
> On top of that, there was only very few disk activity during the dd-
> run. Both controllers have 256mb cache built-in

yes, but what happens - again, I think, I'm not sure - is that drbd
forces a cache flush in order to get data to disk platter.

it might be just that raid5 doesn't play nice with the drbd block size
:( or that 3ware is slow on raid5 as Leonardo said.

On the other hand, since drbd itself provides redundancy, I would
recommend that you configure the drives in jbod mode, export each one,
and make a big VG from all the drives. As long as you have monitoring of
disk failures on your server and are able to 'gnt-instance
replace-disks' quickly, you will have a low window of non-redundancy.

Which makes me moder if you are using raid-5 because you think drbd
(which implements raid1 across physical machines) is not safe enough?
Ganeti was certainly designed to work with 'cheap' (i.e. not smart
hardware raid) controllers.

If you do tests with raid0 too, I'd be very curious as how the
performance changes.

thanks,
iustin

rudi

unread,

Apr 10, 2008, 9:05:34 AM4/10/08

to ganeti

Hm no I don't trust drbd...yet ;) And on top of that I don't trust
regular s-ata drives any more. In the past 12 months I had to do tons
of s-ata hard drive rma's :/

>
> If you do tests with raid0 too, I'd be very curious as how the
> performance changes.

Hm I did some tests with raid-0 compared to raid-5 (with the 3ware
card, using dbench) before I did the xen setup on the boxes... - of
course there was a performance gain but it wasn't THAT huge as one
might expect it to be

>
> thanks,
> iustin

OK here's what I did just now:

I removed the 3ware on one of the nodes and replaced it by a plain
linux software raid-5. It's not done syncing yet but I already have
some better numbers:

After cloning the data from the other node back to the software-raid
one I re-added the node to the cluster and setup an instance (primary
side on the 3ware controller). During the initial drbd sync's I
reached multiple 30MB/sec transfers (wooooow!). I'm quite sure there
were no or almost no writes involved on the primary node during that
time.

After the successfull setup of the instance I issued "bonnie -d /tmp -
s 2048 -u 0" inside the DomU and speed broke down to ~5MB/sec (since
now writes on the 3ware are involved again). I will now replace the
disk subsystem of this server with a software-raid, too and post the
numbers to this list.

Regards,
Rudi

scakkia

unread,

Apr 10, 2008, 6:35:15 PM4/10/08

to gan...@googlegroups.com

Hi,

Il giorno 09/apr/08, alle ore 18:58, rudi ha scritto:

Do you have write cache enabled on the 3wares?

Right now yes - but only for testing purposes because I don't have a

BBU installed :/

I think the missing BBU is the problem of the poor performance.

Because without the BBU the controller put the disk in "write-throut modality"(don't use the cache of the disks) and disable "write-back modality"(best performance) until you put the BBU and charge it(12hours).

Then for best performance too, put "I/O cache" and "read adaptative".

by

Daniele

Reply all

Reply to author

Forward