what does "iostat -x /dev/sd? 2 2000" give you while the sync is
running?
> I don't have the exact figures at hand right now, but the real
> performance of the underlaying RAID-5 was *much* higher than that!
Ah, I think I have an idea what's the problem.... If I'm not mistaken,
DRBD writes syncronously to disk, and this doesn't allow the RAID5 to
buffer any full stripes, and therefore it has to do read-modify-write
at every 4k block or such.
Do you have write cache enabled on the 3wares?
I'd say, do the following:
- run dd on dom0 against the LV and see what you get (disks unmounted)
- run dd on dom0 against the drbd device (disks mounted by instance
not started)
- run dd on domU
I would expect that, if I'm right about the raid5 caching, if you run dd
without caching (oflag=direct), you get bad results on dom0 too (against
the LV). If not, I'll try to help more.
thanks for the feedback, by the way.
iustin
Some years ago i read one paper about "Performance comparison between
iSCSI and other hardware and software solutions"[1], an in this paper
the author found that hardware raid-5 is slow than linux software
raid-5. and that raid-10 or raid-1+raid-0 perform better than raid-5.
I had done similar tests and get the same results.
Rudi, i believe you have some problem in your disks or in your
controller. the performance you measured in direct access to the
device in dom0 is really slow.
There is network improvement between xen 3.0 and >=xen3.1, this
performance boost in network is really IMPRESSIVE!!!!!! [2]
Test the network between the two machines using for example netpipe.
#apt-get install netpipe-tcp
#man NPtcp
[1] === www.slac.stanford.edu/econf/C0303241/proc/papers/TUDP001.PDF
[2] === http://wiki.xen-br.org/index.php?title=Comparação_de_i/o_de_rede_das_versões_3.0.3_e_3.1.0
2008/4/9, rudi <Rudol...@gmail.com>:
--
Leonardo Rodrigues de Mello
jabber: l...@lmello.eu.org
> > - run dd on dom0 against the drbd device (disks mounted by instance
> > not started)
>
> xen01:~# dd if=/dev/zero of=/dev/drbd2 bs=1K count=1M
> 1048576+0 records in
> 1048576+0 records out
> 1073741824 bytes (1.1 GB) copied, 299.625 seconds, 3.6 MB/s
yeah, I think here drbd forces sync to disk and this is why it's so
slow....
> now it's getting awkard...
> On top of that, there was only very few disk activity during the dd-
> run. Both controllers have 256mb cache built-in
yes, but what happens - again, I think, I'm not sure - is that drbd
forces a cache flush in order to get data to disk platter.
it might be just that raid5 doesn't play nice with the drbd block size
:( or that 3ware is slow on raid5 as Leonardo said.
On the other hand, since drbd itself provides redundancy, I would
recommend that you configure the drives in jbod mode, export each one,
and make a big VG from all the drives. As long as you have monitoring of
disk failures on your server and are able to 'gnt-instance
replace-disks' quickly, you will have a low window of non-redundancy.
Which makes me moder if you are using raid-5 because you think drbd
(which implements raid1 across physical machines) is not safe enough?
Ganeti was certainly designed to work with 'cheap' (i.e. not smart
hardware raid) controllers.
If you do tests with raid0 too, I'd be very curious as how the
performance changes.
thanks,
iustin
Do you have write cache enabled on the 3wares?
Right now yes - but only for testing purposes because I don't have a
BBU installed :/