Are you aware that not only the readahead settings on the initiator
side count, but also on the target side ?
Bart.
Can you try to copy the file from the iSCSI target to /dev/null?
This will narrow down whether the problem is with the iSCSI target (or the
read mechanism in the iSCSI layer) or with your local disk.
Is the 'local disk array' a software RAID ?
A. Eijkhoudt schreef:
> On Sep 12, 3:49 pm, Konrad Rzeszutek <kon...@virtualiron.com> wrote:
>> Can you try to copy the file from the iSCSI target to /dev/null?
>
> Yes, no difference. It's still going at a decidedly unimpressive 1.5MB/
> sec.
So what about a windows client? How fast is it going?
Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org
iEYEARECAAYFAkjKgt0ACgkQYH1+F2Rqwn3tqwCeMD+H1u1+dqHQryH2rvAjsMee
nmYAn3OUBXbGjvNeewlQCJdG2eceEpUx
=ISaT
-----END PGP SIGNATURE-----
Again 2c.
Thanks for all the work devs!
> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "open-iscsi" group.
> To post to this group, send email to open-...@googlegroups.com
> To unsubscribe from this group, send email to open-iscsi+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/open-iscsi
> -~----------~----~----~----~------~----~------~--~---
>
>
I've been struggling with a similar problem for a while now. My write
speeds are around 110M/s whereas, even following A. Eijkhoudt's advice
I've only been able to get 34M/s reads.
The actual setup is an experimental play rig where I'm trying to see
how usable/practical it is to provide a HA Linux client on top of two
Xen Linux servers. To provide HA storage I was playing with having
the two servers export LVs via iSCSI and having the client RAID1 these
for redundancy. So the client will see two exported targets - one
being on the same physical machine but on the virtual network - the
other on the alternative server across the physical network. The idea
then is to get the client to RAID1 the exported targets for HA and
therefore, have the ability to migrate the client between Xen servers
for client HA and hopefully get away from any single points of
failure. The setup works as expected except for read speeds.
All physical hardware are the same. All machines are standard Sun
v20z's running 4 cores and 8G RAM with Broadcom GE NICs and the
standard internal U320 SCSI HDDs with external eSATA II drives
connected via a Silicon Image eSATA controller.
All machines are running amd64 Gentoo Linux using a 2.6.18-xen kernel
and Xen-3.3. I'm unable to provide IET and Open-iSCSI versions at the
moment as I'm away from the play rig. However, I can provide if
anyone is interested.
* The target is IET running on a Xen dom0 running Gentoo Linux with
the internal drives in a S/W RAID0 presented via LVM2. The iSCSI
target drive is an LV volume exported via blockio. Read speeds
directly off the internal SCSI drives S/W RAID0 md0 are in the range
of ~140-150M/s with write speeds of ~90-100M/s. Read & write speeds
off the LV are identical to the raw md0 speeds. Read speeds directly
off the external eSATA II drives S/W RAID0 md1 are in the range of
~450M/s with write speeds of ~220M/s. Read & write speeds off the LV
for the external eSATA II drives are identical to the raw md1 speeds.
* The network is GE with 9k MTU and at the moment is just a cross-over
cable between servers. Tests using iperf show consistent TCP
throughput of 118M/s over the wire. Tests using dd & netcat also show
117M/s for (RAID0 md0 svr 1)->network->(/dev/null svr 2) and speeds of
95M/s for (RAID md0 svr 1)->network->(RAID md0 svr 2). The speeds are
identical for (Xen Client 1)->(Xen dom0 srv 1)->network->(Xen dom0 srv
2)->(Xen Client 2). So Xen doesn't seem to penalize these scenarios.
* The initiator is Open-iSCSI running on the Xen client. The exported
targets are then S/W RAID1 and the resulting md1 is then divided using
client side LVM2 presenting the LVs to the client.
So the software stack is quite layered. However, this doesn't seem to
be a problem for writes (110M/s) just reads (30-34M/s).
Initially writes were at ~110Ms but reads were ~1M/s.
Having read-ahead of 8K, MTU of 9K as well as the noop scheduler for
all block devices at all levels on both the target and initiator
pulled the read speed up to ~30M/s. This is with the recommended TCP
tweaks (e.g. RX/TX buffer size of 16M etc.).
Increasing read-ahead to 16K for all block devices at all levels on
both sides resulted in read speeds of ~34M/s. Increasing read-ahead
and/or TCP tweaks further doesn't seem to make any further
improvements. Write speed is unaffected.
Playing with the IET and Open-iSCSI config files and changing the read-
ahead and max recv/xmit block sizes to anything other than default
resulted in 18K/s writes and reads. However, turning on instant data
produces an initial burst of 100M/s for about 2s and then returns back
to ~30M/s. Changing the IET compatibility setting essentially kills
everything. i.e. Any read or write hangs with no messages in syslog.
Bonnie++ tests on the client md1 created from the exported targets are
a mixed bag - the first couple of tests vary greatly but then settle
down to also show ~100M/s writes and ~34M/s reads.
A Linux kernel rebuild of the md1 created from the exported targets
result in an average rebuild speed of ~90M/s while a check of the md1
results in an average check speed of ~26M/s.
Breaking the client S/W RAID1 apart and testing both targets
individually results in write speeds of ~70-90M/s to both and read
speeds of ~20-26M/s from both no matter where they are located.
Either via the virtual network on the same physical box or across the
physical network to the alternate box.
About the only thing I have not tried so far - is modifying the Cyl/Hd/
Sec values on the client for the exported iSCSI targets to see if
alignment is a problem. However, I'm unsure at the moment what values
I'd have to set these to given the multiple layers.
Does anyone have any further ideas? comments or suggestions? Anyone
tried this before?
Thanks for your time.
Adrian Head.
----------
From: "A. Eijkhoudt" <zaanpeng...@gmail.com>
Date: Sep 13, 1:42 am
Subject: Extremely slow read performance, but write speeds are near-
perfect!
To: open-iscsi
Maybe not the advice you are looking for, but did you already have a
look at the SCST iSCSI target implementation ? It's faster than IET
and better maintained. There is also a section about how to tune
read-ahead settings in the file scst/README. See also
http://scst.sourceforge.net/ for more information.
Bart.
Does it do MC/S or Error Recovery Level 2 or 3?
--
Tracy Reed
http://tracyreed.org
Not that I know of. Would you like to see these features implemented in SCST ?
(CC'd Vladislav Bolkhovitin, SCST maintainer)
Bart.
Are you using partitions on the client iSCSI devices? Are your partitions
aligned for example with 64k boundary? Have you tried without partitions?
I'd measure with the raw iSCSI device (/dev/sdX) first, and then start
adding additional layers (md-raid, lvm, etc).
-- Pasi
Spent quite a few days & nights playing and was finally able to get
the following results (from 140 tests of bonnie++):
Block Write:
Min 93.35 MB/s
Avg 98.57 MB/s
Max 103.22 MB/s
Block Read:
Min 127.79 MB/s
Avg 136.45 MB/s
Max 142.27 MB/s
This is for the xen linux client reading and writing to a reiserfs
file-system on top of LVM on top of SW RAID1 across two iSCSI exported
drives that exist as an LV on top of SW RAID0 on the targets. One
target being on the Xen host machine the other over the real network.
After reading the AoE alignment paper referenced from this mailing
list - I was a little confused as to where the realignment took place.
I've assumed that you change these values on the exported drive and
not on the source drive. I did play with changing the heads/sectors
value on the exported drives. When I did this - if I created a
partition with fdisk I got significantly slower writes and reads. If
I changed the heads/sectors value but didn't create a partition and
just had SW RAID use the whole disk I received a 17% improvement in
write speeds as per the RAID rebuild speed provided by "/proc/mdstat".
The file-system tests by dd also showed a 13% improvement but the raw
LV read test suffered a 22% decrease.
The read and write values are the average from 10 runs of dd whereas
the RAID1 check value is the average value from /proc/mdstat during
one rebuild session. This is all from the point of the xen client
only.
Without Alignment
With Alignment on exported target
Read /dev/sda 143
145 MB/s 1.38 % Inc
Read /dev/sdb 103
107 MB/s 3.74 % Inc
Read /dev/md0 106
106 MB/s 0.00 % No Change
Read /dev/vg_md0/iscsi_test 145 113
MB/s 22.07 % Dec
Write /mnt/iscsi_test/iscsi_test.raw 106 106
MB/s 0.00 % No Change
Read /mnt/iscsi_test/iscsi_test.raw 92.9 107
MB/s 13.18 % Inc
RAID1 Check/Rebuild 63
76 MB/s 17.11 % Inc
Some observations:
* When running iSCSI across a xen virtual bridged network you can
almost get raw disk speed. (sda is exported from the xen host and
147MB/s is the raw speed for the test within the host on the raw
disk).
* Although some layers might be low (e.g. md0=106MB/s) the actual
workable file-system speed might be quite reasonable and faster than
expected (e.g. iscsi_test_raw =107MB/s)
* Therefore, it seems quite pointless pontificating over speeds of
various layers - only the file-system layer counts.
* By extension you only need to optimise a couple key layers to
improve the results - not all layers. For example - I did try using
blockdev --setra 65536 on all layers on both target and initiator and
the speeds dropped through the floor. Exactly which layers should
have blockdev optimised is yet to be determined. It appears at the
moment that on the target the raw disks need blockdev as well as the
iSCSI exported disks on the initiator. It also appears that blockdev
on md0 seems to be of some help but is a bit iffy - can either
drastically improve the speed or crash the speed through the floor
depending on value. The LVM auto read-ahead may also give better
results than a blanket blockdev in some cases.
* That changing the heads/sector values for the exported disk does
improve speed. Just don't use a partition as it crashes the speed
once the geometry change has been made - use the complete disk - at
least for SW RAID.
* On write - the network is 99% utilised (as per nettop) but on reads
even for the speeds above the network is only ~80% utilised. I'm not
sure why the big difference. Maybe I have to change the target from
IET to something else or maybe I have to tweek the network
optimisation settings. But why it is better in one direction than the
other given that most things are equal are a mystery. I'm now also
looking at what xen interactions there might be.
Areas for further effort:
* The raw disks are 512 sectors and are using either 4K or 128K SW
RAID0 chunks on the targets***. The xen client RAID1 of the exported
disks are using chunk sizes of 32K. It would be interesting to see
what aligning all these chunk values would do for performance. Even
aligning the disk chuck values against the network MTU.
* To use the LVM dm-mirror instead of the SW RAID1 code to see what
performance that might have on the xen client across the iSCSI
exported drives.
* Try to find out which blockdev values to use on the different layers
or which layers to optimise and which not to get the best file-system
speeds. Early tests have resulted in 110MB/s or better reads under
some situations - but the results are currently not consistent.
* Try to optimise the network further to try to get the full 117MB/s
for a /dev/sdb read. I can reach this speed using dd > netcat >
netcat > dd so it should be possible with iSCSI as the PCI bus
doesn't seem to be a limiting factor.
* Work out why the big difference in network utilisation between read
and write. Look into what affect xen might play with this.
Using RAID1 on exported iSCSI disks at least looks like a reasonable
solution for redundancy. I've already encountered a situation where
an eSATA cable was dislodged from the xen host and the client keep
running uninterrupted using the exported iSCSI across the real network
without any noticeable effects. In fact I didn't notice for a couple
of days. And re-adding was easy and trouble-free and almost as fast
as it would have been given a raw physical disk I might even look at
this as a solution for backups in the future (having RAID1 across 3 or
o more exported disks - to take a backup - break the mirror).
Thanks for people's advice and suggestions.
Adrian.
***Why the difference in SW RAID0 chunk sizes between targets?: I've
got a bug in my xen kernel where if I use SW RAID0 chuck sizes smaller
than 128K on the xen host - I get file-system corruption on the xen
clients. Not sure why as I have not found anyone willing to help me
drill down to find out why. Ext2/3 is worse than reiserfs. Reiserfs
has survived for ~6months whereas ext3 will kill everything in
seconds.***