Extremely slow read performance, but write speeds are near-perfect!

A. Eijkhoudt

unread,

Sep 11, 2008, 8:29:27 PM9/11/08

to open-iscsi, r.z...@hva.nl

Hello and thanks in advance for reading,

We've been having serious read performance issues with Open-iSCSI.

I'll list the specs of our setup first:

The SAN:

- Dual-Core Xeon with 2GB RAM, running Microsoft Windows Storage
Server 2003 R2
- 5TB storage, sliced into two 1.5TB and one 2.0TB partition.
- Dual Broadcom NC324i NICs
- 1Gbit uplink to switch, using the first Broadcom card.

The client machine experiencing the problem:

- Quad-Core Opteron with 8GB, running Gentoo Linux, kernel 2.6.24-
gentoo-r5
- It uses the 2TB iSCSI Target, formatted as xfs and used as backup/
restore for local storage
- Dual Broadcom BCM5708 NICs
- Intel 82545GM fibre NIC
- Open-ISCSI version 2.0-870-rc1 (latest version)
- 1Gbit uplink to switch, using the Intel NIC (fibre).

The switch:

- Cisco 3750G, 24xCopper Gigabit switch with three fibre-modules

The problem we're experiencing is this:

When we copy a file from the local disk array to the iSCSI target, the
write performance is amazing: we max out the gigabit link immediately
(>100MB/sec writes easily). When we copy a file from the iSCSI target
to the local disk array however, performance is absolutely *dreadful*:
1-4MB/sec. This also completely floods the client machine with I/O
requests (as seen by running 'htop'): the load jumps to ~6-8 and it
becomes very slow and almost unresponsive to commands.

What I've tried to rectify the problem, is what I could find online so
far:

- Enable Flow Control on the NICs: no change in speed.
- Change to the copper NIC on the client machine: no change in speed.
- Use different read-ahead settings on the SCSI device 'blockdev --
setra ...' (4096-65536). This only initially solves the problem: we
get a short burst of good speed, and then it dies again. It kills the
random reads/write speeds, however.
- Different programs make no difference: the problem occurs with bonnie
++, cp, scp, mv, etc.

Can anyone help me figure this out, or point me in the right direction?

Bart Van Assche

unread,

Sep 12, 2008, 2:08:08 AM9/12/08

to open-...@googlegroups.com, r.z...@hva.nl

On Fri, Sep 12, 2008 at 2:29 AM, A. Eijkhoudt <zaanp...@gmail.com> wrote:
> - Use different read-ahead settings on the SCSI device 'blockdev --
> setra ...' (4096-65536). This only initially solves the problem: we
> get a short burst of good speed, and then it dies again. It kills the
> random reads/write speeds, however.

Are you aware that not only the readahead settings on the initiator
side count, but also on the target side ?

Bart.

A. Eijkhoudt

unread,

Sep 12, 2008, 5:21:38 AM9/12/08

to open-iscsi

On Sep 12, 8:08 am, "Bart Van Assche" <bart.vanass...@gmail.com>
wrote:

> Are you aware that not only the readahead settings on the initiator
> side count, but also on the target side ?

I would assume there are, but I don't see options to change that under
Windows Storage Server. Wouldn't the write speed be just as bad if
this was the cause?

- Arnim.

Konrad Rzeszutek

unread,

Sep 12, 2008, 9:49:16 AM9/12/08

to open-...@googlegroups.com, r.z...@hva.nl

> When we copy a file from the local disk array to the iSCSI target, the
> write performance is amazing: we max out the gigabit link immediately
> (>100MB/sec writes easily). When we copy a file from the iSCSI target
> to the local disk array however, performance is absolutely *dreadful*:

Can you try to copy the file from the iSCSI target to /dev/null?

This will narrow down whether the problem is with the iSCSI target (or the
read mechanism in the iSCSI layer) or with your local disk.

Is the 'local disk array' a software RAID ?

A. Eijkhoudt

unread,

Sep 12, 2008, 10:52:33 AM9/12/08

to open-iscsi

On Sep 12, 3:49 pm, Konrad Rzeszutek <kon...@virtualiron.com> wrote:
> Can you try to copy the file from the iSCSI target to /dev/null?

Yes, no difference. It's still going at a decidedly unimpressive 1.5MB/
sec.

> Is the 'local disk array' a software RAID ?

No, it's a hardware RAID5 setup over 12 disks, on the HP ProLiant's
internal Compaq RAID controller.

Stefan de Konink

unread,

Sep 12, 2008, 10:55:25 AM9/12/08

to open-...@googlegroups.com

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

A. Eijkhoudt schreef:

> On Sep 12, 3:49 pm, Konrad Rzeszutek <kon...@virtualiron.com> wrote:
>> Can you try to copy the file from the iSCSI target to /dev/null?
>
> Yes, no difference. It's still going at a decidedly unimpressive 1.5MB/
> sec.

So what about a windows client? How fast is it going?

Stefan
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAkjKgt0ACgkQYH1+F2Rqwn3tqwCeMD+H1u1+dqHQryH2rvAjsMee
nmYAn3OUBXbGjvNeewlQCJdG2eceEpUx
=ISaT
-----END PGP SIGNATURE-----

D.A.

unread,

Sep 12, 2008, 11:42:40 AM9/12/08

to open-iscsi

On Sep 12, 1:29 am, "A. Eijkhoudt" <zaanpeng...@gmail.com> wrote:
> Hello and thanks in advance for reading,
>
> We've been having serious read performance issues with Open-iSCSI.
>

I am having a similar problem, with good write performance and
dreadful read performance, but with bursts of good speed.

I am not entirely sure that this will be the final answer (too many
dead ends already), but for me it seems it was a mismatched block size
between the target and initiator devices (and upgrading the iSCSI
target). The target device had a block size of 4096, while on the
initiator side the block size was 512. This issue seemed to manifest
itself particularly with multipathing.

In my case I am using two Ubuntu boxes (open-iscsi v2.0-865 and IET
v0.4.15/0.4.16). I am using software raid (this is a test box) with
LVM, and the LV is presented to the iSCSI target.

Just my 2c.

D.A.

Ben Lake

unread,

Sep 12, 2008, 8:38:06 PM9/12/08

to open-...@googlegroups.com

More 2c, for everyone. I had unacceptable speeds in general when I first
started using open-iscsi, mainly manifested when reading from my IET
target (both Ubuntu BTW). One issue was as another user mentioned, I was
reading from a fast target (2x3Ghz Xeons) and writing to a slower
(1x2.4Ghz P4) more taxed system running software raid. After I realized
that the speed issue made more sense as the bottleneck was the writing
to the software raid. Anyhow... I enabled jumbo frames on the NICs (MTU
> 1504), specifically an MTU of 7000 as that is all the client NIC
could handle. Although 9000 would've been best. This produced consistent
reads from the target at 800Mbps-1Gbps (I was surprised too, used iperf
for tests), and write speeds of around 500-650Mbps.

Again 2c.

Thanks for all the work devs!

> --~--~---------~--~----~------------~-------~--~----~
> You received this message because you are subscribed to the Google Groups "open-iscsi" group.
> To post to this group, send email to open-...@googlegroups.com
> To unsubscribe from this group, send email to open-iscsi+...@googlegroups.com
> For more options, visit this group at http://groups.google.com/group/open-iscsi
> -~----------~----~----~----~------~----~------~--~---
>
>

A. Eijkhoudt

unread,

Sep 12, 2008, 9:42:23 PM9/12/08

to open-iscsi

Thank you all very much!

The suggested combinations of network configuration & iSCSI settings
seem to have solved the problem, even though initially the speed
fluctuated very heavily (no idea why ;)). I've run 3 consecutive full
Bonnie++ tests now and tried different combinations, and these are the
best results I could get so far:

Version 1.93c ------Sequential Output------ --Sequential Input-
--Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec
%CP /sec %CP
abcdefghijklmn 16G 423 99 105115 22 35580 18 756 99 113453
31 367.5 7
Latency 20001us 500ms 520ms 20001us 110ms
90001us
Version 1.93c ------Sequential Create------ --------Random
Create--------
abcdefghijklmn -Create-- --Read--- -Delete-- -Create-- --Read--- -
Delete--
files /sec %CP /sec %CP /sec %CP /sec %CP /sec
%CP /sec %CP
16 4106 29 +++++ +++ 3244 10 888 95 +++++ +++
2829 12

The system load now sticks at 1.00 (1 CPU/core used at 100%) as well.

Hopefully this discussion/thread will help others in the future!

Heady

unread,

Nov 21, 2008, 3:16:05 PM11/21/08

to open-...@googlegroups.com

Folks,

I've been struggling with a similar problem for a while now. My write
speeds are around 110M/s whereas, even following A. Eijkhoudt's advice
I've only been able to get 34M/s reads.

The actual setup is an experimental play rig where I'm trying to see
how usable/practical it is to provide a HA Linux client on top of two
Xen Linux servers. To provide HA storage I was playing with having
the two servers export LVs via iSCSI and having the client RAID1 these
for redundancy. So the client will see two exported targets - one
being on the same physical machine but on the virtual network - the
other on the alternative server across the physical network. The idea
then is to get the client to RAID1 the exported targets for HA and
therefore, have the ability to migrate the client between Xen servers
for client HA and hopefully get away from any single points of
failure. The setup works as expected except for read speeds.

All physical hardware are the same. All machines are standard Sun
v20z's running 4 cores and 8G RAM with Broadcom GE NICs and the
standard internal U320 SCSI HDDs with external eSATA II drives
connected via a Silicon Image eSATA controller.

All machines are running amd64 Gentoo Linux using a 2.6.18-xen kernel
and Xen-3.3. I'm unable to provide IET and Open-iSCSI versions at the
moment as I'm away from the play rig. However, I can provide if
anyone is interested.

* The target is IET running on a Xen dom0 running Gentoo Linux with
the internal drives in a S/W RAID0 presented via LVM2. The iSCSI
target drive is an LV volume exported via blockio. Read speeds
directly off the internal SCSI drives S/W RAID0 md0 are in the range
of ~140-150M/s with write speeds of ~90-100M/s. Read & write speeds
off the LV are identical to the raw md0 speeds. Read speeds directly
off the external eSATA II drives S/W RAID0 md1 are in the range of
~450M/s with write speeds of ~220M/s. Read & write speeds off the LV
for the external eSATA II drives are identical to the raw md1 speeds.

* The network is GE with 9k MTU and at the moment is just a cross-over
cable between servers. Tests using iperf show consistent TCP
throughput of 118M/s over the wire. Tests using dd & netcat also show
117M/s for (RAID0 md0 svr 1)->network->(/dev/null svr 2) and speeds of
95M/s for (RAID md0 svr 1)->network->(RAID md0 svr 2). The speeds are
identical for (Xen Client 1)->(Xen dom0 srv 1)->network->(Xen dom0 srv
2)->(Xen Client 2). So Xen doesn't seem to penalize these scenarios.

* The initiator is Open-iSCSI running on the Xen client. The exported
targets are then S/W RAID1 and the resulting md1 is then divided using
client side LVM2 presenting the LVs to the client.

So the software stack is quite layered. However, this doesn't seem to
be a problem for writes (110M/s) just reads (30-34M/s).

Initially writes were at ~110Ms but reads were ~1M/s.

Having read-ahead of 8K, MTU of 9K as well as the noop scheduler for
all block devices at all levels on both the target and initiator
pulled the read speed up to ~30M/s. This is with the recommended TCP
tweaks (e.g. RX/TX buffer size of 16M etc.).
Increasing read-ahead to 16K for all block devices at all levels on
both sides resulted in read speeds of ~34M/s. Increasing read-ahead
and/or TCP tweaks further doesn't seem to make any further
improvements. Write speed is unaffected.

Playing with the IET and Open-iSCSI config files and changing the read-
ahead and max recv/xmit block sizes to anything other than default
resulted in 18K/s writes and reads. However, turning on instant data
produces an initial burst of 100M/s for about 2s and then returns back
to ~30M/s. Changing the IET compatibility setting essentially kills
everything. i.e. Any read or write hangs with no messages in syslog.

Bonnie++ tests on the client md1 created from the exported targets are
a mixed bag - the first couple of tests vary greatly but then settle
down to also show ~100M/s writes and ~34M/s reads.

A Linux kernel rebuild of the md1 created from the exported targets
result in an average rebuild speed of ~90M/s while a check of the md1
results in an average check speed of ~26M/s.

Breaking the client S/W RAID1 apart and testing both targets
individually results in write speeds of ~70-90M/s to both and read
speeds of ~20-26M/s from both no matter where they are located.
Either via the virtual network on the same physical box or across the
physical network to the alternate box.

About the only thing I have not tried so far - is modifying the Cyl/Hd/
Sec values on the client for the exported iSCSI targets to see if
alignment is a problem. However, I'm unsure at the moment what values
I'd have to set these to given the multiple layers.

Does anyone have any further ideas? comments or suggestions? Anyone
tried this before?

Thanks for your time.

Adrian Head.

----------
From: "A. Eijkhoudt" <zaanpeng...@gmail.com>
Date: Sep 13, 1:42 am
Subject: Extremely slow read performance, but write speeds are near-
perfect!
To: open-iscsi

Bart Van Assche

unread,

Nov 22, 2008, 3:44:09 AM11/22/08

to Heady, open-...@googlegroups.com

On Fri, Nov 21, 2008 at 9:16 PM, Heady <adrianh...@googlemail.com> wrote:
> * The target is IET running on a Xen dom0 running Gentoo Linux with [ ... ]

Maybe not the advice you are looking for, but did you already have a
look at the SCST iSCSI target implementation ? It's faster than IET
and better maintained. There is also a section about how to tune
read-ahead settings in the file scst/README. See also
http://scst.sourceforge.net/ for more information.

Bart.

Tracy Reed

unread,

Nov 22, 2008, 4:58:14 AM11/22/08

to open-...@googlegroups.com, Heady

On Sat, Nov 22, 2008 at 09:44:09AM +0100, Bart Van Assche spake thusly:

> Maybe not the advice you are looking for, but did you already have a
> look at the SCST iSCSI target implementation ? It's faster than IET
> and better maintained. There is also a section about how to tune

Does it do MC/S or Error Recovery Level 2 or 3?

--
Tracy Reed
http://tracyreed.org

Bart Van Assche

unread,

Nov 22, 2008, 7:44:04 AM11/22/08

to open-...@googlegroups.com, Heady, Vladislav Bolkhovitin, Tracy Reed

Not that I know of. Would you like to see these features implemented in SCST ?

(CC'd Vladislav Bolkhovitin, SCST maintainer)

Bart.

Pasi Kärkkäinen

unread,

Nov 25, 2008, 6:09:49 AM11/25/08

to open-...@googlegroups.com

On Fri, Nov 21, 2008 at 12:16:05PM -0800, Heady wrote:
>
> Folks,
>
> I've been struggling with a similar problem for a while now. My write
> speeds are around 110M/s whereas, even following A. Eijkhoudt's advice
> I've only been able to get 34M/s reads.
>

> * The initiator is Open-iSCSI running on the Xen client. The exported
> targets are then S/W RAID1 and the resulting md1 is then divided using
> client side LVM2 presenting the LVs to the client.
>
> So the software stack is quite layered. However, this doesn't seem to
> be a problem for writes (110M/s) just reads (30-34M/s).
>

Are you using partitions on the client iSCSI devices? Are your partitions
aligned for example with 64k boundary? Have you tried without partitions?

I'd measure with the raw iSCSI device (/dev/sdX) first, and then start
adding additional layers (md-raid, lvm, etc).

-- Pasi

Adrian Head

unread,

Nov 28, 2008, 5:46:23 PM11/28/08

to open-...@googlegroups.com

Just a quick update on where I got to regarding this.

Spent quite a few days & nights playing and was finally able to get
the following results (from 140 tests of bonnie++):

Block Write:
Min 93.35 MB/s
Avg 98.57 MB/s
Max 103.22 MB/s

Block Read:
Min 127.79 MB/s
Avg 136.45 MB/s
Max 142.27 MB/s

This is for the xen linux client reading and writing to a reiserfs
file-system on top of LVM on top of SW RAID1 across two iSCSI exported
drives that exist as an LV on top of SW RAID0 on the targets. One
target being on the Xen host machine the other over the real network.

After reading the AoE alignment paper referenced from this mailing
list - I was a little confused as to where the realignment took place.
I've assumed that you change these values on the exported drive and
not on the source drive. I did play with changing the heads/sectors
value on the exported drives. When I did this - if I created a
partition with fdisk I got significantly slower writes and reads. If
I changed the heads/sectors value but didn't create a partition and
just had SW RAID use the whole disk I received a 17% improvement in
write speeds as per the RAID rebuild speed provided by "/proc/mdstat".
The file-system tests by dd also showed a 13% improvement but the raw
LV read test suffered a 22% decrease.

The read and write values are the average from 10 runs of dd whereas
the RAID1 check value is the average value from /proc/mdstat during
one rebuild session. This is all from the point of the xen client
only.
Without Alignment
With Alignment on exported target
Read /dev/sda 143
145 MB/s 1.38 % Inc
Read /dev/sdb 103
107 MB/s 3.74 % Inc
Read /dev/md0 106
106 MB/s 0.00 % No Change
Read /dev/vg_md0/iscsi_test 145 113
MB/s 22.07 % Dec
Write /mnt/iscsi_test/iscsi_test.raw 106 106
MB/s 0.00 % No Change
Read /mnt/iscsi_test/iscsi_test.raw 92.9 107
MB/s 13.18 % Inc
RAID1 Check/Rebuild 63
76 MB/s 17.11 % Inc

Some observations:
* When running iSCSI across a xen virtual bridged network you can
almost get raw disk speed. (sda is exported from the xen host and
147MB/s is the raw speed for the test within the host on the raw
disk).
* Although some layers might be low (e.g. md0=106MB/s) the actual
workable file-system speed might be quite reasonable and faster than
expected (e.g. iscsi_test_raw =107MB/s)
* Therefore, it seems quite pointless pontificating over speeds of
various layers - only the file-system layer counts.
* By extension you only need to optimise a couple key layers to
improve the results - not all layers. For example - I did try using
blockdev --setra 65536 on all layers on both target and initiator and
the speeds dropped through the floor. Exactly which layers should
have blockdev optimised is yet to be determined. It appears at the
moment that on the target the raw disks need blockdev as well as the
iSCSI exported disks on the initiator. It also appears that blockdev
on md0 seems to be of some help but is a bit iffy - can either
drastically improve the speed or crash the speed through the floor
depending on value. The LVM auto read-ahead may also give better
results than a blanket blockdev in some cases.
* That changing the heads/sector values for the exported disk does
improve speed. Just don't use a partition as it crashes the speed
once the geometry change has been made - use the complete disk - at
least for SW RAID.
* On write - the network is 99% utilised (as per nettop) but on reads
even for the speeds above the network is only ~80% utilised. I'm not
sure why the big difference. Maybe I have to change the target from
IET to something else or maybe I have to tweek the network
optimisation settings. But why it is better in one direction than the
other given that most things are equal are a mystery. I'm now also
looking at what xen interactions there might be.

Areas for further effort:
* The raw disks are 512 sectors and are using either 4K or 128K SW
RAID0 chunks on the targets***. The xen client RAID1 of the exported
disks are using chunk sizes of 32K. It would be interesting to see
what aligning all these chunk values would do for performance. Even
aligning the disk chuck values against the network MTU.
* To use the LVM dm-mirror instead of the SW RAID1 code to see what
performance that might have on the xen client across the iSCSI
exported drives.
* Try to find out which blockdev values to use on the different layers
or which layers to optimise and which not to get the best file-system
speeds. Early tests have resulted in 110MB/s or better reads under
some situations - but the results are currently not consistent.
* Try to optimise the network further to try to get the full 117MB/s
for a /dev/sdb read. I can reach this speed using dd > netcat >
netcat > dd so it should be possible with iSCSI as the PCI bus
doesn't seem to be a limiting factor.
* Work out why the big difference in network utilisation between read
and write. Look into what affect xen might play with this.

Using RAID1 on exported iSCSI disks at least looks like a reasonable
solution for redundancy. I've already encountered a situation where
an eSATA cable was dislodged from the xen host and the client keep
running uninterrupted using the exported iSCSI across the real network
without any noticeable effects. In fact I didn't notice for a couple
of days. And re-adding was easy and trouble-free and almost as fast
as it would have been given a raw physical disk I might even look at
this as a solution for backups in the future (having RAID1 across 3 or
o more exported disks - to take a backup - break the mirror).

Thanks for people's advice and suggestions.

Adrian.

***Why the difference in SW RAID0 chunk sizes between targets?: I've
got a bug in my xen kernel where if I use SW RAID0 chuck sizes smaller
than 128K on the xen host - I get file-system corruption on the xen
clients. Not sure why as I have not found anyone willing to help me
drill down to find out why. Ext2/3 is worse than reiserfs. Reiserfs
has survived for ~6months whereas ext3 will kill everything in
seconds.***

Reply all

Reply to author

Forward