dramatic performance drop after Linux 2.6.38 ( tested 2.6.39 and 3.0.1 )

Heinrich Langos

unread,

Sep 8, 2011, 3:06:04 AM9/8/11

to open-iscsi

Hi htere,

I am using the open-iscsi initiator to access a storage back end for
my Xen based virtualization infrastructure.
Since The current 3.0.x Linux kernel finally has everything that I
need to run the host system (Dom0) without any additional patches I
thought I'd give it a try and see if I can replace the 2.6.32 hosts
that cause a lot of trouble when mixing Xen + iSCSI + multipath.

This is raw "dd" throughput for reading ~30GB from an iSCSI storage
via a dedicated 1GB Ethernet link.

2.6.32 : 102 MB/s
2.6.38 : 100 MB/s
2.6.39 : 44 MB/s
3.0.1 : 43 MB/s

Seems like between 2.6.38 and 2.6.39 the iSCSI performance got pretty
much thrown out the window ...

If I were into into conspiracy theories I'd suspect that the Core-
iSCSI
guys are out to prove the superior performance of their initiator by
slowing down the open-iSCSI competitor. ;-)
(See http://linux-iscsi.org/wiki/Core-iSCSI#Performance )

I tested with Debian Squeeze on bare metal to remove Xen from the
equation and I tested without multipath to avoid those complications
too.

Kernel 2.6.32 is the default Squeeze kernel, 2.6.38 and 2.6.39 were
from squeeze-backports and 3.0.1 was built from the vanilla kernel
sources using make-kpkg. The storage is a Dell / Equallogic PS4000.

Please drop me a line if you can confirm or refute those findings.

On the positive side I have to say that the 3.0.1 kernel is much more
stable in my Xen + iSCSI + multipath setup. (Though I havn't yet
tested
it with the load that I have on the 2.6.32 machine.)

Mike Christie

unread,

Sep 8, 2011, 5:36:24 PM9/8/11

to open-...@googlegroups.com, Heinrich Langos

On 09/08/2011 02:06 AM, Heinrich Langos wrote:
> Hi htere,
>
> I am using the open-iscsi initiator to access a storage back end for
> my Xen based virtualization infrastructure.
> Since The current 3.0.x Linux kernel finally has everything that I
> need to run the host system (Dom0) without any additional patches I
> thought I'd give it a try and see if I can replace the 2.6.32 hosts
> that cause a lot of trouble when mixing Xen + iSCSI + multipath.
>
> This is raw "dd" throughput for reading ~30GB from an iSCSI storage
> via a dedicated 1GB Ethernet link.
>
> 2.6.32 : 102 MB/s
> 2.6.38 : 100 MB/s
> 2.6.39 : 44 MB/s
> 3.0.1 : 43 MB/s
>

I can replicate this now. For some reason I only see it with 1 gig. I
think my 10 gig setups that I have been testing with are limited by
something else.

Doing git bisect now to track down the change that caused the problem.

Mike Christie

unread,

Sep 8, 2011, 10:23:49 PM9/8/11

to open-...@googlegroups.com, Heinrich Langos

I did not find anything really major in the iscsi code. But I did notice
that if I just disable iptables throughput goes from about 5 MB/s back
up to 85 MB/s.

If you disable iptables do you see something similar.

Mike Christie

unread,

Sep 8, 2011, 10:25:29 PM9/8/11

to open-...@googlegroups.com, Heinrich Langos

I also noticed that in 2.6.38 throughput would almost immediately start
at 80-90 MB/s, but with 2.6.39 it takes a while (maybe 10 seconds
sometimes) to ramp up.

Heinrich Langos

unread,

Sep 9, 2011, 3:02:52 AM9/9/11

to Mike Christie, open-...@googlegroups.com

On Thu, Sep 08, 2011 at 09:25:29PM -0500, Mike Christie wrote:
> On 09/08/2011 09:23 PM, Mike Christie wrote:
> > On 09/08/2011 04:36 PM, Mike Christie wrote:
> >> On 09/08/2011 02:06 AM, Heinrich Langos wrote:
> >>> Hi htere,
> >>>
> >>> I am using the open-iscsi initiator to access a storage back end for
> >>> my Xen based virtualization infrastructure.
> >>> Since The current 3.0.x Linux kernel finally has everything that I
> >>> need to run the host system (Dom0) without any additional patches I
> >>> thought I'd give it a try and see if I can replace the 2.6.32 hosts
> >>> that cause a lot of trouble when mixing Xen + iSCSI + multipath.
> >>>
> >>> This is raw "dd" throughput for reading ~30GB from an iSCSI storage
> >>> via a dedicated 1GB Ethernet link.
> >>>
> >>> 2.6.32 : 102 MB/s
> >>> 2.6.38 : 100 MB/s
> >>> 2.6.39 : 44 MB/s
> >>> 3.0.1 : 43 MB/s
> >>>
> >>
> >> I can replicate this now. For some reason I only see it with 1 gig. I
> >> think my 10 gig setups that I have been testing with are limited by
> >> something else.
> >>
> >> Doing git bisect now to track down the change that caused the problem.
> >>

First of all thank you very much for taking a look at it. I was wondering if
I was doing something strange since I havn't seen anybody report anything
similar and 2.6.39 is out a while now. But I guess iSCSI users tend to be
coporate users and they don't junp on every new kernel version.

> > I did not find anything really major in the iscsi code. But I did notice
> > that if I just disable iptables throughput goes from about 5 MB/s back
> > up to 85 MB/s.
> >
> > If you disable iptables do you see something similar.
> >
>
> I also noticed that in 2.6.38 throughput would almost immediately start
> at 80-90 MB/s, but with 2.6.39 it takes a while (maybe 10 seconds
> sometimes) to ramp up.

I've noticed the oposite. Throughput starts high and goes down to the
numbers reported after about 10 gigabyte. I've noticed with all the kernels
I've tested but that is proably a result of the crude way of measuring
that I use.

I run 'dd if=/dev/disk/by-path/... of=/dev/null bs=1024k &'
and 'while kill -USR1 <dd-pid> ; do sleep 2 ; done' .
Therefore I usually don't get to see the thoughput during the first couple
of seconds. I'm open to suggestions to improve this.

I took a look at ip_tables with kernel 3.0.1 (I'll check out the other
versions later today.)
If I rmmod iptable_filter ip_tables and x_tables the performance starts
out higher (over 80 instead of over 60) and drops down to around 63
instead of 43. Still not back up in the region it was before.

cheers
-henrik

Ulrich Windl

unread,

Sep 13, 2011, 2:31:32 AM9/13/11

to open-...@googlegroups.com

Hi!

I wonder whether anybody did try to play with network tuning parameters related to iSCSI. Candidates might be "net.core.?mem*". I've seen these setting for a database server, but I don't know what the intention of these actually is:

net.core.rmem_default = 262144
net.core.wmem_default = 262144
net.core.rmem_max = 4194304
net.core.wmem_max = 1048576

Regards,
Ulrich

>>> Heinrich Langos <henrik...@prak.org> schrieb am 09.09.2011 um 09:02 in
Nachricht <2011090907...@www.viadmin.org>:

Mike Christie

unread,

Sep 15, 2011, 4:17:48 PM9/15/11

to Heinrich Langos, open-...@googlegroups.com

On 09/15/2011 05:26 AM, Heinrich Langos wrote:
> Hi Mike,

> Did you find the reason of the performance drop? Is there anything I can do
> to help? I'll have to get that new box into production soon and since I am

Did you confirm that stopping iptables also solves the problem for you?
Was waiting to hear back from you to make sure we are working on the
same bug.

Mike Christie

unread,

Sep 16, 2011, 5:08:35 PM9/16/11

to Heinrich Langos, open-...@googlegroups.com

On 09/16/2011 09:01 AM, Heinrich Langos wrote:
> What it boils down to is no significant effect of iptables !
>

Ok. Different issues maybe then. Let me do some more digging. I have to
kill some other regression that I have been working on for work (also
seen in that "Re: open-iscsi issue" thread) then I can concentrate more
on this.

The strange thing is that I added the iscsi code from upstream to rhel
6/5 and it is working fine with or without iptables for me. I get 113
MB/s read or write on 1 gig.

Is your box 32 bit or 64 and if 32 bit are you using highmem?

If it is 64 bit and you have time and like to build kernels try and do a
"git bisect" to narrow down what kernel commit is causing the problem.

Heinrich Langos

unread,

Sep 16, 2011, 10:01:27 AM9/16/11

to Mike Christie, open-...@googlegroups.com

Hello Mike,

iptables doesn't seem to be involved... The effect I observed (going
from 43MB/s to 63MB/s was only a short term thing. The full test
still shows the same low performance as before.

3.0.1:
...
29843+0 records in
29842+0 records out
31291604992 bytes (31 GB) copied, 728.783 s, 42.9 MB/s
29917+0 records in
29916+0 records out
31369199616 bytes (31 GB) copied, 730.761 s, 42.9 MB/s
29993+0 records in
29992+0 records out
31448891392 bytes (31 GB) copied, 732.776 s, 42.9 MB/s
30000+0 records in
30000+0 records out
31457280000 bytes (31 GB) copied, 737.593 s, 42.6 MB/s
30000+0 records in
30000+0 records out
31457280000 bytes (31 GB) copied, 737.593 s, 42.6 MB/s
[1]+ Done dd if=/dev/disk/by-path/ip-172.26.0.100\:3260-iscsi-iqn.2001-05.com.equallogic\:0-8a0906-f11e2e306-4c60013c7064e675-testme-lun-0 of=/dev/null bs=1024k count=30000
-bash: kill: (2159) - No such process
root@janus01:~# lsmod | grep tab
root@janus01:~#

Then I ran "iptables -L -n" to make sure those modules are loaded

root@janus01:~# lsmod | grep tab
iptable_filter 12536 0
ip_tables 21818 1 iptable_filter
x_tables 18886 2 iptable_filter,ip_tables

I repeated the test and got pretty much the same throughput.

Sometimes the beginning seems faster, sometimes it seems slower.
Sometimes the leveling off happens after a couple of seconds,
sometimes it takes up to 10 GB to of data to to reach a "stable"
throughput level. (Though when measuring 2.6.38 I had a slow slope
going down to 99 MB/s average for the full 30GB even when after
10GB the average still was around 106MB/s.)

What I guess is that caching has to be observed with care when
running multiple tests with the same kernel. After all the machine
I am testing with has 48GB RAM and I am not running anything else
on the machine.

What it boils down to is no significant effect of iptables !

cheers
-henrik

Boaz Harrosh

unread,

Sep 21, 2011, 7:16:48 AM9/21/11

to open-...@googlegroups.com, Mike Christie, Heinrich Langos

I would like to report that with 3.0 Kernel I can do 103MB on a single
osd iscsi-device. osd is very different but it uses the exact same
iscsi-LLD, only the upper layers above scsi are completely different.

Could you also test with direct IO? If you have time, download the
"sg utils" (search on google) and use the sg_dd command with direct IO
use big block sizes of 8MB or so.

With this we can see if it comes from Low-Level-Driver or its an upper
layer issue. Because even if accessing a block device directly dd is
still using page_cache. Also what about an ext4 FS ontop of the iscsi
device, and dd into a file. Is that the same results?

Thanks
Boaz

Mike Christie

unread,

Sep 21, 2011, 4:48:57 PM9/21/11

to Heinrich Langos, Boaz Harrosh, open-...@googlegroups.com

On 09/21/2011 10:33 AM, Heinrich Langos wrote:
> Since dd / sg_dd performance is already pretty bad I don't think we need
> to get into testing fs performance on top of this, right?
>

Yeah. I would not waste time on it.

Reply all

Reply to author

Forward