Understanding effect of read-ahead on open-iscsi block device

Pasi Kärkkäinen

unread,

Apr 7, 2008, 3:24:56 PM4/7/08

to open-...@googlegroups.com

Hello list!

I have lvm volume (from a single sata disk) exported with IET like this:

Target iqn.2001-04.com.test:vol1
Lun 0 Path=/dev/vg1/test1,Type=fileio
InitialR2T No
ImmediateData Yes

I have not modified read-ahead or other disk settings on the target server.
target is using deadline elevator/scheduler.

open-iscsi initiator (CentOS 5.1 2.6.18-53.1.14.el5PAE and the default
open-iscsi that comes with the distro) sees that volume as /dev/sda.

I'm testing performance with different read-ahead settings on the initiator.

Can somebody explain why I see these throughput changes?

# blockdev --setra 256 /dev/sda
# dd if=/dev/sda of=/dev/null bs=1024k count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 45.0601 seconds, 47.7 MB/s

# blockdev --setra 512 /dev/sda
# dd if=/dev/sda of=/dev/null bs=1024k count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 50.7867 seconds, 42.3 MB/s

# blockdev --setra 1024 /dev/sda
# dd if=/dev/sda of=/dev/null bs=1024k count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 36.0101 seconds, 59.6 MB/s

# blockdev --setra 2048 /dev/sda
# dd if=/dev/sda of=/dev/null bs=1024k count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 56.8964 seconds, 37.7 MB/s

# blockdev --setra 4096 /dev/sda
# dd if=/dev/sda of=/dev/null bs=1024k count=2048
2048+0 records in
2048+0 records out
2147483648 bytes (2.1 GB) copied, 147.388 seconds, 14.6 MB/s

I repeated all of the test above multiple times and the results were
reproducible..

I tested also with 4k and 1024k blocksizes.. the results were about the same
with both blocksizes.

# blockdev --setra 1024 /dev/sda
# dd if=/dev/sda of=/dev/null bs=4k count=524288
524288+0 records in
524288+0 records out
2147483648 bytes (2.1 GB) copied, 35.3623 seconds, 60.7 MB/s

# blockdev --setra 4096 /dev/sda
# dd if=/dev/sda of=/dev/null bs=4k count=524288
524288+0 records in
524288+0 records out
2147483648 bytes (2.1 GB) copied, 146.793 seconds, 14.6 MB/s

between test runs I dropped caches on both initiator and the target with:
"echo 3 > /proc/sys/vm/drop_caches"

I'm using noop elevator/scheduler on /dev/sda, but changing it to deadline
or cfq didn't change anything. nr_requests and queue_depth are the defaults.

Initiator box had about 50% CPU free during all the tests..

One thing I noticed was when you got better throughput there was less iowait
and more time spent on "si" and on "sy" (from top output). When I got worse
throughput there was more time spent on iowait and less on "si" and on "sy".

top output with read-ahead of 4096:
Cpu(s): 0.0%us, 1.5%sy, 0.0%ni, 49.5%id, 47.0%wa, 0.0%hi, 2.0%si, 0.0%st

top output with read-ahead of 1024:
Cpu(s): 0.5%us, 3.0%sy, 0.0%ni, 49.8%id, 36.8%wa, 0.0%hi, 10.0%si, 0.0%st

Target box has 2G of RAM and the initiator has 4G of RAM.
Both are CentOS 5.1 32bit with latest updates installed.
Target software is IETD v0.4.16.

Comments are very welcome :) I'd like to understand why this happens and
what's the limiting factor.

-- Pasi

Konrad Rzeszutek

unread,

Apr 7, 2008, 5:14:19 PM4/7/08

to open-...@googlegroups.com

On Mon, Apr 07, 2008 at 10:24:56PM +0300, Pasi Kärkkäinen wrote:
>
> Hello list!
>
> I have lvm volume (from a single sata disk) exported with IET like this:
>
> Target iqn.2001-04.com.test:vol1
> Lun 0 Path=/dev/vg1/test1,Type=fileio
> InitialR2T No
> ImmediateData Yes
>
> I have not modified read-ahead or other disk settings on the target server.
> target is using deadline elevator/scheduler.
>
> open-iscsi initiator (CentOS 5.1 2.6.18-53.1.14.el5PAE and the default
> open-iscsi that comes with the distro) sees that volume as /dev/sda.
>
> I'm testing performance with different read-ahead settings on the initiator.
>
> Can somebody explain why I see these throughput changes?

What is the MTU set on your NICs? If it is 1500 (the default) a 4KB transfer
will have to be divided in four TCP packets and re-assembled on the target side.
This increases the amount of work that the TCP stack has to perform.

Try setting the frame rate to a jumbo frame - but be careful as your switch needs
to support jumbo frames. Or you can use a cross-over cable for this test and
eliminate the switch from the picture.

Pasi Kärkkäinen

unread,

Apr 8, 2008, 4:51:50 AM4/8/08

to open-...@googlegroups.com

On Mon, Apr 07, 2008 at 05:14:19PM -0400, Konrad Rzeszutek wrote:
>
> On Mon, Apr 07, 2008 at 10:24:56PM +0300, Pasi Kärkkäinen wrote:
> >
> > Hello list!
> >
> > I have lvm volume (from a single sata disk) exported with IET like this:
> >
> > Target iqn.2001-04.com.test:vol1
> > Lun 0 Path=/dev/vg1/test1,Type=fileio
> > InitialR2T No
> > ImmediateData Yes
> >
> > I have not modified read-ahead or other disk settings on the target server.
> > target is using deadline elevator/scheduler.
> >
> > open-iscsi initiator (CentOS 5.1 2.6.18-53.1.14.el5PAE and the default
> > open-iscsi that comes with the distro) sees that volume as /dev/sda.
> >
> > I'm testing performance with different read-ahead settings on the initiator.
> >
> > Can somebody explain why I see these throughput changes?
>
> What is the MTU set on your NICs? If it is 1500 (the default) a 4KB transfer
> will have to be divided in four TCP packets and re-assembled on the target side.
> This increases the amount of work that the TCP stack has to perform.
>

Hmm.. I'm using read-ahead values from 128 kB (256) to 2 MB (4096).. so all of the
read-ahead values will cause many TCP packets anyway.. ?

> Try setting the frame rate to a jumbo frame - but be careful as your switch needs
> to support jumbo frames. Or you can use a cross-over cable for this test and
> eliminate the switch from the picture.
>

I could try jumbos and see if it has any effect..

I guess I should also try dd with different read-ahead settings locally on
the target to see if I can see the same behaviour there..

Thanks for the reply!

-- Pasi

Konrad Rzeszutek

unread,

Apr 8, 2008, 9:45:06 AM4/8/08

to open-...@googlegroups.com

> > What is the MTU set on your NICs? If it is 1500 (the default) a 4KB transfer
> > will have to be divided in four TCP packets and re-assembled on the target side.
> > This increases the amount of work that the TCP stack has to perform.
> >
>
> Hmm.. I'm using read-ahead values from 128 kB (256) to 2 MB (4096).. so all of the
> read-ahead values will cause many TCP packets anyway.. ?

Duh! Thougt the packets were in bytes, not in MBs.

Pasi Kärkkäinen

unread,

Apr 8, 2008, 2:26:19 PM4/8/08

to open-...@googlegroups.com

Actually read-ahead value is 512 byte sectors..
so read-ahead of 4096 means 512 bytes * 4096 == 2048 kB == 2 MB.

-- Pasi

Pasi Kärkkäinen

unread,

Apr 8, 2008, 3:09:19 PM4/8/08

to open-...@googlegroups.com

On Mon, Apr 07, 2008 at 10:24:56PM +0300, Pasi Kärkkäinen wrote:
>

> Hello list!
>
> I have lvm volume (from a single sata disk) exported with IET like this:
>
> Target iqn.2001-04.com.test:vol1
> Lun 0 Path=/dev/vg1/test1,Type=fileio
> InitialR2T No
> ImmediateData Yes
>
> I have not modified read-ahead or other disk settings on the target server.
> target is using deadline elevator/scheduler.
>
> open-iscsi initiator (CentOS 5.1 2.6.18-53.1.14.el5PAE and the default
> open-iscsi that comes with the distro) sees that volume as /dev/sda.
>
> I'm testing performance with different read-ahead settings on the initiator.
>
> Can somebody explain why I see these throughput changes?
>

Do other people see the same behaviour? if not then i guess the problem (or
a feature?) is in my target..

Or is there anything in open-iscsi that could explain this?

-- Pasi

Pasi Kärkkäinen

unread,

Apr 11, 2008, 2:52:45 PM4/11/08

to open-...@googlegroups.com

I did some more tests..

Testing the same volume locally on the target doesn't seem to have the same
problem..

# blockdev --getra /dev/vg1/test1
256

# dd if=/dev/vg1/test1 of=/dev/null bs=1024k count=8192
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 139.841 seconds, 61.4 MB/s

# blockdev --setra 4096 /dev/vg1/test1
# blockdev --getra /dev/vg1/test1
4096

# dd if=/dev/vg1/test1 of=/dev/null bs=1024k count=8192
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 139.819 seconds, 61.4 MB/s

# blockdev --setra 1024 /dev/vg1/test1
# blockdev --getra /dev/vg1/test1
1024

# dd if=/dev/vg1/test1 of=/dev/null bs=1024k count=8192
8192+0 records in
8192+0 records out
8589934592 bytes (8.6 GB) copied, 139.89 seconds, 61.4 MB/s

Speed seems to be the same with all read-ahead values, no big changes like
on the initiator.. (I tested with both 4k and 1024k blocksize on the target)

Then some more testing on the initiator..

read-ahead of 4096 on the initiator:

iostat on initiator:
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 4.48 45.27 0.00 49.75

iostat on target:
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 9.45 88.06 0.00 2.49

# dd if=/dev/sda of=/dev/null bs=4k count=524288
524288+0 records in
524288+0 records out

2147483648 bytes (2.1 GB) copied, 65.0137 seconds, 33.0 MB/s

read-ahead of 1024 on the initiator:

iostat on initiator:
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 16.50 32.50 0.00 50.50

iostat on target:
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 19.00 42.00 0.00 39.00

# dd if=/dev/sda of=/dev/null bs=4k count=524288
524288+0 records in
524288+0 records out

2147483648 bytes (2.1 GB) copied, 35.2285 seconds, 61.0 MB/s

read-ahead of 256 on the initiator:

iostat on initiator:
avg-cpu: %user %nice %system %iowait %steal %idle
1.00 0.00 14.43 34.83 0.00 49.75

iostat on target:
avg-cpu: %user %nice %system %iowait %steal %idle
0.50 0.00 17.00 12.00 0.00 70.50

# dd if=/dev/sda of=/dev/null bs=4k count=524288
524288+0 records in
524288+0 records out

2147483648 bytes (2.1 GB) copied, 35.0674 seconds, 61.2 MB/s

So.. it seems with the read-ahead of 4096 (2 MB) the CPU on the
target maxes out.. with most of the time spent on iowait.

The question is why this happens?

Some settings I should tune on the target?

Thanks!

-- Pasi

Pasi Kärkkäinen

unread,

Apr 11, 2008, 3:24:40 PM4/11/08

to open-...@googlegroups.com

On Fri, Apr 11, 2008 at 09:52:45PM +0300, Pasi Kärkkäinen wrote:
>
> So.. it seems with the read-ahead of 4096 (2 MB) the CPU on the
> target maxes out.. with most of the time spent on iowait.
>
> The question is why this happens?
>
> Some settings I should tune on the target?
>

And now I tried with nullio target..

read-ahead of 4096 on the initiator:

iostat on initiator:
avg-cpu: %user %nice %system %iowait %steal %idle

1.52 0.00 17.68 30.81 0.00 50.00

iostat on target:
avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 27.00 0.00 0.00 73.00

# dd if=/dev/sdb of=/dev/null bs=4k count=524288

524288+0 records in
524288+0 records out

2147483648 bytes (2.1 GB) copied, 18.5658 seconds, 116 MB/s

read-ahead of 1024 on the initiator:

iostat on initiator:
avg-cpu: %user %nice %system %iowait %steal %idle

1.01 0.00 20.71 29.80 0.00 48.48

iostat on target:
avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 14.00 0.00 0.00 86.00

# dd if=/dev/sdb of=/dev/null bs=4k count=524288

524288+0 records in
524288+0 records out

2147483648 bytes (2.1 GB) copied, 18.5787 seconds, 116 MB/s

read-ahead of 256 on the initiator:

iostat on initiator:
avg-cpu: %user %nice %system %iowait %steal %idle

0.50 0.00 22.61 26.63 0.00 50.25

iostat on target:
avg-cpu: %user %nice %system %iowait %steal %idle

0.00 0.00 14.50 0.00 0.00 85.50

# dd if=/dev/sdb of=/dev/null bs=4k count=524288

524288+0 records in
524288+0 records out

2147483648 bytes (2.1 GB) copied, 23.1832 seconds, 92.6 MB/s

So.. the effect of bigger read-ahead on the initiator causing
more CPU usage on the target can be seen here too..

-- Pasi

Pasi Kärkkäinen

unread,

Apr 11, 2008, 4:36:27 PM4/11/08

to open-...@googlegroups.com

On Fri, Apr 11, 2008 at 10:24:40PM +0300, Pasi Kärkkäinen wrote:
>
> On Fri, Apr 11, 2008 at 09:52:45PM +0300, Pasi Kärkkäinen wrote:
> >
> > So.. it seems with the read-ahead of 4096 (2 MB) the CPU on the
> > target maxes out.. with most of the time spent on iowait.
> >
> > The question is why this happens?
> >
> > Some settings I should tune on the target?
> >
>
> And now I tried with nullio target..
>

<snip>

>
>
> So.. the effect of bigger read-ahead on the initiator causing
> more CPU usage on the target can be seen here too..
>

Actually the problem seems to be "fileio" on ietd.. changing it
to "blockio" fixed the problem with cpu-usage and iowait.

Using "blockio" I can use big read-ahead values without getting
problems on the target..

The problem with "blockio" is that I don't get read caching like
I get with "fileio".. oh well.