iscsi timeouts and connection errors

Taylor

unread,

May 21, 2010, 4:43:52 PM5/21/10

to open-iscsi

We have a SLES 11 server connected to an Equallogic 10 Gig disk array.

Initially everything seemed to work just fine. When doing some IO
testing against the mounted volumes, in which we cause very high IO
loads, i.e. 95 to 100% as reported by iostat, we started seeing the
following messages in /var/log/messages:

May 21 14:26:20 hostA kernel: connection27:0: ping timeout of 5 secs
expired, last rx 4426024362, last ping 4426025612, now 4426026862
May 21 14:26:20 hostA kernel: connection27:0: detected conn error
(1011)
May 21 14:26:21 hostA iscsid: Kernel reported iSCSI connection 27:0
error (1011) state (3)
May 21 14:27:03 hostA kernel: connection30:0: detected conn error
(1011)
May 21 14:27:26 hostA iscsid: Target requests logout within 3 seconds
for connection
May 21 14:27:26 hostA iscsid: Target dropping connection 0, reconnect
min 2 max 0
May 21 14:27:26 hostA iscsid: Kernel reported iSCSI connection 30:0
error (1011) state (4)
May 21 14:27:38 hostA kernel: connection25:0: detected conn error
(1011)
May 21 14:27:39 hostA iscsid: Kernel reported iSCSI connection 21:0
error (1011) state (3)
May 21 14:27:39 hostA iscsid: Kernel reported iSCSI connection 25:0
error (1011) state (3)
May 21 14:28:16 hostA iscsid: connection27:0 is operational after
recovery (3 attempts)
May 21 14:28:20 hostA iscsid: connection21:0 is operational after
recovery (2 attempts)

So far we've turned on flow control on the network switches, tried
adjusting multipath.conf, turned off offload parameters on the iscsi
NICs, and adjusting iscsid.conf timeouts.

We are running a 2.6.27 kernel with open-iscsi 2.0.870-26.5.

Does anyone have any suggestions on how we can avoide these timeouts
and connection errors?

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-...@googlegroups.com.
To unsubscribe from this group, send email to open-iscsi+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

Mike Christie

unread,

May 21, 2010, 7:23:04 PM5/21/10

to open-...@googlegroups.com, Taylor

Is there more to the log? I want to see if the target requests a logout
first or if we get a ping timeout first.

>
> So far we've turned on flow control on the network switches, tried
> adjusting multipath.conf, turned off offload parameters on the iscsi
> NICs, and adjusting iscsid.conf timeouts.

For the ping timeouts you can set node.conn[0].timeo.noop_out_interval
and node.conn[0].timeo.noop_out_timeout to 0. If you are using
dm-multipath though you might want them on, but maybe a little longer.

Also on the target side you can turn off their load balancing which
would remove the target logout request related disruptions, but that of
course messes with load balancing. If you are using dm-multipath you
probably do not need the target load balancing on though (not 100% sure
what equalogic reccomends, but it seems like each sides algorithms could
end up working against each other).

>
> We are running a 2.6.27 kernel with open-iscsi 2.0.870-26.5.
>

Is that a SLES 2.6.27 kernel or kernel.org? If a SLES kernel then make
sure you have the newest once, because SUSE has added fixes for when we
thought a nop/ping timedout but really it was stuck behind a large
transfer and that transfer was executing ok. If you are using a 2.6.27
kernel.org kernel I would upgrade to the newest upstream one.

Taylor

unread,

May 21, 2010, 10:56:21 PM5/21/10

to open-iscsi

Mike, thanks for the reply. The first line in the log i posted shows
the ping timeout. Or were you wanting to see something else? Let me
know what specifically you are looking for.

Yes we are using multipath, I can post multipath.conf if that helps.
I tried tweaking various settings in iscisd.conf, including increasing
node.session.cmds_max and node.session.queue_depth and in
multipath.conf, including setting features "1 queue_if_no_path" and
disabling no_path_retry.

When you say turn off target load balancing, Do you mean on the
equallogic side, i.e. a per volume setting? Understanding you aren't
familliar with the equallogic, would that be how you would do it on a
generic device, or say a clariion?

It is a SLES stock kernel, we are looking at seeing if OpenSuse has a
fix in it, they run virtually the same open-iscsi versions, with
openSuse being slightly newer.

I'll play with the timeout settings, I think they were both set to 5
by default, I tried increasing them, but it didn't seem to have any
impact.

I think I'm close, but would appreciate further help.

Taylor

unread,

May 24, 2010, 12:06:34 PM5/24/10

to open-iscsi

Adjusting node.conn[0].timeo.noop_out_interval and

node.conn[0].timeo.noop_out_timeout to 0

didn't seem to make a difference. But increasing the value of /sys/
block/sd#/device/timeout from 30 to a higher number, like 360, seems
to have improved things...

Is that something you would expect? Do you know of a sensible value
for setting the timeout under /sys/block/sd#/device, and what is
impact of increasing this?

Mike Christie

unread,

May 24, 2010, 5:23:09 PM5/24/10

to open-...@googlegroups.com, Taylor

On 05/24/2010 11:06 AM, Taylor wrote:
> Adjusting node.conn[0].timeo.noop_out_interval and
> node.conn[0].timeo.noop_out_timeout to 0
> didn't seem to make a difference. But increasing the value of /sys/

If it did not make a difference then it might not have got set right. If
you set them to 0, then you should not see the ping timeout messages
anymore (you might still see the target request logout ones, and that is
what we are trying to see with this test). Did you just set it in
iscsid.conf or did you set it in the node db by running

iscsiadm -m node -T ... -o update -n
node.conn[0].timeo.noop_out_interval -v 0

then repeat for the other value (you can also set it in the node db by
setting it in iscsid.conf then rerunning the iscsiadm discovery command).

> block/sd#/device/timeout from 30 to a higher number, like 360, seems
> to have improved things...
>
> Is that something you would expect? Do you know of a sensible value

No. It might improve thoughput of some IO test, but it would not remove
the those log messages you sent before.

> for setting the timeout under /sys/block/sd#/device, and what is
> impact of increasing this?
>

It depends on the storage you are using.

8.1.2.1 Running Commands, the SCSI Error Handler, and replacement_timeout
-------------------------------------------------------------------------
Remember, from the Nop-out discussion that if a network problem is detected,
the running commands are failed immediately. There is one exception to this
and that is when the SCSI layer's error handler is running. To check if
the SCSI error handler is running iscsiadm can be run as:

iscsiadm -m session -P 3

You will then see:

Host Number: X State: Recovery

When the SCSI EH is running, commands will not be failed until
node.session.timeo.replacement_timeout seconds.

To modify the timer that starts the SCSI EH, you can either write
directly to the device's sysfs file:

echo X > /sys/block/sdX/device/timeout

where X is in seconds or on most distros you can modify the udev rule.

To modify the udev rule open /etc/udev/rules.d/50-udev.rules, and find the
following lines:

ACTION=="add", SUBSYSTEM=="scsi" , SYSFS{type}=="0|7|14", \
RUN+="/bin/sh -c 'echo 60 > /sys$$DEVPATH/timeout'"

And change the echo 60 part of the line to the value that you want.

The default timeout for normal File System commands is 30 seconds when udev
is not being used. If udev is used the default is the above value which
is normally 60 seconds.

Taylor

unread,

May 24, 2010, 10:40:20 PM5/24/10

to open-iscsi

Mike and all, okay, noted about the ping timeout messages. I set both
of
those values by running:

iscsiadm --mode node --o update -n
node.conn[0].timeo.noop_out_interval -v 10
iscsiadm --mode node --o update -n node.conn[0].timeo.noop_out_timeout
-v 10

I verifed the settings took via iscsiadm --mode node --o show. I'm
not seeing ping timeout's in the logs anymore.

I was wrong about the block/sd# timeout changes, it didn't make a
difference, I still see the:
"iscisd: Kernel reported iSCSI iscsi connection error #:# (1011) state
(3)" messages.

We are running tests on 3 to 4o filesystems where we loop and do
dd if=/dev/zero of=$FILENAME bs=$((1024 *1024)) count=8000
dd if=/dev/zero of=$FILENAME bs=$((1024 *1024)) count=12000

to drive disk utilization to high 90% and occaisionally 100%. We are
getting great throughput, but the errors keep occuring.

Called equallogic, and they said the diagnostics on the array show
they aren't receiving a FIN from the initiator side, said could be due
to problems with initiator.

We are running with jumbo frames, don't know if that means we should
increase MaxRecvDataSegmentLength and/or MaxXmitDataSegmentLength

So I tried upgrading open-iscsi by downloading 2.0-871 and building,
and succeeds for the three main files, iscsid, iscsiadm, iscsistart,
but failes on make -C kernel, gives several errors for libiscsi.c

Stuck here at the moment.

Mike Christie

unread,

May 24, 2010, 11:17:47 PM5/24/10

to open-...@googlegroups.com, Taylor

On 05/24/2010 09:40 PM, Taylor wrote:
> Mike and all, okay, noted about the ping timeout messages. I set both
> of
> those values by running:
>
> iscsiadm --mode node --o update -n
> node.conn[0].timeo.noop_out_interval -v 10
> iscsiadm --mode node --o update -n node.conn[0].timeo.noop_out_timeout
> -v 10
>
> I verifed the settings took via iscsiadm --mode node --o show. I'm
> not seeing ping timeout's in the logs anymore.
>
> I was wrong about the block/sd# timeout changes, it didn't make a
> difference, I still see the:
> "iscisd: Kernel reported iSCSI iscsi connection error #:# (1011) state
> (3)" messages.

do you see those messages and the "Target requests logout within 3
seconds for connection" errors or just the 1011 message? After that
message do you see something about a host or target reset?

>
> We are running tests on 3 to 4o filesystems where we loop and do
> dd if=/dev/zero of=$FILENAME bs=$((1024 *1024)) count=8000
> dd if=/dev/zero of=$FILENAME bs=$((1024 *1024)) count=12000
>
> to drive disk utilization to high 90% and occaisionally 100%. We are

Yeah this is sort of expected, when doing heavy writes you get high cpu
use, because there is a xmit thread that basically does a send() of the
data.

You can try to minimize how many times that xmit thread needs to send
pdus. It will have to send the amount of data, but if we can get it all
sent out in 1 pdu then we do not have to wake the xmit thread extra
times and we do not have to send extra headers. It will not help much
though so do not expect a lot and do not go crazy trying to get the
target and initiator to take the settings (some targets prefer specific
values that may conflict). So try using a high
node.session.iscsi.FirstBurstLength, node.session.iscsi.ImmediateData =
Yes, node.session.iscsi.InitialR2T = No, and then on the target you want
a high MaxRecvDataSegmentLength, and similar settings as the initator
ones I mentioned. But like I said, it is not going to help much.

> getting great throughput, but the errors keep occuring.
>
> Called equallogic, and they said the diagnostics on the array show
> they aren't receiving a FIN from the initiator side, said could be due
> to problems with initiator.

I am not sure how that would cause problems on their side. I would need
more details. However, if they are saying that it could be a symptom of
some other problem, then I can see their point. However, there is no
changes on the initiator side of things in that iscsi code path, so
there is probably no reason to try other versions. There could be
changes in the network layer though. For that you would of course need
to try different kernels.

>
> We are running with jumbo frames, don't know if that means we should
> increase MaxRecvDataSegmentLength and/or MaxXmitDataSegmentLength
>
> So I tried upgrading open-iscsi by downloading 2.0-871 and building,
> and succeeds for the three main files, iscsid, iscsiadm, iscsistart,
> but failes on make -C kernel, gives several errors for libiscsi.c
>

I cannot see your screen so cut and paste the error messages :)

Mike Christie

unread,

May 24, 2010, 11:19:22 PM5/24/10

to open-...@googlegroups.com

Hey,

We actually have a good relationship with EQL/Dell. Could you give me
your support number for the issue and who you worked with? I will
contact some of the support people we work with and get more info directly.

On 05/24/2010 10:17 PM, Mike Christie wrote:
>> Called equallogic, and they said the diagnostics on the array show
>> they aren't receiving a FIN from the initiator side, said could be due
>> to problems with initiator.
>
>
> I am not sure how that would cause problems on their side. I would need
> more details.

Taylor

unread,

May 25, 2010, 12:22:30 AM5/25/10

to open-iscsi

Yes, when I get to office tomorrow I can attach screen output of make
fail, as well as support info with equallogic.

I guess to a different point, I understand if we are just killing the
disk that can cause problems, but how can I set it up so that IO is
slow or queues, and doesn't cause disconnects?

Coming from a fiber channel background, you bang away at the disk and
if you exceed disk or array IO, then your application seems to hang or
be sluggish for a bit, but you don't get disconnected.

Is there a way to get similar behavior under iSCSI? I don't know if
this is expected behavior, or a bug, or just misconfiguration, or
what...

Ulrich Windl

unread,

May 25, 2010, 3:48:10 AM5/25/10

to open-...@googlegroups.com

On 24 May 2010 at 9:06, Taylor wrote:

> Adjusting node.conn[0].timeo.noop_out_interval and
> node.conn[0].timeo.noop_out_timeout to 0
> didn't seem to make a difference. But increasing the value of /sys/
> block/sd#/device/timeout from 30 to a higher number, like 360, seems
> to have improved things...
>
> Is that something you would expect? Do you know of a sensible value
> for setting the timeout under /sys/block/sd#/device, and what is
> impact of increasing this?

I think most people are not willing to wait 30s for disk I/O to
complete, not to talk about 6 minutes. If a local disk ever needs 30
seconds for a command to complete, it's a clear indication that it had
to be repaired or replaced. I think the same should be true for iSCSI.
Increasing the timeout you can prevent an I/O error from appearing, but
you won't become much more happy I'm afraid.

Regards,
Ulrich

Taylor

unread,

May 25, 2010, 9:12:21 AM5/25/10

to open-iscsi

Here is the output of the make, showing the compile completed for the
iscsi binaries, but failed for kernel code...

Compilation complete Output file
----------------------------------- ----------------
Built iSCSI daemon: usr/iscsid
Built management application: usr/iscsiadm
Built boot tool: usr/iscsistart

Read README file for detailed information.
make -C kernel
make[1]: Entering directory `/opt/download/open-iscsi-2.0-871/kernel'
make -C /usr/src/linux M=`pwd` KBUILD_OUTPUT=/usr/src/linux-obj/x86_64/
default V=0 modules
make[2]: Entering directory `/usr/src/linux-2.6.27.19-5'
CC [M] /opt/download/open-iscsi-2.0-871/kernel/libiscsi.o
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c: In function
iscsi_target_alloc:
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1554: warning:
unused variable session
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c: At top level:
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1753: error: return
type is an incomplete type
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c: In function
iscsi_eh_cmd_timed_out:
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1754: error:
variable rc has initializer but incomplete type
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1754: error:
EH_NOT_HANDLED undeclared (first use in this function)
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1754: error: (Each
undeclared identifier is reported only once
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1754: error: for
each function it appears in.)
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1754: error:
storage size of rc isn't known
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1771: error:
EH_RESET_TIMER undeclared (first use in this function)
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1838: warning:
âreturnâ with a value, in function returning void
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:1754: warning:
unused variable rc
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c: In function
iscsi_host_add:
/opt/download/open-iscsi-2.0-871/kernel/libiscsi.c:2185: warning:
assignment from incompatible pointer type
make[4]: *** [/opt/download/open-iscsi-2.0-871/kernel/libiscsi.o]
Error 1
make[3]: *** [_module_/opt/download/open-iscsi-2.0-871/kernel] Error 2
make[2]: *** [sub-make] Error 2
make[2]: Leaving directory `/usr/src/linux-2.6.27.19-5'
make[1]: *** [all] Error 2
make[1]: Leaving directory `/opt/download/open-iscsi-2.0-871/kernel'

Taylor

unread,

May 25, 2010, 9:40:13 AM5/25/10

to open-iscsi

Yeah, and netstat -in does show a few RX-ERR on the two 10 Gig NICs we
are connecting to the iSCSI san with.
So I don't know if thats because of the extremely high IO load. i.e.,
iostat is reporting 600,000 Kbytes/sec in writes across 4 LUNs during
our heavy testing.

Mike Christie

unread,

May 25, 2010, 3:47:11 PM5/25/10

to open-...@googlegroups.com, Taylor

On 05/24/2010 11:22 PM, Taylor wrote:
> Yes, when I get to office tomorrow I can attach screen output of make
> fail, as well as support info with equallogic.
>
> I guess to a different point, I understand if we are just killing the
> disk that can cause problems, but how can I set it up so that IO is
> slow or queues, and doesn't cause disconnects?
>
> Coming from a fiber channel background, you bang away at the disk and
> if you exceed disk or array IO, then your application seems to hang or
> be sluggish for a bit, but you don't get disconnected.
>
> Is there a way to get similar behavior under iSCSI? I don't know if
> this is expected behavior, or a bug, or just misconfiguration, or
> what...
>

On the initiator side the ping timeout feature will sometimes
mis-diagnose the connection as bad when it is really just slow. SUSE did
some fixes, but I am not sure what version of SLES. If your not using
multipath then you can just turn them off by setting those noop values to 0.

On the target side, Equalogic has some target side load balancing where
if it determines that it is best to use another path it will send that
target requests logout request to use, then we will logout, kill the
tcp/ip conneciton, then relogin to the new port. You can disable this on
the target. I am not sure the command. You will have to ask Equallogic.

Taylor Lewick

unread,

May 25, 2010, 4:01:37 PM5/25/10

to Mike Christie, open-...@googlegroups.com

Ok, will find out about the equallogic setting. So if we are using
dm-multipath, we should not set those noop values to 0, but possibly
increase them for high IO loads?

Mike Christie

unread,

May 25, 2010, 4:24:58 PM5/25/10

to open-...@googlegroups.com, Taylor Lewick

On 05/25/2010 03:01 PM, Taylor Lewick wrote:
> Ok, will find out about the equallogic setting. So if we are using
> dm-multipath, we should not set those noop values to 0, but possibly
> increase them for high IO loads?
>

Yes, but set them lower than your scsi command timeout
(/sys//block/sdX/device/timeout).

Taylor

unread,

May 25, 2010, 5:15:05 PM5/25/10

to open-iscsi

So far blowing away SUSE 11 and installing OpenSuse has resolved the
problem, however, throughput numbers have dropped by about half. We
were getting nearly 5 Gbit /sec when testing across 5 LUNs, now we are
getting more like 2.6 Gbit/sec, but we've had no disconnect errors.

Now the question is, is that because we aren't getting disconnected
because we are no longer getting the really high IO rates we were
seeing?

I'm going to let disk tests run for a few hours and see if we get any
errors...

Mike Christie

unread,

May 25, 2010, 5:38:00 PM5/25/10

to open-...@googlegroups.com, Taylor

On 05/25/2010 04:15 PM, Taylor wrote:
> So far blowing away SUSE 11 and installing OpenSuse has resolved the

Just so you know , the SUSE developers are good at keeping SLES up to
date and even have fixes that are not yet upstream, so it is sometimes
best to just get their newest SLES kernels.

> problem, however, throughput numbers have dropped by about half. We
> were getting nearly 5 Gbit /sec when testing across 5 LUNs, now we are
> getting more like 2.6 Gbit/sec, but we've had no disconnect errors.
>
> Now the question is, is that because we aren't getting disconnected
> because we are no longer getting the really high IO rates we were
> seeing?

Normally we would expect if the disconnect errors go away then you get
higher throughput numbers. However, when the Equalogic load balancing
comes into play then if the disconnect errors also come with that target
request logout message then it could be that the EQL target knows a
better path and you wanted it to load balance.

Is there a difference between SLES and opensuse's cpu speed handling
(not sure what the feature is called in suse) (for iscsi I normally turn
this off to get the best results) or iptables (for this turning it off
can improve performance) or io scheduler (switching between noop and cfq
sometimes gives differrent results.

Taylor Lewick

unread,

May 25, 2010, 6:43:41 PM5/25/10

to Mike Christie, open-...@googlegroups.com

Thanks for the suggestions, I did change the io scheduler to make sure
it was same as other server we were testing on.
Not using iptables, and cpu frequency scaling or whatever its called
is not enabled.

To us this looks like an issue with worse network performance with later
kernels. The kernel has become very bloated over time and we notice
we get much worse per transaction network latencies in more recent
kernel versions, as opposed to say the kernel version in use for SLES
10.2, which is 2.6.16. SLES 11 got slower, and OpenSUSE seems even
slower then SLES11.

I will do some iperf testing to compare a SLES 11 and an OpenSUSE 11
server to see how the network layer performs.

So far stability is great, and performance is good, just going to see
if we can get performance to be great as well.

Taylor

unread,

May 26, 2010, 11:00:15 AM5/26/10

to open-iscsi

FYI, to disbale connection load balancing on equallogics, you run...

Login into the array as grpadmin; at the CLI> prompt, type this
command:

CLI> grpparams conn-balancing disable

This command can be done w/o a reboot or restart of the arrays and
becomes effective immediately on all members in the group. Should you
decide to re-enable connection load balancing in the future, type this
command at the Group CLI prompt:

CLI> grpparams conn-balancing enable

Taylor

unread,

May 26, 2010, 11:08:58 AM5/26/10

to open-iscsi

Oops, I should have added, in our testing, disabling the conn-
balancing made no difference with respect to throughput/IO numbers...

Reply all

Reply to author

Forward