connection1:0: ping timeout of 5 secs expired, recv timeout 5 / connection1:0: detected conn error (1011)

p...@fhri.org

unread,

Dec 14, 2010, 4:12:06 PM12/14/10

to open-...@googlegroups.com

Hi all...

I have four CentOS 5.4 (2.6.18-164.11.1.el5) servers with iscsid version 2.0-871. Two are misbehaving despite identical configuration. They all connect to Enhance Tech RS8-IP4 array the same way, directly NIC-to-NIC without a switch, physically separate from LAN. I created four targets, one per port, and four separate volumes/LUNs.

Pasted below is the config and error log. About a minute after a successful login, the timeouts/errors begin and keep coming constantly pretty much every minute whenever the session is logged in, regardless of mount state. The problematic units are also often very slow logging in, mounting, even directory listing at times. Also, they sometimes time out and remount the fs read-only in the middle of a large backup run.

The other two servers exhibit no such problems whatsoever.

I'm very new to iSCSI, not sure where to start looking. Would be grateful if someone could point me in the right direction...

**CONFIG**

node.startup = automatic
node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval = 5
node.conn[0].timeo.noop_out_timeout = 5
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 20
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32
node.session.iscsi.InitialR2T = No
node.session.iscsi.ImmediateData = Yes
node.session.iscsi.FirstBurstLength = 262144
node.session.iscsi.MaxBurstLength = 16776192
node.conn[0].iscsi.MaxRecvDataSegmentLength = 262144
discovery.sendtargets.iscsi.MaxRecvDataSegmentLength = 32768
node.conn[0].iscsi.HeaderDigest = None
node.session.iscsi.FastAbort = Yes

**/VAR/LOG/MESSAGES**

Dec 13 17:10:57 db4 kernel: Loading iSCSI transport class v2.0-871.
Dec 13 17:10:57 db4 kernel: cxgb3i: tag itt 0x1fff, 13 bits, age 0xf, 4 bits.
Dec 13 17:10:57 db4 kernel: iscsi: registered transport (cxgb3i)
Dec 13 17:10:57 db4 kernel: Broadcom NetXtreme II CNIC Driver cnic v2.0.1 (Oct 01, 2009)
Dec 13 17:10:57 db4 kernel: Broadcom NetXtreme II iSCSI Driver bnx2i v2.0.1e (June 22, 2009)
Dec 13 17:10:57 db4 kernel: iscsi: registered transport (bnx2i)
Dec 13 17:10:58 db4 kernel: iscsi: registered transport (tcp)
Dec 13 17:10:58 db4 kernel: iscsi: registered transport (iser)
Dec 13 17:10:58 db4 iscsid: iSCSI logger with pid=24781 started!
Dec 13 17:10:59 db4 iscsid: transport class version 2.0-871. iscsid version 2.0-871
Dec 13 17:10:59 db4 iscsid: iSCSI daemon with pid=24782 started!
Dec 13 17:11:07 db4 kernel: scsi15 : iSCSI Initiator over TCP/IP
Dec 13 17:11:07 db4 kernel: Vendor: ETIUSA Model: UltraStorRS8IP4 Rev: 1.1.
Dec 13 17:11:07 db4 kernel: Type: Direct-Access ANSI SCSI revision: 04
Dec 13 17:11:07 db4 kernel: SCSI device sdc: 288374784 4096-byte hdwr sectors (1181183 MB)
Dec 13 17:11:07 db4 kernel: sdc: Write Protect is off
Dec 13 17:11:07 db4 kernel: SCSI device sdc: drive cache: write back
Dec 13 17:11:07 db4 kernel: SCSI device sdc: 288374784 4096-byte hdwr sectors (1181183 MB)
Dec 13 17:11:07 db4 kernel: sdc: Write Protect is off
Dec 13 17:11:07 db4 kernel: SCSI device sdc: drive cache: write back
Dec 13 17:11:07 db4 kernel: sdc: sdc1
Dec 13 17:11:07 db4 kernel: sd 15:0:0:0: Attached scsi disk sdc
Dec 13 17:11:07 db4 kernel: sd 15:0:0:0: Attached scsi generic sg2 type 0
Dec 13 17:11:08 db4 iscsid: connection1:0 is operational now
Dec 13 17:11:59 db4 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 21989742817, last ping 21989747817, now 21989752817
Dec 13 17:11:59 db4 kernel: connection1:0: detected conn error (1011)
Dec 13 17:12:00 db4 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Dec 13 17:12:17 db4 iscsid: connection1:0 is operational after recovery (2 attempts)
Dec 13 17:13:06 db4 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 21989810632, last ping 21989815632, now 21989820632
Dec 13 17:13:06 db4 kernel: connection1:0: detected conn error (1011)
Dec 13 17:13:07 db4 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Dec 13 17:13:25 db4 iscsid: connection1:0 is operational after recovery (2 attempts)
*(mounting now)

Dec 13 17:13:34 db4 kernel: kjournald starting. Commit interval 5 seconds
Dec 13 17:13:34 db4 kernel: EXT3 FS on sdc1, internal journal
Dec 13 17:13:34 db4 kernel: EXT3-fs: mounted filesystem with ordered data mode.

*(mount successful)
Dec 13 17:14:14 db4 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 21989877855, last ping 21989882855, now 21989887855
Dec 13 17:14:14 db4 kernel: connection1:0: detected conn error (1011)
Dec 13 17:14:14 db4 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Dec 13 17:14:32 db4 iscsid: connection1:0 is operational after recovery (2 attempts)
Dec 13 17:15:02 db4 kernel: connection1:0: ping timeout of 5 secs expired, recv timeout 5, last rx 21989925928, last ping 21989930928, now 21989935928
Dec 13 17:15:02 db4 kernel: connection1:0: detected conn error (1011)
Dec 13 17:15:02 db4 iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Dec 13 17:15:20 db4 iscsid: connection1:0 is operational after recovery (2 attempts)

Thanks guys,

-Paul

--

ea926h

Mike Christie

unread,

Dec 14, 2010, 10:18:04 PM12/14/10

to open-...@googlegroups.com, p...@fhri.org

On 12/14/2010 03:12 PM, p...@fhri.org wrote:
> Hi all...
>
> I have four CentOS 5.4 (2.6.18-164.11.1.el5) servers with iscsid version
> 2.0-871. Two are misbehaving despite identical configuration. They all
> connect to Enhance Tech RS8-IP4 array the same way, directly NIC-to-NIC
> without a switch, physically separate from LAN. I created four targets,
> one per port, and four separate volumes/LUNs.
>
> Pasted below is the config and error log. About a minute after a
> successful login, the timeouts/errors begin and keep coming constantly
> pretty much every minute whenever the session is logged in, regardless
> of mount state. The problematic units are also often very slow logging
> in, mounting, even directory listing at times. Also, they sometimes
> time out and remount the fs read-only in the middle of a large backup
> run.
>

There were some fixes to that code in rhel/centos 5.5 kernel, but I do
not think that is what you are hitting.

Do you see those ping/nop timeout messages even when you are not doing
any IO intensive workload?

Did you setup your initiator names (/etc/iscsi/initiatorname.iscsi) or
did you let the tools do this? Does each server have a unique initiator
name or do some servers have the same value in that file?

On the target are there any log messsages?

If you set

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0

(either set that in iscsid.conf then rerun the discovery command and
relogin or run

iscsiadm -m node -o update -n
node.conn[0].timeo.noop_out_interval -v 0

iscsiadm -m node -o update -n
node.conn[0].timeo.noop_out_timeout -v 0
then relogin)

this will turn off the iscsi nops/pings. Then if run mkfs and do
backups, you should not see the ping timeout messages, but do you see
low throughout still? Do you still see "conn error 1011" messages but
just missing the ping timeout messages?

p...@fhri.org

unread,

Dec 15, 2010, 6:16:59 PM12/15/10

to Mike Christie, open-...@googlegroups.com

Hey Mike,
Thank you for taking the time to reply, I certainly appreciate it!

I was writing a reply answering your questions and running the suggested
tests, etc, and came across a config error that was probably the
culprit. I had two hosts configured with the same IP, but the target was
not raising an error about it and the informational messages were being
filtered. Once I enabled those, I saw the mistake.

We'll see how the backups go tonight, but I'm betting everything's gonna
be fine.

Thanks for pointing me in the right direction!

--
Paul

If you set

--
ea926p

Reply all

Reply to author

Forward