Compatibility problem with Equallogic systems and kernel crash

20 views
Skip to first unread message

to...@acm.org

unread,
Jul 5, 2005, 3:55:57 AM7/5/05
to open-...@googlegroups.com
Hi,

I met two problems with r369.

The first problem is that it does not work with Equallogic iSCSI
systems. The connection is closed immediately after the initiator
logged in. I saw the following kernel message.

iscsi_tcp: datalen 36 > 0
iscsi2: detected conn error (1006)

I've uploaded the tcpdump log.

http://zaal.org/iscsi/open/r369-equallogic.cap


Note that I've never used open-iscsi with Equallogic (I've used sfnet
with it so far). So I'm not sure whether this problem happens only
with r369.


The second problem is that the kernel crashes when I tried to log out
by executing 'iscsiadm -m node --record [id] --logout' after the first
problem happens.


------------[ cut here ]------------
kernel BUG at include/asm/spinlock.h:93!
invalid operand: 0000 [#1]
SMP DEBUG_PAGEALLOC
Modules linked in: af_packet iscsi_tcp scsi_transport_iscsi evdev e1000 unix
CPU: 0
EIP: 0060:[<c0338d87>] Not tainted VLI
EFLAGS: 00010202 (2.6.13-rc1)
EIP is at _spin_unlock_irq+0x17/0x30
eax: 00000001 ebx: 00000066 ecx: f56f0bf8 edx: f4f40824
esi: f6244df8 edi: f61a3f9c ebp: f61a2000 esp: f61a3ea4
ds: 007b es: 007b ss: 0068
Process scsi_eh_2 (pid: 2603, threadinfo=f61a2000 task=f5713ae0)
Stack: f89a7f08 c0470960 00000000 00000000 c200f520 c200fee0 f61a3ef0 c0117a5e
c200fee0 00000000 f61a3ee0 c0397dc0 f5713ae0 f4f40a94 f5e16ef8 01bacf60
f7054d90 c042ffc8 c033746c f61a3f68 c03374a0 f61a3f58 00000008 00000002
Call Trace:
[<f89a7f08>] iscsi_eh_abort+0x38/0x640 [iscsi_tcp]
[<c0117a5e>] load_balance_newidle+0x2e/0xb0
[<c033746c>] schedule+0x67c/0xd00
[<c03374a0>] schedule+0x6b0/0xd00
[<c01168c3>] try_to_wake_up+0x2d3/0x320
[<c01158c5>] kernel_map_pages+0x45/0x80
[<c02407ce>] scsi_try_to_abort_cmd+0x2e/0x40
[<c0240947>] scsi_eh_abort_cmds+0x57/0xf0
[<c0241875>] scsi_unjam_host+0xb5/0x200
[<c0336d54>] __down_interruptible+0xf4/0x120
[<c0118290>] default_wake_function+0x0/0x20
[<c0118475>] complete+0x45/0x60
[<c0241ab8>] scsi_error_handler+0xf8/0x1a0
[<c02419c0>] scsi_error_handler+0x0/0x1a0
[<c0101205>] kernel_thread_helper+0x5/0x10
Code: 80 da 34 c0 eb f0 0f 0b 5c 00 80 da 34 c0 eb df 8d 74 26 00 81 78 04 ad 4e ad de 89 c2 75 16 0f b6 02 84 c0 7f 05 c6 02 01 fb c3 <0f> 0b 5d 00 80 da 34 c0 eb f1 0f 0b 5c 00 80 da 34 c0 eb e0 90

Alex Aizman

unread,
Jul 5, 2005, 11:58:29 AM7/5/05
to open-...@googlegroups.com
to...@acm.org wrote:
> Hi,
>
> I met two problems with r369.
>
> The first problem is that it does not work with Equallogic iSCSI systems. The
> connection is closed immediately after the initiator logged in. I saw the
> following kernel message.
>
> iscsi_tcp: datalen 36 > 0 iscsi2: detected conn error (1006)

Initiator for some reason sends MRDSL=0 (why?), Target responds with 64K. Seems like
MRDSL remains zero (user code bug, kernel code could use a check for zero).

Next, Initiator does not like SCSI Inquiry.

Another thing, Login contains "X-com.cisco.PingTimeout". Wonder why it is
there, along with a couple more Cisco-specific.

Mike Christie

unread,
Jul 5, 2005, 1:00:46 PM7/5/05
to open-...@googlegroups.com
Alex Aizman wrote:

> Another thing, Login contains "X-com.cisco.PingTimeout". Wonder why it is
> there, along with a couple more Cisco-specific.
>

It always gets sent becuase Cisco wrote the code and they could do what
they wanted :) I had sent a patch to move it to vendor specific plugins,
but did not merge it in sfnet becuase they couldn't make up their mind
that they even still wanted it.

to...@acm.org

unread,
Jul 5, 2005, 7:07:03 PM7/5/05
to open-...@googlegroups.com
From: Alex Aizman <itn...@yahoo.com>
Subject: Re: Compatibility problem with Equallogic systems and kernel crash
Date: Tue, 05 Jul 2005 08:58:29 -0700

> > I met two problems with r369.
> >
> > The first problem is that it does not work with Equallogic iSCSI systems. The
> > connection is closed immediately after the initiator logged in. I saw the
> > following kernel message.
> >
> > iscsi_tcp: datalen 36 > 0 iscsi2: detected conn error (1006)
>
> Initiator for some reason sends MRDSL=0 (why?), Target responds with 64K. Seems like
> MRDSL remains zero (user code bug, kernel code could use a check for zero).

I have to change IP address by hand to use open-iscsi with Equallogic
because open-iscsi does not support a login response with 'Target
moved temporarily'. I didn't realize the the operation reset all
values.

orly:~# iscsiadm -m node --record 0c6863
node.name = iqn.2001-05.com.equallogic:6-8a0900-fa6510301-bfbff02af754100f-acs
node.transport_name = tcp
(snip)
node.conn[0].address = 129.60.163.91
node.conn[0].port = 3260
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 524288
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.auth_timeout = 45
node.conn[0].timeo.active_timeout = 5
node.conn[0].timeo.idle_timeout = 60
node.conn[0].timeo.ping_timeout = 5
node.conn[0].iscsi.MaxRecvDataSegmentLength = 131072
node.conn[0].iscsi.HeaderDigest = None,CRC32C
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No

orly:~# iscsiadm -m node --record 0c6863 --op update -n node.conn[0].address -v 129.60.163.90

orly:~# iscsiadm -m node --record 0c6863
node.name = iqn.2001-05.com.equallogic:6-8a0900-fa6510301-bfbff02af754100f-acs
node.transport_name = tcp
(snip)
node.conn[0].address = 129.60.163.90
node.conn[0].port = 0
node.conn[0].startup = manual
node.conn[0].tcp.window_size = 0
node.conn[0].tcp.type_of_service = 0
node.conn[0].timeo.login_timeout = 0
node.conn[0].timeo.auth_timeout = 0
node.conn[0].timeo.active_timeout = 0
node.conn[0].timeo.idle_timeout = 0
node.conn[0].timeo.ping_timeout = 0
node.conn[0].iscsi.MaxRecvDataSegmentLength = 0
node.conn[0].iscsi.HeaderDigest = None
node.conn[0].iscsi.DataDigest = None
node.conn[0].iscsi.IFMarker = No
node.conn[0].iscsi.OFMarker = No


I prefer the initiator to keep these values.


> Next, Initiator does not like SCSI Inquiry.

I restore the values by hand and then the initiator works. However,
the kernel crashed immediate after starting I/O. Seems that there are
bugs in the error handling code.

Jul 6 16:41:18 orly kernel: ------------[ cut here ]------------
Jul 6 16:41:18 orly kernel: kernel BUG at include/asm/spinlock.h:93!
Jul 6 16:41:18 orly kernel: invalid operand: 0000 [#1]
Jul 6 16:41:18 orly kernel: SMP DEBUG_PAGEALLOC
Jul 6 16:41:18 orly kernel: Modules linked in: af_packet iscsi_tcp scsi_transport_iscsi e1000 evdev unix
Jul 6 16:41:18 orly kernel: CPU: 2
Jul 6 16:41:18 orly kernel: EIP: 0060:[_spin_unlock_irq+23/48] Not tainted VLI
Jul 6 16:41:18 orly kernel: EFLAGS: 00010202 (2.6.13-rc1)
Jul 6 16:41:18 orly kernel: EIP is at _spin_unlock_irq+0x17/0x30
Jul 6 16:41:18 orly kernel: eax: 00000001 ebx: 00000066 ecx: f6242bf8 edx: f4da4824
Jul 6 16:41:18 orly kernel: esi: f354fdf8 edi: f4e61f9c ebp: f4e60000 esp: f4e61ea4
Jul 6 16:41:18 orly kernel: ds: 007b es: 007b ss: 0068
Jul 6 16:41:18 orly kernel: Process scsi_eh_2 (pid: 2656, threadinfo=f4e60000 task=f482eae0)
Jul 6 16:41:18 orly kernel: Stack: f89a7f08 c0470960 00000000 00000000 c200f520 c200fee0 f4e61ef0 c0117a5e
Jul 6 16:41:18 orly kernel: c200fee0 00000000 f4e61ee0 c21a6ca0 f482eae0 f4da4a94 f4878ef8 01bbcf60
Jul 6 16:41:18 orly kernel: f7358d90 c21fffa4 c033746c f4e61f68 c03374a0 f4e61f58 00000008 00000002
Jul 6 16:41:18 orly kernel: Call Trace:
Jul 6 16:41:18 orly kernel: [pg0+944824072/1068909568] iscsi_eh_abort+0x38/0x640 [iscsi_tcp]
Jul 6 16:41:18 orly kernel: [load_balance_newidle+46/176] load_balance_newidle+0x2e/0xb0
Jul 6 16:41:18 orly kernel: [schedule+1660/3328] schedule+0x67c/0xd00
Jul 6 16:41:18 orly kernel: [schedule+1712/3328] schedule+0x6b0/0xd00
Jul 6 16:41:18 orly kernel: [try_to_wake_up+723/800] try_to_wake_up+0x2d3/0x320
Jul 6 16:41:18 orly kernel: [kernel_map_pages+69/128] kernel_map_pages+0x45/0x80
Jul 6 16:41:18 orly kernel: [scsi_try_to_abort_cmd+46/64] scsi_try_to_abort_cmd+0x2e/0x40
Jul 6 16:41:18 orly kernel: [scsi_eh_abort_cmds+87/240] scsi_eh_abort_cmds+0x57/0xf0
Jul 6 16:41:18 orly kernel: [scsi_unjam_host+181/512] scsi_unjam_host+0xb5/0x200
Jul 6 16:41:18 orly kernel: [__down_interruptible+244/288] __down_interruptible+0xf4/0x120
Jul 6 16:41:18 orly kernel: [default_wake_function+0/32] default_wake_function+0x0/0x20
Jul 6 16:41:18 orly kernel: [complete+69/96] complete+0x45/0x60
Jul 6 16:41:18 orly kernel: [scsi_error_handler+248/416] scsi_error_handler+0xf8/0x1a0
Jul 6 16:41:18 orly kernel: [scsi_error_handler+0/416] scsi_error_handler+0x0/0x1a0
Jul 6 16:41:18 orly kernel: [kernel_thread_helper+5/16] kernel_thread_helper+0x5/0x10
Jul 6 16:41:18 orly kernel: Code: 80 da 34 c0 eb f0 0f 0b 5c 00 80 da 34 c0 eb df 8d 74 26 00 81 78 04 ad 4e ad de 89 c2 75 16 0f b6 02 84 c0 7f 05 c6 02 01 fb c3 <0f> 0b 5d 00 80 da 34 c0 eb f1 0f 0b 5c 00 80 da 34 c0 eb e0 90
Jul 6 16:50:48 orly kernel: <6>iscsi2: detected conn error (1011)


> Another thing, Login contains "X-com.cisco.PingTimeout". Wonder why it is
> there, along with a couple more Cisco-specific.

There are still Cisco-specific parameter stuff in open-iscsi.
Reply all
Reply to author
Forward
0 new messages