open-iscsi r454 "detected conn error (1011)" / iSCSI PDU timed out / cannot login

219 views
Skip to first unread message

Pasi Kärkkäinen

unread,
Jan 24, 2006, 12:41:25 PM1/24/06
to open-...@googlegroups.com
Hi!

I'm using open-iscsi v0.5 r454 with Debian 2.6.15-1-686-smp kernel, using
the default iscsi modules from that kernel.

Discovery works OK to Equallogic target, and I get the volume list. But,
when I try to login, I get these errors:

# iscsiadm -m node --record 6fe140 --login
iscsiadm: iscsid reported error (11 - iSCSI PDU timed out)

from syslog:

kernel: iscsi4: detected conn error (1011)
iscsid: detected iSCSI connection (handle 0xf78cd000) error (1011) state (2)

Is there solution to this problem?

Same target/volume works with other initiators.

-- Pasi Kärkkäinen

^
. .
Linux
/ - \
Choice.of.the
.Next.Generation.

sc...@leerssen.com

unread,
Jan 24, 2006, 12:46:36 PM1/24/06
to open-iscsi
That's the same error I was getting when the discovery packet didn't
contain the port number (3260). Try an 'iscsiadm -m node' and see what
the target port is, and then make sure that it is the same as the port
that should be used to contact the target. In my case, it was 0 (zero)
and should have been 3260.

Pasi Kärkkäinen

unread,
Jan 24, 2006, 12:51:00 PM1/24/06
to open-...@googlegroups.com

# iscsiadm -m node
[6fe140] 10.xx.yy.245:3260,0 iqn.equallogic:6-8a0900-2f3432c01-e950000001843d65-vol01
[0143ba] 10.xx.yy.245:3260,0 iqn.equallogic:6-8a0900-252432c01-a2d0000001543d65-testvol01

That's what I get.. is that "3260,0" correct?

-- Pasi

sc...@leerssen.com

unread,
Jan 24, 2006, 1:01:36 PM1/24/06
to open-iscsi
That looks right. Sorry for the red herring.

Pasi Kärkkäinen

unread,
Jan 24, 2006, 1:07:35 PM1/24/06
to open-...@googlegroups.com
On Tue, Jan 24, 2006 at 10:01:36AM -0800, sc...@leerssen.com wrote:
>
> That looks right. Sorry for the red herring.
>

No problem.

So, you have open-iscsi working with Equallogic target? What version of
open-iscsi? What version of Equallogic firmware?

Also please post your iscsid.conf so I can try to figure out what's the
problem in my setup..

Thanks!

-- Pasi

Pasi Kärkkäinen

unread,
Jan 26, 2006, 2:20:08 AM1/26/06
to open-...@googlegroups.com

Anyone?

Same target/volume works fine with linux-iscsi (cisco) initiator.. and also
with windows initiator.

Recommended steps for debugging this?

-- Pasi

Mike Christie

unread,
Jan 26, 2006, 2:32:15 AM1/26/06
to open-...@googlegroups.com

I actually have one of these at work I will try it out in the morning.
What model are you using? Maybe a ethereal trace would help others or
maybe turn on the iscsi and scsi debugging to better see where the error
is coming from.

Pasi Kärkkäinen

unread,
Jan 26, 2006, 2:47:37 AM1/26/06
to open-...@googlegroups.com

I'm using Equallogic PS-200E.

I'll try with the debug options..

Thanks!

-- Pasi

Wang Zhenyu

unread,
Jan 26, 2006, 3:53:22 AM1/26/06
to open-...@googlegroups.com

If that's login problem, you should send 'iscsid -d9' log output.

Pasi Kärkkäinen

unread,
Jan 26, 2006, 10:00:14 AM1/26/06
to open-...@googlegroups.com

Done.

http://nrg.joroinen.fi/iscsid-d9.log

Also attached to this mail.

-- Pasi

iscsid-d9.log

Mike Christie

unread,
Jan 26, 2006, 7:17:45 PM1/26/06
to open-...@googlegroups.com

Did you also send the kernel log parts? I ran with a "Vendor: EQLOGIC
Model: 100E-00" and it seems to work ok. It was very slow becuase I was
in minneapolis MN and the target was in westford MA but I could fdisk,
and do mkfs. I am wondering where in the kernel the conn error that
produced this

iscsid: detected iSCSI connection (handle 0xd2a40000) error (1011) state (2)

in the daemon came from.

Pasi Kärkkäinen

unread,
Jan 27, 2006, 6:18:21 AM1/27/06
to open-...@googlegroups.com

dmesg:

iscsi: registered transport (tcp)
scsi4 : iSCSI Initiator over TCP/IP, v.0.3


iscsi4: detected conn error (1011)

Not very informative..

-- Pasi

Mike Christie

unread,
Jan 27, 2006, 1:15:49 PM1/27/06
to open-...@googlegroups.com

We should probably come up with a way to do make DEBUG, but for now
could you turn on debugging in the driver. To do this you must open
kernel/iscsi_tcp.c and uncomment these lines:

/* #define DEBUG_TCP */
/* #define DEBUG_SCSI */

Or you can apply this patch which uncomments the lines for you. To apply
the patch go to the open-iscsi root dir and do a "patch -p0 -i
path-to-patch/out.debug.patch"

out.debug.patch

Pasi Kärkkäinen

unread,
Jan 28, 2006, 7:18:49 AM1/28/06
to open-...@googlegroups.com

$ iscsiadm -m node --record b66f1a --login


iscsiadm: iscsid reported error (11 - iSCSI PDU timed out)

dmesg with DEBUG_TCP and DEBUG_SCSI turned on:

iscsi: registered transport (tcp)
scsi5 : iSCSI Initiator over TCP/IP, v.0.3
scsi: mgmtpdu [op 0x43 hdr->itt 0xa00 datalen 492]
scsi: mtask deq [cid 0 state 4 itt 0xa00]
tcp: sendhdr 48 bytes, sent 0 res 48
tcp: sendpage: 492 bytes, sent 0 left 492 sent 0 res 492
tcp: iscsi_tcp_state_change: TCP_CLOSE|TCP_CLOSE_WAIT
iscsi5: detected conn error (1011)
tcp: in 80 bytes
tcp: conn 0 Rx suspended!

Hope that helps..

-- Pasi

Pasi Kärkkäinen

unread,
Jan 30, 2006, 10:51:42 AM1/30/06
to open-...@googlegroups.com
On Thu, Jan 26, 2006 at 06:17:45PM -0600, Mike Christie wrote:
>
> >>>>>Same target/volume works fine with linux-iscsi (cisco) initiator.. and
> >>>>>also
> >>>>>with windows initiator.
> >>>>>
> >>>>>Recommended steps for debugging this?
> >>>>>
> >>>>
> >>>>I actually have one of these at work I will try it out in the morning.
> >>>>What model are you using? Maybe a ethereal trace would help others or
> >>>>maybe turn on the iscsi and scsi debugging to better see where the
> >>>>error is coming from.
> >>>
> >>>I'm using Equallogic PS-200E.
> >>>
> >>>I'll try with the debug options..
> >>>
> >>
> >>If that's login problem, you should send 'iscsid -d9' log output.
> >>
> >
> >
> >Done.
> >
> >http://nrg.joroinen.fi/iscsid-d9.log
> >
> >Also attached to this mail.
> >
>
> Did you also send the kernel log parts? I ran with a "Vendor: EQLOGIC
> Model: 100E-00" and it seems to work ok. It was very slow becuase I was
> in minneapolis MN and the target was in westford MA but I could fdisk,
> and do mkfs.
>

Hmm.. Could you also check your open-iscsi + kernel versions ? Would be nice
to know which version of open-iscsi works out-of-the-box with EqualLogic
targets.

-- Pasi

Mike Christie

unread,
Jan 30, 2006, 2:48:56 PM1/30/06
to open-...@googlegroups.com

I am not sure if I am reading this correctly but from your two logs it
looks like the target is closing the connection. Is that the correct
interpretation? I am not sure why.... Maybe there is a iscsi reason and
for that I guess we could add some code to dump the values in the PDUs
or could you send a ethereal trace? Or maybe it is just a target
misconfig - that is just a complete guess based on similar problems with
another type if target.

Pasi Kärkkäinen

unread,
Jan 30, 2006, 3:27:57 PM1/30/06
to open-...@googlegroups.com

The target is (should be) OK, because it works with Linux Cisco Initiator,
and also with Microsoft initiator..

I'll capture ethereal trace tomorrow and post it.

-- Pasi

Pasi Kärkkäinen

unread,
Feb 3, 2006, 7:23:55 AM2/3/06
to open-...@googlegroups.com
On Mon, Jan 30, 2006 at 10:27:57PM +0200, Pasi Kärkkäinen wrote:
>
> > >>>>>>If that's login problem, you should send 'iscsid -d9' log output.
> > >>>>>>
> > >>>>>Done.
> > >>>>>
> > >>>>>http://nrg.joroinen.fi/iscsid-d9.log
> > >>>>>
> > >>>>

The same problem happens with Linux 2.6.15 + open-iscsi v1.0 r485.

Looking at the ethereal capture, the login process fails when EqualLogic
target sends "iSCSI Login Response (Target moved temporarily)".

That happens because of automatic connection load balancing in EqualLogic target.
Initiators connect to the group (load balancing) IP which then redirects to
the real target interfaces.

Open-iscsi initiator seems to ask ARP for the correct target interface IP
(which it got from the "Target moved temporarily" message), but it doesn't send
*anything* to the real target IP..

So the problem is related to the "Target moved temporarily" handling in
open-iscsi.

Ideas?

-- Pasi

Mike Christie

unread,
Feb 3, 2006, 11:01:27 PM2/3/06
to open-...@googlegroups.com
Pasi Kärkkäinen wrote:
>
> The same problem happens with Linux 2.6.15 + open-iscsi v1.0 r485.
>
> Looking at the ethereal capture, the login process fails when EqualLogic
> target sends "iSCSI Login Response (Target moved temporarily)".
>
> That happens because of automatic connection load balancing in EqualLogic target.
> Initiators connect to the group (load balancing) IP which then redirects to
> the real target interfaces.
>
> Open-iscsi initiator seems to ask ARP for the correct target interface IP
> (which it got from the "Target moved temporarily" message), but it doesn't send
> *anything* to the real target IP..
>
> So the problem is related to the "Target moved temporarily" handling in
> open-iscsi.

For some reason it looks like the response pdu is not getting to the
daemon. This is not a fix, but could you try this out? It should work
with svn or open-iscsi v1.0 r485. Thanks.

out.state.patch

Mike Christie

unread,
Feb 3, 2006, 11:18:53 PM2/3/06
to open-...@googlegroups.com
> ------------------------------------------------------------------------
>
> Index: kernel/iscsi_tcp.c
> ===================================================================
> --- kernel/iscsi_tcp.c (revision 485)
> +++ kernel/iscsi_tcp.c (working copy)
> @@ -1240,13 +1240,7 @@
> conn = (struct iscsi_conn*)sk->sk_user_data;
> session = conn->session;
>
> - if ((sk->sk_state == TCP_CLOSE_WAIT ||
> - sk->sk_state == TCP_CLOSE) &&
> - !atomic_read(&sk->sk_rmem_alloc)) {
> - debug_tcp("iscsi_tcp_state_change: TCP_CLOSE|TCP_CLOSE_WAIT\n");
> - iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
> - }
> -
> + printk(KERN_ERR "iscsi_tcp_state_change state %d\n", sk->sk_state);
> old_state_change = conn->old_state_change;
>
> read_unlock(&sk->sk_callback_lock);

I guess if this works, maybe we want to be taking the sk_callback_lock
with write_lock becuase we could be setting the suspend bit if we call
iscsi_conn_failure.

Pasi Kärkkäinen

unread,
Feb 4, 2006, 7:04:42 AM2/4/06
to open-...@googlegroups.com


I tried this.. the patch didn't apply to v1.0, but I did the same changes
manually.

I also defined DEBUG_SCSI and DEBUG_TCP.

The connection still fails.

$ iscsiadm -m node --record b66f1a --login
iscsiadm: iscsid reported error (11 - iSCSI PDU timed out)


dmesg:
scsi13 : iSCSI Initiator over TCP/IP, v.0.3
iscsi13: detected conn error (1011)


iscsid -d9 log available from:
http://nrg.joroinen.fi/iscsid-v1.0-d9.log


-Pasi

Mike Christie

unread,
Feb 4, 2006, 5:14:03 PM2/4/06
to open-...@googlegroups.com

Is this all you have for the kernel messages? Maybe you need to get the
KERN_DEBUG messages.

Robin H. Johnson

unread,
Feb 4, 2006, 5:58:58 PM2/4/06
to open-...@googlegroups.com
On Fri, Feb 03, 2006 at 10:01:27PM -0600, Mike Christie wrote:
> >So the problem is related to the "Target moved temporarily" handling in
> >open-iscsi.
I've got an EqualLogic 100E-00 target, that exhibits the identical
behavior as described by the above poster.

My copy of the regression test is slightly modified. I've got the
fdisk/mkfs/bonnie turned off, as I've got a need for raw disk (also
it's got a patch for parallel runs that I should submit)
Additionally I have a wrapper script that runs the regression test in a
loop, logging to a new sequential file each time.

Your patch does appear to help a lot (I can complete 3 parallel
regression runs 95% of the time now, where it died much earlier in them
before), but things still aren't 100%.

I get a LOT of 'iscsi_tcp_state_change state 8' output, esp. around the
spots that iscsiadm is changing parameters.
For 3 parallel runs, 500+ instances of the error for the course of the
hour that regression.sh takes to run.

However if I modify the regression test to just change parameters, and
not actually touch the device at all, I see
'iscsi_tcp_state_change state 7' instead.

Grepping the regression logs for other error output, here are the counts
for the hour.
2 iscsiadm: got read error (0), daemon died?
4 iscsiadm: iscsid reported error (11 - iSCSI PDU timed out)
8 iscsiadm: iscsid reported error (5 - encountered iSCSI login failure)

--
Robin Hugh Johnson
E-Mail : rob...@gentoo.org
GnuPG FP : 11AC BA4F 4778 E3F6 E4ED F38E B27B 944E 3488 4E85

Wang, Zhenyu Z

unread,
Feb 5, 2006, 12:50:05 AM2/5/06
to open-...@googlegroups.com
pa...@iki.fi wrote:
> On Fri, Feb 03, 2006 at 10:01:27PM -0600, Mike Christie wrote:
>> Pasi Kärkkäinen wrote:
>>>
>>> The same problem happens with Linux 2.6.15 + open-iscsi v1.0 r485.
>>>
>>> Looking at the ethereal capture, the login process fails when
>>> EqualLogic target sends "iSCSI Login Response (Target moved
>>> temporarily)".
>>>
>>> That happens because of automatic connection load balancing in
>>> EqualLogic target. Initiators connect to the group (load balancing)
>>> IP which then redirects to the real target interfaces.
>>>
>>> Open-iscsi initiator seems to ask ARP for the correct target
>>> interface IP (which it got from the "Target moved temporarily"
>>> message), but it doesn't send *anything* to the real target IP..
>>>
>>> So the problem is related to the "Target moved temporarily"
>>> handling in open-iscsi.

I did remember that when I did login redirect work, Tomof has a Equallogic
and he tested my patch fine with it. Not sure about why it doesn't work for you.

But it seems racing that daemon got tcp disconn message before login
rsp pdu. You'd better post tcpdump or ethereal capture, thanks.

>>
>> For some reason it looks like the response pdu is not getting to the
>> daemon. This is not a fix, but could you try this out? It should work
>> with svn or open-iscsi v1.0 r485. Thanks.
>
>
> I tried this.. the patch didn't apply to v1.0, but I did the same
> changes manually.
>
> I also defined DEBUG_SCSI and DEBUG_TCP.
>
> The connection still fails.
>
> $ iscsiadm -m node --record b66f1a --login

> iscsiadm: iscsid reported error (11 - iSCSI PDU timed out)
>
>

> dmesg:
> scsi13 : iSCSI Initiator over TCP/IP, v.0.3
> iscsi13: detected conn error (1011)
>
>

Pasi Kärkkäinen

unread,
Feb 5, 2006, 4:13:57 PM2/5/06
to open-...@googlegroups.com

Yep, that's all.

I tried again, started with fresh v1.0 tarball, patched iscsi_tcp.c with
your patch and defined DEBUG_SCSI and DEBUG_TCP. Compiled, and loaded the
modules. Did I miss something?

Still nothing in dmesg or syslog.. I wonder why..

- Pasi

Mike Christie

unread,
Feb 6, 2006, 12:33:45 PM2/6/06
to open-...@googlegroups.com
Robin H. Johnson wrote:
> On Fri, Feb 03, 2006 at 10:01:27PM -0600, Mike Christie wrote:
>
>>>So the problem is related to the "Target moved temporarily" handling in
>>>open-iscsi.
>
> I've got an EqualLogic 100E-00 target, that exhibits the identical
> behavior as described by the above poster.
>
> My copy of the regression test is slightly modified. I've got the
> fdisk/mkfs/bonnie turned off, as I've got a need for raw disk (also
> it's got a patch for parallel runs that I should submit)
> Additionally I have a wrapper script that runs the regression test in a
> loop, logging to a new sequential file each time.
>
> Your patch does appear to help a lot (I can complete 3 parallel
> regression runs 95% of the time now, where it died much earlier in them
> before), but things still aren't 100%.

So before my patch, you could not log in at all. Now you can log in but
we have similar problems later on? Am I understanding that right? When
you run the regression tests and you can login initially ok but it later
fails, do you any async message PDUs being sent from the target to the
initiator (something about the connection or session being dropped or
logged out)?

Robin H. Johnson

unread,
Feb 7, 2006, 12:47:19 AM2/7/06
to open-...@googlegroups.com
On Mon, Feb 06, 2006 at 11:33:45AM -0600, Mike Christie wrote:
> >Your patch does appear to help a lot (I can complete 3 parallel
> >regression runs 95% of the time now, where it died much earlier in them
> >before), but things still aren't 100%.
> So before my patch, you could not log in at all. Now you can log in
> but we have similar problems later on? Am I understanding that right?
I could login before, but I couldn't even complete a single regression
pass reliably (non-parallel) before the I went to errors of [detected
conn error (1011)] due to the target moving - I would note in this that
I did not have dedicated access to the target at any point, so I
couldn't test it without other users (and I suspect their traffic
caused the target to want my connection moved).

Running multiple passes in parallel triggered the error within 10
minutes of starting reasonably quickly, but the quantity of traffic
made it very tough to capture iSCSI packets without running out of disk
space.

> When you run the regression tests and you can login initially ok but
> it later fails, do you any async message PDUs being sent from the
> target to the initiator (something about the connection or session
> being dropped or logged out)?

I did some initial debugging this some last November with zhen, and found
an async logout. Zhen's async patch '[RFC] handler for reject and async
event pdu' and it's later editions that added a stub
'iscsi_async_event_rsp' did help a little as well, but left more
questions unanswered than not, and I suspect that the union of them and
your testing patch from this thread may resolve the issue completely.

(Thereafter I didn't have any access to the initiator box and target
box for most of December and January, and I've come back to the fray
now).

Pasi Kärkkäinen

unread,
Feb 7, 2006, 2:39:21 AM2/7/06
to open-...@googlegroups.com
On Mon, Feb 06, 2006 at 09:47:19PM -0800, Robin H. Johnson wrote:
> On Mon, Feb 06, 2006 at 11:33:45AM -0600, Mike Christie wrote:
> > >Your patch does appear to help a lot (I can complete 3 parallel
> > >regression runs 95% of the time now, where it died much earlier in them
> > >before), but things still aren't 100%.
> > So before my patch, you could not log in at all. Now you can log in
> > but we have similar problems later on? Am I understanding that right?
> I could login before, but I couldn't even complete a single regression
> pass reliably (non-parallel) before the I went to errors of [detected
> conn error (1011)] due to the target moving - I would note in this that
> I did not have dedicated access to the target at any point, so I
> couldn't test it without other users (and I suspect their traffic
> caused the target to want my connection moved).
>

So, you have EqualLogic PS-100E.. what firmware?

What version of open-iscsi?

I'm asking because I can't login at all.. I have PS-200E with v2.2.6
firmware, and open-scsi v1.0.

> I did some initial debugging this some last November with zhen, and found
> an async logout. Zhen's async patch '[RFC] handler for reject and async
> event pdu' and it's later editions that added a stub
> 'iscsi_async_event_rsp' did help a little as well, but left more
> questions unanswered than not, and I suspect that the union of them and
> your testing patch from this thread may resolve the issue completely.
>
> (Thereafter I didn't have any access to the initiator box and target
> box for most of December and January, and I've come back to the fray
> now).
>

Nice to have others solving these problems too.. :)

-- Pasi

Robin H. Johnson

unread,
Feb 7, 2006, 3:22:43 AM2/7/06
to open-...@googlegroups.com
On Tue, Feb 07, 2006 at 09:39:21AM +0200, Pasi K?rkk?inen wrote:
> So, you have EqualLogic PS-100E.. what firmware?
V2.2.3 (R75).

> What version of open-iscsi?
I keep up with SVN HEAD, which currently matches v1.0.

> I'm asking because I can't login at all.. I have PS-200E with v2.2.6
> firmware, and open-scsi v1.0.

I'm not aware of what changes were made between 2.2.3 and 2.2.6, but I
certainly haven't ever had problems logging in - but I don't use any
authentication in my environment.

Pasi Kärkkäinen

unread,
Feb 7, 2006, 5:04:26 AM2/7/06
to open-...@googlegroups.com
On Sun, Feb 05, 2006 at 01:50:05PM +0800, Wang, Zhenyu Z wrote:
>
> pa...@iki.fi wrote:
> > On Fri, Feb 03, 2006 at 10:01:27PM -0600, Mike Christie wrote:
> >> Pasi Kärkkäinen wrote:
> >>>
> >>> The same problem happens with Linux 2.6.15 + open-iscsi v1.0 r485.
> >>>
> >>> Looking at the ethereal capture, the login process fails when
> >>> EqualLogic target sends "iSCSI Login Response (Target moved
> >>> temporarily)".
> >>>
> >>> That happens because of automatic connection load balancing in
> >>> EqualLogic target. Initiators connect to the group (load balancing)
> >>> IP which then redirects to the real target interfaces.
> >>>
> >>> Open-iscsi initiator seems to ask ARP for the correct target
> >>> interface IP (which it got from the "Target moved temporarily"
> >>> message), but it doesn't send *anything* to the real target IP..
> >>>
> >>> So the problem is related to the "Target moved temporarily"
> >>> handling in open-iscsi.
>
> I did remember that when I did login redirect work, Tomof has a Equallogic
> and he tested my patch fine with it. Not sure about why it doesn't work for you.
>
> But it seems racing that daemon got tcp disconn message before login
> rsp pdu. You'd better post tcpdump or ethereal capture, thanks.
>

$ iscsiadm -m discovery -t sendtargets -p 10.0.0.245:3260
[6f4fb5] 10.0.0.245:3260,0 iqn.equallogic:6-8a0900-c4f432c01-a210000003143e86-testvol01

$ iscsiadm -m node -r 6f4fb5 -l


iscsiadm: iscsid reported error (11 - iSCSI PDU timed out)

dmesg:
scsi5 : iSCSI Initiator over TCP/IP, v.0.3


iscsi5: detected conn error (1011)

http://nrg.joroinen.fi/iscsid-d9-equallogic.txt
http://nrg.joroinen.fi/open-iscsi-v1.0-equallogic-login-error.pcap

There you are.. and I'm running open-iscsi v1.0 with 2.6.15-1-k7 Debian
kernel.

IP-addresses:

initiator: 10.0.0.10
target group IP: 10.0.0.245
target-if1 IP: 10.0.0.241
target-if2 IP: 10.0.0.242
target-if3 IP: 10.0.0.243

Pasi Kärkkäinen

unread,
Feb 7, 2006, 5:06:16 AM2/7/06
to open-...@googlegroups.com
On Tue, Feb 07, 2006 at 12:22:43AM -0800, Robin H. Johnson wrote:
> On Tue, Feb 07, 2006 at 09:39:21AM +0200, Pasi K?rkk?inen wrote:
> > So, you have EqualLogic PS-100E.. what firmware?
> V2.2.3 (R75).
>

OK.



> > What version of open-iscsi?
> I keep up with SVN HEAD, which currently matches v1.0.
>

OK.

> > I'm asking because I can't login at all.. I have PS-200E with v2.2.6
> > firmware, and open-scsi v1.0.
> I'm not aware of what changes were made between 2.2.3 and 2.2.6, but I
> certainly haven't ever had problems logging in - but I don't use any
> authentication in my environment.
>

Hmm.. I'm not using CHAP authentication either.

I'm only using access restrictions based on initiator IP address
(currently).

Maybe I should try with unrestricted volumes (in the target)..

Currently I cannot login at all..I just posted ethereal dump of the failing
login process, hopefully that will help debugging the problem.

-- Pasi

Robin H. Johnson

unread,
Feb 7, 2006, 7:12:40 PM2/7/06
to open-...@googlegroups.com
On Tue, Feb 07, 2006 at 12:06:16PM +0200, Pasi K?rkk?inen wrote:
> > I'm not aware of what changes were made between 2.2.3 and 2.2.6, but I
> > certainly haven't ever had problems logging in - but I don't use any
> > authentication in my environment.
> Hmm.. I'm not using CHAP authentication either.
>
> I'm only using access restrictions based on initiator IP address
> (currently).
>
> Maybe I should try with unrestricted volumes (in the target)..
Ah. I did see one related thing in my usage.
My initiator machines all have two GigE interfaces, and when I messed
up the ACL list at one point, it only had an IP for one of the
interfaces in a machine, and that caused random login errors.

We moved to restricting on initiator name instead to prevent future
occurrences.

Pasi Kärkkäinen

unread,
Feb 8, 2006, 3:31:27 PM2/8/06
to open-...@googlegroups.com

if that ethereal capture is not enough, please tell me.. I'm able to do new
captures quite easily.

-- Pasi

Wang Zhenyu

unread,
Feb 15, 2006, 2:33:04 AM2/15/06
to open-...@googlegroups.com

This seems still a problem right? Pasi?
From your iscsid log, code to do target temp move isn't called, but server
has closed conn by which initiator won't go further any more.

How about this? In tcp state change, we check if we're in normal state,
otherwise just ignore this callback.

---
--- kernel/iscsi_tcp.c.orig 2006-02-15 14:38:43.000000000 +0800
+++ kernel/iscsi_tcp.c 2006-02-15 15:12:26.000000000 +0800
@@ -1242,7 +1242,8 @@ iscsi_tcp_state_change(struct sock *sk)



if ((sk->sk_state == TCP_CLOSE_WAIT ||

sk->sk_state == TCP_CLOSE) &&
- !atomic_read(&sk->sk_rmem_alloc)) {

+ !atomic_read(&sk->sk_rmem_alloc) &&
+ (session->state == ISCSI_STATE_LOGGED_IN)) {


debug_tcp("iscsi_tcp_state_change: TCP_CLOSE|TCP_CLOSE_WAIT\n");

iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
}

Pasi Kärkkäinen

unread,
Feb 15, 2006, 1:03:29 PM2/15/06
to open-...@googlegroups.com

Yep, the problem is not yet resolved.

I also sent ethereal packet capture dump last week to this list.

I'm able to test your patch propably tomorrow. I'll let you know if it
helps!

Thanks!

- Pasi

Pasi Kärkkäinen

unread,
Feb 17, 2006, 12:32:52 PM2/17/06
to open-...@googlegroups.com
On Wed, Feb 15, 2006 at 08:03:29PM +0200, Pasi Kärkkäinen wrote:
>
> On Wed, Feb 15, 2006 at 03:33:04PM +0800, Wang Zhenyu wrote:
> >
> > > >
> > > > Did you also send the kernel log parts? I ran with a "Vendor: EQLOGIC
> > > > Model: 100E-00" and it seems to work ok. It was very slow becuase I was
> > > > in minneapolis MN and the target was in westford MA but I could fdisk,
> > > > and do mkfs.
> > > >
> > >
> > > Hmm.. Could you also check your open-iscsi + kernel versions ? Would be nice
> > > to know which version of open-iscsi works out-of-the-box with EqualLogic
> > > targets.
> > >
> >
> > This seems still a problem right? Pasi?
> > From your iscsid log, code to do target temp move isn't called, but server
> > has closed conn by which initiator won't go further any more.
> >
>
> Yep, the problem is not yet resolved.
>
> I also sent ethereal packet capture dump last week to this list.
>
> I'm able to test your patch propably tomorrow. I'll let you know if it
> helps!
>
> > How about this? In tcp state change, we check if we're in normal state,
> > otherwise just ignore this callback.
> >

With this patch I can now successfully mount volume from Equallogic target!!

Thanks!

I'll do some stress testing next week and let you know how it goes..

- Pasi

Mike Christie

unread,
Feb 17, 2006, 11:37:59 PM2/17/06
to open-...@googlegroups.com

This is strange. Zhen's patch makes it so it does not fire only when we
are not yet logged in. I took the big hammer out and removed all that
code so that iscsi_conn_failure would never fire here, which would of
course include Zhen's case. But my patch did not work in handling this?

Mike Christie

unread,
Feb 17, 2006, 11:39:58 PM2/17/06
to open-...@googlegroups.com

rm that "not yet"

Pasi Kärkkäinen

unread,
Feb 18, 2006, 7:41:51 AM2/18/06
to open-...@googlegroups.com

I just figured out I had backup copy of original 2.6.15 iscsi modules under
/lib/modules/ (sigh), and that was the reason why I didn't get any debug
output.. the original modules were loaded :( and that's the reason why your
patch didn't help.

Should I try your patch also?

-- Pasi

Mike Christie

unread,
Feb 19, 2006, 6:07:31 PM2/19/06
to open-...@googlegroups.com

no, my patch was just a hack to make sure we were racing with the login
code.

thanks for the follow up.

Wang Zhenyu

unread,
Feb 20, 2006, 2:32:10 AM2/20/06
to open-...@googlegroups.com
On 2006.02.19 17:07:31 +0000, Mike Christie wrote:
> >>>>>---
> >>>>>--- kernel/iscsi_tcp.c.orig 2006-02-15 14:38:43.000000000 +0800
> >>>>>+++ kernel/iscsi_tcp.c 2006-02-15 15:12:26.000000000 +0800
> >>>>>@@ -1242,7 +1242,8 @@ iscsi_tcp_state_change(struct sock *sk)
> >>>>>
> >>>>> if ((sk->sk_state == TCP_CLOSE_WAIT ||
> >>>>> sk->sk_state == TCP_CLOSE) &&
> >>>>>- !atomic_read(&sk->sk_rmem_alloc)) {
> >>>>>+ !atomic_read(&sk->sk_rmem_alloc) &&
> >>>>>+ (session->state == ISCSI_STATE_LOGGED_IN)) {
> >>>>> debug_tcp("iscsi_tcp_state_change: TCP_CLOSE|TCP_CLOSE_WAIT\n");
> >>>>> iscsi_conn_failure(conn, ISCSI_ERR_CONN_FAILED);
> >>>>> }
> >>>
> >>>
> >>This is strange. Zhen's patch makes it so it does not fire only when we
> >>are not yet logged in. I took the big hammer out and removed all that
> >>code so that iscsi_conn_failure would never fire here, which would of
> >>course include Zhen's case. But my patch did not work in handling this?
> >>
> >
> >
> > I just figured out I had backup copy of original 2.6.15 iscsi modules under
> > /lib/modules/ (sigh), and that was the reason why I didn't get any debug
> > output.. the original modules were loaded :( and that's the reason why your
> > patch didn't help.
> >
> > Should I try your patch also?
> >
>
> no, my patch was just a hack to make sure we were racing with the login
> code.
>
> thanks for the follow up.
>

Mike, I just got a feel that should we do bind_conn that early? In case of
login redirect we'll possibly recreate socket. I'm not quite sure about this...
But the race is truly out there, so my patch would say that ignore conn fail
when session is not logged in, that depends on userspace daemon to handle all
the cases(redirect, timeout, etc). Is this ok?

Mike Christie

unread,
Feb 20, 2006, 1:57:28 PM2/20/06
to open-...@googlegroups.com

yeah, agree on the race.

> when session is not logged in, that depends on userspace daemon to handle all
> the cases(redirect, timeout, etc). Is this ok?
>

I am still thinking about it. We can either overhaul the conn fail path
to remove the race or for the short term go your route.

The only problem I hit with the patch is that test you added makes some
login failures a lot longer (minutes instead of seconds). For one of my
test setups I use to verify patches, that code was detecting network
problems and would bail us out right away. My case is probably not a big
deal if it comes down to getting this to work for eql vs making failures
faster.

Pasi Kärkkäinen

unread,
Feb 25, 2006, 6:14:19 AM2/25/06
to open-...@googlegroups.com

At least I'd hope to get the login stuff working with Equallogic targets :)

There's no way to get both? login working and the failure detection working
without long delays?

-- Pasi

^
. .
Linux
/ - \
Choice.of.the
.Next.Generation.

Reply all
Reply to author
Forward
0 new messages