Re: iscsi connection errors

2,732 views
Skip to first unread message

Paul_...@dell.com

unread,
Oct 5, 2012, 5:29:48 PM10/5/12
to open-...@googlegroups.com

On Oct 5, 2012, at 3:39 PM, squadra wrote:

> Hi,
>
> from time to time i see connection errors like this to our equallogic 6100xv / 4100e stack.
>
> ct 5 21:22:20 xxx kernel: connection4:0: detected conn error (1020)
> Oct 5 21:22:21 xxx iscsid: Kernel reported iSCSI connection 4:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
> Oct 5 21:22:23 xxx kernel: connection4:0: detected conn error (1020)
> Oct 5 21:22:24 xxx iscsid: connection4:0 is operational after recovery (1 attempts)
>
> any ideas what this error code means?
>
> cheers,
>
> Juergen

I wonder if that is a connection close due to an async logout request from the array, which is what it does if it wants to move a connection to another port.

If yes, then that's a bad message from the iscsi kernel code: an async logout is not an error and logging it with "error" in the text is incorrect.

paul

Michael Christie

unread,
Oct 5, 2012, 11:15:03 PM10/5/12
to open-...@googlegroups.com
On Oct 5, 2012, at 2:39 PM, squadra <j...@internetx.de> wrote:

Hi,

from time to time i see connection errors like this to our equallogic 6100xv / 4100e stack.

ct  5 21:22:20 xxx kernel: connection4:0: detected conn error (1020)


Do you see something before this? Maybe something about a nop/ping timing out, or as Paul mentioned something about the target wanting to logout or dropping the connections?

If not, on the target log, do you see something about the target closing the connection?

Oct  5 21:22:21 xxx iscsid: Kernel reported iSCSI connection 4:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)

It means the target closed the connection.

Oct  5 21:22:23 xxx kernel: connection4:0: detected conn error (1020)
Oct  5 21:22:24 xxx iscsid: connection4:0 is operational after recovery (1 attempts)

any ideas what this error code means?

cheers,

Juergen

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To view this discussion on the web visit https://groups.google.com/d/msg/open-iscsi/-/vggeBH8Nc_MJ.
To post to this group, send email to open-...@googlegroups.com.
To unsubscribe from this group, send email to open-iscsi+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

Michael Christie

unread,
Oct 7, 2012, 7:26:43 PM10/7/12
to open-...@googlegroups.com

On Oct 6, 2012, at 2:08 AM, rok...@gmail.com wrote:

> Hello,
>
> I have similar error, when i try to discover another lun i lost the connection to the current lun using multipath. The log say this:


Your error is nothing like what was being discussed in this thread. Changing subject for you :)

What target are you using?

What version of open-iscsi? What kernel what userspace tools?

When you see this, what is in the target's logs?

How are you discovering another LUN? What command are you running?


>
> Oct 6 07:56:21 robin kernel: connection1:0: pdu (op 0x3000003d itt 0x1) rejected. Reason code 0x4

> Oct 6 07:56:51 robin kernel: connection1:0: pdu (op 0x3000004d itt 0x1) rejected. Reason code 0x4
> Oct 6 07:57:21 robin kernel: connection1:0: pdu (op 0x30000077 itt 0x1) rejected. Reason code 0x4
> Oct 6 07:57:51 robin kernel: connection1:0: pdu (op 0x30000025 itt 0x1) rejected. Reason code 0x4
> Oct 6 07:58:21 robin kernel: connection1:0: pdu (op 0x30000070 itt 0x1) rejected. Reason code 0x4
> Oct 6 07:58:35 robin kernel: connection2:0: pdu (op 0x1000007e itt 0x40) rejected. Reason code 0x7
> Oct 6 07:58:51 robin kernel: connection1:0: pdu (op 0x3000005a itt 0x1) rejected. Reason code 0x4

The target did not like something we did. It is reporting a protocol error.

We have never seen this error before. The initiator just logs the error and does nothing.


> Oct 6 07:59:00 robin kernel: connection2:0: ping timeout of 15 secs expired, recv timeout 10, last rx 5477313, last ping 5477313, now 5479813

Target stopped responding to us.

> Oct 6 07:59:00 robin kernel: connection2:0: detected conn error (1011)


We dropped the connection. The scsi eh eventually runs and we cannot recover the device.


> Oct 6 07:59:21 robin kernel: connection1:0: pdu (op 0x30000079 itt 0x1) rejected. Reason code 0x4
> Oct 6 07:59:51 robin kernel: connection1:0: pdu (op 0x30000049 itt 0x1) rejected. Reason code 0x4
> Oct 6 08:00:21 robin kernel: connection1:0: pdu (op 0x30000069 itt 0x1) rejected. Reason code 0x4
> Oct 6 08:00:51 robin kernel: connection1:0: pdu (op 0x3000006d itt 0x1) rejected. Reason code 0x4
> Oct 6 08:01:21 robin kernel: connection1:0: pdu (op 0x3000002f itt 0x1) rejected. Reason code 0x4
> Oct 6 08:01:31 robin kernel: connection1:0: pdu (op 0x30000015 itt 0x40) rejected. Reason code 0x7
> Oct 6 08:01:51 robin kernel: connection1:0: pdu (op 0x30000065 itt 0x1) rejected. Reason code 0x4
> Oct 6 08:02:16 robin kernel: connection1:0: ping timeout of 15 secs expired, recv timeout 10, last rx 5496921, last ping 5494921, now 5499421
> Oct 6 08:02:16 robin kernel: connection1:0: detected conn error (1011)
> Oct 6 08:02:16 robin kernel: sd 13:0:0:0: Device offlined - not ready after error recovery
> Oct 6 08:02:16 robin last message repeated 8 times
> Oct 6 08:02:16 robin kernel: sd 13:0:0:0: [sdb] Unhandled error code
> Oct 6 08:02:16 robin kernel: sd 13:0:0:0: [sdb] Result: hostbyte=DID_ABORT driverbyte=DRIVER_OK
> Oct 6 08:02:16 robin kernel: sd 13:0:0:0: [sdb] CDB: Read(10): 28 00 0f c7 c6 48 00 00 40 00
> Oct 6 08:02:16 robin kernel: end_request: I/O error, dev sdb, sector 264750664
> Oct 6 08:02:16 robin kernel: sd 13:0:0:0: [sdb] Unhandled error code
> Oct 6 08:02:16 robin kernel: sd 13:0:0:0: [sdb] Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
> Oct 6 08:02:16 robin kernel: device-mapper: multipath: Failing path 8:16.
> Oct 6 08:02:16 robin kernel: sd 13:0:0:0: [sdb] CDB: Write(10): 2a 00 24 27 24 10 00 00 08 00
> Oct 6 08:02:16 robin kernel: end_request: I/O error, dev sdb, sector 606544912
>
> And then i cant get access to the lun so i need to reboot the server, when is rebooted the server works and the multipath connections are 2 of 2.
>
> Some ideas?
>
> Best Regards.

squadra

unread,
Oct 8, 2012, 2:18:42 AM10/8/12
to open-...@googlegroups.com, Paul_...@dell.com
Hello Paul,

we thought something, too. thats why we disabled connection loadbalancing on the eql array, without success so far.

-- juergen

squadra

unread,
Oct 8, 2012, 2:21:48 AM10/8/12
to open-...@googlegroups.com
Hello Mike,


Am Samstag, 6. Oktober 2012 05:15:11 UTC+2 schrieb Mike Christie:
On Oct 5, 2012, at 2:39 PM, squadra <j...@internetx.de> wrote:

Hi,

from time to time i see connection errors like this to our equallogic 6100xv / 4100e stack.

ct  5 21:22:20 xxx kernel: connection4:0: detected conn error (1020)


Do you see something before this? Maybe something about a nop/ping timing out, or as Paul mentioned something about the target wanting to logout or dropping the connections?


no, those messages are the first thing which pop up on the clients. the only thing i can see is that we ran a full backup at this time.
 
If not, on the target log, do you see something about the target closing the connection?


the equallogic tells us this:

iSCSI session to target '192.168.xx.xx:3260, iqn.2001-05.com.equallogic:4-52aed6-...' from initiator '192.168.xxxx:48758, iqn.1994-05.com.redhat:xxx' was closed.
	iSCSI intra-group connection failure.
	Local reset initiated due to network errors.
 
on switch side (a cisco 3750 stack) we dont see any drops/errors at all...

Donald Williams

unread,
Oct 8, 2012, 10:01:33 AM10/8/12
to open-...@googlegroups.com
Hello, 

If when you see these errors, look for an INFO: event from the EQL array of "Load Balancing request" or "Volume membership has changed".   If so, then as Paul mentioned, these events should not be considered an error. 

 Re: Connection load balancing. (CLB) This should NOT normally be disabled.  It can result in reduced performance.  Where very busy sessions on the same physical ports will have to share that single port.  While others may be available to better balance out the load.

If you have more than three members in a pool, as blocks are balanced between members, log out requests will still occur and those cannot be disabled.  

Regards, 

 Don 

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To view this discussion on the web visit https://groups.google.com/d/msg/open-iscsi/-/2yyQoiYcDKIJ.

Jose Joaquin Anton Herrerias

unread,
Oct 8, 2012, 3:42:11 AM10/8/12
to open-...@googlegroups.com
Hello,

We are using xenserver 5.6 with open-iscsi-2.0.871-0.20.3.xs647, whe use the xenserver tool for discover and attach the lun. The Kenerl version is 2.6.32.12-0.7.1.xs5.6.100.307.170586xen. And when I see this error in log I seen the lost conectin with sdb.

Oct 7 05:04:34 robin multipathd: sdb: tur checker reports path is down

I look that anothers server when use multipath create a sdb , sdc and dm-0 disk, in this case only create sdb and sdc disk. Is normal?

Thank you for your help.
Best Regards.

José J. Antón Herrerías
Responsable de soporte técnico
jan...@abserver.es



Access Basic Server S.L. Elche Parque Industrial. C/Galileo Galilei, 12. 03203 Elche (Alicante) Telf. +34 96 568 29 04 Fax. +34 96 568 35 30
Cláusula de confidencialidad: Este mensaje se dirige exclusivamente al destinatario consignado. Puede contener información confidencial, de nuestra propiedad o legalmente protegida. Si usted no es el destinatario, le informamos que cualquier acceso, divulgación, copia o distribución de la información, así como cualquier acción u omisión realizada con base a la misma, queda prohibida y puede ser ilegal. En caso de haber recibido este mensaje por error, le rogamos nos lo reenvíe y notifique inmediatamente, borrando toda copia de su sistema. Gracias.

(Antes de imprimir este mensaje, asegúrese de que es necesario. Proteger el medio ambiente está en nuestra mano. Piensa en global, actúa en local.

-----Mensaje original-----
De: open-...@googlegroups.com [mailto:open-...@googlegroups.com] En nombre de Michael Christie
Enviado el: lunes, 08 de octubre de 2012 1:27
Para: open-...@googlegroups.com
Asunto: rejected pdu when doing discovery
--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.

Jose Joaquin Anton Herrerias

unread,
Oct 8, 2012, 4:38:06 AM10/8/12
to open-...@googlegroups.com
Hello Michael,

Im working on this error today and I reproduced the error that all I see in message logs:

Oct 8 10:16:43 robin multipathd: Path event for 360050768028107669000000000000015, calling mpathcount
Oct 8 10:16:43 robin multipathd: sde: add path (operator)
Oct 8 10:16:43 robin multipathd: sde: spurious uevent, path already in pathvec
Oct 8 10:16:43 robin multipathd: sdd: add path (operator)
Oct 8 10:16:43 robin multipathd: sdd: spurious uevent, path already in pathvec
Oct 8 10:16:44 robin fe: 24553 (/opt/xensource/sm/LVMoISCSISR <methodCall><methodName>sr_attach</methodName><...) exitted with code 0
Oct 8 10:16:54 robin fe: 24945 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:17:24 robin fe: 24956 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:17:54 robin fe: 24974 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:18:24 robin fe: 24982 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:18:55 robin fe: 24994 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:19:25 robin fe: 25002 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:19:55 robin fe: 25012 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:20:25 robin fe: 25034 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:20:33 robin fe: 24549 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:20:33 robin fe: 24875 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:20:33 robin fe: 25043 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:20:55 robin fe: 25051 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:21:25 robin fe: 25061 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:21:55 robin fe: 25071 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:22:25 robin fe: 25081 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:22:55 robin fe: 25091 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:23:25 robin fe: 25108 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:23:55 robin fe: 25118 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:24:11 robin xenguest: Determined the following parameters from xenstore:
Oct 8 10:24:11 robin xenguest: vcpu/number:1 vcpu/affinity:0 vcpu/weight:0 vcpu/cap:0 nx: 1 viridian: 1 apic: 1 acpi: 1 pae: 1 acpi_s4: 0
acpi_s3: 0
Oct 8 10:24:11 robin fe: 25159 (/opt/xensource/libexec/xenguest -controloutfd 6 -controlinfd 7 -debuglog /tmp...) exitted with code 2
Oct 8 10:24:11 robin kernel: connection4:0: pdu (op 0x37 itt 0x1) rejected. Reason code 0x7
Oct 8 10:24:14 robin kernel: connection4:0: pdu (op 0x38 itt 0x1) rejected. Reason code 0x4
Oct 8 10:24:25 robin fe: 25206 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:24:55 robin fe: 25213 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:25:25 robin fe: 25229 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:25:33 robin fe: 25163 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:25:45 robin kernel: connection4:0: pdu (op 0x42 itt 0x1) rejected. Reason code 0x4
Oct 8 10:25:55 robin fe: 25245 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:26:25 robin fe: 25313 (/usr/sbin/stunnel -fd 6) exitted with code 0
Oct 8 10:26:31 robin kernel: INFO: task multipathd:7713 blocked for more than 120 seconds.
Oct 8 10:26:31 robin kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.


As you can see there are an error before the pdu rejected. Oct 8 10:20:33 robin fe: 25043 (/usr/sbin/stunnel -fd 6) exitted with code 0

Best Regards.

José J. Antón Herrerías
Responsable de soporte técnico
jan...@abserver.es



Access Basic Server S.L. Elche Parque Industrial. C/Galileo Galilei, 12. 03203 Elche (Alicante) Telf. +34 96 568 29 04 Fax. +34 96 568 35 30
Cláusula de confidencialidad: Este mensaje se dirige exclusivamente al destinatario consignado. Puede contener información confidencial, de nuestra propiedad o legalmente protegida. Si usted no es el destinatario, le informamos que cualquier acceso, divulgación, copia o distribución de la información, así como cualquier acción u omisión realizada con base a la misma, queda prohibida y puede ser ilegal. En caso de haber recibido este mensaje por error, le rogamos nos lo reenvíe y notifique inmediatamente, borrando toda copia de su sistema. Gracias.

(Antes de imprimir este mensaje, asegúrese de que es necesario. Proteger el medio ambiente está en nuestra mano. Piensa en global, actúa en local.


-----Mensaje original-----
De: open-...@googlegroups.com [mailto:open-...@googlegroups.com] En nombre de Michael Christie
Enviado el: lunes, 08 de octubre de 2012 1:27
Para: open-...@googlegroups.com
Asunto: rejected pdu when doing discovery


Michael Christie

unread,
Oct 8, 2012, 12:47:48 PM10/8/12
to open-...@googlegroups.com

On Oct 8, 2012, at 3:38 AM, Jose Joaquin Anton Herrerias <Jan...@abserver.es> wrote:
> As you can see there are an error before the pdu rejected. Oct 8 10:20:33 robin fe: 25043 (/usr/sbin/stunnel -fd 6) exitted with code 0

It's not helpful. We want to know why the target is returning this error. What is in the target logs? Could you get a tcpdump trace at this time, so we can see what the target is seeing?

Michael Christie

unread,
Oct 8, 2012, 12:48:27 PM10/8/12
to open-...@googlegroups.com
What target are you using? What is the vendor and model?


On Oct 8, 2012, at 2:42 AM, Jose Joaquin Anton Herrerias <Jan...@abserver.es> wrote:

> Hello,
>
> We are using xenserver 5.6 with open-iscsi-2.0.871-0.20.3.xs647, whe use the xenserver tool for discover and attach the lun. The Kenerl version is 2.6.32.12-0.7.1.xs5.6.100.307.170586xen. And when I see this error in log I seen the lost conectin with sdb.
>
> Oct 7 05:04:34 robin multipathd: sdb: tur checker reports path is down
>
> I look that anothers server when use multipath create a sdb , sdc and dm-0 disk, in this case only create sdb and sdc disk. Is normal?
>

I am not sure what you are asking. Are you just saying dm-0 is not getting setup on one of the servers? Then no, that is not normal if you have multipathd setup and running properly.

Jose Joaquin Anton Herrerias

unread,
Oct 11, 2012, 10:43:49 AM10/11/12
to open-...@googlegroups.com
Hello,

We are using two IBM X3850 x5 servers with a Storwize V7000, I attach a network.txt that is the dump of the command "tcpdump -i xenbr1 -w network.txt" and the message is the log message of the system. You can see the error of the pdu and dev dm-1. If you need more information or want to connect to the server you can speak with me by skype or similar.

This hardware is in the lab and isn’t in production.

Best Regards.

José J. Antón Herrerías
Responsable de soporte técnico
jan...@abserver.es



Access Basic Server S.L. Elche Parque Industrial. C/Galileo Galilei, 12. 03203 Elche (Alicante) Telf. +34 96 568 29 04 Fax. +34 96 568 35 30
Cláusula de confidencialidad: Este mensaje se dirige exclusivamente al destinatario consignado. Puede contener información confidencial, de nuestra propiedad o legalmente protegida. Si usted no es el destinatario, le informamos que cualquier acceso, divulgación, copia o distribución de la información, así como cualquier acción u omisión realizada con base a la misma, queda prohibida y puede ser ilegal. En caso de haber recibido este mensaje por error, le rogamos nos lo reenvíe y notifique inmediatamente, borrando toda copia de su sistema. Gracias.

(Antes de imprimir este mensaje, asegúrese de que es necesario. Proteger el medio ambiente está en nuestra mano. Piensa en global, actúa en local.

-----Mensaje original-----
De: open-...@googlegroups.com [mailto:open-...@googlegroups.com] En nombre de Michael Christie
Enviado el: lunes, 08 de octubre de 2012 18:48
Para: open-...@googlegroups.com
Asunto: Re: rejected pdu when doing discovery

Mike Christie

unread,
Oct 11, 2012, 11:43:25 AM10/11/12
to open-...@googlegroups.com, Jose Joaquin Anton Herrerias
On 10/11/2012 09:43 AM, Jose Joaquin Anton Herrerias wrote:
> Hello,
>
> We are using two IBM X3850 x5 servers with a Storwize V7000, I attach a network.txt that is the dump of the command "tcpdump -i xenbr1 -w network.txt" and the message is the log message of the system. You can see the error of the pdu and dev dm-1. If you need more information or want to connect to the server you can speak with me by skype or similar.
>

The attachment did not come through.

You did not send the target logs.


> This hardware is in the lab and isn’t in production.

What do you mean? Is it a prototype or just not in production because it
is older?
Reply all
Reply to author
Forward
0 new messages