Re: conn error (1011) under IO RH 6.2 and RH 5.6

Mike Christie

unread,

Jun 15, 2012, 1:10:48 PM6/15/12

to open-...@googlegroups.com, marcus_49

On 06/15/2012 09:07 AM, marcus_49 wrote:
> Hello together,
>
> I am new to the wold of iSCSI but I hope I can provide some interesting
> problems for for "pros" here ;-)
>
> I got conn error (1011) then I make dd-Tests to partitions of a LUN
> somtimes pinging the target Channel IP stoppd during the conn error
>

Just to make sure I understand that. You do the dd test below, and while
dd is running you the ping command below (you were not referring to the
iscsi nop as a iscsi ping referenced in the log snippet were you). Then
while running that, you get the conn error 1011 error iscsi, and at that
same time the ping fails (not sure what you meant exactly by the ping
stopped) too?

>
>
> Jun 15 14:09:41 ivan kernel: connection14:0: detected conn error (1011)
> Jun 15 14:09:42 ivan iscsid: Kernel reported iSCSI connection 14:0 error
> (1011) state (3)
> Jun 15 14:09:45 ivan iscsid: connection14:0 is operational after
> recovery (1 attempts)
> Jun 15 14:09:55 ivan kernel: connection14:0: ping timeout of 5 secs
> expired, recv timeout 5, last rx 7303292903, last ping 7303297903, now
> 7303302903
>

Is there something before this log snippet? Did you get a iscsi ping
timeout error before this too or is the first error message the conn
error one?

>
> Setup:
>
> - server ivan:
> kernel 2.6.18-238.el5, isci-utils iscsi-initiator-utils-6.2.0.872-6.el5
> device-mapper-multipath-0.4.7-48.el5_8.1, RedHat 5.6
>
> - server eric:
> same hard- and software
>
> - server alon:
> same hardware but newer RedHat 6.2
> kernel 2.6.32-220.el6, iscsi-initiator-utils-6.2.0.872-34.el6.x86_64
> device-mapper-multipath-0.4.9-46.el6.x86_64
> device-mapper-multipath-libs-0.4.9-46.el6.x86_64
>
> # Storage
> - Infortrend DS S16E-R1130
> (jumbo frames enabled, redundant Controller, 4 x 1GBit Channels/Controller)
> - Tests made only with Controller A
>
> I tested different things but the error still occure:
> it makes no difference if I configure:
>
>
> - etc/sysctrl.conf
> net.ipv4.tcp_window_scaling=0
> net.ipv4.conf.all.arp_ignore=1
> net.ipv4.conf.all.arp_announce=2
>
> - etc/iscsi/iscsi.conf
> node.conn[0].timeo.noop_out_interval = 30
> node.conn[0].timeo.noop_out_timeout = 30

When you used these values did the ping timeout error change to indicate
that it took 60 secs? So it would look like:

> Jun 15 14:09:55 ivan kernel: connection14:0: ping timeout of 30 secs
> expired, recv timeout 30, last rx 7303292903, last ping 7303297903,
now> 7303302903

?

When you set a param in iscsid.conf you have to rediscover your targets
or you can run

iscsiadm -m node -T yourtarget -o update -n
node.conn[0].timeo.noop_out_interval -v 30

iscsiadm -m node -T yourtarget -o update -n
node.conn[0].timeo.noop_out_timeout -v 30

Then relogin to the target for the params to take effect.

>
> - multipathd with the failover or "multibus" policy
>
> - map one path to one LUN or three paths to one LUN in the storage device
>
> - use jumbo frames or not
>
> The only differece between the new RH 6.2 and the old one is that the
> new one shows no IO-Errors.

What do you mean here? RHEL 6.2 does not show the error but rhel 5.6 did?

Is this really easy to replicate? If so can you do a tcpdump/wireshark
trace? If so right before you see the iscsi ping timeout message and
conn error errors, can you see if there is any traffic being sent or is
there a long period (a period of node.conn[0].timeo.noop_out_interval +
node.conn[0].timeo.noop_out_timeout seconds) where no iscsi or tcp/ip IO
is on the wires?

Also can you run a test kernel?

marcus_49

unread,

Jun 15, 2012, 4:32:57 PM6/15/12

to open-iscsi

Hello Mike

thanks a lot for reading all this words....

On 15 Jun., 19:10, Mike Christie <micha...@cs.wisc.edu> wrote:
> On 06/15/2012 09:07 AM, marcus_49 wrote:

> > I got conn error (1011) then I make dd-Tests to partitions of a LUN
> > somtimes pinging the target Channel IP stoppd during the conn error
>
> Just to make sure I understand that. You do the dd test below, and while
> dd is running you the ping command below (you were not referring to the
> iscsi nop as a iscsi ping referenced in the log snippet were you). Then
> while running that, you get the conn error 1011 error iscsi, and at that
> same time the ping fails (not sure what you meant exactly by the ping
> stopped) too?

exactly - I do a manual ping on the console during dd is running.
ping times becomes slower if I start the dd test. If I get connection
errors in /var/log/messages the ping command on the console stops
(hangs, there is no unreachable message, it happends nothing) or the
ping times become very slow

> > Jun 15 14:09:41 ivan kernel: connection14:0: detected conn error (1011)
> > Jun 15 14:09:42 ivan iscsid: Kernel reported iSCSI connection 14:0 error
> > (1011) state (3)
> > Jun 15 14:09:45 ivan iscsid: connection14:0 is operational after
> > recovery (1 attempts)
> > Jun 15 14:09:55 ivan kernel: connection14:0: ping timeout of 5 secs
> > expired, recv timeout 5, last rx 7303292903, last ping 7303297903, now
> > 7303302903
>
> Is there something before this log snippet? Did you get a iscsi ping
> timeout error before this too or is the first error message the conn
> error one?
>

no - this is the first error message. There is only output from
starting iscsid and multipathd befor in the logfile.

> > - etc/iscsi/iscsi.conf
> > node.conn[0].timeo.noop_out_interval = 30
> > node.conn[0].timeo.noop_out_timeout = 30
>
> When you used these values did the ping timeout error change to indicate
> that it took 60 secs? So it would look like:
>
> > Jun 15 14:09:55 ivan kernel: connection14:0: ping timeout of 30 secs
> > expired, recv timeout 30, last rx 7303292903, last ping 7303297903,
>
> now> 7303302903
> ?

ahh - you are right. I'm blind

> When you set a param in iscsid.conf you have to rediscover your targets
> or you can run
>
> iscsiadm -m node -T yourtarget -o update -n
> node.conn[0].timeo.noop_out_interval -v 30
>
> iscsiadm -m node -T yourtarget -o update -n
> node.conn[0].timeo.noop_out_timeout -v 30
>
> Then relogin to the target for the params to take effect.

ok - just changed and start dd
I get this messages without ping timout messages but the test is still
going on:

Jun 15 21:50:25 ivan kernel: connection17:0: detected conn error
(1011)
Jun 15 21:50:26 ivan iscsid: Kernel reported iSCSI connection 17:0
error (1011) state (3)
Jun 15 21:50:29 ivan iscsid: connection17:0 is operational after
recovery (1 attempts)
Jun 15 21:50:34 ivan kernel: connection16:0: detected conn error
(1011)
Jun 15 21:50:35 ivan iscsid: Kernel reported iSCSI connection 16:0
error (1011) state (3)
Jun 15 21:50:39 ivan iscsid: connection16:0 is operational after
recovery (1 attempts)
Jun 15 21:52:22 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 21:52:23 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 21:52:26 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 21:53:00 ivan kernel: connection18:0: detected conn error
(1011)

I don't understand why I get a "connection18:0 is operational after
recovery" message because I can make realy no IO to the Partitions.
Can't touch an empty file in a partition8 but on partition7 I can
touch a file
[root@ivan ~]# touch /tmp.alanna/data07/test/bla
[root@ivan ~]#
[root@ivan ~]# touch /tmp.alanna/data08/test/bla
(hangs)

ping on the console is workin and not slow - hmmm?
--- snip ---
64 bytes from 192.168.0.1: icmp_seq=409 ttl=64 time=1.88 ms
64 bytes from 192.168.0.1: icmp_seq=410 ttl=64 time=1.88 ms
64 bytes from 192.168.0.1: icmp_seq=411 ttl=64 time=1.86 ms
64 bytes from 192.168.0.1: icmp_seq=412 ttl=64 time=1.85 ms
64 bytes from 192.168.0.1: icmp_seq=413 ttl=64 time=1.86 ms

--- 192.168.0.1 ping statistics ---
413 packets transmitted, 413 received, 0% packet loss, time 412312ms
rtt min/avg/max/mdev = 1.765/1.931/2.895/0.126 ms
[root@ivan ~]#

The dd Test is still hangs on partition 8:
[root@ivan ~]# dd_alanna.sh
erstelle files in /tmp.alanna/dataXY/test
erstelle: /tmp.alanna/data01/test/file_2GB_1
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 2.97861 seconds, 344 MB/s
/tmp.alanna/data02/test/file_2GB_1
2000000+0 records in
2000000+0 records out
2048000000 bytes (2.0 GB) copied, 9.17361 seconds, 223 MB/s
/tmp.alanna/data03/test/file_2GB_1
2000000+0 records in
2000000+0 records out
2048000000 bytes (2.0 GB) copied, 7.48079 seconds, 274 MB/s
/tmp.alanna/data04/test/file_2GB_1
2000000+0 records in
2000000+0 records out
2048000000 bytes (2.0 GB) copied, 9.76605 seconds, 210 MB/s
/tmp.alanna/data05/test/file_2GB_1
2000000+0 records in
2000000+0 records out
2048000000 bytes (2.0 GB) copied, 9.55907 seconds, 214 MB/s
/tmp.alanna/data06/test/file_1GB_1
1000000+0 records in
1000000+0 records out
1024000000 bytes (1.0 GB) copied, 3.83553 seconds, 267 MB/s
/tmp.alanna/data07/test/file_2GB_1
2000000+0 records in
2000000+0 records out
2048000000 bytes (2.0 GB) copied, 56.0423 seconds, 36.5 MB/s
/tmp.alanna/data08/test/file_2GB_1
(hangs now)

- now I get new error messages in /var/log/messages (with "ping
timeout of 30 secs" - thank you !!):
Jun 15 21:53:35 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 21:53:39 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 21:54:11 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 21:54:12 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 21:54:15 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 21:55:15 ivan kernel: connection18:0: ping timeout of 30 secs
expired, recv timeout 30, last rx 7331162523, last ping 7331192523,
now 7331222523
Jun 15 21:55:15 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 21:55:15 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 21:55:53 ivan iscsid: connection18:0 is operational after
recovery (3 attempts)
Jun 15 21:56:26 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 21:56:28 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 21:56:31 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 21:57:05 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 21:57:06 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 21:57:09 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 21:58:51 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 21:58:52 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 21:58:56 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 21:59:55 ivan kernel: sd 42:0:0:4: timing out command, waited
60s
Jun 15 21:59:55 ivan multipathd: dm-11: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-12: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-13: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-14: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-15: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-17: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-16: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-18: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-19: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-21: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-20: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-23: add map (uevent)
Jun 15 21:59:55 ivan kernel: device-mapper: multipath: Failing path
8:208.
Jun 15 21:59:55 ivan multipathd: dm-22: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-24: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-25: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-26: add map (uevent)
Jun 15 21:59:55 ivan multipathd: sdn: readsector0 checker reports path
is down
Jun 15 21:59:55 ivan multipathd: checker failed path 8:208 in map
P4_reorg
Jun 15 21:59:55 ivan multipathd: P4_reorg: remaining active paths: 2
Jun 15 21:59:55 ivan multipathd: dm-4: add map (uevent)
Jun 15 21:59:55 ivan multipathd: dm-4: devmap already registered
Jun 15 22:00:38 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 22:00:39 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 22:00:41 ivan multipathd: sdn: readsector0 checker reports path
is up
Jun 15 22:00:41 ivan multipathd: 8:208: reinstated
Jun 15 22:00:41 ivan multipathd: P4_reorg: remaining active paths: 3

IO and paths recoverd for a little moment and fail again

Jun 15 22:00:42 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 22:01:16 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 22:01:17 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 22:01:21 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 22:02:12 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 22:02:13 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 22:02:17 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 22:02:24 ivan kernel: connection17:0: detected conn error
(1011)
Jun 15 22:02:24 ivan iscsid: Kernel reported iSCSI connection 17:0
error (1011) state (3)
Jun 15 22:02:28 ivan iscsid: connection17:0 is operational after
recovery (1 attempts)
Jun 15 22:03:00 ivan kernel: connection17:0: detected conn error
(1011)
Jun 15 22:03:01 ivan iscsid: Kernel reported iSCSI connection 17:0
error (1011) state (3)
Jun 15 22:03:04 ivan iscsid: connection17:0 is operational after
recovery (1 attempts)
Jun 15 22:03:38 ivan kernel: connection17:0: detected conn error
(1011)
Jun 15 22:03:39 ivan iscsid: Kernel reported iSCSI connection 17:0
error (1011) state (3)
Jun 15 22:03:42 ivan iscsid: connection17:0 is operational after
recovery (1 attempts)
Jun 15 22:04:42 ivan kernel: sd 41:0:0:4: timing out command, waited
60s
Jun 15 22:04:42 ivan multipathd: sdp: readsector0 checker reports path
is down
Jun 15 22:04:42 ivan multipathd: checker failed path 8:240 in map
P4_reorg
Jun 15 22:04:42 ivan kernel: device-mapper: multipath: Failing path
8:240.
Jun 15 22:04:42 ivan multipathd: P4_reorg: remaining active paths: 2
Jun 15 22:04:42 ivan multipathd: dm-4: add map (uevent)
Jun 15 22:04:42 ivan multipathd: dm-4: devmap already registered
Jun 15 22:05:24 ivan kernel: connection17:0: detected conn error
(1011)
Jun 15 22:05:25 ivan iscsid: Kernel reported iSCSI connection 17:0
error (1011) state (3)
Jun 15 22:05:28 ivan iscsid: connection17:0 is operational after
recovery (1 attempts)
Jun 15 22:05:28 ivan multipathd: sdp: readsector0 checker reports path
is up
Jun 15 22:05:28 ivan multipathd: 8:240: reinstated
Jun 15 22:05:28 ivan multipathd: P4_reorg: remaining active paths: 3
Jun 15 22:05:28 ivan multipathd: dm-4: add map (uevent)
Jun 15 22:05:28 ivan multipathd: dm-4: devmap already registered
Jun 15 22:05:35 ivan kernel: connection16:0: detected conn error
(1011)
Jun 15 22:05:35 ivan iscsid: Kernel reported iSCSI connection 16:0
error (1011) state (3)
Jun 15 22:05:39 ivan iscsid: connection16:0 is operational after
recovery (1 attempts)
Jun 15 22:05:46 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 22:05:47 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 22:05:50 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)
Jun 15 22:06:22 ivan kernel: connection18:0: detected conn error
(1011)
Jun 15 22:06:23 ivan iscsid: Kernel reported iSCSI connection 18:0
error (1011) state (3)
Jun 15 22:06:26 ivan iscsid: connection18:0 is operational after
recovery (1 attempts)

.... don't know how long the dd command needs to finish if the paths
are going down again and again...
But in this test the ping is ok only for the time without dd IO. It
looks like if dd IO starts ping fails:
[root@ivan ~]# ping 192.168.0.1
PING 192.168.0.1 (192.168.0.1) 56(84) bytes of data
(hang)

> > The only differece between the new RH 6.2 and the old one is that the
> > new one shows no IO-Errors.
>
> What do you mean here? RHEL 6.2 does not show the error but rhel 5.6 did?

yes: in RHEL 5.6 i saw
Jun 6 14:42:37 eric kernel: end_request: I/O error, dev sds, sector 0
Jun 6 14:42:37 eric kernel: device-mapper: multipath: Failing path
65:32.
Jun 6 14:42:38 eric kernel: EXT3-fs error (device dm-7) in
ext3_write_begin: IO failure
Jun 6 14:42:38 eric kernel: __journal_remove_journal_head: freeing
b_committed_data
Jun 6 14:42:38 eric last message repeated 45 times
Jun 6 14:42:38 eric kernel: printk: 1642223 messages suppressed.

but acutally I can't reproduce that! I also got no IO-Error on we old
system.

>
> Is this really easy to replicate? If so can you do a tcpdump/wireshark
> trace? If so right before you see the iscsi ping timeout message and
> conn error errors, can you see if there is any traffic being sent or is
> there a long period (a period of node.conn[0].timeo.noop_out_interval +
> node.conn[0].timeo.noop_out_timeout seconds) where no iscsi or tcp/ip IO
> is on the wires?

ok - I can tcpdump an interface. I will make this test after the test
described above is "finished".

>
> Also can you run a test kernel?

hmm - yes I have a system there I can do that (same hardware). Do you
mean a rpm or sources?
It's a long time ago that I build a kernel - and this is no option for
our productiv systems cause we lost support and homogeneity...
You thing kernel 2.6.18-238.el5 x64 is not that perfect?

Thanks a lot for your help
Marcus

marcus_49

unread,

Jun 15, 2012, 9:06:43 PM6/15/12

to open-...@googlegroups.com, marcus_49

hello Mike,

On Friday, June 15, 2012 7:10:48 PM UTC+2, Mike Christie wrote:

Is this really easy to replicate? If so can you do a tcpdump/wireshark
trace? If so right before you see the iscsi ping timeout message and
conn error errors, can you see if there is any traffic being sent or is
there a long period (a period of node.conn[0].timeo.noop_out_interval +
node.conn[0].timeo.noop_out_timeout seconds) where no iscsi or tcp/ip IO
is on the wires?

l can't reproduce the IO-failed problem.

Anyway I send you a tcpdum of eth0 for a time period where ping hangs (I'm not totaly shure that ping still hangs at the end of the dump....I have to check that again but I have to wait a long time until the ping hangs. So its not to easy to catch the right moment (see attached file eth0.out4ok)

What I don't understand is that just after the message "connection XY:0 ist down" there comes a message "connectionxy:0 is operational after recovery".

For me it sounds that all is fine but I still can't touch a file in the partition.

----- snip ----

Jun 16 01:35:22 ivan iscsid: Kernel reported iSCSI connection 18:0 error (1011) state (3)
Jun 16 01:35:26 ivan iscsid: connection18:0 is operational after recovery (1 attempts)
Jun 16 01:35:57 ivan kernel: connection18:0: detected conn error (1011)
Jun 16 01:35:58 ivan iscsid: Kernel reported iSCSI connection 18:0 error (1011) state (3)
Jun 16 01:36:02 ivan iscsid: connection18:0 is operational after recovery (1 attempts)
Jun 16 01:36:23 ivan kernel: connection17:0: detected conn error (1011)
Jun 16 01:36:24 ivan iscsid: Kernel reported iSCSI connection 17:0 error (1011) state (3)
Jun 16 01:36:27 ivan iscsid: connection17:0 is operational after recovery (1 attempts)
Jun 16 01:36:33 ivan kernel: connection16:0: detected conn error (1011)
Jun 16 01:36:34 ivan iscsid: Kernel reported iSCSI connection 16:0 error (1011) state (3)
Jun 16 01:36:38 ivan iscsid: connection16:0 is operational after recovery (1 attempts)
Jun 16 01:38:11 ivan kernel: connection16:0: ping timeout of 30 secs expired, recv timeout 30, last rx 734453815
0, last ping 7344535144, now 7344598150

----- snip ----

With Best Regards
Marcus

Jun 16 01:35:22 ivan iscsid: Kernel reported iSCSI connection 18:0 error (1011) state (3)
Jun 16 01:35:26 ivan iscsid: connection18:0 is operational after recovery (1 attempts)
Jun 16 01:35:57 ivan kernel: connection18:0: detected conn error (1011)
Jun 16 01:35:58 ivan iscsid: Kernel reported iSCSI connection 18:0 error (1011) state (3)
Jun 16 01:36:02 ivan iscsid: connection18:0 is operational after recovery (1 attempts)
Jun 16 01:36:23 ivan kernel: connection17:0: detected conn error (1011)
Jun 16 01:36:24 ivan iscsid: Kernel reported iSCSI connection 17:0 error (1011) state (3)
Jun 16 01:36:27 ivan iscsid: connection17:0 is operational after recovery (1 attempts)
Jun 16 01:36:33 ivan kernel: connection16:0: detected conn error (1011)
Jun 16 01:36:34 ivan iscsid: Kernel reported iSCSI connection 16:0 error (1011) state (3)
Jun 16 01:36:38 ivan iscsid: connection16:0 is operational after recovery (1 attempts)
Jun 16 01:38:11 ivan kernel: connection16:0: ping timeout of 30 secs expired, recv timeout 30, last rx 734453815
0, last ping 7344535144, now 7344598150
----- snip ----

eth0.out4ok

Mike Christie

unread,

Jun 16, 2012, 1:05:31 PM6/16/12

to open-...@googlegroups.com, marcus_49, marcus_49

On 06/15/2012 08:06 PM, marcus_49 wrote:
> What I don't understand is that just after the message "connection XY:0 ist
> down" there comes a message "connectionxy:0 is operational after recovery".
>

Still looking through all the data. For that above, right after you see
the recovered msg, do you see another connection down msg? So does it
look like the conn is bouncing between up and down?

Could you send all the /var/log/messages?

Matthew Dickinson

unread,

Jun 16, 2012, 2:55:44 PM6/16/12

to open-...@googlegroups.com

Hi,

i've been having "fun" with one of these units (S16E-R1130) also for a
number of years.
I never managed to fix the conn errors under RH5.x but think i might have
gotten them straightened out under RH6.2 finally. keeping fingers crossed.

By "think i got it fixed" i mean i haven't had any dmesg/syslog messages
about conn errors - once this has been stable, i'll do some more
performance testing.

see
https://groups.google.com/forum/?fromgroups#!topic/open-iscsi/xbPNzjrCLYg

and

https://bugzilla.redhat.com/show_bug.cgi?id=548556

Performance of this unit isn't exactly much to write home about - i'm
using it for D2D backup - i tried using it as a iSCSI VMFS storage volume
but gave up quickly.

setup wis, i have a bonded pair on the server, and 8 single IP addreses on
the storage unit - i'm using multipath - and no jumbo frames.

if the below doesn't get you going, let me know and i'll try and attach
full files.

devices {
device {
vendor "IFT"
path_grouping_policy failover
getuid_callout "/lib/udev/scsi_id --whitelisted --device=/dev/%n"
path_checker tur
path_selector "service-time 0"
hardware_handler "0"
failback 15
rr_min_io_rq 100
rr_weight uniform
no_path_retry 12
prio alua
}

}

node.session.timeo.replacement_timeout = 120
node.conn[0].timeo.login_timeout = 15
node.conn[0].timeo.logout_timeout = 15
node.conn[0].timeo.noop_out_interval =30
node.conn[0].timeo.noop_out_timeout = 90
node.session.err_timeo.abort_timeout = 15
node.session.err_timeo.lu_reset_timeout = 30
node.session.err_timeo.tgt_reset_timeout = 30
node.session.initial_login_retry_max = 8
node.session.cmds_max = 128
node.session.queue_depth = 32

Matthew

On Fri, 15 Jun 2012, marcus_49 wrote:

> Hello together,
>
> I am new to the wold of iSCSI but I hope I can provide some interesting problems for for "pros"� here ;-)
>

> I got conn error (1011) then I make dd-Tests to partitions of a LUN
> somtimes pinging the target Channel IP stoppd during the conn error
>
>
>

> Jun 15 14:09:41 ivan kernel:� connection14:0: detected conn error (1011)

> Jun 15 14:09:42 ivan iscsid: Kernel reported iSCSI connection 14:0 error (1011) state (3)
> Jun 15 14:09:45 ivan iscsid: connection14:0 is operational after recovery (1 attempts)
> Jun 15 14:09:55 ivan kernel:� connection14:0: ping timeout of 5 secs expired, recv timeout 5, last rx 7303292903, last ping 7303297903, now 7303302903
>
>

> Setup:
>
> - server ivan:
> � kernel 2.6.18-238.el5, isci-utils iscsi-initiator-utils-6.2.0.872-6.el5
> � device-mapper-multipath-0.4.7-48.el5_8.1, RedHat 5.6
>
> - server eric:
> same hard- and software
>
> - server alon:
> same hardware but newer RedHat 6.2
> � kernel 2.6.32-220.el6, iscsi-initiator-utils-6.2.0.872-34.el6.x86_64
> � device-mapper-multipath-0.4.9-46.el6.x86_64 device-mapper-multipath-libs-0.4.9-46.el6.x86_64
>
> # Storage
> - Infortrend DS S16E-R1130
> (jumbo frames enabled, redundant Controller, 4 x 1GBit Channels/Controller)
> - Tests made only with Controller A
>
> I tested different things but the error still occure:
> it makes no difference if I configure:
>
>
> - etc/sysctrl.conf
> �� net.ipv4.tcp_window_scaling=0
> �� net.ipv4.conf.all.arp_ignore=1
> �� net.ipv4.conf.all.arp_announce=2
>

> - etc/iscsi/iscsi.conf
> �� node.conn[0].timeo.noop_out_interval = 30
> �� node.conn[0].timeo.noop_out_timeout� = 30
>

> - multipathd with the failover or "multibus" policy
>
> - map one path to one� LUN or three paths to one LUN in the storage device
>
> - use jumbo frames or not
>

> The only differece between the new RH 6.2 and the old one is that the new one shows no IO-Errors.

> I can only observer connection errors but the IO ist going down too.
>
>
> Here are further informations. Attached you find the iscsid.conf, multipath.conf, /var/log/messages
>
> Any suggestions would be greatly appreciated
> Best Marcus
>
>
> # Network
>
> I configrued only the three activ paths via and not the passive path to Ctrl_B to reduce comlexity
> And I only show you the two of three ifaces
>
>
> [root@ivan ~]# cat /var/lib/iscsi/ifaces/*
> # BEGIN RECORD 2.0-872
> iface.iscsi_ifacename = iface.eth0
> iface.hwaddress = 00:19:99:97:5A:BD
> iface.transport_name = tcp
> # END RECORD
> # BEGIN RECORD 2.0-872
> iface.iscsi_ifacename = iface.eth1
> iface.hwaddress = 00:19:99:97:5A:BC
> iface.transport_name = tcp
> # END RECORD
> # BEGIN RECORD 2.0-872
> iface.iscsi_ifacename = iface.eth2
> iface.hwaddress = 00:19:99:97:5C:63
> iface.transport_name = tcp
> # END RECORD
> # BEGIN RECORD 2.0-872
> iface.iscsi_ifacename = iface.eth3
> iface.hwaddress = 00:19:99:97:5C:62
> iface.transport_name = tcp
> # END RECORD
> [root@ivan ~]#
>
>
>
>
> # nics / routing
>
> (Intel Corporation 82576NS Gigabit Network Connection)
>
> [root@ivan ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth*
> # Intel Corporation 82576NS Gigabit Network Connection
> DEVICE=eth0
> BOOTPROTO=static
> HWADDR=00:19:99:97:5A:BD
> ONBOOT=yes
> IPADDR=192.168.0.10
> NETMASK=255.255.255.0
> NETWORK=192.168.255.0
> BROADCAST=192.168.0.255
> TYPE=Ethernet
> #USERCTL=no
> MTU="9000"
>
>
> # Intel Corporation 82576NS Gigabit Network Connection
> DEVICE=eth1
> BOOTPROTO=static
> HWADDR=00:19:99:97:5A:BC
> ONBOOT=yes
> IPADDR=192.168.1.10
> NETMASK=255.255.255.0
> NETWORK=192.168.1.0
> BROADCAST=192.168.1.255
> TYPE=Ethernet
> #USERCTL=no
> MTU="9000"
>
> # Intel Corporation 82576NS Gigabit Network Connection
> DEVICE=eth2
> BOOTPROTO=static
> HWADDR=00:19:99:97:5C:63
> ONBOOT=yes
> BROADCAST=192.168.2.255
> IPADDR=192.168.2.10
> NETMASK=255.255.255.0
> NETWORK=192.168.2.0
> TYPE=Ethernet
> #USERCTL=no
> MTU="9000"
>
> # Intel Corporation 82576NS Gigabit Network Connection
> DEVICE=eth3
> BOOTPROTO=static
> HWADDR=00:19:99:97:5C:62
> ONBOOT=yes
> IPADDR=192.168.7.10
> NETMASK=255.255.255.0
> NETWORK=192.168.7.0
> BROADCAST=192.168.7.255
> TYPE=Ethernet
> #USERCTL=no
> MTU="9000"
>
>
> # Intel Corporation 82575EB Gigabit Network Connection
> DEVICE=eth4
> BOOTPROTO=static
> BROADCAST=10.31.25.255
> HWADDR=00:19:99:98:03:10
> IPADDR=10.31.25.37
> NETMASK=255.255.255.0
> NETWORK=10.31.25.0
> ONBOOT=yes
> # Intel Corporation 82575EB Gigabit Network Connection
> DEVICE=eth5
> BOOTPROTO=static
> BROADCAST=10.31.25.255
> HWADDR=00:19:99:98:03:11
> IPADDR=10.31.25.38
> NETMASK=255.255.255.0
> NETWORK=10.31.25.0
> ONBOOT=yes
> [root@ivan ~]#
>
>
> [root@ivan ~]# route -n
> Kernel IP routing table
> Destination�� Gateway�� Genmask�� Flags Metric Ref�� Use Iface
> 192.168.7.0�� 0.0.0.0�� 255.255.255.0�� U�� 0�� 0�� 0 eth3
> 10.31.25.0�� 0.0.0.0�� 255.255.255.0�� U�� 0�� 0�� 0 eth4
> 10.31.25.0�� 0.0.0.0�� 255.255.255.0�� U�� 0�� 0�� 0 eth5
> 192.168.2.0�� 0.0.0.0�� 255.255.255.0�� U�� 0�� 0�� 0 eth2
> 192.168.1.0�� 0.0.0.0�� 255.255.255.0�� U�� 0�� 0�� 0 eth1
> 192.168.0.0�� 0.0.0.0�� 255.255.255.0�� U�� 0�� 0�� 0 eth0
> 169.254.0.0�� 0.0.0.0�� 255.255.0.0�� U�� 0�� 0�� 0 eth3
> 0.0.0.0�� 10.31.25.1�� 0.0.0.0�� UG�� 0�� 0�� 0 eth4
>
>
>
>
>
> # sessions/targest
>
> [root@alon ~]# iscsiadm -m session
> tcp: [1] 192.168.2.1:3260,1 iqn.2002-10.com.infortrend:raid.sn7917498.201
> tcp: [2] 192.168.0.1:3260,1 iqn.2002-10.com.infortrend:raid.sn7917498.001
> tcp: [3] 192.168.1.1:3260,1 iqn.2002-10.com.infortrend:raid.sn7917498.101
> [root@alon ~]#
>
>
> [root@ivan ~]# multipath -l
> P4_reorg (3600d02310009ed051285112c40336821) dm-6 IFT,DS S16E-R1130
> [size=150G][features=0][hwhandler=0][rw]
> \_ round-robin 0 [prio=0][active]
> �\_ 38:0:0:4 sdv 65:80� [active][undef]
> �\_ 37:0:0:4 sdx 65:112 [active][undef]
> �\_ 39:0:0:4 sdy 65:128 [active][undef]
> P0_alanna (3600d02310009ed0547f3eb2d7c6b8847) dm-2 IFT,DS S16E-R1130
> [size=551G][features=0][hwhandler=0][rw]
> \_ round-robin 0 [prio=0][enabled]
> �\_ 37:0:0:0 sde 8:64�� [active][undef]
> �\_ 38:0:0:0 sdf 8:80�� [active][undef]
> �\_ 39:0:0:0 sdg 8:96�� [active][undef]
> P3_schule (3600d02310009ed056736241c65250093) dm-5 IFT,DS S16E-R1130
> [size=552G][features=0][hwhandler=0][rw]
> \_ round-robin 0 [prio=0][active]
> �\_ 38:0:0:3 sdl 8:176� [active][undef]
> �\_ 37:0:0:3 sdr 65:16� [active][undef]
> �\_ 39:0:0:3 sdw 65:96� [active][undef]
> [root@ivan ~]#
>
>
>
> root@ivan ~]# iscsiadm� -m session -P3
> iSCSI Transport Class version 2.0-871
> version 2.0-872
> Target: iqn.2002-10.com.infortrend:raid.sn7917498.001
> �� Current Portal: 192.168.0.1:3260,1
> �� Persistent Portal: 192.168.0.1:3260,1
> �� **********
> �� Interface:
> �� **********
> �� Iface Name: iface.eth0
> �� Iface Transport: tcp
> �� Iface Initiatorname: eric-iqn.voebb.verwalt-berlin.de
> �� Iface IPaddress: 192.168.0.10
> �� Iface HWaddress: 00:19:99:97:5A:BD
> �� Iface Netdev: <empty>
> �� SID: 13
> �� iSCSI Connection State: LOGGED IN
> �� iSCSI Session State: LOGGED_IN
> �� Internal iscsid Session State: NO CHANGE
> �� ************************
> �� Negotiated iSCSI params:
> �� ************************
> �� HeaderDigest: None
> �� DataDigest: None
> �� MaxRecvDataSegmentLength: 262144
> �� MaxXmitDataSegmentLength: 65536
> �� FirstBurstLength: 65536
> �� MaxBurstLength: 262144
> �� ImmediateData: Yes
> �� InitialR2T: No
> �� MaxOutstandingR2T: 1
> �� ************************
> �� Attached SCSI devices:
> �� ************************
> �� Host Number: 37 State: running
> �� scsi37 Channel 00 Id 0 Lun: 0
> �� Attached scsi disk sde�� State: running
> �� scsi37 Channel 00 Id 0 Lun: 1
> �� Attached scsi disk sdh�� State: running
> �� scsi37 Channel 00 Id 0 Lun: 2
> �� Attached scsi disk sdk�� State: running
> �� scsi37 Channel 00 Id 0 Lun: 3
> �� Attached scsi disk sdr�� State: running
> �� scsi37 Channel 00 Id 0 Lun: 4
> �� Attached scsi disk sdx�� State: running
> Target: iqn.2002-10.com.infortrend:raid.sn7917498.101
> �� Current Portal: 192.168.1.1:3260,1
> �� Persistent Portal: 192.168.1.1:3260,1
> �� **********
> �� Interface:
> �� **********
> �� Iface Name: iface.eth1
> �� Iface Transport: tcp
> �� Iface Initiatorname: eric-iqn.voebb.verwalt-berlin.de
> �� Iface IPaddress: 192.168.1.10
> �� Iface HWaddress: 00:19:99:97:5A:BC
> �� Iface Netdev: <empty>
> �� SID: 14
> �� iSCSI Connection State: LOGGED IN
> �� iSCSI Session State: LOGGED_IN
> �� Internal iscsid Session State: NO CHANGE
> �� ************************
> �� Negotiated iSCSI params:
> �� ************************
> �� HeaderDigest: None
> �� DataDigest: None
> �� MaxRecvDataSegmentLength: 262144
> �� MaxXmitDataSegmentLength: 65536
> �� FirstBurstLength: 65536
> �� MaxBurstLength: 262144
> �� ImmediateData: Yes
> �� InitialR2T: No
> �� MaxOutstandingR2T: 1
> �� ************************
> �� Attached SCSI devices:
> �� ************************
> �� Host Number: 38 State: running
> �� scsi38 Channel 00 Id 0 Lun: 0
> �� Attached scsi disk sdf�� State: running
> �� scsi38 Channel 00 Id 0 Lun: 1
> �� Attached scsi disk sdi�� State: running
> �� scsi38 Channel 00 Id 0 Lun: 2
> �� Attached scsi disk sdj�� State: running
> �� scsi38 Channel 00 Id 0 Lun: 3
> �� Attached scsi disk sdl�� State: running
> �� scsi38 Channel 00 Id 0 Lun: 4
> �� Attached scsi disk sdv�� State: running
> Target: iqn.2002-10.com.infortrend:raid.sn7917498.201
> �� Current Portal: 192.168.2.1:3260,1
> �� Persistent Portal: 192.168.2.1:3260,1
> �� **********
> �� Interface:
> �� **********
> �� Iface Name: iface.eth2
> �� Iface Transport: tcp
> �� Iface Initiatorname: eric-iqn.voebb.verwalt-berlin.de
> �� Iface IPaddress: 192.168.2.10
> �� Iface HWaddress: 00:19:99:97:5C:63
> �� Iface Netdev: <empty>
> �� SID: 15
> �� iSCSI Connection State: LOGGED IN
> �� iSCSI Session State: LOGGED_IN
> �� Internal iscsid Session State: NO CHANGE
> �� ************************
> �� Negotiated iSCSI params:
> �� ************************
> �� HeaderDigest: None
> �� DataDigest: None
> �� MaxRecvDataSegmentLength: 262144
> �� MaxXmitDataSegmentLength: 65536
> �� FirstBurstLength: 65536
> �� MaxBurstLength: 262144
> �� ImmediateData: Yes
> �� InitialR2T: No
> �� MaxOutstandingR2T: 1
> �� ************************
> �� Attached SCSI devices:
> �� ************************
> �� Host Number: 39 State: running
> �� scsi39 Channel 00 Id 0 Lun: 0
> �� Attached scsi disk sdg�� State: running
> �� scsi39 Channel 00 Id 0 Lun: 1
> �� Attached scsi disk sdo�� State: running
> �� scsi39 Channel 00 Id 0 Lun: 2
> �� Attached scsi disk sdt�� State: running
> �� scsi39 Channel 00 Id 0 Lun: 3
> �� Attached scsi disk sdw�� State: running
> �� scsi39 Channel 00 Id 0 Lun: 4
> �� Attached scsi disk sdy�� State: running
> [root@ivan ~]#
>
>
>
>
>
>
> [root@alon iscsi]# df -h
> [root@ivan ~]# df -h
> Filesystem�� Size� Used Avail Use% Mounted on
> /dev/mapper/VolGroup01-LogVol00
> �� 117G�� 19G�� 93G� 17% /
> /dev/sda1�� 99M�� 14M�� 81M� 15% /boot
> tmpfs�� 13G�� 0�� 13G�� 0% /dev/shm
> nfs01.voebb.verwalt-berlin.de:/data_nfs01
> �� 5.5T� 2.1T� 3.4T� 39% /data_nfs01_local
> nfs01.voebb.verwalt-berlin.de:/ivan_alanna
> �� 5.5T� 2.1T� 3.4T� 39% /alanna
> nfs01.voebb.verwalt-berlin.de:/ivan_star
> �� 5.5T� 2.1T� 3.4T� 39% /star
> nfs01.voebb.verwalt-berlin.de:/ivan_schule
> �� 5.5T� 2.1T� 3.4T� 39% /schule
> nfs01.voebb.verwalt-berlin.de:/ivan_star2
> �� 5.5T� 2.1T� 3.4T� 39% /star2
> nfs01.voebb.verwalt-berlin.de:/ivan_alanna2
> �� 5.5T� 2.1T� 3.4T� 39% /alanna2
> /dev/mapper/P3_schulep1
> �� 152G� 107G�� 38G� 75% /tmp.schule/data01
> /dev/mapper/P3_schulep2
> �� 80G�� 43G�� 33G� 57% /tmp.schule/data02
> /dev/mapper/P3_schulep3
> �� 92G�� 42G�� 46G� 49% /tmp.schule/data03
> /dev/mapper/P3_schulep5
> �� 51G� 1.8G�� 47G�� 4% /tmp.schule/data04
> /dev/mapper/P3_schulep6
> �� 21G� 173M�� 19G�� 1% /tmp.schule/data05
> /dev/mapper/P3_schulep7
> �� 2.8G�� 69M� 2.6G�� 3% /tmp.schule/data06
> /dev/mapper/P3_schulep8
> �� 5.6G� 140M� 5.1G�� 3% /tmp.schule/data07
> /dev/mapper/P3_schulep9
> �� 6.5G� 143M� 6.0G�� 3% /tmp.schule/data08
> /dev/mapper/P3_schulep10
> �� 42G� 177M�� 39G�� 1% /tmp.schule/data09
> /dev/mapper/P3_schulep11
> �� 31G� 6.6G�� 23G� 23% /tmp.schule/data10
> [root@ivan ~]#
>
>
>
>
>
> [[root@ivan ~]# ll /dev/mapper/P3_schule*
> brw-rw---- 1 root disk 253,� 5 Jun 15 10:51 /dev/mapper/P3_schule
> brw-rw---- 1 root disk 253,� 7 Jun 15 10:51 /dev/mapper/P3_schulep1
> brw-rw---- 1 root disk 253, 15 Jun 15 10:51 /dev/mapper/P3_schulep10
> brw-rw---- 1 root disk 253, 16 Jun 15 10:51 /dev/mapper/P3_schulep11
> brw-rw---- 1 root disk 253, 17 Jun 15 10:51 /dev/mapper/P3_schulep12
> brw-rw---- 1 root disk 253,� 8 Jun 15 10:51 /dev/mapper/P3_schulep2
> brw-rw---- 1 root disk 253,� 9 Jun 15 10:51 /dev/mapper/P3_schulep3
> brw-rw---- 1 root disk 253, 10 Jun 15 10:51 /dev/mapper/P3_schulep5
> brw-rw---- 1 root disk 253, 11 Jun 15 10:51 /dev/mapper/P3_schulep6
> brw-rw---- 1 root disk 253, 12 Jun 15 10:51 /dev/mapper/P3_schulep7
> brw-rw---- 1 root disk 253, 13 Jun 15 10:51 /dev/mapper/P3_schulep8
> brw-rw---- 1 root disk 253, 14 Jun 15 10:51 /dev/mapper/P3_schulep9
>
>
>
>
>
>
>
>
>
>
> -> The Test
>
> dd if=/dev/zero bs=1024 count=10000000� of=/tmp.schule/data01/test/file_10GB_1
>
>
>
>
> [root@alon ~]# ping 192.168.0.1� (first portal / Storage Device)
> �� --- snip ---
>
> 64 bytes from 192.168.0.1: icmp_seq=38 ttl=64 time=1.90 ms
> 64 bytes from 192.168.0.1: icmp_seq=39 ttl=64 time=1.89 ms
> 64 bytes from 192.168.0.1: icmp_seq=40 ttl=64 time=1.88 ms
> 64 bytes from 192.168.0.1: icmp_seq=41 ttl=64 time=1.89 ms
> 64 bytes from 192.168.0.1: icmp_seq=42 ttl=64 time=1.93 ms
> 64 bytes from 192.168.0.1: icmp_seq=43 ttl=64 time=1.92 ms
> 64 bytes from 192.168.0.1: icmp_seq=44 ttl=64 time=1.93 ms
> 64 bytes from 192.168.0.1: icmp_seq=45 ttl=64 time=2.18 ms
> 64 bytes from 192.168.0.1: icmp_seq=46 ttl=64 time=2.30 ms
> 64 bytes from 192.168.0.1: icmp_seq=47 ttl=64 time=2.22 ms
> 64 bytes from 192.168.0.1: icmp_seq=48 ttl=64 time=2.00 ms
> 64 bytes from 192.168.0.1: icmp_seq=49 ttl=64 time=1.87 ms
> 64 bytes from 192.168.0.1: icmp_seq=50 ttl=64 time=2.21 ms
> 64 bytes from 192.168.0.1: icmp_seq=51 ttl=64 time=2.35 ms
> 64 bytes from 192.168.0.1: icmp_seq=52 ttl=64 time=2.58 ms
> 64 bytes from 192.168.0.1: icmp_seq=53 ttl=64 time=15230 ms
> 64 bytes from 192.168.0.1: icmp_seq=54 ttl=64 time=14232 ms
> 64 bytes from 192.168.0.1: icmp_seq=55 ttl=64 time=13234 ms
> 64 bytes from 192.168.0.1: icmp_seq=56 ttl=64 time=12235 ms
> 64 bytes from 192.168.0.1: icmp_seq=57 ttl=64 time=11237 ms
> 64 bytes from 192.168.0.1: icmp_seq=69 ttl=64 time=1.83 ms
> 64 bytes from 192.168.0.1: icmp_seq=70 ttl=64 time=2.87 ms
> 64 bytes from 192.168.0.1: icmp_seq=71 ttl=64 time=32324 ms
> 64 bytes from 192.168.0.1: icmp_seq=72 ttl=64 time=31326 ms
> 64 bytes from 192.168.0.1: icmp_seq=73 ttl=64 time=30328 ms
> 64 bytes from 192.168.0.1: icmp_seq=74 ttl=64 time=29330 ms
> 64 bytes from 192.168.0.1: icmp_seq=75 ttl=64 time=28331 ms
> 64 bytes from 192.168.0.1: icmp_seq=76 ttl=64 time=27333 ms
> 64 bytes from 192.168.0.1: icmp_seq=77 ttl=64 time=26334 ms
> 64 bytes from 192.168.0.1: icmp_seq=78 ttl=64 time=25336 ms
> 64 bytes from 192.168.0.1: icmp_seq=79 ttl=64 time=24338 ms
> 64 bytes from 192.168.0.1: icmp_seq=80 ttl=64 time=23339 ms
> 64 bytes from 192.168.0.1: icmp_seq=81 ttl=64 time=22341 ms
> 64 bytes from 192.168.0.1: icmp_seq=82 ttl=64 time=21343 ms
> 64 bytes from 192.168.0.1: icmp_seq=104 ttl=64 time=2.65 ms
> 64 bytes from 192.168.0.1: icmp_seq=105 ttl=64 time=2.23 ms
> 64 bytes from 192.168.0.1: icmp_seq=106 ttl=64 time=1.91 ms
> 64 bytes from 192.168.0.1: icmp_seq=107 ttl=64 time=1.91 ms
> 64 bytes from 192.168.0.1: icmp_seq=108 ttl=64 time=2.19 ms
> 64 bytes from 192.168.0.1: icmp_seq=109 ttl=64 time=2.33 ms
> �� --- snip ---
>
>
>
>
>
>
>
>
>
> --
> You received this message because you are subscribed to the Google Groups "open-iscsi" group.
> To view this discussion on the web visit https://groups.google.com/d/msg/open-iscsi/-/G1s1W6IxCI0J.
> To post to this group, send email to open-...@googlegroups.com.
> To unsubscribe from this group, send email to open-iscsi+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.
>
>

marcus_49

unread,

Jun 18, 2012, 8:57:57 AM6/18/12

to open-...@googlegroups.com, marcus_49

here are the attached files:

messages-20120616.txt
eth0out.2.txt
ping.txt

messages-20120616.txt

eth0out.2.txt

ping.txt

marcus_49

unread,

Jun 18, 2012, 9:26:17 AM6/18/12

to open-...@googlegroups.com, marcus_49

Hello Mike,

hmm my privieus post don't reach google-groups cause I use a wron login.
Yes, like you can see in the messages file connection error and recovery messages are bouncing:

Jun 16 12:16:14 eric iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
Jun 16 12:16:17 eric iscsid: connection2:0 is operational after recovery (1 attempts)
Jun 16 12:16:50 eric kernel: connection2:0: detected conn error (1011)
Jun 16 12:16:51 eric iscsid: Kernel reported iSCSI connection 2:0 error (1011) state (3)
Jun 16 12:16:54 eric iscsid: connection2:0 is operational after recovery (1 attempts)
Jun 16 12:17:10 eric kernel: connection1:0: detected conn error (1011)
Jun 16 12:17:10 eric iscsid: Kernel reported iSCSI connection 1:0 error (1011) state (3)
Jun 16 12:17:14 eric iscsid: connection1:0 is operational after recovery (1 attempts)
Jun 16 12:17:39 eric kernel: connection2:0: detected conn error (1011)
....

Thanks again for analysing this problem.

Marcus

marcus_49

unread,

Jun 18, 2012, 10:28:17 AM6/18/12

to open-...@googlegroups.com

Hello Matthew,

great !! - I just finished a quick dd test and see no more connection errors for the first time after weeks of "fun"
(I have jumbo frames still enabled and no bonding is active, System is RHEL 6.2).
.
Thanks a lot for sharing your knowlegde with me!!
I will now test again with a load balancing poliy and hope this works on RHEL 5.6 servers too.

Best
Marcus

berlin123

unread,

Jun 18, 2012, 8:48:42 AM6/18/12

to open-iscsi

Hello Mike,
sorry for answering that late. Yesterday I take a break and today I
waste my time in some meetings...

On 16 Jun., 19:05, Mike Christie <micha...@cs.wisc.edu> wrote:
> Still looking through all the data. For that above, right after you see
> the recovered msg, do you see another connection down msg? So does it
> look like the conn is bouncing between up and down?
>
> Could you send all the /var/log/messages?

yes - "recovery" and "connection error" messages are bouncing.

Jun 16 12:16:54 eric iscsid: connection2:0 is operational after
recovery (1 attempts)

Jun 16 12:17:10 eric kernel: connection1:0: detected conn error
(1011)
Jun 16 12:17:10 eric iscsid: Kernel reported iSCSI connection 1:0
error (1011) state (3)

Jun 16 12:17:14 eric iscsid: connection1:0 is operational after
recovery (1 attempts)

Jun 16 12:17:39 eric kernel: connection2:0: detected conn error
(1011)

I will attach a messages (messages-20120616.txt)
Also I attach a tcpdump (eth0out.2.txt) and ping (ping.txt) file.

Best Marcus

marcus_49

unread,

Jun 21, 2012, 7:00:36 AM6/21/12

to open-...@googlegroups.com

Hello Matthew,

Am Samstag, 16. Juni 2012 20:55:44 UTC+2 schrieb mdaitc:

setup wis, i have a bonded pair on the server, and 8 single IP addreses on
the storage unit - i'm using multipath - and no jumbo frames.

one question:
do you use Grouping or "trunking" on the storage device? Or do you don't "want" to bundle the Channels on the storage to increase bandwidth?

Best Marcus

Reply all

Reply to author

Forward