iscsid: Kernel reported iSCSI connection 1:0 error (1020) state (3)

9,123 views
Skip to first unread message

立凡 王

unread,
May 10, 2010, 4:57:10 AM5/10/10
to open-iscsi
Hello,

Befrore I uninstall the Fedora 6, I build a Fedora 12 machine for
testing.
But I still get some problems.

iscsid: Kernel reported iSCSI connection 1:0 error (1020) state (3)

I googled it and found this page 'https://bugzilla.redhat.com/
show_bug.cgi?id=583581'.
Although I didn't run CHAP, I upgrade the kernel to
2.6.32.12-114.fc12.i686.PAE.
(iscsi-initiator-utils-6.2.0.870-10.fc12.1.i686)

It sometimes show
May 10 13:56:40 milk kernel: connection1:0: detected conn error (1020)
May 10 13:56:41 milk iscsid: Kernel reported iSCSI connection 1:0
error (1020) state (3)
May 10 13:56:43 milk iscsid: connection1:0 is operational after
recovery (1 attempts)
May 10 13:57:12 milk kernel: connection1:0: detected conn error (1020)
May 10 13:57:13 milk iscsid: Kernel reported iSCSI connection 1:0
error (1020) state (3)
May 10 13:57:16 milk iscsid: connection1:0 is operational after
recovery (1 attempts)
May 10 14:01:26 milk kernel: connection1:0: detected conn error (1020)
May 10 14:01:27 milk iscsid: Kernel reported iSCSI connection 1:0
error (1020) state (3)
May 10 14:01:28 milk kernel: INFO: task kjournald:2594 blocked for
more than 120 seconds.
May 10 14:01:28 milk kernel: "echo 0 > /proc/sys/kernel/
hung_task_timeout_secs" disables this message.
May 10 14:01:28 milk kernel: kjournald D 00000775 0 2594
2 0x00000080
May 10 14:01:28 milk kernel: f20ade5c 00000046 9a9505fd 00000775
002764c1 00000000 f5f99c2c c06e9c8c
May 10 14:01:28 milk kernel: c0a81354 c0a85e60 f5f99c2c c0a85e60
c0a85e60 f20ade4c c0462bdf 647ebaad
May 10 14:01:28 milk kernel: 00000001 00000000 00000775 f5f99980
c2888e60 00000000 f20adeac f20ade54
May 10 14:01:28 milk kernel: Call Trace:
May 10 14:01:28 milk kernel: [<c06e9c8c>] ? dm_table_unplug_all+0x3e/
0x97
May 10 14:01:28 milk kernel: [<c0462bdf>] ? ktime_get_ts+0x98/0xa2
May 10 14:01:28 milk kernel: [<c07a503f>] io_schedule+0x37/0x4e
May 10 14:01:28 milk kernel: [<c0501f94>] sync_buffer+0x38/0x3c
May 10 14:01:28 milk kernel: [<c07a54fe>] __wait_on_bit+0x39/0x60
May 10 14:01:28 milk kernel: [<c0501f5c>] ? sync_buffer+0x0/0x3c
May 10 14:01:28 milk kernel: [<c0501f5c>] ? sync_buffer+0x0/0x3c
May 10 14:01:28 milk kernel: [<c07a55c5>] out_of_line_wait_on_bit
+0xa0/0xa8
May 10 14:01:28 milk kernel: [<c045b6cd>] ? wake_bit_function+0x0/0x3c
May 10 14:01:28 milk kernel: [<c0501ed1>] __wait_on_buffer+0x1e/0x21
May 10 14:01:28 milk kernel: [<c056aa86>] wait_on_buffer+0x34/0x37
May 10 14:01:28 milk kernel: [<c056af46>] journal_commit_transaction
+0x41d/0xc57
May 10 14:01:28 milk kernel: [<c045b699>] ? autoremove_wake_function
+0x0/0x34
May 10 14:01:28 milk kernel: [<c044e53b>] ? lock_timer_base+0x26/0x45
May 10 14:01:28 milk kernel: [<c044e742>] ? try_to_del_timer_sync+0x5e/
0x66
May 10 14:01:28 milk kernel: [<c056de45>] kjournald+0xb8/0x1cc
May 10 14:01:28 milk kernel: [<c045b699>] ? autoremove_wake_function
+0x0/0x34
May 10 14:01:28 milk kernel: [<c056dd8d>] ? kjournald+0x0/0x1cc
May 10 14:01:28 milk kernel: [<c045b461>] kthread+0x64/0x69
May 10 14:01:28 milk kernel: [<c045b3fd>] ? kthread+0x0/0x69
May 10 14:01:28 milk kernel: [<c0409cc7>] kernel_thread_helper
+0x7/0x10
May 10 14:01:30 milk iscsid: connection1:0 is operational after
recovery (1 attempts)
May 10 14:01:59 milk kernel: connection1:0: detected conn error (1020)
May 10 14:02:00 milk iscsid: Kernel reported iSCSI connection 1:0
error (1020) state (3)
May 10 14:02:03 milk iscsid: connection1:0 is operational after
recovery (1 attempts)
May 10 14:07:28 milk kernel: INFO: task flush-253:2:2647 blocked for
more than 120 seconds.
May 10 14:07:28 milk kernel: "echo 0 > /proc/sys/kernel/
hung_task_timeout_secs" disables this message.
May 10 14:07:28 milk kernel: flush-253:2 D 000007d6 0 2647
2 0x00000080
May 10 14:07:28 milk kernel: f0087cdc 00000046 3012e708 000007d6
00261298 00000000 f68cc26c c06e9c8c
May 10 14:07:28 milk kernel: c0a81354 c0a85e60 f68cc26c c0a85e60
c0a85e60 f0087ccc 007ee65d c2884354
May 10 14:07:28 milk kernel: 00000001 00000000 000007d6 f68cbfc0
c2888e60 00000000 00000000 00000000

and

May 10 16:16:31 milk kernel: lost page write due to I/O error on dm-2
May 10 16:16:31 milk kernel: Buffer I/O error on device dm-2, logical
block 104091560
May 10 16:16:31 milk kernel: lost page write due to I/O error on dm-2
May 10 16:16:31 milk kernel: Buffer I/O error on device dm-2, logical
block 104091561
May 10 16:16:31 milk kernel: lost page write due to I/O error on dm-2
May 10 16:16:31 milk kernel: Buffer I/O error on device dm-2, logical
block 104091562
May 10 16:16:31 milk kernel: lost page write due to I/O error on dm-2
May 10 16:16:31 milk kernel: Buffer I/O error on device dm-2, logical
block 104091563
May 10 16:16:31 milk kernel: lost page write due to I/O error on dm-2
May 10 16:16:31 milk kernel: sd 8:0:0:0: [sdb] Unhandled error code
May 10 16:16:31 milk kernel: sd 8:0:0:0: [sdb] Result:
hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
May 10 16:16:31 milk kernel: sd 8:0:0:0: [sdb] CDB: Write(10): 2a 00
31 a2 7a cf 00 04 00 00
May 10 16:16:31 milk kernel: end_request: I/O error, dev sdb, sector
832731855
May 10 16:16:34 milk iscsid: connection1:0 is operational after
recovery (1 attempts)
May 10 16:17:03 milk kernel: connection1:0: detected conn error (1020)
May 10 16:17:04 milk iscsid: Kernel reported iSCSI connection 1:0
error (1020) state (3)
May 10 16:17:08 milk iscsid: connection1:0 is operational after
recovery (1 attempts)
May 10 16:17:27 milk kernel: JBD: Detected IO errors while flushing
file data on dm-2
May 10 16:17:57 milk kernel: connection1:0: detected conn error (1020)
May 10 16:17:58 milk iscsid: Kernel reported iSCSI connection 1:0
error (1020) state (3)
May 10 16:18:02 milk iscsid: connection1:0 is operational after
recovery (1 attempts)
May 10 16:19:05 milk kernel: connection1:0: detected conn error (1020)

Did I miss something?

Testing machine and iscsi machine are not in the same room.
It has 9 MB/sec for hdparm testing.

--
You received this message because you are subscribed to the Google Groups "open-iscsi" group.
To post to this group, send email to open-...@googlegroups.com.
To unsubscribe from this group, send email to open-iscsi+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/open-iscsi?hl=en.

Mike Christie

unread,
May 10, 2010, 1:55:23 PM5/10/10
to open-...@googlegroups.com, 立凡 王
On 05/10/2010 03:57 AM, 立凡 王 wrote:
> Hello,
>
> Befrore I uninstall the Fedora 6, I build a Fedora 12 machine for
> testing.
> But I still get some problems.
>
> iscsid: Kernel reported iSCSI connection 1:0 error (1020) state (3)
>
> I googled it and found this page 'https://bugzilla.redhat.com/
> show_bug.cgi?id=583581'.
> Although I didn't run CHAP, I upgrade the kernel to
> 2.6.32.12-114.fc12.i686.PAE.
> (iscsi-initiator-utils-6.2.0.870-10.fc12.1.i686)
>
> It sometimes show
> May 10 13:56:40 milk kernel: connection1:0: detected conn error (1020)
> May 10 13:56:41 milk iscsid: Kernel reported iSCSI connection 1:0
> error (1020) state (3)
> May 10 13:56:43 milk iscsid: connection1:0 is operational after
> recovery (1 attempts)

Do you start seeing this right when you log in or is it after doing some
IO or doing some other operation like logging in other sessions? 1020
means that the target disconnected from us (disconnected tcp/ip
connection) for some unknown reason.

Could you send all of the /var/log/messages.
And what target are you using? Is it scsi-target-utils or IET or stgt or
some target from a hardware vendor?

立凡 王

unread,
May 10, 2010, 11:22:21 PM5/10/10
to open-iscsi
Because I need to backup data from Fedora 6, I use rsync to backup
machine A and test iscsi between machine B and iscsi storage.
So the data from A will through B into storage.
The rsync is working with error messages

┏━━━━━┓      ┏━━━━━━┓        ┏━━━━━━━━━┓
┃ fedora6 ┃ rsync ┃  fedora12┃  iscsi   ┃ hp storage    ┃
┃  (A) ┃━━━━━→┃  (B)   ┃━━━━━━━→┃  ms2312i    ┃
┗━━━━━┛      ┗━━━━━━┛        ┗━━━━━━━━━┛

The rsync is working with error messages, but the messages do not
happen all the times.


This is the messages file
http://people.chu.edu.tw/~b8902110/temp/messages

Mike Christie

unread,
May 11, 2010, 12:10:20 PM5/11/10
to open-...@googlegroups.com, 立凡 王
On 05/10/2010 10:22 PM, 立凡 王 wrote:
> Because I need to backup data from Fedora 6, I use rsync to backup
> machine A and test iscsi between machine B and iscsi storage.
> So the data from A will through B into storage.
> The rsync is working with error messages
>
> ┏━━━━━┓      ┏━━━━━━┓        ┏━━━━━━━━━┓
> ┃ fedora6 ┃ rsync ┃  fedora12┃  iscsi   ┃ hp storage    ┃
> ┃  (A) ┃━━━━━→┃  (B)   ┃━━━━━━━→┃  ms2312i    ┃
> ┗━━━━━┛      ┗━━━━━━┛        ┗━━━━━━━━━┛
>
> The rsync is working with error messages, but the messages do not
> happen all the times.
>
>
> This is the messages file
> http://people.chu.edu.tw/~b8902110/temp/messages
>

In your log it looks like you are using a RHEL or Centos kernel and you
are using iser, and I do not see any 1020 errors like below.

For the problem in the log on your people.chu.edu.tw page it looks like
you are sending more IO than the target can handle. I would increase the
scsi command timeout and/or decrease the iscsi/scsi queue settings. For
the latter reduce node.session.cmds_max.


For the 1020 errors do you have multiple machines connected to the iscsi
target at the same time? And if you do do they all have different
initiator names in /etc/iscsi/initiatorname.iscsi?

david elsen

unread,
May 11, 2010, 6:40:34 PM5/11/10
to open-...@googlegroups.com
Can someone tell me what I am missing on my system if I am getting following compilation error? I am using RHEL 5.3 default kernel.


[root@pegasus open-iscsi-2.0-871]# make install
make -C kernel install_kernel
make[1]: Entering directory `/root/open-iscsi-2.0-871/kernel'
echo "Patching source code for linux-2.6.14-23 ..."
Patching source code for linux-2.6.14-23 ...
if [ -e cur_patched ]; then \
                make -C . clean; \
        fi
patch -p1 < 2.6.14-23_compat.patch
patching file iscsi_tcp.c
Hunk #1 succeeded at 460 (offset 4 lines).
Hunk #3 succeeded at 838 (offset 4 lines).
Hunk #5 succeeded at 859 (offset 4 lines).
patching file iscsi_tcp.h
patching file libiscsi.c
Hunk #2 succeeded at 86 (offset 23 lines).
Hunk #4 succeeded at 328 (offset 23 lines).
Hunk #6 succeeded at 401 (offset 23 lines).
Hunk #8 succeeded at 729 (offset 23 lines).
Hunk #10 succeeded at 796 (offset 23 lines).
Hunk #11 succeeded at 1504 (offset 6 lines).
patching file libiscsi.h
patching file libiscsi_tcp.c
Hunk #5 succeeded at 697 with fuzz 2 (offset 1 line).
patching file libiscsi_tcp.h
patching file open_iscsi_compat.h
patching file scsi_transport_iscsi.c
patching file scsi_transport_iscsi.h
cp 2.6.14-23_compat.patch has_14to23_patch
ln -s has_14to23_patch cur_patched
make -C /lib/modules/2.6.18-128.el5PAE/build M=`pwd` KBUILD_OUTPUT=  V=0 modules
make[2]: Entering directory `/usr/src/kernels/2.6.18-128.el5-PAE-i686'
  CC [M]  /root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.o
/root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.c: In function ‘__iscsi_unblock_session’:
/root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.c:535: warning: unused variable ‘ihost’
/root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.c: In function ‘__iscsi_block_session’:
/root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.c:579: warning: passing argument 2 of ‘queue_delayed_work’ from incompatible pointer type
/root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.c: At top level:
/root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.c:1266: warning: ‘iscsi_if_ep_connect’ defined but not used
  CC [M]  /root/open-iscsi-2.0-871/kernel/libiscsi.o
/root/open-iscsi-2.0-871/kernel/libiscsi.c: In function ‘iscsi_target_alloc’:
/root/open-iscsi-2.0-871/kernel/libiscsi.c:1547: warning: unused variable ‘session’
  CC [M]  /root/open-iscsi-2.0-871/kernel/libiscsi_tcp.o
/root/open-iscsi-2.0-871/kernel/libiscsi_tcp.c: In function ‘iscsi_tcp_dgst_header’:
/root/open-iscsi-2.0-871/kernel/libiscsi_tcp.c:305: warning: passing argument 2 of ‘sg_init_one’ discards qualifiers from pointer target type
  CC [M]  /root/open-iscsi-2.0-871/kernel/iscsi_tcp.o
  Building modules, stage 2.
  MODPOST
  CC      /root/open-iscsi-2.0-871/kernel/iscsi_tcp.mod.o
  LD [M]  /root/open-iscsi-2.0-871/kernel/iscsi_tcp.ko
  CC      /root/open-iscsi-2.0-871/kernel/libiscsi.mod.o
  LD [M]  /root/open-iscsi-2.0-871/kernel/libiscsi.ko
  CC      /root/open-iscsi-2.0-871/kernel/libiscsi_tcp.mod.o
  LD [M]  /root/open-iscsi-2.0-871/kernel/libiscsi_tcp.ko
  CC      /root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.mod.o
  LD [M]  /root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.ko
make[2]: Leaving directory `/usr/src/kernels/2.6.18-128.el5-PAE-i686'
make -C /lib/modules/2.6.18-128.el5PAE/build M=`pwd` KBUILD_OUTPUT=  V=0 modules_install INSTALL_MOD_DIR=kernel/drivers/scsi INSTALL_MOD_PATH=
make[2]: Entering directory `/usr/src/kernels/2.6.18-128.el5-PAE-i686'
  INSTALL /root/open-iscsi-2.0-871/kernel/iscsi_tcp.ko
  INSTALL /root/open-iscsi-2.0-871/kernel/libiscsi.ko
  INSTALL /root/open-iscsi-2.0-871/kernel/libiscsi_tcp.ko
  INSTALL /root/open-iscsi-2.0-871/kernel/scsi_transport_iscsi.ko
  DEPMOD  2.6.18-128.el5PAE
make[2]: Leaving directory `/usr/src/kernels/2.6.18-128.el5-PAE-i686'
make[1]: Leaving directory `/root/open-iscsi-2.0-871/kernel'
cc     usr/iscsid.c   -o usr/iscsid
In file included from usr/iscsid.c:34:
usr/mgmt_ipc.h:23:22: error: iscsi_if.h: No such file or directory
In file included from usr/mgmt_ipc.h:24,
                 from usr/iscsid.c:34:
usr/config.h:27:18: error: list.h: No such file or directory
In file included from usr/mgmt_ipc.h:24,
                 from usr/iscsid.c:34:
usr/config.h:189: error: field ‘list’ has incomplete type
usr/config.h:205: error: ‘TARGET_NAME_MAXLEN’ undeclared here (not in a function)
usr/config.h:210: error: field ‘list’ has incomplete type
In file included from usr/iscsid.c:34:
usr/mgmt_ipc.h:119: error: field ‘ev’ has incomplete type
usr/mgmt_ipc.h:120: error: field ‘stats’ has incomplete type
usr/mgmt_ipc.h:121: error: invalid application of ‘sizeof’ to incomplete type ‘struct iscsi_stats_custom’
usr/mgmt_ipc.h:122: error: ‘ISCSI_STATS_CUSTOM_MAX’ undeclared here (not in a function)
usr/mgmt_ipc.h:125: error: ‘VALUE_MAXLEN’ undeclared here (not in a function)
In file included from usr/iscsid.c:36:
usr/iscsi_ipc.h:79: warning: ‘enum iscsi_param’ declared inside parameter list
usr/iscsi_ipc.h:79: warning: its scope is only this definition or declaration, which is probably not what you want
usr/iscsi_ipc.h:83: warning: ‘enum iscsi_host_param’ declared inside parameter list
usr/iscsi_ipc.h:88: warning: ‘enum iscsi_param’ declared inside parameter list
usr/iscsi_ipc.h:107: warning: ‘enum iscsi_uevent_e’ declared inside parameter list
In file included from usr/iscsid.c:37:
usr/log.h:65: error: field ‘semarg’ has incomplete type
In file included from usr/iscsid.c:39:
usr/initiator.h:28:25: error: iscsi_proto.h: No such file or directory
In file included from usr/initiator.h:33,
                 from usr/iscsid.c:39:
usr/actor.h:37: error: field ‘list’ has incomplete type
In file included from usr/iscsid.c:39:
usr/initiator.h:102: error: field ‘pdu’ has incomplete type
usr/initiator.h:124: error: ‘ISCSI_DEF_MAX_RECV_SEG_LEN’ undeclared here (not in a function)
usr/initiator.h:199: error: field ‘list’ has incomplete type
In file included from usr/iscsid.c:40:
usr/transport.h:37: error: field ‘list’ has incomplete type
usr/transport.h:41: error: field ‘sessions’ has incomplete type
In file included from usr/iscsi_sysfs.h:22,
                 from usr/iscsid.c:43:
usr/sysfs.h:33: error: field ‘node’ has incomplete type
In file included from usr/iscsid.c:45:
usr/session_info.h:13: error: field ‘list’ has incomplete type
usr/iscsid.c:46:21: error: sysdeps.h: No such file or directory
usr/iscsid.c: In function ‘sync_session’:
usr/iscsid.c:198: error: ‘CAP_FW_DB’ undeclared (first use in this function)
usr/iscsid.c:198: error: (Each undeclared identifier is reported only once
usr/iscsid.c:198: error: for each function it appears in.)
make: *** [usr/iscsid] Error 1
[root@pegasus open-iscsi-2.0-871]#




The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started.

Mike Christie

unread,
May 11, 2010, 10:32:24 PM5/11/10
to open-...@googlegroups.com, david elsen
On 05/11/2010 05:40 PM, david elsen wrote:
>
> Can someone tell me what I am missing on my system if I am getting following compilation error? I am using RHEL 5.3 default kernel.
>
>
> [root@pegasus open-iscsi-2.0-871]# make install
>

It should work if you do a "make" then "make install".

david elsen

unread,
May 12, 2010, 8:55:49 PM5/12/10
to open-...@googlegroups.com
thanks. yes it worked. In README, it is written to do "make install" directly otherwise I might have missed it.

 
> Date: Tue, 11 May 2010 21:32:24 -0500
> From: mich...@cs.wisc.edu
> To: open-...@googlegroups.com
> CC: elsen...@hotmail.com
> Subject: Re: compilation error with open-iscsi-2.0-871.tar.gz on RHEL 5.3

The New Busy think 9 to 5 is a cute idea. Combine multiple calendars with Hotmail. Get busy.

立凡 王

unread,
May 13, 2010, 4:46:26 AM5/13/10
to open-iscsi
Hello, Mike
Thank you for your suggestion.
I reduced node.session.cmds_max = 64 and increase
node.session.timeo.replacement_timeout = 240, but the 1020 still
exist.
So I tested the IO speed below.

# hdparm -tT /dev/sda2

/dev/sda2:
Timing cached reads: 7208 MB in 2.00 seconds = 3608.39 MB/sec
Timing buffered disk reads: 224 MB in 3.00 seconds = 74.59 MB/sec
(sda is local disk)

# hdparm -tT /dev/sdb

/dev/sdb:
Timing cached reads: 6894 MB in 2.00 seconds = 3450.44 MB/sec
Timing buffered disk reads: 30 MB in 3.14 seconds = 9.55 MB/sec
(sdb is iscsi disk, the speed(9.55 MB/sec) is limited by network
speed(100 Mb/sec).)

# dd if=/dev/zero of=/test.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 11.8259 s, 90.8 MB/s
(on local disk)

# dd if=/dev/zero of=/backup2/test.img bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1152.39 s, 932 kB/s
(on iscsi disk)

# dd if=/dev/zero of=/backup2/test.img bs=1M count=512
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 480.972 s, 1.1 MB/s

I tried twice, the transfer rate on iscsi disk is so ... amazing.

I tried the initiator upload speed.
# ncftpput -zER -u cte -p 32502 -P 2932 xxx.xxx.11.245 / /test.img
/test.img: 1.00 GB
10.15 MB/s
The iscsi initiator upload speed can be 10 MB/sec.

Is this normal?
Or I did something wrong?

Mike Christie

unread,
May 13, 2010, 7:44:23 PM5/13/10
to open-...@googlegroups.com, 立凡 王
On 05/13/2010 03:46 AM, 立凡 王 wrote:
> Hello, Mike
> Thank you for your suggestion.
> I reduced node.session.cmds_max = 64 and increase
> node.session.timeo.replacement_timeout = 240, but the 1020 still
> exist.

replacacement_timeout is not going to help prevent the initial iscsi
errors you are seeing.

In the logs and emails it seems you have multiple issues.

In the previous mail I was saying that reducing the cmds_max may help
the problem where in the logs you sent we see 1011 errors and the
scsi/iscsi eh running.


For the 1020 errors, I was asking if you had multiple initiators using
the same name in /etc/iscsi/initiatorname.iscsi connected to the same
target. For this it looked like the target would disconect us, we would
reconnect, then the target would disconnect us again and we would
reconnect and we would keep doing that. This happens sometimes when you
copy over your iscsi settings from one box, and also copy the
/etc/iscsi/initiatorname.iscsi file (the value in that needs to be
uniqueue for each initaitor).

立凡 王

unread,
May 14, 2010, 5:24:56 AM5/14/10
to open-iscsi
Sorry, I did not answer the multiple initiators question.
The initiatorname of previos machine is different from present machine
and I set the iscsi service of previos machine down.
I also loged in the hp storage killed records of other hosts.
But the 1020 errors still alive.
I found it seems to be relative with the size of files.
When the backup files like video files or some big files, the errors
will show again.

machine A show below
27118/980330/980330-13.wmv
27118/980330/980330-14.wmv
27118/980330/980330-2.wmv
27118/980330/980330-3.wmv
27118/980330/980330-4.wmv
27118/980330/980330-5.wmv
27118/980330/980330-6.wmv
27118/980330/980330-7.wmv
27118/980330/980330-8.wmv
27118/980330/980330-9.wmv
27118/980413/980413-1.WMV
27118/980413/980413-10.WMV
27118/980413/980413-11.WMV
27118/980413/980413-12.WMV
27118/980413/980413-2.WMV
27118/980413/980413-3.WMV
27118/980413/980413-4.WMV
27118/980413/980413-5.WMV
27118/980413/980413-6.WMV
27118/980413/980413-7.WMV
27118/980413/980413-8.WMV
27118/980413/980413-9.WMV
27118/980420/980420-1.wmv
27118/980420/980420-10.wmv
27118/980420/980420-11.wmv
27118/980420/980420-12.wmv
27118/980420/980420-13.wmv
27118/980420/980420-2.wmv
27118/980420/980420-3.wmv
27118/980420/980420-4.wmv
27118/980420/980420-5.wmv
27118/980420/980420-6.wmv
27118/980420/980420-7.wmv
27118/980420/980420-8.wmv
27118/980420/980420-9.wmv

In the period of time, the log file of machine B
http://people.chu.edu.tw/~b8902110/temp/messages23


And now, I did not see 1011 errors any more.
:)

Mike Christie

unread,
May 14, 2010, 12:44:20 PM5/14/10
to open-...@googlegroups.com, 立凡 王
Could you rerun that test but before you start copying file turn on
iscsi eh debugging by doing:

echo 1 > /sys/module/libiscsi/parameters/debug_libiscsi_eh


I think you might be hitting a problem where you are sending too much
IO, a IO takes too long and the scsi eh runs and is waiting on the
target but the target is waiting on the initiator and then ends up
dropping the session.

**If** with the debug_libicsi_eh on you see messages like

aborting sc 0x1234432

in /var/log/messages while running your test, then to avoid falling in
there you need to:


1. increase scsi cmd timeout.

echo N > /sys/block/sdX/device/timeout

(you can also create a udev rule to increase this. See section 8.1.2.1
in /usr/share/doc/iscsi-initiator-utils-6.2.0.870/README)


2. Reduce node.session.cmds_max and node.session.queue_depth. Do not
just set it to half and report back. Try setting it really low like set
node.session.queue_depth to 1 or 2 or 4 and see if that changes
anything. Let me know what worked and did not too btw.




Could you also tell me what version of the kernel you are using?

david elsen

unread,
May 14, 2010, 8:47:58 PM5/14/10
to open-...@googlegroups.com

Once I have done login to the target using "iscsiadm -" from the initiator, after next reboot Initiator connects to the target without running the command.
 
How can I stop that? I have multiple targets on my set-up and want to connect them one-by-one.
 
Can someone help me with with this please?
 
thanks,
david
 


The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. Get busy.

Mike Christie

unread,
May 14, 2010, 11:56:07 PM5/14/10
to open-...@googlegroups.com, david elsen
On 05/14/2010 07:47 PM, david elsen wrote:
>
>
> Once I have done login to the target using "iscsiadm -" from the initiator, after next reboot Initiator connects to the target without running the command.
>
>
>
> How can I stop that? I have multiple targets on my set-up and want to connect them one-by-one.
>
>


Either

1. delete the record

iscsiadm -m node -T target -p ip -o delete


or

2. mark node.startup as manual for the record

iscsiadm -m node -T target -p ip -o update -n node.startup -v manual
iscsiadm -m node -T target -p ip -o update -n node.conn[0].startup -v
manual

david elsen

unread,
May 17, 2010, 6:55:40 PM5/17/10
to open-...@googlegroups.com
Mike,
 
Thanks. Your suggestion worked.
 
But I have to do following:
1. Discovery login to get the name.
2. Update the record for manual
3. Normal login.
 
to do the login
 
For deleting, I have to do:
1. Logout
2. Delete.
 
 
I am not sure if I can have any simpler approach.
 
Thanks,
David

 
> Date: Fri, 14 May 2010 22:56:07 -0500
> Subject: Re: removing auto login from Initiator

The New Busy is not the too busy. Combine all your e-mail accounts with Hotmail. Get busy.

--

Mike Christie

unread,
May 17, 2010, 8:41:47 PM5/17/10
to open-...@googlegroups.com, david elsen
On 05/17/2010 05:55 PM, david elsen wrote:
>
> Mike,
>
>
>
> Thanks. Your suggestion worked.
>
>
>
> But I have to do following:
>
> 1. Discovery login to get the name.
>
> 2. Update the record for manual
>
> 3. Normal login.
>
>
>
> to do the login
>
>
>
> For deleting, I have to do:
>
> 1. Logout
>
> 2. Delete.
>
>

I am not sure what you are saying.


Are you deleting and updating the record to manual? If so
you do not have to set the record for manual and delete it. You can just
do one or the other. For deletion then you do not need to logout, if you
just delete the records you do not want, then do the login.


You could also just set them all as manual by default by setting the
node.startup = manual in /etc/iscsi/iscsid.conf. Then just manually
login to the ones you want or set the ones you want to node.startup =
automatic by running iscsiadm.

立凡 王

unread,
May 17, 2010, 8:39:30 PM5/17/10
to open-iscsi
The version of the kernel is '2.6.32.12-114.fc12.i686.PAE'.
The log file after debugging turned on is -> http://people.chu.edu.tw/~b8902110/temp/messages0517

There are lots of below messages in the log.
kernel: session1: iscsi_eh_cmd_timed_out scsi cmd f5eb8480 timedout
kernel: session1: iscsi_eh_cmd_timed_out return timer reset

And I didn't see messages like
> aborting sc 0x1234432

Mike Christie

unread,
May 19, 2010, 7:52:05 AM5/19/10
to open-...@googlegroups.com, 立凡 王
On 05/17/2010 07:39 PM, 立凡 王 wrote:
> The version of the kernel is '2.6.32.12-114.fc12.i686.PAE'.
> The log file after debugging turned on is -> http://people.chu.edu.tw/~b8902110/temp/messages0517
>
> There are lots of below messages in the log.
> kernel: session1: iscsi_eh_cmd_timed_out scsi cmd f5eb8480 timedout
> kernel: session1: iscsi_eh_cmd_timed_out return timer reset
>
> And I didn't see messages like
>> aborting sc 0x1234432
>

Ah, I guess that is good news and bad news. It is good news we are
avoiding starting the scsi eh when we do not have to. However it is bad
news because the iscsi_eh_cmd_timed_out messages means it is taking the
target a long time to process the IO.

What is the value of your scsi cmd timeout?

cat /sys/block/sdX/device/timeout


And this is connected to that HP target, right?

david elsen

unread,
May 20, 2010, 3:37:29 PM5/20/10
to open-...@googlegroups.com
For Logout:
Yes, I was doing logout and then deleting the entry.
 
For Login:
I do discovery to get the iSCSI name and record.
Then update it to manual and doing the login with iSCSI name.
 
I will update /etc/iscsi/iscsid.conf to it easier.
thanks,
david

 
> Date: Mon, 17 May 2010 19:41:47 -0500

> From: mich...@cs.wisc.edu
> To: open-...@googlegroups.com
> CC: elsen...@hotmail.com
> Subject: Re: removing auto login from Initiator
>

Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn more.

立凡 王

unread,
May 20, 2010, 5:20:37 AM5/20/10
to open-iscsi
Hello, Mike
Thank you for help me a lot.

The timeout is 30.
# cat /sys/block/sdb/device/timeout
30

I'm afraid that problem is the storage self.
So I called the HP customer service for asking the normal performace
of the storage.
The HP customer service show me that.
http://people.chu.edu.tw/~b8902110/temp/Outlook.jpg

The differences are our storage use SATA disk, 100M ethernat and just
single controll module.
(100M is used for testing)
The performace should be 10 to 20 MB/sec but the real performace on
Fedora is terrible as I posted(1 MB/sec).
So I tried it on Win 7 to make sure the storage is fine.
I copied an iso file which is 3 GB to iscsi disk.
The average speed is 11 MB/sec and that is just in the range 10 to 20.
So the storage is fine.

Mike Christie

unread,
May 21, 2010, 2:31:09 PM5/21/10
to open-...@googlegroups.com, 立凡 王
On 05/13/2010 03:46 AM, 立凡 王 wrote:
> # hdparm -tT /dev/sdb
>
> /dev/sdb:
> Timing cached reads: 6894 MB in 2.00 seconds = 3450.44 MB/sec
> Timing buffered disk reads: 30 MB in 3.14 seconds = 9.55 MB/sec
> (sdb is iscsi disk, the speed(9.55 MB/sec) is limited by network
> speed(100 Mb/sec).)
>
> # dd if=/dev/zero of=/test.img bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 11.8259 s, 90.8 MB/s
> (on local disk)
>
> # dd if=/dev/zero of=/backup2/test.img bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 1152.39 s, 932 kB/s
> (on iscsi disk)
>
> # dd if=/dev/zero of=/backup2/test.img bs=1M count=512
> 512+0 records in
> 512+0 records out
> 536870912 bytes (537 MB) copied, 480.972 s, 1.1 MB/s
>

Hey so with Win7 you get around 10 MB/s and when you do reads to the
iscsi disk /dev/sdb directly you got 9.5 MB/s, but when writing to the
disk you get around 1 MB/s, right?

For your dd test could you do

// THIS WILL ERASE ANYTHING ON THAT DISK!!!! SO DO NOT
// DO IT ON A DISK THAT IS USED IN PRODUCTION!!!!!
dd if=/dev/zero of=/dev/sdb bs=1M count=1024

dd if=/dev/sdb of=/dev/null bs=1M count=1024

and

dd if=/dev/sdb of=/dev/null bs=1M count=1024 iflag=direct


And what FS are you using?

立凡 王

unread,
May 23, 2010, 11:47:56 PM5/23/10
to open-iscsi
> Hey so with Win7 you get around 10 MB/s and when you do reads to the
> iscsi disk /dev/sdb directly you got 9.5 MB/s, but when writing to the
> disk you get around 1 MB/s, right?

YES

These are the dd testing outcome.
# dd if=/dev/zero of=/dev/sdc bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 1149.12 s, 934 kB/s

# dd if=/dev/sdc of=/dev/null bs=1M count=1024
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 112.21 s, 9.6 MB/s

# dd if=/dev/sdc of=/dev/null bs=1M count=1024 iflag=direct
1024+0 records in
1024+0 records out
1073741824 bytes (1.1 GB) copied, 107.276 s, 10.0 MB/s

> And what FS are you using?
I've try ext3 and ext4 then there was nothing getting better.

Mike Christie

unread,
May 24, 2010, 5:18:15 PM5/24/10
to open-...@googlegroups.com, 立凡 王
On 05/23/2010 10:47 PM, 立凡 王 wrote:
>> Hey so with Win7 you get around 10 MB/s and when you do reads to the
>> iscsi disk /dev/sdb directly you got 9.5 MB/s, but when writing to the
>> disk you get around 1 MB/s, right?
>
> YES
>
> These are the dd testing outcome.
> # dd if=/dev/zero of=/dev/sdc bs=1M count=1024
> 1024+0 records in
> 1024+0 records out
> 1073741824 bytes (1.1 GB) copied, 1149.12 s, 934 kB/s
>

Could you send the output of

iscsiadm -m session -P 3

when you are logged into the target?

Could you also try setting /sys/block/sdX/device/timeout to 300? Could
you also send the /var/log/messages from the time you login to the time
the write test is done?

立凡 王

unread,
May 25, 2010, 4:40:36 AM5/25/10
to open-iscsi
Hello, Mike

# iscsiadm -m session -P 3
iSCSI Transport Class version 2.0-870
iscsiadm version 2.0-870
Target: iqn.1986-03.com.hp:storage.msa2312i.0919d81d3a
Current Portal: XXX.XXX.7.88:3260,2
Persistent Portal: XXX.XXX.7.88:3260,2
**********
Interface:
**********
Iface Name: default
Iface Transport: tcp
Iface Initiatorname: iqn.1994-05.com.fedora:
82fd14af68f9
Iface IPaddress: XXX.XXX.20.70
Iface HWaddress: default
Iface Netdev: default
SID: 1
iSCSI Connection State: LOGGED IN
iSCSI Session State: LOGGED_IN
Internal iscsid Session State: NO CHANGE
************************
Negotiated iSCSI params:
************************
HeaderDigest: None
DataDigest: None
MaxRecvDataSegmentLength: 131072
MaxXmitDataSegmentLength: 524288
FirstBurstLength: 262144
MaxBurstLength: 2097152
ImmediateData: No
InitialR2T: Yes
MaxOutstandingR2T: 1
************************
Attached SCSI devices:
************************
Host Number: 8 State: running
scsi8 Channel 00 Id 0 Lun: 0
Attached scsi disk sdb State: running
scsi8 Channel 00 Id 0 Lun: 2
Attached scsi disk sdc State: running
scsi8 Channel 00 Id 0 Lun: 3
Attached scsi disk sdd State: running

# iscsiadm -m node -p XXX.XXX.7.88 -u
Logging out of session [sid: 1, target: iqn.
1986-03.com.hp:storage.msa2312i.0919d81d3a, portal: XXX.XXX.7.88,3260]
Logout of [sid: 1, target: iqn.1986-03.com.hp:storage.msa2312i.
0919d81d3a, portal: XXX.XXX.7.88,3260]: successful

# iscsiadm -m node -p XXX.XXX.7.88 -l
Logging in to [iface: default, target: iqn.
1986-03.com.hp:storage.msa2312i.0919d81d3a, portal: XXX.XXX.7.88,3260]
Login to [iface: default, target: iqn.1986-03.com.hp:storage.msa2312i.
0919d81d3a, portal: XXX.XXX.7.88,3260]: successful

# cat /sys/block/sdc/device/timeout
30
# vi /sys/block/sdc/device/timeout
300
(When I writed ,it showed below
"/sys/devices/platform/host9/session1/target9:0:0/9:0:0:2/timeout"
WARNING: The file has been changed since reading it!!!
Do you really want to write to it (y/n)?y
"/sys/devices/platform/host9/session1/target9:0:0/9:0:0:2/timeout"
E667: Fsync failed
Press ENTER or type command to continue)

# cat /sys/block/sdc/device/timeout
30
# echo 300 > /sys/block/sdc/device/timeout
# cat /sys/block/sdc/device/timeout
300


# dd if=/dev/zero of=/dev/sdc bs=1M count=1024
1024+0 records in
1024+0 records out

1073741824 bytes (1.1 GB) copied, 1203.55 s, 892 kB/s

# cat /var/log/messages
May 25 10:51:20 milk nm-dispatcher.action: Script '/etc/NetworkManager/
dispatcher.d/04-iscsi' exited with error status 1.
May 25 10:52:25 milk kernel: sd 8:0:0:0: [sdb] Synchronizing SCSI
cache
May 25 10:52:25 milk kernel: sd 8:0:0:2: [sdc] Synchronizing SCSI
cache
May 25 10:52:25 milk kernel: sd 8:0:0:3: [sdd] Synchronizing SCSI
cache
May 25 10:52:47 milk kernel: scsi9 : iSCSI Initiator over TCP/IP
May 25 10:52:48 milk kernel: scsi 9:0:0:0: Direct-Access HP
MSA2312i M110 PQ: 0 ANSI: 5
May 25 10:52:48 milk kernel: sd 9:0:0:0: Attached scsi generic sg2
type 0
May 25 10:52:48 milk kernel: sd 9:0:0:0: [sdb] 3984374976 512-byte
logical blocks: (2.03 TB/1.85 TiB)
May 25 10:52:48 milk kernel: scsi 9:0:0:2: Direct-Access HP
MSA2312i M110 PQ: 0 ANSI: 5
May 25 10:52:48 milk kernel: sd 9:0:0:2: Attached scsi generic sg3
type 0
May 25 10:52:48 milk kernel: sd 9:0:0:0: [sdb] Write Protect is off
May 25 10:52:48 milk kernel: scsi 9:0:0:3: Direct-Access HP
MSA2312i M110 PQ: 0 ANSI: 5
May 25 10:52:48 milk kernel: sd 9:0:0:3: Attached scsi generic sg4
type 0
May 25 10:52:48 milk kernel: sd 9:0:0:2: [sdc] 97656224 512-byte
logical blocks: (49.9 GB/46.5 GiB)
May 25 10:52:48 milk kernel: sd 9:0:0:0: [sdb] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
May 25 10:52:48 milk kernel: sd 9:0:0:3: [sdd] 29296832 512-byte
logical blocks: (14.9 GB/13.9 GiB)
May 25 10:52:48 milk kernel: sd 9:0:0:2: [sdc] Write Protect is off
May 25 10:52:48 milk kernel: sd 9:0:0:2: [sdc] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
May 25 10:52:48 milk kernel: sd 9:0:0:3: [sdd] Write Protect is off
May 25 10:52:48 milk kernel: sdb:
May 25 10:52:48 milk kernel: sd 9:0:0:3: [sdd] Write cache: enabled,
read cache: enabled, doesn't support DPO or FUA
May 25 10:52:48 milk kernel: sdb1
May 25 10:52:48 milk kernel: sdc:
May 25 10:52:48 milk kernel: sdd: sdd1
May 25 10:52:48 milk kernel: unknown partition table
May 25 10:52:48 milk kernel: sd 9:0:0:0: [sdb] Attached SCSI disk
May 25 10:52:48 milk kernel: sd 9:0:0:3: [sdd] Attached SCSI disk
May 25 10:52:48 milk kernel: sd 9:0:0:2: [sdc] Attached SCSI disk
May 25 10:52:48 milk iscsid: connection2:0 is operational now
May 25 10:53:40 milk ntpd[1603]: synchronized to 59.124.71.8, stratum
3
May 25 10:53:40 milk ntpd[1603]: kernel time sync status change 2001
May 25 11:18:33 milk kernel: connection2:0: ping timeout of 5 secs
expired, recv timeout 5, last rx 1457759, last ping 1462759, now
1467759
May 25 11:18:33 milk kernel: connection2:0: detected conn error (1011)
May 25 11:18:33 milk iscsid: Kernel reported iSCSI connection 2:0
error (1011) state (3)
May 25 11:18:51 milk iscsid: connection2:0 is operational after
recovery (2 attempts)
end

I tried the dd test on CentOS 5.5(2.6.18-194.3.1.el5 x86_64) and it
can be upto 180 MB/sec.(1Gb ethernet)
I guess that I should use CentOS to establish a server will be better.

Mike Christie

unread,
May 25, 2010, 4:20:59 PM5/25/10
to open-...@googlegroups.com, 立凡 王
The settings I see in iscsiadm seem fine.

On 05/25/2010 03:40 AM, 立凡 王 wrote:
> May 25 11:18:33 milk kernel: connection2:0: ping timeout of 5 secs
> expired, recv timeout 5, last rx 1457759, last ping 1462759, now
> 1467759
> May 25 11:18:33 milk kernel: connection2:0: detected conn error (1011)
> May 25 11:18:33 milk iscsid: Kernel reported iSCSI connection 2:0
> error (1011) state (3)
> May 25 11:18:51 milk iscsid: connection2:0 is operational after
> recovery (2 attempts)

That error will hurt performance, but I do not think it should cause the
perf to drop to less than a MB.

Do you see the ping timeout error everytime you run your write tests? If
so set the

node.conn[0].timeo.noop_out_interval = 0
node.conn[0].timeo.noop_out_timeout = 0


in iscsid.conf then rerun the iscsiadm discovery command (make sure you
logout of sessions, do discovery, then relogin).


> end
>
> I tried the dd test on CentOS 5.5(2.6.18-194.3.1.el5 x86_64) and it
> can be upto 180 MB/sec.(1Gb ethernet)
> I guess that I should use CentOS to establish a server will be better.
>


That is strange because the fedora and Centos 5.5 code is almost the
same. At least there are not major changes in the data path that would
improve performance.

If you do

ps -l -u root | grep iscsi_q*

Do you see a major difference in the PRI values for the process between
fedora and Centos?

Reply all
Reply to author
Forward
0 new messages