Error while performing writes using dd or mkfs on iSCSI initiator.

276 views
Skip to first unread message

iamlin...@gmail.com

unread,
Jan 23, 2019, 4:48:19 PM1/23/19
to open-iscsi
We have a LIO target on RHEL 7.5 with the lun created using fileio through targetcli. We exported it
to RHEL initiator on the same box (Tried with other box as well). 
On the lun, when we do mkfs for ext3/ext4, it fails with following message and can not be mounted.

-------------------------------------------------------------------------------------------------
[root@linux_machine /]# mkfs -t ext4 /dev/sdh
mke2fs 1.42.9 (28-Dec-2013)
/dev/sdh is entire device, not just one partition!
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=1024 blocks
2621440 inodes, 10485760 blocks
524288 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=2157969408
320 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
        4096000, 7962624

Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information:
Warning, had trouble writing out superblocks.
-----------------------------------------------------------------------------------------------------
while above task fails, /var/log/messages on initiator has following errors.

-------------------------------------------------------------------------------------------------------------
kernel: connection1:0: detected conn error (1020)
Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
connection1:0 is operational after recovery (1 attempts)
connection1:0: detected conn error (1020)
Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
connection1:0 is operational after recovery (1 attempts)
connection1:0: detected conn error (1020)
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 54 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 5505040
Kernel: Buffer I/O error on dev sdf, logical block 688130, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688131, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688132, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688133, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688134, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688135, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688136, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688137, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688138, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688139, lost async page write
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 50 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 5242896
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 4c 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 4980752
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 48 00 10 00 10 00 00
blk_update_request: I/O error, dev sdf, sector 4718608
sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 44 00 10 00 10 00 00
blk_update_request: I/O error, dev sdf, sector 4456464
sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 40 00 10 00 10 00 00
blk_update_request: I/O error, dev sdf, sector 4194320
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 3c 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3932176
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 38 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3670032
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 34 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3407888
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 30 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3145744
iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
iscsid: connection1:0 is operational after recovery (1 attempts)
------------------------------------------------------------------------------------------------------------------------------------

Upon further debugging we found out that target TCP window is becoming full because of the writes on initiator side due to mkfs.
We then tried dd command on initiator with oflag=direct to perform synchronous writes. This time we did not face any issue. 
If we try dd without oflag=direct, we see the same error messages in /var/log/messages as in case of mkfs


Following are the things we tried:
1) We tried increasing the TCP RECV window on Target to more than the existing
window size. But it did not help.
2) We tried increasing MaxRecvDataSegmentLength, MaxBurstLength, FirstBurstLengt on
target side. This helped in a sense that it delayed the occurance of the error
but still the errors were seen.
3) We also changed the and node.session.timeo.replacement_timeout,
node.conn[0].timeo.noop_out_interval, node.conn[0].timeo.noop_out_timeout, node.session.err_timeo.abort_timeout on initiator side.
They were not effective in solving the problem

Following is the query:
1) What could be the causes of this issue? Why is the target deamon so slow?
2) what other tunables could we try to solve the problem?

Environment Details:
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
Kernel Version: 3.10.0-862.el7.x86_64

PFA image:
In the Wireshark image, 10.182.110.221 is the target and 10.182.111.167 is the
initiator

tcp_reset (002).jpg




The Lee-Man

unread,
Feb 6, 2019, 12:11:00 PM2/6/19
to open-iscsi
So that shows this has nothing to do with ext3/ext4, but instead has to do with your network.


Following are the things we tried:
1) We tried increasing the TCP RECV window on Target to more than the existing
window size. But it did not help.
2) We tried increasing MaxRecvDataSegmentLength, MaxBurstLength, FirstBurstLengt on
target side. This helped in a sense that it delayed the occurance of the error
but still the errors were seen.
3) We also changed the and node.session.timeo.replacement_timeout,
node.conn[0].timeo.noop_out_interval, node.conn[0].timeo.noop_out_timeout, node.session.err_timeo.abort_timeout on initiator side.
They were not effective in solving the problem

Following is the query:
1) What could be the causes of this issue? Why is the target deamon so slow?
2) what other tunables could we try to solve the problem?

Environment Details:
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
Kernel Version: 3.10.0-862.el7.x86_64

PFA image:
In the Wireshark image, 10.182.110.221 is the target and 10.182.111.167 is the
initiator

tcp_reset (002).jpg


And you say you get the same TCP congestion when initiator and target on are on the same system? If so, can you try using 127.0.0.1.

Your distro packages look quite old. Are they all up to date with current patches/fixes? What version of targetcli-fb do you have?

I'm afraid I know little about networking issues, but if the issue persists using loopback that would seem to eliminate any issues with your switches.

iamlin...@gmail.com

unread,
Feb 11, 2019, 6:11:58 PM2/11/19
to open-iscsi
    > As we do face this issue even with localhost as target and initiator, could you suggest just some tunables to deal with TCP window and network congestion in between initiator and target ?
 


Following are the things we tried:
1) We tried increasing the TCP RECV window on Target to more than the existing
window size. But it did not help.
2) We tried increasing MaxRecvDataSegmentLength, MaxBurstLength, FirstBurstLengt on
target side. This helped in a sense that it delayed the occurance of the error
but still the errors were seen.
3) We also changed the and node.session.timeo.replacement_timeout,
node.conn[0].timeo.noop_out_interval, node.conn[0].timeo.noop_out_timeout, node.session.err_timeo.abort_timeout on initiator side.
They were not effective in solving the problem

Following is the query:
1) What could be the causes of this issue? Why is the target deamon so slow?
2) what other tunables could we try to solve the problem?

Environment Details:
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
Kernel Version: 3.10.0-862.el7.x86_64

PFA image:
In the Wireshark image, 10.182.110.221 is the target and 10.182.111.167 is the
initiator

tcp_reset (002).jpg


And you say you get the same TCP congestion when initiator and target on are on the same system? If so, can you try using 127.0.0.1.

Your distro packages look quite old. Are they all up to date with current patches/fixes? What version of targetcli-fb do you have?

I'm afraid I know little about networking issues, but if the issue persists using loopback that would seem to eliminate any issues with your switches.

   > We even checked it with 127.0.0.1 and the same issue persist. This excludes the network too. And what are distro packages ? could you please add more to it? We are using 'Red Hat Enterprise Linux Server release 7.5' and targetcli rpm is 'targetcli-2.1.fb46-7.el7.noarch'!  

 
 
 
 
 
Reply all
Reply to author
Forward
0 new messages