We have a LIO target on RHEL 7.5 with the lun created using fileio through targetcli. We exported it
to RHEL initiator on the same box (Tried with other box as well).
On the lun, when we do mkfs for ext3/ext4, it fails with following message and can not be mounted.
-------------------------------------------------------------------------------------------------
[root@linux_machine /]# mkfs -t ext4 /dev/sdh
mke2fs 1.42.9 (28-Dec-2013)
/dev/sdh is entire device, not just one partition!
Proceed anyway? (y,n) y
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=0 blocks, Stripe width=1024 blocks
2621440 inodes, 10485760 blocks
524288 blocks (5.00%) reserved for the super user
First data block=0
320 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information:
Warning, had trouble writing out superblocks.
-----------------------------------------------------------------------------------------------------
while above task fails, /var/log/messages on initiator has following errors.
-------------------------------------------------------------------------------------------------------------
kernel: connection1:0: detected conn error (1020)
Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
connection1:0 is operational after recovery (1 attempts)
connection1:0: detected conn error (1020)
Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
connection1:0 is operational after recovery (1 attempts)
connection1:0: detected conn error (1020)
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 54 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 5505040
Kernel: Buffer I/O error on dev sdf, logical block 688130, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688131, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688132, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688133, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688134, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688135, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688136, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688137, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688138, lost async page write
kernel: Buffer I/O error on dev sdf, logical block 688139, lost async page write
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 50 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 5242896
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 4c 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 4980752
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 48 00 10 00 10 00 00
blk_update_request: I/O error, dev sdf, sector 4718608
sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 44 00 10 00 10 00 00
blk_update_request: I/O error, dev sdf, sector 4456464
sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 40 00 10 00 10 00 00
blk_update_request: I/O error, dev sdf, sector 4194320
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 3c 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3932176
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 38 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3670032
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 34 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3407888
kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 30 00 10 00 10 00 00
kernel: blk_update_request: I/O error, dev sdf, sector 3145744
iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)
iscsid: connection1:0 is operational after recovery (1 attempts)
------------------------------------------------------------------------------------------------------------------------------------
Upon further debugging we found out that target TCP window is becoming full because of the writes on initiator side due to mkfs.
We then tried dd command on initiator with oflag=direct to perform synchronous writes. This time we did not face any issue.
If we try dd without oflag=direct, we see the same error messages in /var/log/messages as in case of mkfs
Following are the things we tried:
1) We tried increasing the TCP RECV window on Target to more than the existing
window size. But it did not help.
2) We tried increasing MaxRecvDataSegmentLength, MaxBurstLength, FirstBurstLengt on
target side. This helped in a sense that it delayed the occurance of the error
but still the errors were seen.
3) We also changed the and node.session.timeo.replacement_timeout,
node.conn[0].timeo.noop_out_interval, node.conn[0].timeo.noop_out_timeout, node.session.err_timeo.abort_timeout on initiator side.
They were not effective in solving the problem
Following is the query:
1) What could be the causes of this issue? Why is the target deamon so slow?
2) what other tunables could we try to solve the problem?
Environment Details:
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
Kernel Version: 3.10.0-862.el7.x86_64
PFA image:
In the Wireshark image, 10.182.110.221 is the target and 10.182.111.167 is the
initiator
.jpg?part=0.1&view=1)