Error while performing writes using dd or mkfs on iSCSI initiator.

281 views

Skip to first unread message

iamlin...@gmail.com

unread,

Jan 23, 2019, 4:48:19 PM1/23/19

to open-iscsi

We have a LIO target on RHEL 7.5 with the lun created using fileio through targetcli. We exported it

to RHEL initiator on the same box (Tried with other box as well).

On the lun, when we do mkfs for ext3/ext4, it fails with following message and can not be mounted.

-------------------------------------------------------------------------------------------------

[root@linux_machine /]# mkfs -t ext4 /dev/sdh

mke2fs 1.42.9 (28-Dec-2013)

/dev/sdh is entire device, not just one partition!

Proceed anyway? (y,n) y

Filesystem label=

OS type: Linux

Block size=4096 (log=2)

Fragment size=4096 (log=2)

Stride=0 blocks, Stripe width=1024 blocks

2621440 inodes, 10485760 blocks

524288 blocks (5.00%) reserved for the super user

First data block=0

Maximum filesystem blocks=2157969408

320 block groups

32768 blocks per group, 32768 fragments per group

8192 inodes per group

Superblock backups stored on blocks:

32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,

4096000, 7962624

Allocating group tables: done

Writing inode tables: done

Creating journal (32768 blocks): done

Writing superblocks and filesystem accounting information:

Warning, had trouble writing out superblocks.

-----------------------------------------------------------------------------------------------------

while above task fails, /var/log/messages on initiator has following errors.

-------------------------------------------------------------------------------------------------------------

kernel: connection1:0: detected conn error (1020)

Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)

connection1:0 is operational after recovery (1 attempts)

connection1:0: detected conn error (1020)

Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)

connection1:0 is operational after recovery (1 attempts)

connection1:0: detected conn error (1020)

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 54 00 10 00 10 00 00

kernel: blk_update_request: I/O error, dev sdf, sector 5505040

Kernel: Buffer I/O error on dev sdf, logical block 688130, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688131, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688132, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688133, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688134, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688135, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688136, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688137, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688138, lost async page write

kernel: Buffer I/O error on dev sdf, logical block 688139, lost async page write

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 50 00 10 00 10 00 00

kernel: blk_update_request: I/O error, dev sdf, sector 5242896

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 4c 00 10 00 10 00 00

kernel: blk_update_request: I/O error, dev sdf, sector 4980752

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 48 00 10 00 10 00 00

blk_update_request: I/O error, dev sdf, sector 4718608

sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 44 00 10 00 10 00 00

blk_update_request: I/O error, dev sdf, sector 4456464

sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 40 00 10 00 10 00 00

blk_update_request: I/O error, dev sdf, sector 4194320

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 3c 00 10 00 10 00 00

kernel: blk_update_request: I/O error, dev sdf, sector 3932176

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 38 00 10 00 10 00 00

kernel: blk_update_request: I/O error, dev sdf, sector 3670032

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 34 00 10 00 10 00 00

kernel: blk_update_request: I/O error, dev sdf, sector 3407888

kernel: sd 7:0:0:1: [sdf] FAILED Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK

kernel: sd 7:0:0:1: [sdf] CDB: Write(10) 2a 00 00 30 00 10 00 10 00 00

kernel: blk_update_request: I/O error, dev sdf, sector 3145744

iscsid: Kernel reported iSCSI connection 1:0 error (1020 - ISCSI_ERR_TCP_CONN_CLOSE: TCP connection closed) state (3)

iscsid: connection1:0 is operational after recovery (1 attempts)

------------------------------------------------------------------------------------------------------------------------------------

Upon further debugging we found out that target TCP window is becoming full because of the writes on initiator side due to mkfs.

We then tried dd command on initiator with oflag=direct to perform synchronous writes. This time we did not face any issue.

If we try dd without oflag=direct, we see the same error messages in /var/log/messages as in case of mkfs

Following are the things we tried:

1) We tried increasing the TCP RECV window on Target to more than the existing

window size. But it did not help.

2) We tried increasing MaxRecvDataSegmentLength, MaxBurstLength, FirstBurstLengt on

target side. This helped in a sense that it delayed the occurance of the error

but still the errors were seen.

3) We also changed the and node.session.timeo.replacement_timeout,

node.conn[0].timeo.noop_out_interval, node.conn[0].timeo.noop_out_timeout, node.session.err_timeo.abort_timeout on initiator side.

They were not effective in solving the problem

Following is the query:

1) What could be the causes of this issue? Why is the target deamon so slow?

2) what other tunables could we try to solve the problem?

Environment Details:

OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)

Kernel Version: 3.10.0-862.el7.x86_64

PFA image:

In the Wireshark image, 10.182.110.221 is the target and 10.182.111.167 is the

initiator

tcp_reset (002).jpg

The Lee-Man

unread,

Feb 6, 2019, 12:11:00 PM2/6/19

to open-iscsi

So that shows this has nothing to do with ext3/ext4, but instead has to do with your network.

Following are the things we tried:
1) We tried increasing the TCP RECV window on Target to more than the existing
window size. But it did not help.
2) We tried increasing MaxRecvDataSegmentLength, MaxBurstLength, FirstBurstLengt on
target side. This helped in a sense that it delayed the occurance of the error
but still the errors were seen.
3) We also changed the and node.session.timeo.replacement_timeout,
node.conn[0].timeo.noop_out_interval, node.conn[0].timeo.noop_out_timeout, node.session.err_timeo.abort_timeout on initiator side.
They were not effective in solving the problem

Following is the query:
1) What could be the causes of this issue? Why is the target deamon so slow?
2) what other tunables could we try to solve the problem?

Environment Details:
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
Kernel Version: 3.10.0-862.el7.x86_64

PFA image:
In the Wireshark image, 10.182.110.221 is the target and 10.182.111.167 is the
initiator

And you say you get the same TCP congestion when initiator and target on are on the same system? If so, can you try using 127.0.0.1.

Your distro packages look quite old. Are they all up to date with current patches/fixes? What version of targetcli-fb do you have?

I'm afraid I know little about networking issues, but if the issue persists using loopback that would seem to eliminate any issues with your switches.

iamlin...@gmail.com

unread,

Feb 11, 2019, 6:11:58 PM2/11/19

to open-iscsi

> As we do face this issue even with localhost as target and initiator, could you suggest just some tunables to deal with TCP window and network congestion in between initiator and target ?

Following are the things we tried:
1) We tried increasing the TCP RECV window on Target to more than the existing
window size. But it did not help.
2) We tried increasing MaxRecvDataSegmentLength, MaxBurstLength, FirstBurstLengt on
target side. This helped in a sense that it delayed the occurance of the error
but still the errors were seen.
3) We also changed the and node.session.timeo.replacement_timeout,
node.conn[0].timeo.noop_out_interval, node.conn[0].timeo.noop_out_timeout, node.session.err_timeo.abort_timeout on initiator side.
They were not effective in solving the problem

Following is the query:
1) What could be the causes of this issue? Why is the target deamon so slow?
2) what other tunables could we try to solve the problem?

Environment Details:
OS: Red Hat Enterprise Linux Server release 7.5 (Maipo)
Kernel Version: 3.10.0-862.el7.x86_64

PFA image:
In the Wireshark image, 10.182.110.221 is the target and 10.182.111.167 is the
initiator

And you say you get the same TCP congestion when initiator and target on are on the same system? If so, can you try using 127.0.0.1.

Your distro packages look quite old. Are they all up to date with current patches/fixes? What version of targetcli-fb do you have?

I'm afraid I know little about networking issues, but if the issue persists using loopback that would seem to eliminate any issues with your switches.

> We even checked it with 127.0.0.1 and the same issue persist. This excludes the network too. And what are distro packages ? could you please add more to it? We are using 'Red Hat Enterprise Linux Server release 7.5' and targetcli rpm is 'targetcli-2.1.fb46-7.el7.noarch'!

Reply all

Reply to author

Forward

0 new messages