open-iscsi Ping timeout error.

Zhengyuan Liu

unread,

May 13, 2016, 12:52:46 PM5/13/16

to open-...@googlegroups.com

Hi everyone:

I create a target using fileio as the backend storage on ARM64 server. The initiator reported some errors showed bellow while perform iozone test.

[178444.145679] connection14:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4339462894, last ping 4339464146, now 4339465400

[178444.145706] connection14:0: detected conn error (1011)

[178469.674313] connection14:0: detected conn error (1020)

[178504.420979] connection14:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4339477953, last ping 4339479204, now 4339480456

[178504.421001] connection14:0: detected conn error (1011)

[178532.064262] connection14:0: detected conn error (1020)

[178564.584087] connection14:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4339492980, last ping 4339494232, now 4339495484

..............................

I try to trace the function call of target iscsi. Then, I found the receiving thread of target iscsi blocked at fd_execute_sync_cache -> vfs_fsync_range. Further, vfs_fsync_range may takes more than 10 seconds to return,while initiator Ping timeout would happened after 5 seconds. vfs_fsync_range was call with the form vfs_fsync_range(fd_dev->fd_file, 0, LLONG_MAX, 1) every times which means sync all device cache.

So, is this a bug?

How does Initiator send sync_cache scsi command?

Does it need to sync all device cache at once?

Any reply would be thankful.

The Lee-Man

unread,

May 20, 2016, 12:39:26 PM5/20/16

to open-iscsi, liuzheng...@gmail.com

Hi:

It seems like your backend is getting busy and not replying in time when it gets very busy. You can disable the NOOP, or you can lengthen its interval, I believe.

If there is a bug, it would be in the kernel target subsystem. Have you tried the target-devel @ vger kernel mailing list?

Mike Christie

unread,

May 20, 2016, 5:14:37 PM5/20/16

to open-...@googlegroups.com

On 05/20/2016 11:39 AM, The Lee-Man wrote:
> Hi:
>
> It seems like your backend is getting busy and not replying in time when
> it gets very busy. You can disable the NOOP, or you can lengthen its
> interval, I believe.
>
> If there is a bug, it would be in the kernel target subsystem. Have you
> tried the target-devel @ vger kernel mailing list?

We are waiting to hear back from Nick

http://www.spinics.net/lists/linux-scsi/msg96904.html

> --
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to open-iscsi+...@googlegroups.com
> <mailto:open-iscsi+...@googlegroups.com>.
> To post to this group, send email to open-...@googlegroups.com
> <mailto:open-...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.

Mike Christie

unread,

May 20, 2016, 5:22:29 PM5/20/16

to open-...@googlegroups.com

On 05/20/2016 04:14 PM, Mike Christie wrote:
> On 05/20/2016 11:39 AM, The Lee-Man wrote:
>> Hi:
>>
>> It seems like your backend is getting busy and not replying in time when
>> it gets very busy. You can disable the NOOP, or you can lengthen its
>> interval, I believe.
>>
>> If there is a bug, it would be in the kernel target subsystem. Have you
>> tried the target-devel @ vger kernel mailing list?
>
> We are waiting to hear back from Nick
>
> http://www.spinics.net/lists/linux-scsi/msg96904.html
>

Oh yeah, another workaround might be to just modify the write back/flush
settings on the LIO target so there are not so many dirty pages to write
back when a sync is finally sent.

Zhengyuan Liu

unread,

May 21, 2016, 1:09:07 AM5/21/16

to The Lee-Man, open-iscsi

Thanks for you tips, I would have a try as you said to disable the NOOP.

I had make a XFS file system on the LUN at the Initiator side . Finally, I catch  the sync_cache command was issued by XFS log infrastructure actually. When I replace the XFS with EXT2 that dmesg error don`t appear anymore during the iozone test.  I think pings/Nops got no response  because it still stay in TCP stack not received by the target rx-thread.

The Ping time out would lead to IO failure from upper layers?  or it can recovery from re-connection and continue to transfer data so upper layers applicaction can not feel the underlying IO error?

Reply all

Reply to author

Forward