open-iscsi Ping timeout error.

335 views
Skip to first unread message

Zhengyuan Liu

unread,
May 13, 2016, 12:52:46 PM5/13/16
to open-...@googlegroups.com
Hi everyone:
I create a target using fileio as the backend storage on ARM64 server. The initiator reported some errors showed bellow  while perform iozone test.

[178444.145679]  connection14:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4339462894, last ping 4339464146, now 4339465400
[178444.145706]  connection14:0: detected conn error (1011)
[178469.674313]  connection14:0: detected conn error (1020)
[178504.420979]  connection14:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4339477953, last ping 4339479204, now 4339480456
[178504.421001]  connection14:0: detected conn error (1011)
[178532.064262]  connection14:0: detected conn error (1020)
[178564.584087]  connection14:0: ping timeout of 5 secs expired, recv timeout 5, last rx 4339492980, last ping 4339494232, now 4339495484
..............................

I try to trace the function call of target iscsi. Then, I found the  receiving  thread of target iscsi blocked at fd_execute_sync_cache -> vfs_fsync_range. Further, vfs_fsync_range may takes more than 10 seconds to return,while initiator Ping timeout would happened after 5 seconds.   vfs_fsync_range was call with the form vfs_fsync_range(fd_dev->fd_file, 0, LLONG_MAX, 1) every times  which means sync all device cache. 
So, is this a bug?
How  does Initiator send sync_cache scsi command? 
Does it need to sync all device cache at once?
Any reply would be thankful.

The Lee-Man

unread,
May 20, 2016, 12:39:26 PM5/20/16
to open-iscsi, liuzheng...@gmail.com
Hi:

It seems like your backend is getting busy and not replying in time when it gets very busy. You can disable the NOOP, or you can lengthen its interval, I believe.

If there is a bug, it would be in the kernel target subsystem. Have you tried the target-devel @ vger kernel mailing list?

Mike Christie

unread,
May 20, 2016, 5:14:37 PM5/20/16
to open-...@googlegroups.com
On 05/20/2016 11:39 AM, The Lee-Man wrote:
> Hi:
>
> It seems like your backend is getting busy and not replying in time when
> it gets very busy. You can disable the NOOP, or you can lengthen its
> interval, I believe.
>
> If there is a bug, it would be in the kernel target subsystem. Have you
> tried the target-devel @ vger kernel mailing list?

We are waiting to hear back from Nick

http://www.spinics.net/lists/linux-scsi/msg96904.html
> --
> You received this message because you are subscribed to the Google
> Groups "open-iscsi" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to open-iscsi+...@googlegroups.com
> <mailto:open-iscsi+...@googlegroups.com>.
> To post to this group, send email to open-...@googlegroups.com
> <mailto:open-...@googlegroups.com>.
> Visit this group at https://groups.google.com/group/open-iscsi.
> For more options, visit https://groups.google.com/d/optout.

Mike Christie

unread,
May 20, 2016, 5:22:29 PM5/20/16
to open-...@googlegroups.com
On 05/20/2016 04:14 PM, Mike Christie wrote:
> On 05/20/2016 11:39 AM, The Lee-Man wrote:
>> Hi:
>>
>> It seems like your backend is getting busy and not replying in time when
>> it gets very busy. You can disable the NOOP, or you can lengthen its
>> interval, I believe.
>>
>> If there is a bug, it would be in the kernel target subsystem. Have you
>> tried the target-devel @ vger kernel mailing list?
>
> We are waiting to hear back from Nick
>
> http://www.spinics.net/lists/linux-scsi/msg96904.html
>


Oh yeah, another workaround might be to just modify the write back/flush
settings on the LIO target so there are not so many dirty pages to write
back when a sync is finally sent.

Zhengyuan Liu

unread,
May 21, 2016, 1:09:07 AM5/21/16
to The Lee-Man, open-iscsi
Thanks for you tips, I would have a try as you said to disable the NOOP.
I had make a XFS file system on the LUN at the Initiator side . Finally, I catch  the sync_cache command was issued by XFS log infrastructure actually. When I replace the XFS with EXT2 that dmesg error don`t appear anymore during the iozone test.  I think pings/Nops got no response  because it still stay in TCP stack not received by the target rx-thread. 
The Ping time out would lead to IO failure from upper layers?  or it can recovery from re-connection and continue to transfer data so upper layers applicaction can not feel the underlying IO error?
Reply all
Reply to author
Forward
0 new messages