IO request returns prematurely when iscsi connection is down

21 views
Skip to first unread message

neutro...@gmail.com

unread,
Feb 19, 2021, 11:28:14 PM2/19/21
to open-iscsi
Hello all,

I encounter a weird issue with open-iscsi.  I have a test machine with 500 iscsi volumes backed by an IP san.  The test machine then performs r/w with o_direct on those 500 raw block devices.  During the test I trigger a failure on the IP san so some iscsi connections break.  iscsi client is able to reconnect and recover,  however,  immediately after recovery,   some iscsi read finds corrupted data.

This issue happens frequently. After a lot of tracing on the IP san server,  we become sure that those corrupted read requests have never been received by iscsi server at IP san.

In the following timeline diagram,  the client generates the read around time t1 when connections are turned down.  iscsi connection recovered at time t2.  The time between t1 and t2 is about 15~20 seconds. Read returns several seconds after t2.  

                     cut iscsi connections             iscsi connection recoveryed
------------------------- t1 ------------------------------------------- t2 ---------------------------------->


The client machine uses Linux libaio to perform read/write.  The read/write is performed in the following approach:

- blk devices are opened with O_DIRECT,  io buffer is 4K-aligned,  io offset is 4K aligned.
- Call io_submit() to submit requests to blk device.
- call io_getevents() to wait for completion events. 
     * If the status is “N bytes done”,  assumes I/O was successful.
     * If the status is “-1”, assume IO failure.

Is it possible that,  iscsi layer will mark a blk_read/write completion with 0-bytes done because the connection is not available,  and the upper layer will receive a completion with 0-bytes as the result?

Thank you for reading.


-Shawn

neutro...@gmail.com

unread,
Feb 19, 2021, 11:34:28 PM2/19/21
to open-iscsi
More details:

1,  after read returns, we parse the read buffer and find that the read-buffer contains stale data from previous read,  i.e.,  it seems the kernel didn't update the buffer at all. That's why I suspect the kernel iscsi client didn't perform the read,  it just bounce back the request to upper layer and mark it completed. 

2,  Client is Ubuntu 18.04 with stock open-iscsi.


-Shawn

neutro...@gmail.com

unread,
Mar 29, 2021, 3:28:11 PM3/29/21
to open-iscsi
Bound the thread to hopefully catch some attention.

On Friday, February 19, 2021 at 8:28:14 PM UTC-8 neutro...@gmail.com wrote:
Reply all
Reply to author
Forward
0 new messages