Re: [bug report][scsi blk_mq] data corruption due to a bug in SCSI LLD error handling

Douglas Gilbert

unread,

Apr 3, 2019, 5:06:32 PM4/3/19

to Jaesoo Lee, Jens Axboe, Lee Duncan, Chris Leech, James E.J. Bottomley, Martin K. Petersen, linux...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org

On 2019-04-03 4:18 p.m., Jaesoo Lee wrote:
> Hello All,
>
> I encountered this bug while trying to enable dm_blk_mq for our
> iSCSI/FCP targets.
>
> The bug is that the sg_io issued to scsi_blk_mq would succeed even if
> LLD wants to error out those requests.
>
> Let me explain the scenario in more details.
>
> Setup:
> 0. Host kernel configuration
> - 4.19.9, 4.20.16
> - boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y
> scsi_transport_iscsi.debug_session=1 scsi_transport_iscsi.debug_conn=1
>
> Scenario:
> 1. Connect the host to iSCSI target via four paths
> : A dm device is created for those target devices
> 2. Start an application in the host which generates sg_io ioctl for
> XCOPY and WSAME to the dm device with the ratio of around 50%
> (pread/pwrite for the rest).
> 3. Perform system crash (sysrq-trigger) in the iSCSI target
>
> Expected result:
> - Any outstanding IOs should get failed with errors
>
> Actual results:
> - Normal read/write IOs get failed as expected
> - SG_IO ioctls SUCCEEDED!!

Not all ioctl(SG_IO)s are created equal!

If you are using the sg v3 interface (i.e. struct sg_io_hdr) then I would
expect DRIVER_TIMEOUT in sg_io_obj.driver_status or DID_TIME_OUT in
sg_io_obj.host_status to be set on completion. [BTW You will _not_ see
a ETIMEDOUT errno; only errors prior to submission yield errno style
errors.]

If you don't see that with ioctl(SG_IO) on a block device then try again on
a sg device. If neither report that then the mid-level error processing
is broken.

Doug Gilbert

> - log message:
> [Tue Apr 2 11:26:34 2019] session3: session recovery timed out after 11 secs
> [Tue Apr 2 11:26:34 2019] session3: session_recovery_timedout:
> Unblocking SCSI target
> ..
> [Tue Apr 2 11:26:34 2019] sd 8:0:0:8: scsi_prep_state_check:
> rejecting I/O to offline device
> [Tue Apr 2 11:26:34 2019] sd 8:0:0:8: scsi_prep_state_check:
> rejecting I/O to offline device
> [Tue Apr 2 11:26:34 2019] sd 8:0:0:8: scsi_prep_state_check:
> rejecting I/O to offline device
> [Tue Apr 2 11:26:34 2019] print_req_error: I/O error, dev sdi, sector 30677580
> [Tue Apr 2 11:26:34 2019] device-mapper: multipath: Failing path 8:128.
> [Tue Apr 2 11:26:34 2019] SG_IO disk=sdi, result=0x0
>
> - This causes the DATA corruption for the application
>
> Relavant call stacks: (SG_IO issue path)
> [Tue Apr 2 11:26:33 2019] sd 8:0:0:8: [sdi] sd_ioctl: disk=sdi, cmd=0x2285
> [Tue Apr 2 11:26:33 2019] SG_IO disk=sdi, retried 1 cmd 93
> [Tue Apr 2 11:26:33 2019] CPU: 30 PID: 16080 Comm: iostress Not
> tainted 4.19.9-purekernel_dbg.x86_64+ #30
> [Tue Apr 2 11:26:33 2019] Hardware name: /0JP31P, BIOS 2.0.19 08/29/2013
> [Tue Apr 2 11:26:33 2019] Call Trace:
> [Tue Apr 2 11:26:33 2019] dump_stack+0x63/0x85
> [Tue Apr 2 11:26:33 2019] sg_io+0x41e/0x480
> [Tue Apr 2 11:26:33 2019] scsi_cmd_ioctl+0x297/0x420
> [Tue Apr 2 11:26:33 2019] ? sdev_prefix_printk+0xe9/0x120
> [Tue Apr 2 11:26:33 2019] ? _cond_resched+0x1a/0x50
> [Tue Apr 2 11:26:33 2019] scsi_cmd_blk_ioctl+0x51/0x70
> [Tue Apr 2 11:26:33 2019] sd_ioctl+0x95/0x1a0 [sd_mod]
> [Tue Apr 2 11:26:33 2019] __blkdev_driver_ioctl+0x25/0x30
> [Tue Apr 2 11:26:33 2019] dm_blk_ioctl+0x79/0x100 [dm_mod]
> [Tue Apr 2 11:26:33 2019] blkdev_ioctl+0x89a/0x940
> [Tue Apr 2 11:26:33 2019] ? do_nanosleep+0xae/0x180
> [Tue Apr 2 11:26:33 2019] block_ioctl+0x3d/0x50
> [Tue Apr 2 11:26:33 2019] do_vfs_ioctl+0xa6/0x600
> [Tue Apr 2 11:26:33 2019] ? __audit_syscall_entry+0xe9/0x130
> [Tue Apr 2 11:26:33 2019] ksys_ioctl+0x6d/0x80
> ).[Tue Apr 2 11:26:33 2019] __x64_sys_ioctl+0x1a/0x20
> [Tue Apr 2 11:26:33 2019] do_syscall_64+0x60/0x180
> [Tue Apr 2 11:26:33 2019] entry_SYSCALL_64_after_hwframe+0x44/0xa9
>
> According to the analysis, it seems that there is a bug in propagating
> errors for iSCSI session timeout (i.e., session_recovery_timedout).
> Compared to this, legacy SCSI queue handles the errors in
> scsi_request_fn by killing the request as can be seen below.
>
> 1872 static void scsi_request_fn(struct request_queue *q)
> 1873 __releases(q->queue_lock)
> 1874 __acquires(q->queue_lock)
> 1875 {
> ..
> 1886 for (;;) {
> ..
> 1893 req = blk_peek_request(q);
> 1894 if (!req)
> 1895 break;
> 1896
> 1897 if (unlikely(!scsi_device_online(sdev))) {
> 1898 sdev_printk(KERN_ERR, sdev,
> 1899 "scsi_request_fn: rejecting
> I/O to offline device\n");
> 1900 scsi_kill_request(req, q);
> 1901 continue;
> 1902 }
> 1903
>
> I am not sure in which layer we have to fix this, LLD or scsi_queue_rq().
>
> Could someone please take a look? Or, could someone guide me on how to
> fix this bug?
>
> Thanks,
>
> Jaesoo Lee.
>

Jaesoo Lee

unread,

Apr 8, 2019, 4:21:21 PM4/8/19

to Jens Axboe, Lee Duncan, Chris Leech, James E.J. Bottomley, Martin K. Petersen, linux...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org

Hello All,

I encountered this bug while trying to enable dm_blk_mq for our
iSCSI/FCP targets.

The bug is that the sg_io issued to scsi_blk_mq would succeed even if
LLD wants to error out those requests.

Let me explain the scenario in more details.

Setup:
0. Host kernel configuration
- 4.19.9, 4.20.16
- boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y
scsi_transport_iscsi.debug_session=1 scsi_transport_iscsi.debug_conn=1

Scenario:
1. Connect the host to iSCSI target via four paths
: A dm device is created for those target devices
2. Start an application in the host which generates sg_io ioctl for
XCOPY and WSAME to the dm device with the ratio of around 50%
(pread/pwrite for the rest).
3. Perform system crash (sysrq-trigger) in the iSCSI target

Expected result:
- Any outstanding IOs should get failed with errors

Actual results:
- Normal read/write IOs get failed as expected
- SG_IO ioctls SUCCEEDED!!

Jaesoo Lee

unread,

Apr 9, 2019, 5:36:42 PM4/9/19

to Douglas Gilbert, Jens Axboe, Lee Duncan, Chris Leech, James E.J. Bottomley, Martin K. Petersen, linux...@vger.kernel.org, open-...@googlegroups.com, linux...@vger.kernel.org

Hi,

Thanks for the comments.

I tried to run sg_write_same over sg device and I am seeing the same problem.

The result is as follows:

0. Kernel configs
Version: 5.1-rc1
Boot parameter: dm_mod.use_blk_mq=Y scsi_mod.use_blk_mq=Y

1. Normal state
: (As expected) The command succeeded

$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
$

2. Immediately after bringing down the iSCSI interface at the target
: (As expected) Failed with DID_TRANSPORT_DISRUPTED after a few seconds

$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: transport: Host_status=0x0e [DID_TRANSPORT_DISRUPTED]
Driver_status=0x00 [DRIVER_OK, SUGGEST_OK]

Write same(10) command failed

3. Immediately after the DID_TRANSPORT_DISRUPTED error
: (BUG) The command succeeded after a few seconds

$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
$

: Kernel logs
Apr 8 18:28:03 init21-16 kernel: session1: session recovery timed out
after 10 secs
Apr 8 18:28:03 init21-16 kernel: sd 8:0:0:5: rejecting I/O to offline device

4. Issued IO again
: (As expected) The command failed

$ sg_write_same --lba=100 --xferlen=512 /dev/sg5
Write same: pass through os error: No such device or address
Write same(10) command failed

Let me upload my fix for this and please give me some comments on that.

Thanks,

Jaesoo Lee.

---------- Forwarded message ---------
From: Douglas Gilbert <dgil...@interlog.com>
Date: Wed, Apr 3, 2019 at 2:06 PM
Subject: Re: [bug report][scsi blk_mq] data corruption due to a bug in
SCSI LLD error handling
To: Jaesoo Lee <ja...@purestorage.com>, Jens Axboe <ax...@kernel.dk>,
Lee Duncan <ldu...@suse.com>, Chris Leech <cle...@redhat.com>, James
E.J. Bottomley <je...@linux.vnet.ibm.com>, Martin K. Petersen
<martin....@oracle.com>
Cc: <linux...@vger.kernel.org>, <open-...@googlegroups.com>,
<linux...@vger.kernel.org>

The Lee-Man

unread,

May 9, 2019, 10:51:18 AM5/9/19

to open-iscsi

I think the point is that the sg path is not equal to the real IO path. You are (currently) never going to get correct error handling from the sg path, which is considered out of band.

So what are you trying to do here? Are you really developing software that uses the sg path, and you need this to work, or are you testing the error handling and decided to use sg? In other words, is this a real bug, or just something you think should work but does not?

Reply all

Reply to author

Forward