scsi: BUG in scsi_init_io

62 views
Skip to first unread message

Dmitry Vyukov

unread,
Jan 31, 2017, 3:56:13 AM1/31/17
to je...@linux.vnet.ibm.com, Martin K. Petersen, linux-scsi, LKML, Al Viro, syzkaller
Hello,

The following program triggers BUG in scsi_init_io:

kernel BUG at drivers/scsi/scsi_lib.c:1043!
invalid opcode: 0000 [#1] SMP KASAN
Modules linked in:
CPU: 0 PID: 2899 Comm: a.out Not tainted 4.10.0-rc5+ #201
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
task: ffff88006baa4500 task.stack: ffff880069788000
RIP: 0010:scsi_init_io+0x2a3/0x3d0 drivers/scsi/scsi_lib.c:1043
RSP: 0018:ffff88006978e500 EFLAGS: 00010097
RAX: ffff88006baa4500 RBX: ffff8800683f2c00 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffff88006afc79c0 RDI: ffff88006afc7aa0
RBP: ffff88006978e548 R08: 0000000000000001 R09: 0000000000000001
R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800683f2c00
R13: ffff8800683f2d40 R14: ffff880068b335d8 R15: ffff88006afc79c0
FS: 0000000002572880(0000) GS:ffff88006d000000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000020c03000 CR3: 000000006b04f000 CR4: 00000000001406f0
Call Trace:
scsi_setup_blk_pc_cmnd drivers/scsi/scsi_lib.c:1153 [inline]
scsi_setup_cmnd+0x13b/0x5d0 drivers/scsi/scsi_lib.c:1201
scsi_prep_fn+0x375/0x610 drivers/scsi/scsi_lib.c:1313
blk_peek_request+0x686/0xcc0 block/blk-core.c:2382
scsi_request_fn+0x19e/0x1d70 drivers/scsi/scsi_lib.c:1709
__blk_run_queue_uncond block/blk-core.c:325 [inline]
__blk_run_queue+0xc5/0x130 block/blk-core.c:343
blk_execute_rq_nowait+0x304/0x480 block/blk-exec.c:83
sg_common_write.isra.22+0x10b8/0x1b00 drivers/scsi/sg.c:804
sg_new_write.isra.25+0x5e7/0x990 drivers/scsi/sg.c:747
sg_ioctl+0x244b/0x39a0 drivers/scsi/sg.c:855
vfs_ioctl fs/ioctl.c:43 [inline]
do_vfs_ioctl+0x1bf/0x1790 fs/ioctl.c:683
SYSC_ioctl fs/ioctl.c:698 [inline]
SyS_ioctl+0x8f/0xc0 fs/ioctl.c:689
entry_SYSCALL_64_fastpath+0x1f/0xc2
RIP: 0033:0x434da9
RSP: 002b:00007ffd20ad81a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
RAX: ffffffffffffffda RBX: 0000000000401b00 RCX: 0000000000434da9
RDX: 0000000020007000 RSI: 0000000000002285 RDI: 0000000000000003
RBP: 0000000000000000 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000100000004
R13: 0000000000401b00 R14: 0000000000401b90 R15: 0000000000000000
Code: df 49 c7 85 e8 00 00 00 00 00 00 00 e8 37 f9 fd ff 48 8b 7d c8
48 81 c7 38 02 00 00 e8 f7 8a f0 ff e9 70 ff ff ff e8 fd f1 a5 fe <0f>
0b e8 f6 f1 a5 fe 48 8b 3d 8f a8 f1 03 be 20 80 08 01 e8 05
RIP: scsi_init_io+0x2a3/0x3d0 drivers/scsi/scsi_lib.c:1043 RSP: ffff88006978e500
---[ end trace 08eb8aec64134983 ]---


// autogenerated by syzkaller (http://github.com/google/syzkaller)
#define _GNU_SOURCE
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <sys/mount.h>
#include <sys/syscall.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main()
{
syscall(__NR_mmap, 0x20000000ul, 0xfc0000ul, 0x3ul,
0x32ul, 0xfffffffffffffffful, 0x8000000ul, 0,
0, 0);
int fd = syscall(__NR_open, "/dev/sg0", 0x0ul, 0, 0, 0, 0, 0, 0);
(*(uint64_t*)0x2000f000 = (uint64_t)0x20003fc0);
(*(uint64_t*)0x2000f008 = (uint64_t)0x20003fc0);
(*(uint64_t*)0x2000f010 = (uint64_t)0x2000f000);
(*(uint64_t*)0x20003fc0 = (uint64_t)0x7);
(*(uint32_t*)0x20003fc8 = (uint32_t)0x0);
(*(uint32_t*)0x20003fcc = (uint32_t)0x0);
(*(uint16_t*)0x20003fd0 = (uint16_t)0x0);
(*(uint16_t*)0x20003fd2 = (uint16_t)0x0);
(*(uint32_t*)0x20003fd4 = fd);
(*(uint64_t*)0x20003fd8 = (uint64_t)0x2000f000);
(*(uint64_t*)0x20003fe0 = (uint64_t)0x0);
(*(uint64_t*)0x20003fe8 = (uint64_t)0x0);
(*(uint64_t*)0x20003ff0 = (uint64_t)0x20007000);
(*(uint32_t*)0x20003ff8 = (uint32_t)0x1);
(*(uint32_t*)0x20003ffc = fd);
(*(uint64_t*)0x20007000 = (uint64_t)0x0);
(*(uint32_t*)0x20007008 = (uint32_t)0x4000009);
(*(uint32_t*)0x2000700c = (uint32_t)0x1);
(*(uint64_t*)0x20007010 = (uint64_t)0x20c03000);
(*(uint64_t*)0x20007018 = (uint64_t)0x2000f000);
(memcpy((void*)0x2000f000,
"\x83\x3c\x35\x2f\xff\x00\x00\x00\x00\x00\x00\x7f"
"\xff\x00\x00\x00\x00\x82\x7a\x7f\xa3\xcc\x90\xbe"
"\x3d\xf8\x43\x81\xc5\x02",
30));
(*(uint64_t*)0x20003fc0 = (uint64_t)0x649e);
(*(uint32_t*)0x20003fc8 = (uint32_t)0x0);
(*(uint32_t*)0x20003fcc = (uint32_t)0x0);
(*(uint16_t*)0x20003fd0 = (uint16_t)0x0);
(*(uint16_t*)0x20003fd2 = (uint16_t)0x1);
(*(uint32_t*)0x20003fd4 = fd);
(*(uint64_t*)0x20003fd8 = (uint64_t)0x2000f000);
(*(uint64_t*)0x20003fe0 = (uint64_t)0xaf);
(*(uint64_t*)0x20003fe8 = (uint64_t)0xffff);
(*(uint64_t*)0x20003ff0 = (uint64_t)0x20378fb0);
(*(uint32_t*)0x20003ff8 = (uint32_t)0x1);
(*(uint32_t*)0x20003ffc = fd);
(memcpy(
(void*)0x2000f000,
"\x05\x60\x1e\xc6\x2e\x5d\xdf\xc8\xcd\xd1\xd8\x2c\x37\x5f\xa2\x63"
"\xec\x39\x1d\x03\xf8\xfd\x1d\xe8\xf6\xfd\x84\x33\xfb\x7a\xd4\xfb"
"\xaf\x30\x6a\x0a\x2a\x43\xd8\xbb\x41\xcc\x7a\x74\x17\xe8\x66\x62"
"\x40\x17\x4d\x14\x34\x9e\x1b\x3b\x43\x50\x22\x95\x54\x05\x1e\xfd"
"\x8f\x9e\xb6\xe8\x93\x5c\xee\x48\x5f\xf8\x41\xac\x62\x5c\x0e\x80"
"\x2e\x3c\x55\x3f\xb1\xe1\x06\x10\xda\xce\xfd\x0b\x55\x79\x1b\x7c"
"\x10\x21\xfc\x0b\xb7\xee\x49\x1e\x07\x49\xdc\xe1\xb4\x77\xcb\xb3"
"\xbc\x85\xbc\x91\x1d\x1c\x22\xa3\x97\x43\x1a\x85\x0a\x7e\xf9\xc3"
"\x90\x06\x40\xbe\x4e\x9c\x9a\x8d\xe7\x14\xa7\xfc\xbc\x4c\x51\x95"
"\x54\xfb\x84\xd3\x20\x96\x33\xd1\x5e\x12\x65\x63\xb5\x5f\xd7\xc7"
"\x07\x46\x1a\x0e\xa3\x89\x00\x0f\xda\xd4\x3d\x9c\xff\x24\x4f",
175));
(*(uint64_t*)0x20378fb0 = (uint64_t)0x20);
(*(uint32_t*)0x20378fb8 = (uint32_t)0xffffffffffffffff);
(*(uint32_t*)0x20378fbc = (uint32_t)0x0);
(*(uint64_t*)0x20378fc0 = (uint64_t)0x81);
(*(uint64_t*)0x20378fc8 = (uint64_t)0x3ff);
(*(uint64_t*)0x20378fd0 = (uint64_t)0x6);
(*(uint64_t*)0x20378fd8 = (uint64_t)0x4);
(*(uint64_t*)0x20378fe0 = (uint64_t)0x2);
(*(uint64_t*)0x20378fe8 = (uint64_t)0x4);
(*(uint64_t*)0x20378ff0 = (uint64_t)0x100000004);
(*(uint64_t*)0x20378ff8 = (uint64_t)0x3);
(*(uint64_t*)0x2000f000 = (uint64_t)0xffff);
(*(uint32_t*)0x2000f008 = (uint32_t)0x0);
(*(uint32_t*)0x2000f00c = (uint32_t)0x0);
(*(uint16_t*)0x2000f010 = (uint16_t)0x7);
(*(uint16_t*)0x2000f012 = (uint16_t)0x401);
(*(uint32_t*)0x2000f014 = (uint32_t)0xffffffffffffffff);
(*(uint64_t*)0x2000f018 = (uint64_t)0x2000ffed);
(*(uint64_t*)0x2000f020 = (uint64_t)0x13);
(*(uint64_t*)0x2000f028 = (uint64_t)0x0);
(*(uint64_t*)0x2000f030 = (uint64_t)0x2000f000);
(*(uint32_t*)0x2000f038 = (uint32_t)0x1);
(*(uint32_t*)0x2000f03c = fd);
(memcpy((void*)0x2000ffed, "\x4e\x80\xd3\x97\x1f\x50\xaa"
"\xe2\x09\xbc\x10\x45\x72\x24"
"\xc0\xc2\x60\x5c\xa8",
19));
(*(uint64_t*)0x2000f000 = (uint64_t)0x101);
(*(uint32_t*)0x2000f008 = (uint32_t)0x9);
(*(uint32_t*)0x2000f00c = (uint32_t)0x6);
(*(uint32_t*)0x2000f010 = (uint32_t)0x0);
syscall(__NR_io_submit, 0x0ul, 0x3ul, 0x2000f000ul, 0, 0, 0, 0, 0, 0);
(memcpy((void*)0x20007000, "\x53", 1));
syscall(__NR_ioctl, fd, 0x2285ul, 0x20007000ul, 0, 0, 0, 0, 0, 0);
return 0;
}


On commit fd694aaa46c7ed811b72eb47d5eb11ce7ab3f7f1

Johannes Thumshirn

unread,
Jan 31, 2017, 4:20:52 AM1/31/17
to Dmitry Vyukov, je...@linux.vnet.ibm.com, Martin K. Petersen, linux-scsi, LKML, Al Viro, syzkaller, Hannes Reinecke
On Tue, Jan 31, 2017 at 09:55:52AM +0100, Dmitry Vyukov wrote:
> Hello,
>
> The following program triggers BUG in scsi_init_io:

Well crashing a machine just because of an empty dma transfer is a bit harsh,
isn't it?

From 86e6fa5f618fe588b98e923e032f33e075fcd4f4 Mon Sep 17 00:00:00 2001
From: Johannes Thumshirn <jthum...@suse.de>
Date: Tue, 31 Jan 2017 10:16:00 +0100
Subject: [PATCH] scsi: don't BUG_ON() empty DMA transfers

Don't crash the machine just because of an empty transfer. Use WARN_ON()
combined with returning an error.

Signed-off-by: Johannes Thumshirn <jthum...@suse.de>
---
drivers/scsi/scsi_lib.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index e9e1e14..414588a 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1040,7 +1040,8 @@ int scsi_init_io(struct scsi_cmnd *cmd)
bool is_mq = (rq->mq_ctx != NULL);
int error;

- BUG_ON(!blk_rq_nr_phys_segments(rq));
+ if (WARN_ON(!blk_rq_nr_phys_segments(rq)))
+ return -EINVAL;

error = scsi_init_sgtable(rq, &cmd->sdb);
if (error)
--
2.10.2


--
Johannes Thumshirn Storage
jthum...@suse.de +49 911 74053 689
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)
Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850

Dmitry Vyukov

unread,
Jan 31, 2017, 4:51:16 AM1/31/17
to Johannes Thumshirn, je...@linux.vnet.ibm.com, Martin K. Petersen, linux-scsi, LKML, Al Viro, syzkaller, Hannes Reinecke
Please-please-please, let's not use WARN for something that is not a
kernel bug and is user-triggerable. This makes it impossible to
automate kernel testing and requires hiring an army of people doing
mechanical job of sorting out WARNING reports into kernel-bugs and
non-kernel-bugs.
If the message is absolutely necessary (while kernel does not
generally explain every EINVAL on console), the following will do:

if (!blk_rq_nr_phys_segments(rq)) {
pr_err("you are doing something wrong\n");

Johannes Thumshirn

unread,
Jan 31, 2017, 4:58:25 AM1/31/17
to Dmitry Vyukov, je...@linux.vnet.ibm.com, Martin K. Petersen, linux-scsi, LKML, Al Viro, syzkaller, Hannes Reinecke
On Tue, Jan 31, 2017 at 10:50:49AM +0100, Dmitry Vyukov wrote:
> On Tue, Jan 31, 2017 at 10:20 AM, Johannes Thumshirn <jthum...@suse.de> wrote:
> > On Tue, Jan 31, 2017 at 09:55:52AM +0100, Dmitry Vyukov wrote:

[...]

> Please-please-please, let's not use WARN for something that is not a
> kernel bug and is user-triggerable. This makes it impossible to
> automate kernel testing and requires hiring an army of people doing
> mechanical job of sorting out WARNING reports into kernel-bugs and
> non-kernel-bugs.
> If the message is absolutely necessary (while kernel does not
> generally explain every EINVAL on console), the following will do:
>
> if (!blk_rq_nr_phys_segments(rq)) {
> pr_err("you are doing something wrong\n");
> return -EINVAL;
> }

Yes I understand that. OTOH having the WARN helps you finding the caller because
of to the stack trace. But arguably that could be accomplished with function
graph tracing as well. I'll re-send a v2 as a proper patch.

Thanks,
Johannes

Dmitry Vyukov

unread,
Jan 31, 2017, 5:06:50 AM1/31/17
to Johannes Thumshirn, je...@linux.vnet.ibm.com, Martin K. Petersen, linux-scsi, LKML, Al Viro, syzkaller, Hannes Reinecke
On Tue, Jan 31, 2017 at 10:58 AM, Johannes Thumshirn <jthum...@suse.de> wrote:
>
> [...]
>
>> Please-please-please, let's not use WARN for something that is not a
>> kernel bug and is user-triggerable. This makes it impossible to
>> automate kernel testing and requires hiring an army of people doing
>> mechanical job of sorting out WARNING reports into kernel-bugs and
>> non-kernel-bugs.
>> If the message is absolutely necessary (while kernel does not
>> generally explain every EINVAL on console), the following will do:
>>
>> if (!blk_rq_nr_phys_segments(rq)) {
>> pr_err("you are doing something wrong\n");
>> return -EINVAL;
>> }
>
> Yes I understand that. OTOH having the WARN helps you finding the caller because
> of to the stack trace. But arguably that could be accomplished with function
> graph tracing as well. I'll re-send a v2 as a proper patch.


Thank you very much.
Stack trace can be done with dump_stack() if necessary, e.g.

pr_err("you are doing something wrong here:\n");
dump_stack();

James Bottomley

unread,
Jan 31, 2017, 10:41:59 AM1/31/17
to Dmitry Vyukov, Johannes Thumshirn, Martin K. Petersen, linux-scsi, LKML, Al Viro, syzkaller, Hannes Reinecke
It is a kernel bug and it should not be user triggerable, so it should
have a warn_on or bug_on. It means something called a data setup
function with no data. There's actually a root cause that patches like
this won't fix, can we find it?

James

Al Viro

unread,
Feb 19, 2017, 2:16:14 AM2/19/17
to James Bottomley, Dmitry Vyukov, Johannes Thumshirn, Martin K. Petersen, linux-scsi, LKML, syzkaller, Hannes Reinecke, Linus Torvalds
On Tue, Jan 31, 2017 at 07:41:51AM -0800, James Bottomley wrote:

> > Please-please-please, let's not use WARN for something that is not a
> > kernel bug and is user-triggerable.
>
> It is a kernel bug and it should not be user triggerable, so it should
> have a warn_on or bug_on. It means something called a data setup
> function with no data. There's actually a root cause that patches like
> this won't fix, can we find it?

The root cause is unfixable without access to TARDIS and dose of
antipsychotics sufficient to prevent /dev/sg API creation.

What happens is that write to /dev/sg is given a request with non-zero
->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing
to an array full of empty iovecs.

AFAICS, the minimal fix would be something like this:

YAMissingSanityCheck in /dev/sg

write permission to /dev/sg shouldn't be equivalent to the ability to trigger
BUG_ON() while holding spinlocks...

Cc: sta...@vger.kernel.org
Signed-off-by: Al Viro <vi...@zeniv.linux.org.uk>
---

diff --git a/drivers/scsi/sg.c b/drivers/scsi/sg.c
index dbe5b4b95df0..121de0aaa6ad 100644
--- a/drivers/scsi/sg.c
+++ b/drivers/scsi/sg.c
@@ -1753,6 +1753,10 @@ sg_start_req(Sg_request *srp, unsigned char *cmd)
return res;

iov_iter_truncate(&i, hp->dxfer_len);
+ if (!iov_iter_count(&i)) {
+ kfree(iov);
+ return -EINVAL;
+ }

res = blk_rq_map_user_iov(q, rq, md, &i, GFP_ATOMIC);
kfree(iov);

Christoph Hellwig

unread,
Feb 19, 2017, 12:25:53 PM2/19/17
to Al Viro, James Bottomley, Dmitry Vyukov, Johannes Thumshirn, Martin K. Petersen, linux-scsi, LKML, syzkaller, Hannes Reinecke, Linus Torvalds
On Sun, Feb 19, 2017 at 07:15:27AM +0000, Al Viro wrote:
> The root cause is unfixable without access to TARDIS and dose of
> antipsychotics sufficient to prevent /dev/sg API creation.
>
> What happens is that write to /dev/sg is given a request with non-zero
> ->iovec_count combined with zero ->dxfer_len. Or with ->dxferp pointing
> to an array full of empty iovecs.
>
> AFAICS, the minimal fix would be something like this:
>
> YAMissingSanityCheck in /dev/sg
>
> write permission to /dev/sg shouldn't be equivalent to the ability to trigger
> BUG_ON() while holding spinlocks...

Looks fine to me:

Reviewed-by: Christoph Hellwig <h...@lst.de>

The other thing we really need to consider is to finally merge the
SG_IO implementations for /dev/sg with the common block layer one.

Linus Torvalds

unread,
Feb 19, 2017, 12:38:27 PM2/19/17
to James Bottomley, Dmitry Vyukov, Johannes Thumshirn, Martin K. Petersen, linux-scsi, LKML, Al Viro, syzkaller, Hannes Reinecke
On Tue, Jan 31, 2017 at 7:41 AM, James Bottomley
<je...@linux.vnet.ibm.com> wrote:
>
> It is a kernel bug and it should not be user triggerable, so it should
> have a warn_on or bug_on.

Hell NO.

Christ, James, listen to yourself. What you are basically saying when
you say it should be a BUG_ON() is

"This shouldn't happen, but if it ever does happen, let's just turn
our mistaken assumptions into a dead machine that is really hard to
debug".

Because a BUG_ON() effectively kills the machine if the call chain has
some locks held. In the SCSI layer, that generally means that there
will be no logged oops either, because any locks held likely just
killed your filesystem or disk subsystem, so now that oops is
basically not even likely to be reported by most normal users.

So stop this "should have a bug_on". In fact, since this apparently
_is_ easily user-triggerable, it damn well shouldn't have a warn_on
either. At most, a WARN_ON_ONCE(), so that we might get reports of
_what_ the bad call chain is, but we will never kill the machine and
we will *not* give people the ability to randomly spam the system
logs.

BUG_ON() needs to die. People need to realize that it is a _problem_,
and that it makes any bugs _worse_. Don't do it.

The only valid reason for BUG_ON() is when some very core data
structure is _so_ corrupt that you can't even continue, because you
simply can't even return an error and there's no way for you to just
say "log it once and continue".

And by that I don't mean some random value you have in a request. I
mean literally "this is a really core data structure, and I simply
_cannot_ continue" (where that "cannot" is about an actual physical
impossibility, not a "I could continue but I think this is serious").

Anything else is a "return error, possibly with a WARN_ON() to let
people know that bad things are going on".

Basically, BUG_ON() should be in core kernel code. Not in drivers. And
even in core kernel code, it's likely wrong.

Linus

Linus Torvalds

unread,
Feb 19, 2017, 12:56:01 PM2/19/17
to Al Viro, James Bottomley, Dmitry Vyukov, Johannes Thumshirn, Martin K. Petersen, linux-scsi, LKML, syzkaller, Hannes Reinecke
On Sat, Feb 18, 2017 at 11:15 PM, Al Viro <vi...@zeniv.linux.org.uk> wrote:
>
> AFAICS, the minimal fix would be something like this:

Applied, along with getting rid of the BUG_ON() that made this bug
much worse than it would have been without it.

Linus
Reply all
Reply to author
Forward
0 new messages