Re: [PATCH] bfs: add sanity check at bfs_fill_super().

11 views
Skip to first unread message

Tetsuo Handa

unread,
Jun 13, 2018, 9:33:11 AM6/13/18
to Tigran Aivazian, Andrew Morton, linux-...@vger.kernel.org, syzbot, syzkaller-bugs
On 2018/05/10 8:53, Andrew Morton wrote:
> On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin...@i-love.sakura.ne.jp> wrote:
>
>>> page-allocation-fauilure warning and a nice backtrace, etc. Why
>>> suppress all of that and add our custom warning instead?
>>
>> the intent of this patch is to avoid panic() by panic_on_warn == 1
>> due to hitting
>>
>> struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
>> {
>> unsigned int index;
>>
>> if (unlikely(size > KMALLOC_MAX_SIZE)) {
>> WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */
>> return NULL;
>> }
>>
>> when size to allocate is controlled by the filesystem image.
>
> Well, the same could happen with many many memory-allocation sites.
> What's special about BFS? If someone sets panic_on_warn=1 then
> presumably this panic is the behaviour they wanted in this case.
>

Tigran, this patch is stalling. Do we want to apply this? Or, ignore as invalid?

errors=panic mount option for ext4 case was ignored as invalid.
http://lkml.kernel.org/r/CACT4Y+Z+2YW_VALJzzQr6hLsviA=dXk3iFqwVf+...@mail.gmail.com

But I prefer avoiding crashes if we can fix it.

Tigran Aivazian

unread,
Jun 13, 2018, 9:49:34 AM6/13/18
to Tetsuo Handa, Andrew Morton, linux-...@vger.kernel.org, syzbot, syzkaller-bugs
Having read the discussion carefully, I personally prefer to ignore
the fix as invalid, because mounting a filesystem image is a
privileged operation and if attempting to mount a corrupted image
causes a panic, that is no big deal, imho.

On 13 June 2018 at 14:33, Tetsuo Handa

Dmitry Vyukov

unread,
Jun 13, 2018, 12:00:48 PM6/13/18
to Tigran Aivazian, Tetsuo Handa, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
On Wed, Jun 13, 2018 at 3:49 PM, Tigran Aivazian
<aivazia...@gmail.com> wrote:
> Having read the discussion carefully, I personally prefer to ignore
> the fix as invalid, because mounting a filesystem image is a
> privileged operation and if attempting to mount a corrupted image
> causes a panic, that is no big deal, imho.

Even if not a big deal, but still a bug?

Also note that this is kind a chicken and egg problem. The only reason
why these operations are privileged is that we have bugs and we are
not fixing them.

Also keep this in mind whenever you insert anything into usb and
automount if turned on ;) You are basically giving permission to that
thing to do everything with the machine. Or when you mount any image
not created by you with trusted tools that you build yourself from
trusted sources with a trusted compiler.

Also keep in mind Android and other systems where root is not a trusted entity.



> On 13 June 2018 at 14:33, Tetsuo Handa
> <penguin...@i-love.sakura.ne.jp> wrote:
>> On 2018/05/10 8:53, Andrew Morton wrote:
>>> On Thu, 10 May 2018 08:46:18 +0900 Tetsuo Handa <penguin...@i-love.sakura.ne.jp> wrote:
>>>
>>>>> page-allocation-fauilure warning and a nice backtrace, etc. Why
>>>>> suppress all of that and add our custom warning instead?
>>>>
>>>> the intent of this patch is to avoid panic() by panic_on_warn == 1
>>>> due to hitting
>>>>
>>>> struct kmem_cache *kmalloc_slab(size_t size, gfp_t flags)
>>>> {
>>>> unsigned int index;
>>>>
>>>> if (unlikely(size > KMALLOC_MAX_SIZE)) {
>>>> WARN_ON_ONCE(!(flags & __GFP_NOWARN)); /* <= this line */
>>>> return NULL;
>>>> }
>>>>
>>>> when size to allocate is controlled by the filesystem image.
>>>
>>> Well, the same could happen with many many memory-allocation sites.
>>> What's special about BFS? If someone sets panic_on_warn=1 then
>>> presumably this panic is the behaviour they wanted in this case.
>>>
>>
>> Tigran, this patch is stalling. Do we want to apply this? Or, ignore as invalid?
>>
>> errors=panic mount option for ext4 case was ignored as invalid.
>> http://lkml.kernel.org/r/CACT4Y+Z+2YW_VALJzzQr6hLsviA=dXk3iFqwVf+...@mail.gmail.com
>>
>> But I prefer avoiding crashes if we can fix it.
>
> --
> You received this message because you are subscribed to the Google Groups "syzkaller-bugs" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to syzkaller-bug...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/syzkaller-bugs/CAK%2B_RLko_OCepN4FCmsaQPAKkt9JNGe8pNRK7SO-onhw5zCneA%40mail.gmail.com.
> For more options, visit https://groups.google.com/d/optout.

Tetsuo Handa

unread,
Jun 13, 2018, 6:09:45 PM6/13/18
to Tigran Aivazian, Andrew Morton, linux-...@vger.kernel.org, syzbot, syzkaller-bugs
On 2018/06/13 22:49, Tigran Aivazian wrote:
> Having read the discussion carefully, I personally prefer to ignore
> the fix as invalid, because mounting a filesystem image is a
> privileged operation and if attempting to mount a corrupted image
> causes a panic, that is no big deal, imho.

While this report is triggered by a crafted filesystem image, I don't think that
a legitimate but huge filesystem image crashes the system by hitting
(size > KMALLOC_MAX_SIZE) path is nice. While filesystem should try to avoid
such large allocation, there is no need to crash the system just because
kmalloc() failed.

e.g. http://lkml.kernel.org/r/927f24d4-b0c3-8192...@iogearbox.net

Tigran Aivazian

unread,
Jun 14, 2018, 3:38:30 AM6/14/18
to Tetsuo Handa, Andrew Morton, linux-...@vger.kernel.org, syzbot, syzkaller-bugs
On 13 June 2018 at 23:09, Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
> While this report is triggered by a crafted filesystem image, I don't think that
> a legitimate but huge filesystem image crashes the system by hitting
> (size > KMALLOC_MAX_SIZE) path is nice. While filesystem should try to avoid
> such large allocation, there is no need to crash the system just because
> kmalloc() failed.
>
> e.g. http://lkml.kernel.org/r/927f24d4-b0c3-8192...@iogearbox.net

Ok, could you please show me the very final version of the suggested
patch, please?

Tetsuo Handa

unread,
Jun 14, 2018, 6:45:18 AM6/14/18
to Tigran Aivazian, Andrew Morton, linux-...@vger.kernel.org, syzbot, syzkaller-bugs
(Which patch did you mean? Hmm, I paste both patches instead of waiting for your answer.)



commit 0c54bbdb89992eeff50866b64366a1a0cbd0e6fb
Author: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Date: Sat May 26 14:15:41 2018 +1000

bfs: add sanity check at bfs_fill_super()

syzbot is reporting too large memory allocation at bfs_fill_super() [1].
Since file system image is corrupted such that bfs_sb->s_start == 0,
bfs_fill_super() is trying to allocate 8MB of continuous memory. Fix this
by adding a sanity check on bfs_sb->s_start, __GFP_NOWARN and printf().

[1] https://syzkaller.appspot.com/bug?id=16a87c236b951351374a84c8a32f40edbc034e96

Link: http://lkml.kernel.org/r/1525862104-3407-1-git-s...@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin...@I-love.SAKURA.ne.jp>
Reported-by: syzbot <syzbot+71c6b5...@syzkaller.appspotmail.com>
Reviewed-by: Andrew Morton <ak...@linux-foundation.org>
Cc: Tigran Aivazian <aivazia...@gmail.com>
Cc: Matthew Wilcox <wi...@infradead.org>
Signed-off-by: Andrew Morton <ak...@linux-foundation.org>
Signed-off-by: Stephen Rothwell <s...@canb.auug.org.au>

diff --git a/fs/bfs/inode.c b/fs/bfs/inode.c
index 9a69392..d81c148 100644
--- a/fs/bfs/inode.c
+++ b/fs/bfs/inode.c
@@ -350,7 +350,8 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent)

s->s_magic = BFS_MAGIC;

- if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end)) {
+ if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end) ||
+ le32_to_cpu(bfs_sb->s_start) < BFS_BSIZE) {
printf("Superblock is corrupted\n");
goto out1;
}
@@ -359,9 +360,11 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent)
sizeof(struct bfs_inode)
+ BFS_ROOT_INO - 1;
imap_len = (info->si_lasti / 8) + 1;
- info->si_imap = kzalloc(imap_len, GFP_KERNEL);
- if (!info->si_imap)
+ info->si_imap = kzalloc(imap_len, GFP_KERNEL | __GFP_NOWARN);
+ if (!info->si_imap) {
+ printf("Cannot allocate %u bytes\n", imap_len);
goto out1;
+ }
for (i = 0; i < BFS_ROOT_INO; i++)
set_bit(i, info->si_imap);




commit a343993c518ce252b62ec00ac06bccfb1d17129d
Author: Bj旦rn T旦pel <bjorn...@intel.com>
Date: Mon Jun 11 13:57:12 2018 +0200

xsk: silence warning on memory allocation failure

syzkaller reported a warning from xdp_umem_pin_pages():

WARNING: CPU: 1 PID: 4537 at mm/slab_common.c:996 kmalloc_slab+0x56/0x70 mm/slab_common.c:996
...
__do_kmalloc mm/slab.c:3713 [inline]
__kmalloc+0x25/0x760 mm/slab.c:3727
kmalloc_array include/linux/slab.h:634 [inline]
kcalloc include/linux/slab.h:645 [inline]
xdp_umem_pin_pages net/xdp/xdp_umem.c:205 [inline]
xdp_umem_reg net/xdp/xdp_umem.c:318 [inline]
xdp_umem_create+0x5c9/0x10f0 net/xdp/xdp_umem.c:349
xsk_setsockopt+0x443/0x550 net/xdp/xsk.c:531
__sys_setsockopt+0x1bd/0x390 net/socket.c:1935
__do_sys_setsockopt net/socket.c:1946 [inline]
__se_sys_setsockopt net/socket.c:1943 [inline]
__x64_sys_setsockopt+0xbe/0x150 net/socket.c:1943
do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287
entry_SYSCALL_64_after_hwframe+0x49/0xbe

This is a warning about attempting to allocate more than
KMALLOC_MAX_SIZE memory. The request originates from userspace, and if
the request is too big, the kernel is free to deny its allocation. In
this patch, the failed allocation attempt is silenced with
__GFP_NOWARN.

Fixes: c0c77d8fb787 ("xsk: add user memory registration support sockopt")
Reported-by: syzbot+4abadc...@syzkaller.appspotmail.com
Signed-off-by: Bj旦rn T旦pel <bjorn...@intel.com>
Signed-off-by: Daniel Borkmann <dan...@iogearbox.net>

diff --git a/net/xdp/xdp_umem.c b/net/xdp/xdp_umem.c
index b9ef487..f47abb4 100644
--- a/net/xdp/xdp_umem.c
+++ b/net/xdp/xdp_umem.c
@@ -204,7 +204,8 @@ static int xdp_umem_pin_pages(struct xdp_umem *umem)
long npgs;
int err;

- umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs), GFP_KERNEL);
+ umem->pgs = kcalloc(umem->npgs, sizeof(*umem->pgs),
+ GFP_KERNEL | __GFP_NOWARN);
if (!umem->pgs)
return -ENOMEM;

Tigran Aivazian

unread,
Jun 14, 2018, 8:11:55 AM6/14/18
to Tetsuo Handa, Andrew Morton, linux-...@vger.kernel.org, syzbot, syzkaller-bugs
Thank you. Please see comments below:

On 14 June 2018 at 11:45, Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
> (Which patch did you mean? Hmm, I paste both patches instead of waiting for your answer.)

I only meant the BFS patch.
Yes, this is acceptable.

> @@ -359,9 +360,11 @@ static int bfs_fill_super(struct super_block *s, void *data, int silent)
> sizeof(struct bfs_inode)
> + BFS_ROOT_INO - 1;
> imap_len = (info->si_lasti / 8) + 1;
> - info->si_imap = kzalloc(imap_len, GFP_KERNEL);
> - if (!info->si_imap)
> + info->si_imap = kzalloc(imap_len, GFP_KERNEL | __GFP_NOWARN);
> + if (!info->si_imap) {
> + printf("Cannot allocate %u bytes\n", imap_len);
> goto out1;
> + }

This is unnecessary. As Andrew Morton pointed out, if people set
panic_on_warn then they do really want a panic on allocation failure
warnings. Please update the patch with just the above sanity check for
s_start < BFS_BSIZE.

Kind regards,
Tigran

Tigran Aivazian

unread,
Jun 14, 2018, 8:23:35 AM6/14/18
to Dmitry Vyukov, Tetsuo Handa, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
Hello Dmitry,

Historically, Unix mount(2) (and mount(8)) are normally privileged
unless the filesystem is "trusted" by sysadmin who recorded this fact
in the corresponding entry in /etc/fstab (see fstab(5)). I agree that
the filesystem mounting code should perform enough validation on the
superblock to avoid crashing on an invalid filesystem image. But I
disagree that some specific filesystem (and least of all BFS!) should
modify the panic_on_warn semantics and replace it with its own private
warning message on kernel memory allocation failures.

Kind regards,
Tigran

Dmitry Vyukov

unread,
Jun 14, 2018, 8:38:52 AM6/14/18
to Tigran Aivazian, Tetsuo Handa, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
On Thu, Jun 14, 2018 at 2:23 PM, Tigran Aivazian
<aivazia...@gmail.com> wrote:
> Hello Dmitry,
>
> Historically, Unix mount(2) (and mount(8)) are normally privileged
> unless the filesystem is "trusted" by sysadmin who recorded this fact
> in the corresponding entry in /etc/fstab (see fstab(5)). I agree that
> the filesystem mounting code should perform enough validation on the
> superblock to avoid crashing on an invalid filesystem image. But I
> disagree that some specific filesystem (and least of all BFS!) should
> modify the panic_on_warn semantics and replace it with its own private
> warning message on kernel memory allocation failures.

This is not a general memory allocation failure. Nothing is printed on
memory allocation failures.
This is a very specific case of memory allocation failure, namely when
kernel code asks for too large block, such large that it can't be
possible allocated regardless of amount of available memory. For
allocations with user-controllable size kernel code either needs to
use __GFP_NOWARN, or (better) impose own limits such that it can
always be allocated with kmalloc.
Consider, currently we can have a bfs image that works fine on one
kernel, but fails to mount on another just because it happens so that
one could allocate 4MB with kmalloc, but another can't (different
allocator/different settings/different kernel revision). This looks
like pretty unfortunate property for persistent data formats in general.

Tigran Aivazian

unread,
Jun 14, 2018, 9:05:09 AM6/14/18
to Dmitry Vyukov, Tetsuo Handa, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
On 14 June 2018 at 13:38, Dmitry Vyukov <dvy...@google.com> wrote:
> Consider, currently we can have a bfs image that works fine on one
> kernel, but fails to mount on another just because it happens so that
> one could allocate 4MB with kmalloc, but another can't (different
> allocator/different settings/different kernel revision).

Yes, but this would only happen _without_ the validation proposed by
Tetsuo Handa. If we check s_start then the invalid enormous allocation
request will not be made and what you describe won't not happen.

Dmitry Vyukov

unread,
Jun 14, 2018, 9:13:15 AM6/14/18
to Tigran Aivazian, Tetsuo Handa, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
Agree. If we do a sanity check first, then __GFP_NOWARN is actually
useful: it will detect the cases where a filesystem can be mounted on
one machine but not on another (and will also double check that our
sanity is really sane).

Tetsuo Handa

unread,
Jun 14, 2018, 9:28:23 AM6/14/18
to Tigran Aivazian, Dmitry Vyukov, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
What is possible largest value for imap_len ?

info->si_lasti = (le32_to_cpu(bfs_sb->s_start) - BFS_BSIZE) / sizeof(struct bfs_inode) + BFS_ROOT_INO - 1;
imap_len = (info->si_lasti / 8) + 1;
info->si_imap = kzalloc(imap_len, GFP_KERNEL);

Since sizeof(struct bfs_inode) is 64 and bfs_sb->s_start is unsigned 32bits integer
(where constraints is BFS_BSIZE <= bfs_sb->s_start <= bfs_sb->s_end), theoretically
it is possible to assign bfs_sb->s_start > 2GB (apart from whether such value makes
sense). Then, isn't it possible that imap_len > 4M and still hit KMALLOC_MAX_SIZE limit?

Tigran Aivazian

unread,
Jun 14, 2018, 11:13:31 AM6/14/18
to Tetsuo Handa, Dmitry Vyukov, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
On 14 June 2018 at 14:28, Tetsuo Handa
<penguin...@i-love.sakura.ne.jp> wrote:
> What is possible largest value for imap_len ?
>
> info->si_lasti = (le32_to_cpu(bfs_sb->s_start) - BFS_BSIZE) / sizeof(struct bfs_inode) + BFS_ROOT_INO - 1;
> imap_len = (info->si_lasti / 8) + 1;
> info->si_imap = kzalloc(imap_len, GFP_KERNEL);
>
> Since sizeof(struct bfs_inode) is 64 and bfs_sb->s_start is unsigned 32bits integer
> (where constraints is BFS_BSIZE <= bfs_sb->s_start <= bfs_sb->s_end), theoretically
> it is possible to assign bfs_sb->s_start > 2GB (apart from whether such value makes
> sense). Then, isn't it possible that imap_len > 4M and still hit KMALLOC_MAX_SIZE limit?

You are correct, but the proper fix should be to restrict imap_len to
whatever the maximum value allowed by BFS filesystem layout and reject
anything beyond it. I will try to remember what it was from the notes
I made when I wrote BFS back in 1999. Please wait (possibly a few
days) and I will let you know what those values are.

Kind regards,
Tigran

Tigran Aivazian

unread,
Jun 14, 2018, 12:15:37 PM6/14/18
to Tetsuo Handa, Dmitry Vyukov, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
Actually, a more accurate sanity check for the value of s_start should
be (patch against 4.16.3):

--- fs/bfs/inode.c.0 2018-06-14 16:50:52.136792126 +0100
+++ fs/bfs/inode.c 2018-06-14 16:51:49.344792119 +0100
@@ -350,7 +350,8 @@

s->s_magic = BFS_MAGIC;

- if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end)) {
+ if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end) ||
+ le32_to_cpu(bfs_sb->s_start) < sizeof(struct bfs_super_block)
+ sizeof(struct bfs_dirent)) {
printf("Superblock is corrupted\n");
goto out1;
}

However, that doesn't address the issue of the _upper_ limit of
s_start, i.e. it can still get (on an invalid image pretending to be
BFS) arbitrarily large and cause the allocation to fail as you
described. I will dig a bit more (in my memories :) and try to come up
with the check which doesn't reject a valid BFS image and at the same
time restricts s_start (or imap_len which ultimately depends on it)
sufficiently to prevent wild kernel memory allocation requests.

Btw, I included in the WikiPedia article "Boot File System" a
reference to the original "BFS kernel support" webpage from those
ancient days: http://ftp.linux.org.uk/pub/linux/iBCS/bfs/

Tigran Aivazian

unread,
Jun 14, 2018, 3:00:24 PM6/14/18
to Tetsuo Handa, Dmitry Vyukov, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
Ah, it turned out easier than I thought! The maximum number of inodes
of a BFS filesystem is 512, so an inode map cannot be longer than 65
bytes. Well, we can be generous and restrict imap_len to 128 and be
done with it :)

Namely, if the calculated imap_len turns out to be greater than 128,
then something is definitely wrong and the filesystem image should be
rejected as corrupted.

Tetsuo Handa

unread,
Jun 14, 2018, 6:18:35 PM6/14/18
to Tigran Aivazian, Dmitry Vyukov, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
On 2018/06/15 4:00, Tigran Aivazian wrote:
>> However, that doesn't address the issue of the _upper_ limit of
>> s_start, i.e. it can still get (on an invalid image pretending to be
>> BFS) arbitrarily large and cause the allocation to fail as you
>> described. I will dig a bit more (in my memories :) and try to come up
>> with the check which doesn't reject a valid BFS image and at the same
>> time restricts s_start (or imap_len which ultimately depends on it)
>> sufficiently to prevent wild kernel memory allocation requests.
>>
>> Btw, I included in the WikiPedia article "Boot File System" a
>> reference to the original "BFS kernel support" webpage from those
>> ancient days: http://ftp.linux.org.uk/pub/linux/iBCS/bfs/
>
> Ah, it turned out easier than I thought! The maximum number of inodes
> of a BFS filesystem is 512, so an inode map cannot be longer than 65
> bytes. Well, we can be generous and restrict imap_len to 128 and be
> done with it :)
>
> Namely, if the calculated imap_len turns out to be greater than 128,
> then something is definitely wrong and the filesystem image should be
> rejected as corrupted.
>
So, the constraint is

if (le32_to_cpu(bfs_sb->s_start) > le32_to_cpu(bfs_sb->s_end) ||
le32_to_cpu(bfs_sb->s_end) > What_is_the_number_here)

you can write the fix yourself...

Tigran Aivazian

unread,
Jun 15, 2018, 6:45:35 AM6/15/18
to Tetsuo Handa, Dmitry Vyukov, Andrew Morton, linux-fsdevel, syzbot, syzkaller-bugs
On 14 June 2018 at 23:18, Tetsuo Handa
No, s_end has nothing to do with the number of inodes, it is to do
with the actual data blocks.

Yes, I am writing the fix myself and will test it under 4.17.1 to
which I switched my Ubuntu desktop just now.

Thanks,
Tigran
Reply all
Reply to author
Forward
0 new messages