Re: [QUESTION] WARNNING after 3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs output")

68 views
Skip to first unread message

Greg KH

unread,
Apr 2, 2021, 3:45:18 AM4/2/21
to yangerkun, sta...@vger.kernel.org, vba...@suse.cz, linu...@kvack.org, open-...@googlegroups.com, cle...@redhat.com, zhangyi (F), Kefeng Wang, liuyong...@huawei.com, Zhengyejian (Zetta), Yang Yingliang, chenz...@huawei.com
On Fri, Apr 02, 2021 at 03:16:21PM +0800, yangerkun wrote:
> sysfs_emit(3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to
> format sysfs output")) has a hidden constraint that the buf should be
> alignment with PAGE_SIZE. It's OK since 59bb47985c1d ("mm, sl[aou]b:
> guarantee natural alignment for kmalloc(power-of-two)") help us to solve
> scenes like CONFIG_SLUB_DEBUG or CONFIG_SLOB which will break this.
>
>
> But since lots of stable branch(we reproduce it with 4.19 stable) merge
> 3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs
> output") without 59bb47985c1d ("mm, sl[aou]b: guarantee natural
> alignment for kmalloc(power-of-two)"), we will get the follow warning
> with command 'cat /sys/class/iscsi_transport/tcp/handle' once we enable
> CONFIG_SLUB_DEBUG and start kernel with slub_debug=UFPZ!
>
>
> Obviously, we can backport 59bb47985c1d ("mm, sl[aou]b: guarantee
> natural alignment for kmalloc(power-of-two)") to fix it. But this will
> waste some memory to ensure natural alignment which seems unbearable for
> embedded device. So for stable branch like 4.19, can we just remove the
> warning in sysfs_emit since the only user for it is iscsi, and seq_read
> + sysfs_kf_seq_show can ensure that the buf in sysfs_emit must be aware
> of PAGE_SIZE. Or does there some other advise for this problem?

More users of this function will be backported over time as we all know,
so just removing the functions is not good.

Why is the buffer alignment considered a "waste" here? If that change
is in Linus's tree and newer kernels (it showed up in 5.4 which was
released quite a while ago), where are the people complaining about it
there?

I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural
alignment for kmalloc(power-of-two)") seems like the correct thing to do
here to bring things into alignment (pun intended) with newer kernels.

thanks,

greg k-h

yangerkun

unread,
Apr 8, 2021, 4:06:53 PM4/8/21
to Matthew Wilcox, Greg KH, j...@perches.com, sta...@vger.kernel.org, vba...@suse.cz, linu...@kvack.org, open-...@googlegroups.com, cle...@redhat.com, zhangyi (F), Kefeng Wang, liuyong...@huawei.com, Zhengyejian (Zetta), Yang Yingliang, chenz...@huawei.com


在 2021/4/2 22:41, Matthew Wilcox 写道:
> On Fri, Apr 02, 2021 at 09:45:12AM +0200, Greg KH wrote:
>> Why is the buffer alignment considered a "waste" here? If that change
>> is in Linus's tree and newer kernels (it showed up in 5.4 which was
>> released quite a while ago), where are the people complaining about it
>> there?
>>
>> I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural
>> alignment for kmalloc(power-of-two)") seems like the correct thing to do
>> here to bring things into alignment (pun intended) with newer kernels.
>
> It's only a waste for slabs which need things like redzones (eg we could
> get 7 512-byte allocations out of a 4kB page with a 64 byte redzone
> and no alignment ; with alignment we can only get four). Since slub
> can enable/disable redzones on a per-slab basis, and redzones aren't
> terribly interesting now that we have kasan/kfence, nobody really cares.
>
> .
>

Thanks for your explain! The imfluence seems minimal since the "waste"
will happen only when we enable slub_debug.

One more question for Joe Perches. Patch v2[1] does not add the
alignment check for buf and we add it in v3[2]. I don't see the
necessity for this check... Can you help to explain that why we need this?

Thanks,
Kun.


[1].
https://lore.kernel.org/lkml/a9054fb521e65f2809671fa9c18e2...@perches.com/
[2].
https://lore.kernel.org/lkml/743a648dc817cddd2e70462...@perches.com/

yangerkun

unread,
Apr 8, 2021, 4:06:53 PM4/8/21
to sta...@vger.kernel.org, vba...@suse.cz, gre...@linuxfoundation.org, linu...@kvack.org, open-...@googlegroups.com, cle...@redhat.com, zhangyi (F), Kefeng Wang, liuyong...@huawei.com, Zhengyejian (Zetta), yang...@huawei.com, Yang Yingliang, chenz...@huawei.com
sysfs_emit(3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to
format sysfs output")) has a hidden constraint that the buf should be
alignment with PAGE_SIZE. It's OK since 59bb47985c1d ("mm, sl[aou]b:
guarantee natural alignment for kmalloc(power-of-two)") help us to solve
scenes like CONFIG_SLUB_DEBUG or CONFIG_SLOB which will break this.


But since lots of stable branch(we reproduce it with 4.19 stable) merge
3d8e2128f26a ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs
output") without 59bb47985c1d ("mm, sl[aou]b: guarantee natural
alignment for kmalloc(power-of-two)"), we will get the follow warning
with command 'cat /sys/class/iscsi_transport/tcp/handle' once we enable
CONFIG_SLUB_DEBUG and start kernel with slub_debug=UFPZ!


Obviously, we can backport 59bb47985c1d ("mm, sl[aou]b: guarantee
natural alignment for kmalloc(power-of-two)") to fix it. But this will
waste some memory to ensure natural alignment which seems unbearable for
embedded device. So for stable branch like 4.19, can we just remove the
warning in sysfs_emit since the only user for it is iscsi, and seq_read
+ sysfs_kf_seq_show can ensure that the buf in sysfs_emit must be aware
of PAGE_SIZE. Or does there some other advise for this problem?


# without 59bb47985c1d + 1G ram
[root@localhost ~]# free
total used free shared buff/cache
available
Mem: 947336 169960 389732 896 387644
624216
Swap: 0 0 0

# merge with 59bb47985c1d + 1G ram
[root@localhost ~]# free
total used free shared buff/cache
available
Mem: 947340 175176 374396 896 397768
618964
Swap: 0 0 0
[root@localhost ~]#


[ 37.683332] ------------[ cut here ]------------
[ 37.692747] invalid sysfs_emit: buf:00000000f75441ab
[ 37.693914] WARNING: CPU: 1 PID: 576 at fs/sysfs/file.c:577
sysfs_emit+0xb9/0xe0
[ 37.694861] Modules linked in:
[ 37.695264] CPU: 1 PID: 576 Comm: cat Not tainted
4.19.183-00023-gdf225d326e8c #7
[ 37.696210] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31
04/01/2014
[ 37.697866] RIP: 0010:sysfs_emit+0xb9/0xe0
[ 37.698387] Code: 47 c9 c3 48 83 05 76 33 b3 04 01 48 89 fe 48 c7 c7
64 08 bb 8a 48 83 05 7c 33 b3 04 01 e8 13 7f be 00 48 83 05 77 33 b3 04
01 <0f> 0b 48 83 05 75 33 b3 04 01 48 83 05 73
[ 37.700713] RSP: 0018:ffffc90000af7cf8 EFLAGS: 00010202
[ 37.701370] RAX: 0000000000000000 RBX: ffff88803e0e4c00 RCX:
0000000000000006
[ 37.702261] RDX: 0000000000000007 RSI: 0000000000000006 RDI:
ffff888039455bf0
[ 37.703171] RBP: ffffc90000af7d48 R08: 00000000000002f8 R09:
0000000000000005
[ 37.704079] R10: 00000000000002f7 R11: ffffffff8bd9534d R12:
ffff88801a013740
[ 37.705001] R13: ffff88803db37a08 R14: ffff88803db37a30 R15:
ffff88803db37a48
[ 37.705918] FS: 00007fcb96411580(0000) GS:ffff888039440000(0000)
knlGS:0000000000000000
[ 37.706956] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 37.707692] CR2: 00007fcb88cf0000 CR3: 000000001a501000 CR4:
00000000000006e0
[ 37.708607] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 37.709520] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[ 37.710427] Call Trace:
[ 37.710784] show_transport_handle+0x3e/0x60
[ 37.711338] dev_attr_show+0x22/0x60
[ 37.711808] sysfs_kf_seq_show+0xc6/0x190
[ 37.712332] kernfs_seq_show+0x25/0x30
[ 37.712862] seq_read+0xe1/0x540
[ 37.713292] ? __handle_mm_fault+0xba3/0x1c70
[ 37.713866] kernfs_fop_read+0x36/0x230
[ 37.714371] __vfs_read+0x3c/0x230
[ 37.714819] ? handle_mm_fault+0x1d1/0x340
[ 37.715345] vfs_read+0xb5/0x1b0
[ 37.715774] ksys_read+0x67/0x130
[ 37.716218] __x64_sys_read+0x1e/0x30
[ 37.716701] do_syscall_64+0x95/0x3d0
[ 37.717175] ? do_async_page_fault+0x2e/0x190
[ 37.717747] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[ 37.718406] RIP: 0033:0x7fcb963363f2
[ 37.718881] Code: c0 e9 b2 fe ff ff 50 48 8d 3d 8a 41 0a 00 e8 75 f0
01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f
05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 04
[ 37.721290] RSP: 002b:00007ffea78dff18 EFLAGS: 00000246 ORIG_RAX:
0000000000000000
[ 37.722264] RAX: ffffffffffffffda RBX: 0000000000020000 RCX:
00007fcb963363f2
[ 37.723169] RDX: 0000000000020000 RSI: 00007fcb88cf1000 RDI:
0000000000000003
[ 37.724100] RBP: 00007fcb88cf1000 R08: 00007fcb88cf0010 R09:
0000000000000000
[ 37.725039] R10: 0000000000000022 R11: 0000000000000246 R12:
0000000000020f00
[ 37.725945] R13: 0000000000000003 R14: 0000000000020000 R15:
0000000000020000
[ 37.726857] ---[ end trace fbd5b85cd7d85530 ]---

Matthew Wilcox

unread,
Apr 8, 2021, 4:06:53 PM4/8/21
to Greg KH, yangerkun, sta...@vger.kernel.org, vba...@suse.cz, linu...@kvack.org, open-...@googlegroups.com, cle...@redhat.com, zhangyi (F), Kefeng Wang, liuyong...@huawei.com, Zhengyejian (Zetta), Yang Yingliang, chenz...@huawei.com
On Fri, Apr 02, 2021 at 09:45:12AM +0200, Greg KH wrote:
> Why is the buffer alignment considered a "waste" here? If that change
> is in Linus's tree and newer kernels (it showed up in 5.4 which was
> released quite a while ago), where are the people complaining about it
> there?
>
> I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural
> alignment for kmalloc(power-of-two)") seems like the correct thing to do
> here to bring things into alignment (pun intended) with newer kernels.

yangerkun

unread,
Apr 8, 2021, 4:06:53 PM4/8/21
to sta...@vger.kernel.org, vba...@suse.cz, gre...@linuxfoundation.org, linu...@kvack.org, open-...@googlegroups.com, cle...@redhat.com, zhangyi (F), Kefeng Wang, liuyong...@huawei.com, Zhengyejian (Zetta), Yang Yingliang, chenz...@huawei.com
Emm...

Actually, the problem exist for stable branch like 4.19 after fix for
CVE-2021-27365 which include the follow two patch:
2efc459d06f1 ("sysfs: Add sysfs_emit and sysfs_emit_at to format sysfs
output")
ec98ea7070e9 ("scsi: iscsi: Ensure sysfs attributes are limited to
PAGE_SIZE")
> .

Joe Perches

unread,
Apr 8, 2021, 4:07:18 PM4/8/21
to yangerkun, Matthew Wilcox, Greg KH, sta...@vger.kernel.org, vba...@suse.cz, linu...@kvack.org, open-...@googlegroups.com, cle...@redhat.com, zhangyi (F), Kefeng Wang, liuyong...@huawei.com, Zhengyejian (Zetta), Yang Yingliang, chenz...@huawei.com
On Wed, 2021-04-07 at 20:14 +0800, yangerkun wrote:
>
> 在 2021/4/2 22:41, Matthew Wilcox 写道:
> > On Fri, Apr 02, 2021 at 09:45:12AM +0200, Greg KH wrote:
> > > Why is the buffer alignment considered a "waste" here? If that change
> > > is in Linus's tree and newer kernels (it showed up in 5.4 which was
> > > released quite a while ago), where are the people complaining about it
> > > there?
> > >
> > > I think backporting 59bb47985c1d ("mm, sl[aou]b: guarantee natural
> > > alignment for kmalloc(power-of-two)") seems like the correct thing to do
> > > here to bring things into alignment (pun intended) with newer kernels.
> >
> > It's only a waste for slabs which need things like redzones (eg we could
> > get 7 512-byte allocations out of a 4kB page with a 64 byte redzone
> > and no alignment ; with alignment we can only get four). Since slub
> > can enable/disable redzones on a per-slab basis, and redzones aren't
> > terribly interesting now that we have kasan/kfence, nobody really cares.
> >
> > .
> >
>
> Thanks for your explain! The imfluence seems minimal since the "waste"
> will happen only when we enable slub_debug.
>
> One more question for Joe Perches. Patch v2[1] does not add the
> alignment check for buf and we add it in v3[2]. I don't see the
> necessity for this check... Can you help to explain that why we need this?

It's to make sure it's a PAGE_SIZE aligned buffer.
It's just so it would not be misused/abused in non-sysfs derived cases.

yangerkun

unread,
Apr 8, 2021, 4:07:18 PM4/8/21
to Joe Perches, Matthew Wilcox, Greg KH, sta...@vger.kernel.org, vba...@suse.cz, linu...@kvack.org, open-...@googlegroups.com, cle...@redhat.com, zhangyi (F), Kefeng Wang, liuyong...@huawei.com, Zhengyejian (Zetta), Yang Yingliang, chenz...@huawei.com
> .
>

Thanks! It help me to understand the problem better!
Reply all
Reply to author
Forward
0 new messages