[syzbot] [crypto?] general protection fault in scatterwalk_copychunks (5)

29 views
Skip to first unread message

syzbot

unread,
Dec 26, 2023, 10:28:21 AM12/26/23
to ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com
Hello,

syzbot found the following issue on:

HEAD commit: 39676dfe5233 Add linux-next specific files for 20231222
git tree: linux-next
console+strace: https://syzkaller.appspot.com/x/log.txt?x=172080a1e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
syz repro: https://syzkaller.appspot.com/x/repro.syz?x=178f6e26e80000
C reproducer: https://syzkaller.appspot.com/x/repro.c?x=15c399e9e80000

Downloadable assets:
disk image: https://storage.googleapis.com/syzbot-assets/360542c2ca67/disk-39676dfe.raw.xz
vmlinux: https://storage.googleapis.com/syzbot-assets/900dfb21ca8a/vmlinux-39676dfe.xz
kernel image: https://storage.googleapis.com/syzbot-assets/c94a2a3ea0e0/bzImage-39676dfe.xz

The issue was bisected to:

commit 7bc134496bbbaacb0d4423b819da4eca850a839d
Author: Chengming Zhou <zhouch...@bytedance.com>
Date: Mon Dec 18 11:50:31 2023 +0000

mm/zswap: change dstmem size to one page

bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=15f60c36e80000
final oops: https://syzkaller.appspot.com/x/report.txt?x=17f60c36e80000
console output: https://syzkaller.appspot.com/x/log.txt?x=13f60c36e80000

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+3eff5e...@syzkaller.appspotmail.com
Fixes: 7bc134496bbb ("mm/zswap: change dstmem size to one page")

general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 5065 Comm: syz-executor140 Not tainted 6.7.0-rc6-next-20231222-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
crypto_acomp_compress include/crypto/acompress.h:302 [inline]
zswap_store+0x98b/0x2430 mm/zswap.c:1666
swap_writepage+0x8e/0x220 mm/page_io.c:198
pageout+0x399/0x9e0 mm/vmscan.c:656
shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
walk_pmd_range mm/pagewalk.c:143 [inline]
walk_pud_range mm/pagewalk.c:221 [inline]
walk_p4d_range mm/pagewalk.c:256 [inline]
walk_pgd_range+0xa48/0x1870 mm/pagewalk.c:293
__walk_page_range+0x630/0x770 mm/pagewalk.c:395
walk_page_range+0x626/0xa80 mm/pagewalk.c:521
madvise_pageout_page_range mm/madvise.c:585 [inline]
madvise_pageout+0x32c/0x820 mm/madvise.c:612
madvise_vma_behavior+0x1cc/0x1b50 mm/madvise.c:1031
madvise_walk_vmas+0x1cf/0x2c0 mm/madvise.c:1260
do_madvise+0x333/0x660 mm/madvise.c:1440
__do_sys_madvise mm/madvise.c:1453 [inline]
__se_sys_madvise mm/madvise.c:1451 [inline]
__x64_sys_madvise+0xa9/0x110 mm/madvise.c:1451
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x62/0x6a
RIP: 0033:0x7f15a5e14b69
Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffde7b4a5c8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f15a5e14b69
RDX: 0000000000000015 RSI: 0000000000c00304 RDI: 0000000020000000
RBP: 0000000000000000 R08: 00005555565c6610 R09: 00005555565c6610
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000001
R13: 00007ffde7b4a808 R14: 0000000000000001 R15: 0000000000000001
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: f0 48 c1 e8 03 lock shr $0x3,%rax
5: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1)
9: 0f 85 81 01 00 00 jne 0x190
f: 49 8d 44 24 08 lea 0x8(%r12),%rax
14: 4d 89 26 mov %r12,(%r14)
17: 48 bf 00 00 00 00 00 movabs $0xdffffc0000000000,%rdi
1e: fc ff df
21: 48 89 44 24 10 mov %rax,0x10(%rsp)
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 0f b6 04 38 movzbl (%rax,%rdi,1),%eax <-- trapping instruction
2e: 84 c0 test %al,%al
30: 74 08 je 0x3a
32: 3c 03 cmp $0x3,%al
34: 0f 8e 47 01 00 00 jle 0x181
3a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
3f: 41 rex.B


---
This report is generated by a bot. It may contain errors.
See https://goo.gl/tpsmEJ for more information about syzbot.
syzbot engineers can be reached at syzk...@googlegroups.com.

syzbot will keep track of this issue. See:
https://goo.gl/tpsmEJ#status for how to communicate with syzbot.
For information about bisection process see: https://goo.gl/tpsmEJ#bisection

If the report is already addressed, let syzbot know by replying with:
#syz fix: exact-commit-title

If you want syzbot to run the reproducer, reply with:
#syz test: git://repo/address.git branch-or-commit-hash
If you attach or paste a git patch, syzbot will apply it before testing.

If you want to overwrite report's subsystems, reply with:
#syz set subsystems: new-subsystem
(See the list of subsystem names on the web dashboard)

If the report is a duplicate of another one, reply with:
#syz dup: exact-subject-of-another-report

If you want to undo deduplication, reply with:
#syz undup

Nhat Pham

unread,
Dec 26, 2023, 4:08:23 PM12/26/23
to syzbot, ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com
Looks like it's this line:

scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,1);

My guess is dlen here exceeds 1 page - maybe the data is
incompressible, so the output exceeds the original input? Based on the
included kernel config, the algorithm used here is lzo, and it seems
that can happen for this particular compressor:

https://github.com/nemequ/lzo/blob/master/doc/LZO.FAQ#L200C1-L203C1
https://github.com/nemequ/lzo/blob/master/doc/LZO.TXT#L157C2-L157C62

Not 100% sure about linux kernel's implementation though. I'm no
compression expert :)

If this is the case though, perhaps we can't reduce the output buffer
size to 1 page after all, unless we contractually obligate the output
size to be <= input size (i.e new function in the compression API).

Nhat Pham

unread,
Dec 26, 2023, 6:11:26 PM12/26/23
to Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com
On Tue, Dec 26, 2023 at 2:32 PM Chris Li <chr...@kernel.org> wrote:
>
> Hi Nhat,
>
> Thanks for the first stab on this bug.
> > > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline] <========= This is the offending line.
> > > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
>
> This is the call stack with an inline function. Very nice annotations
> on the inline function expansions call stacks.
>
> > > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > Call Trace:
> > > <TASK>
> > > scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
> > > scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
> >
> > Looks like it's this line:
> >
> > scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,1);
>
> Yes indeed.
>
> The offending instruction is actually this one:
>
> The inlined part of the call stack:
> RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
> <========= This is the offending line.
> static inline void scatterwalk_start(struct scatter_walk *walk,
> struct scatterlist *sg)
> {
> walk->sg = sg;
> walk->offset = sg->offset; <============== RIP pointer
> }
>
> RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
>
> static inline void scatterwalk_pagedone(struct scatter_walk *walk, int out,
> unsigned int more)
> {
> if (out) {
> struct page *page;
>
> page = sg_page(walk->sg) + ((walk->offset - 1) >> PAGE_SHIFT);
> flush_dcache_page(page);
> }
>
> if (more && walk->offset >= walk->sg->offset + walk->sg->length)
> scatterwalk_start(walk, sg_next(walk->sg)); <==========================
> }
>
> RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
>
> void scatterwalk_copychunks(void *buf, struct scatter_walk *walk,
> size_t nbytes, int out)
> {
> for (;;) {
> unsigned int len_this_page = scatterwalk_pagelen(walk);
> u8 *vaddr;
>
> if (len_this_page > nbytes)
> len_this_page = nbytes;
>
> if (out != 2) {
> vaddr = scatterwalk_map(walk);
> memcpy_dir(buf, vaddr, len_this_page, out);
> scatterwalk_unmap(vaddr);
> }
>
> scatterwalk_advance(walk, len_this_page);
>
> if (nbytes == len_this_page)
> break;
>
> buf += len_this_page;
> nbytes -= len_this_page;
>
> scatterwalk_pagedone(walk, out & 1, 1); <=========================
> }
> }
>
> then the unlined portion of the call stack:
> scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
>
> void scatterwalk_map_and_copy(void *buf, struct scatterlist *sg,
> unsigned int start, unsigned int nbytes, int out)
> {
> struct scatter_walk walk;
> struct scatterlist tmp[2];
>
> if (!nbytes)
> return;
>
> sg = scatterwalk_ffwd(tmp, sg, start);
>
> scatterwalk_start(&walk, sg);
> scatterwalk_copychunks(buf, &walk, nbytes, out); <===========================
> scatterwalk_done(&walk, out, 0);
> }
>
> scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
> if (dir)
> ret = crypto_scomp_compress(scomp, scratch->src, req->slen,
> scratch->dst, &req->dlen, *ctx);
> else
> ret = crypto_scomp_decompress(scomp, scratch->src, req->slen,
> scratch->dst, &req->dlen, *ctx);
> if (!ret) {
> if (!req->dst) {
> req->dst = sgl_alloc(req->dlen, GFP_ATOMIC, NULL);
> if (!req->dst) {
> ret = -ENOMEM;
> goto out;
> }
> }
> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> <=======================
> 1);
> }
>
> crypto_acomp_compress include/crypto/acompress.h:302 [inline]
> zswap_store+0x98b/0x2430 mm/zswap.c:1666
> swap_writepage+0x8e/0x220 mm/page_io.c:198
> pageout+0x399/0x9e0 mm/vmscan.c:656
> shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
> reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
> reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
> madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
> walk_pmd_range mm/pagewalk.c:143 [inline]
> >
> > My guess is dlen here exceeds 1 page - maybe the data is
> > incompressible, so the output exceeds the original input? Based on the
> > included kernel config, the algorithm used here is lzo, and it seems
> > that can happen for this particular compressor:
> >
> > https://github.com/nemequ/lzo/blob/master/doc/LZO.FAQ#L200C1-L203C1
> > https://github.com/nemequ/lzo/blob/master/doc/LZO.TXT#L157C2-L157C62
>
> The decompressed output can be bigger than input. However, here we
> specify the output size limit to be one page. The decompressor should
> not output more than the 1 page of the dst buffer.
>
> I did check the lzo1x_decompress_safe() function.

I think you meant lzo1x_1_do_compress() right? This error happens on
the zswap_store() path, so it is a compression bug.

> It is supposed to use the NEED_OP(x) check before it stores the output buffer.

I can't seem to find any check in compression code. But that function
is 300 LoC, with no comment :)

> However I do find one place that seems to be missing that check, at
> least it is not obvious to me how that check is effective. I will
> paste it here let other experts take a look as well:
> Line 228:
>
> if (op - m_pos >= 8) {
> unsigned char *oe = op + t;
> if (likely(HAVE_OP(t + 15))) {
> do {
> COPY8(op, m_pos);
> op += 8;
> m_pos += 8;
> COPY8(op, m_pos);
> op += 8;
> m_pos += 8;
> } while (op < oe);
> op = oe;
> if (HAVE_IP(6)) {
> state = next;
> COPY4(op, ip); <========================= This COPY4 does not have
> obvious NEED_OP() check. As far as I can tell, this output is not
> converted by the above HAVE_OP(t + 15)) check, which means to protect
> the unaligned two 8 byte stores inside the "do while" loops.
> op += next;
> ip += next;
> continue;
> }
> } else {
> NEED_OP(t);
> do {
> *op++ = *m_pos++;
> } while (op < oe);
> }
> } else
>
>
> >
> > Not 100% sure about linux kernel's implementation though. I'm no
> > compression expert :)
>
> Me neither. Anyway, if it is a decompression issue. For this bug, it
> is good to have some debug print assert to check that after
> decompression, the *dlen is not bigger than the original output
> length. If it does blow over the decompression buffer, it is a bug
> that needs to be fixed in the decompression function side, not the
> zswap code.

But regardless, I agree. We should enforce the condition that the
output should not exceed the given buffer's size, and gracefully fails
if it does (i.e returns an interpretable error code as opposed to just
walking off the cliff like this).

I imagine that many compression use-cases are best-effort
optimization, i.e it's fine to fail. zswap is exactly this - do your
best to compress the data, but if it's too incompressible, failure is
acceptable (we have plenty of fallback options - swapping, keeping the
page in memory, etc.).

Furthermore, this is a bit of a leaky abstraction - currently there's
no contract on how much bigger can the "compressed" output be with
respect to the input. It's compression algorithm-dependent, which
defeats the point of the abstraction. In zswap case, 2 * PAGE_SIZE is
just a rule of thumb - now imagine we use a compression algorithm that
behaves super well in most of the cases, but would output 3 *
PAGE_SIZE in some edge cases. This will still break the code. Better
to give the caller the ability to bound the output size, and
gracefully fail if that bound cannot be respected for a particular
input.

>
> Chris

Nhat Pham

unread,
Dec 26, 2023, 7:24:05 PM12/26/23
to Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com
On Tue, Dec 26, 2023 at 3:30 PM Chris Li <chr...@kernel.org> wrote:
>
> Again, sorry I was looking at the decompression side rather than the
> compression side. The compression side does not even offer a safe
> version of the compression function.
> That seems to be dangerous. It seems for now we should make the zswap
> roll back to 2 page buffer until we have a safe way to do compression
> without overwriting the output buffers.

Unfortunately, I think this is the way - at least until we rework the
crypto/compression API (if that's even possible?).
I still think the 2 page buffer is dumb, but it is what it is :(

Edward Adam Davis

unread,
Dec 26, 2023, 9:27:30 PM12/26/23
to syzbot+3eff5e...@syzkaller.appspotmail.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
please test WARNING in perf_event_open

#syz test https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git 39676dfe5233

diff --git a/mm/madvise.c b/mm/madvise.c
index 912155a94ed5..8fd3e00af243 100644
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -1405,6 +1405,9 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh
if (!madvise_behavior_valid(behavior))
return -EINVAL;

+ if (!start)
+ return -EINVAL;
+
if (!PAGE_ALIGNED(start))
return -EINVAL;
len = PAGE_ALIGN(len_in);

syzbot

unread,
Dec 26, 2023, 9:42:04 PM12/26/23
to ead...@qq.com, linux-...@vger.kernel.org, syzkall...@googlegroups.com
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an issue:
general protection fault in scatterwalk_copychunks

general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN
KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f]
CPU: 0 PID: 5477 Comm: syz-executor.0 Not tainted 6.7.0-rc6-next-20231222-syzkaller-dirty #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 11/17/2023
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
RSP: 0018:ffffc9000557ecf0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff88807fc6d940 RSI: ffffffff8465df94 RDI: dffffc0000000000
RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
R13: 0000000000000014 R14: ffffc9000557ed88 R15: 0000000000001000
FS: 00007ff79da616c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff79cd98000 CR3: 0000000021251000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Call Trace:
<TASK>
scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
crypto_acomp_compress include/crypto/acompress.h:302 [inline]
zswap_store+0x98b/0x2430 mm/zswap.c:1666
swap_writepage+0x8e/0x220 mm/page_io.c:198
pageout+0x399/0x9e0 mm/vmscan.c:656
shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
walk_pmd_range mm/pagewalk.c:143 [inline]
walk_pud_range mm/pagewalk.c:221 [inline]
walk_p4d_range mm/pagewalk.c:256 [inline]
walk_pgd_range+0xa48/0x1870 mm/pagewalk.c:293
__walk_page_range+0x630/0x770 mm/pagewalk.c:395
walk_page_range+0x626/0xa80 mm/pagewalk.c:521
madvise_pageout_page_range mm/madvise.c:585 [inline]
madvise_pageout+0x32c/0x820 mm/madvise.c:612
madvise_vma_behavior+0x1cc/0x1b50 mm/madvise.c:1031
madvise_walk_vmas+0x1cf/0x2c0 mm/madvise.c:1260
do_madvise+0x349/0x670 mm/madvise.c:1443
__do_sys_madvise mm/madvise.c:1456 [inline]
__se_sys_madvise mm/madvise.c:1454 [inline]
__x64_sys_madvise+0xa9/0x110 mm/madvise.c:1454
do_syscall_x64 arch/x86/entry/common.c:52 [inline]
do_syscall_64+0x40/0x110 arch/x86/entry/common.c:83
entry_SYSCALL_64_after_hwframe+0x62/0x6a
RIP: 0033:0x7ff79cc7cce9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 e1 20 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b0 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff79da610c8 EFLAGS: 00000246 ORIG_RAX: 000000000000001c
RAX: ffffffffffffffda RBX: 00007ff79cd9bf80 RCX: 00007ff79cc7cce9
RDX: 0000000000000015 RSI: 0000000000c00304 RDI: 0000000020000000
RBP: 00007ff79ccc947a R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000000000b R14: 00007ff79cd9bf80 R15: 00007ffc0bca2f58
</TASK>
Modules linked in:
---[ end trace 0000000000000000 ]---
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50
Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
RSP: 0018:ffffc9000557ecf0 EFLAGS: 00010202
RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
RDX: ffff88807fc6d940 RSI: ffffffff8465df94 RDI: dffffc0000000000
RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
R13: 0000000000000014 R14: ffffc9000557ed88 R15: 0000000000001000
FS: 00007ff79da616c0(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007ff79cd98000 CR3: 0000000021251000 CR4: 00000000003506f0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
----------------
Code disassembly (best guess):
0: f0 48 c1 e8 03 lock shr $0x3,%rax
5: 80 3c 08 00 cmpb $0x0,(%rax,%rcx,1)
9: 0f 85 81 01 00 00 jne 0x190
f: 49 8d 44 24 08 lea 0x8(%r12),%rax
14: 4d 89 26 mov %r12,(%r14)
17: 48 bf 00 00 00 00 00 movabs $0xdffffc0000000000,%rdi
1e: fc ff df
21: 48 89 44 24 10 mov %rax,0x10(%rsp)
26: 48 c1 e8 03 shr $0x3,%rax
* 2a: 0f b6 04 38 movzbl (%rax,%rdi,1),%eax <-- trapping instruction
2e: 84 c0 test %al,%al
30: 74 08 je 0x3a
32: 3c 03 cmp $0x3,%al
34: 0f 8e 47 01 00 00 jle 0x181
3a: 48 8b 44 24 08 mov 0x8(%rsp),%rax
3f: 41 rex.B


Tested on:

commit: 39676dfe Add linux-next specific files for 20231222
git tree: https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
console output: https://syzkaller.appspot.com/x/log.txt?x=11c37595e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=12fb5776e80000

Chengming Zhou

unread,
Dec 26, 2023, 10:51:39 PM12/26/23
to Nhat Pham, Chris Li, 21c...@gmail.com, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
Hi,

I think it's a bug in `scomp_acomp_comp_decomp()`, which doesn't use
the caller passed "src" and "dst" scatterlist. Instead, it uses its own
per-cpu "scomp_scratch", which have 128KB src and dst.

When compression done, it uses the output req->dlen to copy scomp_scratch->dst
to our dstmem, which has only one page now, so this problem happened.

I still don't know why the alg->compress(src, slen, dst, &dlen) doesn't
check the dlen? It seems an obvious bug, right?

As for this problem in `scomp_acomp_comp_decomp()`, this patch below
should fix it. I will set up a few tests to check later.

Thanks!

diff --git a/crypto/scompress.c b/crypto/scompress.c
index 442a82c9de7d..e654a120ae5a 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
struct crypto_scomp *scomp = *tfm_ctx;
void **ctx = acomp_request_ctx(req);
struct scomp_scratch *scratch;
+ unsigned int dlen;
int ret;

if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
@@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
req->dlen = SCOMP_SCRATCH_SIZE;

+ dlen = req->dlen;
+
scratch = raw_cpu_ptr(&scomp_scratch);
spin_lock(&scratch->lock);

@@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
ret = -ENOMEM;
goto out;
}
+ } else if (req->dlen > dlen) {
+ ret = -ENOMEM;
+ goto out;
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
1);

Chengming Zhou

unread,
Dec 27, 2023, 1:38:32 AM12/27/23
to Barry Song, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On 2023/12/27 14:25, Barry Song wrote:
> This can't fix the problem as crypto_scomp_compress() has written overflow data.

No, crypto_scomp_compress() writes to its own scomp_scratch->dst memory, then copy
to our dstmem.

>
> BTW, in many cases, hardware-accelerators drivers/crypto can do compression and
> decompression by off-loading CPU;
> we won't have a chance to let hardware check the dst buffer size. so
> giving the dst buffer
> with enough length to the hardware's dma engine is the right way. I
> mean, we shouldn't
> change dst from 2pages to 1page.

But how do we know 2 pages is enough for any compression algorithm?

Thanks.

>
>> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
>> 1);
>
>
> Thanks
> Barry

chengmi...@linux.dev

unread,
Dec 27, 2023, 1:51:38 AM12/27/23
to ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, 21c...@gmail.com, zhouch...@bytedance.com, syzbot+3eff5e...@syzkaller.appspotmail.com
From: Chengming Zhou <zhouch...@bytedance.com>

The req->dst buffer size should be checked before copying from the
scomp_scratch->dst to avoid req->dst buffer overflow problem.

Fixes: 1ab53a77b772 ("crypto: acomp - add driver-side scomp interface")
Reported-by: syzbot+3eff5e...@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000000b...@google.com/
Signed-off-by: Chengming Zhou <zhouch...@bytedance.com>
---
crypto/scompress.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index 442a82c9de7d..e654a120ae5a 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
struct crypto_scomp *scomp = *tfm_ctx;
void **ctx = acomp_request_ctx(req);
struct scomp_scratch *scratch;
+ unsigned int dlen;
int ret;

if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
@@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
req->dlen = SCOMP_SCRATCH_SIZE;

+ dlen = req->dlen;
+
scratch = raw_cpu_ptr(&scomp_scratch);
spin_lock(&scratch->lock);

@@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
ret = -ENOMEM;
goto out;
}
+ } else if (req->dlen > dlen) {
+ ret = -ENOMEM;
+ goto out;
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
1);
--
2.40.1

Chengming Zhou

unread,
Dec 27, 2023, 4:16:01 AM12/27/23
to Barry Song, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On 2023/12/27 15:01, Barry Song wrote:
> On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
> There is no guarrette 2 pages is enough.
>
> We can invent our own compression algorithm, in our algorithm, we can
> add a lot of metadata, for example, longer than 4KB when the source data
> is uncompress-able. then dst can be larger than 2 * PAGE_SIZE. but this
> is not the case :-) This kind of algorithm may never succeed.
>
> For those in-tree algorithms, we have a WORST_COMPR_FACTOR. in
> ubifs, zram and zswap, we all use "2" as the worst comp factor.

Thanks for your explanation! Maybe it's best for us to return to 2 pages
if no other people's comments. And this really need more documentation :-)
since there is no any comment or check in the acomp compress interface.

/*
* @src: Source Data
* @dst: Destination data
* @slen: Size of the input buffer
* @dlen: Size of the output buffer and number of bytes produced
* @flags: Internal flags
* @__ctx: Start of private context data
*/
struct acomp_req {
struct crypto_async_request base;
struct scatterlist *src;
struct scatterlist *dst;
unsigned int slen;
unsigned int dlen;
u32 flags;
void *__ctx[] CRYPTO_MINALIGN_ATTR;
};

>
> /*
> * Some compressors, like LZO, may end up with more data then the input buffer.
> * So UBIFS always allocates larger output buffer, to be sure the compressor
> * will not corrupt memory in case of worst case compression.
> */
> #define WORST_COMPR_FACTOR 2
>
> #ifdef CONFIG_FS_ENCRYPTION
> #define UBIFS_CIPHER_BLOCK_SIZE FSCRYPT_CONTENTS_ALIGNMENT
> #else
> #define UBIFS_CIPHER_BLOCK_SIZE 0
> #endif
>
> /*
> * How much memory is needed for a buffer where we compress a data node.
> */
> #define COMPRESSED_DATA_NODE_BUF_SZ \
> (UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * WORST_COMPR_FACTOR)
>
>
> For years, we have never seen 2 pages that can be a problem. but 1
> page is definitely
> not enough, I remember I once saw many cases where accelerators' dmaengine
> can write more than 1 page.

Chris Li

unread,
Dec 27, 2023, 4:16:14 AM12/27/23
to Nhat Pham, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com
Hi Nhat,

Thanks for the first stab on this bug.

On Tue, Dec 26, 2023 at 1:08 PM Nhat Pham <nph...@gmail.com> wrote:
>
> > RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline] <========= This is the offending line.
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]
> > RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
> > RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50

This is the call stack with an inline function. Very nice annotations
on the inline function expansions call stacks.

> > Code: f0 48 c1 e8 03 80 3c 08 00 0f 85 81 01 00 00 49 8d 44 24 08 4d 89 26 48 bf 00 00 00 00 00 fc ff df 48 89 44 24 10 48 c1 e8 03 <0f> b6 04 38 84 c0 74 08 3c 03 0f 8e 47 01 00 00 48 8b 44 24 08 41
> > RSP: 0018:ffffc90003a8ecf0 EFLAGS: 00010202
> > RAX: 0000000000000001 RBX: 0000000000000000 RCX: dffffc0000000000
> > RDX: ffff88802785d940 RSI: ffffffff8465df74 RDI: dffffc0000000000
> > RBP: 0000000000001000 R08: 0000000000000005 R09: 0000000000000000
> > R10: 0000000000000002 R11: 82d8bd1b6060f805 R12: 0000000000000000
> > R13: 0000000000000014 R14: ffffc90003a8ed88 R15: 0000000000001000
> > FS: 00005555565c5380(0000) GS:ffff8880b9800000(0000) knlGS:0000000000000000
> > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 0000000000d5e538 CR3: 0000000079f3a000 CR4: 00000000003506f0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > Call Trace:
> > <TASK>
> > scatterwalk_map_and_copy+0x151/0x1d0 crypto/scatterwalk.c:67
> > scomp_acomp_comp_decomp+0x3a3/0x780 crypto/scompress.c:149
>
> Looks like it's this line:
>
> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,1);

Yes indeed.

The offending instruction is actually this one:

The inlined part of the call stack:
RIP: 0010:scatterwalk_start include/crypto/scatterwalk.h:63 [inline]
<========= This is the offending line.
static inline void scatterwalk_start(struct scatter_walk *walk,
struct scatterlist *sg)
{
walk->sg = sg;
walk->offset = sg->offset; <============== RIP pointer
}

RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:83 [inline]

static inline void scatterwalk_pagedone(struct scatter_walk *walk, int out,
unsigned int more)
{
if (out) {
struct page *page;

page = sg_page(walk->sg) + ((walk->offset - 1) >> PAGE_SHIFT);
flush_dcache_page(page);
}

if (more && walk->offset >= walk->sg->offset + walk->sg->length)
scatterwalk_start(walk, sg_next(walk->sg)); <==========================
}

RIP: 0010:scatterwalk_pagedone include/crypto/scatterwalk.h:72 [inline]
RIP: 0010:scatterwalk_copychunks+0x3e0/0x560 crypto/scatterwalk.c:50

ret = -ENOMEM;
goto out;
}
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
<=======================
1);
}

crypto_acomp_compress include/crypto/acompress.h:302 [inline]
zswap_store+0x98b/0x2430 mm/zswap.c:1666
swap_writepage+0x8e/0x220 mm/page_io.c:198
pageout+0x399/0x9e0 mm/vmscan.c:656
shrink_folio_list+0x2f47/0x3ea0 mm/vmscan.c:1319
reclaim_folio_list+0xe4/0x3a0 mm/vmscan.c:2104
reclaim_pages+0x483/0x6a0 mm/vmscan.c:2140
madvise_cold_or_pageout_pte_range+0x129e/0x1f70 mm/madvise.c:526
walk_pmd_range mm/pagewalk.c:143 [inline]
>
> My guess is dlen here exceeds 1 page - maybe the data is
> incompressible, so the output exceeds the original input? Based on the
> included kernel config, the algorithm used here is lzo, and it seems
> that can happen for this particular compressor:
>
> https://github.com/nemequ/lzo/blob/master/doc/LZO.FAQ#L200C1-L203C1
> https://github.com/nemequ/lzo/blob/master/doc/LZO.TXT#L157C2-L157C62

The decompressed output can be bigger than input. However, here we
specify the output size limit to be one page. The decompressor should
not output more than the 1 page of the dst buffer.

I did check the lzo1x_decompress_safe() function.
It is supposed to use the NEED_OP(x) check before it stores the output buffer.
> Not 100% sure about linux kernel's implementation though. I'm no
> compression expert :)

Me neither. Anyway, if it is a decompression issue. For this bug, it
is good to have some debug print assert to check that after
decompression, the *dlen is not bigger than the original output
length. If it does blow over the decompression buffer, it is a bug
that needs to be fixed in the decompression function side, not the
zswap code.

Chris

Barry Song

unread,
Dec 27, 2023, 4:16:36 AM12/27/23
to Chengming Zhou, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
There is no guarrette 2 pages is enough.

We can invent our own compression algorithm, in our algorithm, we can
add a lot of metadata, for example, longer than 4KB when the source data
is uncompress-able. then dst can be larger than 2 * PAGE_SIZE. but this
is not the case :-) This kind of algorithm may never succeed.

For those in-tree algorithms, we have a WORST_COMPR_FACTOR. in
ubifs, zram and zswap, we all use "2" as the worst comp factor.

/*
* Some compressors, like LZO, may end up with more data then the input buffer.
* So UBIFS always allocates larger output buffer, to be sure the compressor
* will not corrupt memory in case of worst case compression.
*/
#define WORST_COMPR_FACTOR 2

#ifdef CONFIG_FS_ENCRYPTION
#define UBIFS_CIPHER_BLOCK_SIZE FSCRYPT_CONTENTS_ALIGNMENT
#else
#define UBIFS_CIPHER_BLOCK_SIZE 0
#endif

/*
* How much memory is needed for a buffer where we compress a data node.
*/
#define COMPRESSED_DATA_NODE_BUF_SZ \
(UBIFS_DATA_NODE_SZ + UBIFS_BLOCK_SIZE * WORST_COMPR_FACTOR)


For years, we have never seen 2 pages that can be a problem. but 1
page is definitely
not enough, I remember I once saw many cases where accelerators' dmaengine
can write more than 1 page.

Barry Song

unread,
Dec 27, 2023, 4:16:36 AM12/27/23
to Chengming Zhou, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On Wed, Dec 27, 2023 at 4:51 PM Chengming Zhou
<zhouch...@bytedance.com> wrote:
>
This can't fix the problem as crypto_scomp_compress() has written overflow data.

BTW, in many cases, hardware-accelerators drivers/crypto can do compression and
decompression by off-loading CPU;
we won't have a chance to let hardware check the dst buffer size. so
giving the dst buffer
with enough length to the hardware's dma engine is the right way. I
mean, we shouldn't
change dst from 2pages to 1page.

> scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
> 1);


Thanks
Barry

Chris Li

unread,
Dec 27, 2023, 4:16:36 AM12/27/23
to Nhat Pham, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com
Hi Nhat,

On Tue, Dec 26, 2023 at 3:11 PM Nhat Pham <nph...@gmail.com> wrote:

> > The decompressed output can be bigger than input. However, here we
> > specify the output size limit to be one page. The decompressor should
> > not output more than the 1 page of the dst buffer.
> >
> > I did check the lzo1x_decompress_safe() function.
>
> I think you meant lzo1x_1_do_compress() right? This error happens on
> the zswap_store() path, so it is a compression bug.

Ah, my bad. I don't know why I am looking at the decompression rather
than compression.
Thanks for catching that.

>
> > It is supposed to use the NEED_OP(x) check before it stores the output buffer.
>
> I can't seem to find any check in compression code. But that function
> is 300 LoC, with no comment :)

It seems the compression side does not have a safe version of the
function which respects the output buffer size. I agree with you there
seems to be no check on the output buffer size before writing to the
output buffer. The "size_t *out_len" seems to work as an output only
pointer. It does not respect the output size limit.

The only place use the output_len is at lzo1x_compress.c:298
*out_len = op - out;

So confirm it is output only :-(
Again, sorry I was looking at the decompression side rather than the
compression side. The compression side does not even offer a safe
version of the compression function.
That seems to be dangerous. It seems for now we should make the zswap
roll back to 2 page buffer until we have a safe way to do compression
without overwriting the output buffers.

>
> I imagine that many compression use-cases are best-effort
> optimization, i.e it's fine to fail. zswap is exactly this - do your
> best to compress the data, but if it's too incompressible, failure is
> acceptable (we have plenty of fallback options - swapping, keeping the
> page in memory, etc.).
>
> Furthermore, this is a bit of a leaky abstraction - currently there's
> no contract on how much bigger can the "compressed" output be with
> respect to the input. It's compression algorithm-dependent, which
> defeats the point of the abstraction. In zswap case, 2 * PAGE_SIZE is
> just a rule of thumb - now imagine we use a compression algorithm that
> behaves super well in most of the cases, but would output 3 *
> PAGE_SIZE in some edge cases. This will still break the code. Better
> to give the caller the ability to bound the output size, and
> gracefully fail if that bound cannot be respected for a particular
> input.

Agree.

Chris

Barry Song

unread,
Dec 27, 2023, 4:16:36 AM12/27/23
to Chengming Zhou, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
Thanks, I got your point as you were using scomp. I used to depend on
acomp, so that
wasn't the case.

Herbert Xu

unread,
Dec 27, 2023, 4:26:20 AM12/27/23
to chengmi...@linux.dev, ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, 21c...@gmail.com, zhouch...@bytedance.com, syzbot+3eff5e...@syzkaller.appspotmail.com
I think ENOMEM is ambiguous, perhaps ENOSPC?

Thanks,
--
Email: Herbert Xu <her...@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

Chengming Zhou

unread,
Dec 27, 2023, 4:28:44 AM12/27/23
to Herbert Xu, ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, 21c...@gmail.com, zhouch...@bytedance.com, syzbot+3eff5e...@syzkaller.appspotmail.com
Right, ENOSPC is better. Should I send a v2?

Thanks.

Herbert Xu

unread,
Dec 27, 2023, 4:30:58 AM12/27/23
to Chengming Zhou, ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, 21c...@gmail.com, zhouch...@bytedance.com, syzbot+3eff5e...@syzkaller.appspotmail.com
On Wed, Dec 27, 2023 at 05:28:35PM +0800, Chengming Zhou wrote:
>
> Right, ENOSPC is better. Should I send a v2?

Yes please.

chengmi...@linux.dev

unread,
Dec 27, 2023, 4:35:42 AM12/27/23
to ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, 21c...@gmail.com, zhouch...@bytedance.com, syzbot+3eff5e...@syzkaller.appspotmail.com
From: Chengming Zhou <zhouch...@bytedance.com>

The req->dst buffer size should be checked before copying from the
scomp_scratch->dst to avoid req->dst buffer overflow problem.

Fixes: 1ab53a77b772 ("crypto: acomp - add driver-side scomp interface")
Reported-by: syzbot+3eff5e...@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/0000000000000b...@google.com/
Signed-off-by: Chengming Zhou <zhouch...@bytedance.com>
---
v2:
- change error code to ENOSPC.
---
crypto/scompress.c | 6 ++++++
1 file changed, 6 insertions(+)

diff --git a/crypto/scompress.c b/crypto/scompress.c
index 442a82c9de7d..b108a30a7600 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
struct crypto_scomp *scomp = *tfm_ctx;
void **ctx = acomp_request_ctx(req);
struct scomp_scratch *scratch;
+ unsigned int dlen;
int ret;

if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
@@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
req->dlen = SCOMP_SCRATCH_SIZE;

+ dlen = req->dlen;
+
scratch = raw_cpu_ptr(&scomp_scratch);
spin_lock(&scratch->lock);

@@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
ret = -ENOMEM;
goto out;
}
+ } else if (req->dlen > dlen) {
+ ret = -ENOSPC;
+ goto out;
}
scatterwalk_map_and_copy(scratch->dst, req->dst, 0, req->dlen,
1);
--
2.40.1

Barry Song

unread,
Dec 27, 2023, 6:06:12 AM12/27/23
to chengmi...@linux.dev, ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com, syzbot+3eff5e...@syzkaller.appspotmail.com
On Wed, Dec 27, 2023 at 5:35 PM <chengmi...@linux.dev> wrote:
>
> From: Chengming Zhou <zhouch...@bytedance.com>
>
> The req->dst buffer size should be checked before copying from the
> scomp_scratch->dst to avoid req->dst buffer overflow problem.
>
> Fixes: 1ab53a77b772 ("crypto: acomp - add driver-side scomp interface")
> Reported-by: syzbot+3eff5e...@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/0000000000000b...@google.com/
> Signed-off-by: Chengming Zhou <zhouch...@bytedance.com>
> ---
> v2:
> - change error code to ENOSPC.

Reviewed-by: Barry Song <v-song...@oppo.com>

Barry Song

unread,
Dec 27, 2023, 6:10:53 AM12/27/23
to Chengming Zhou, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
I agree. we need some doc.

besides, i actually think we can skip zswap frontend if
over-compression is really
happening.

--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -1318,7 +1318,7 @@ bool zswap_store(struct folio *folio)
if (zpool_malloc_support_movable(zpool))
gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
ret = zpool_malloc(zpool, dlen, gfp, &handle);
- if (ret == -ENOSPC) {
+ if (ret == -ENOSPC || dlen > PAGE_SIZE) {
zswap_reject_compress_poor++;
goto put_dstmem;
}

since we are not saving but wasting space in this corner case?

Nhat Pham

unread,
Dec 27, 2023, 6:26:46 PM12/27/23
to Barry Song, Chengming Zhou, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On Wed, Dec 27, 2023 at 3:10 AM Barry Song <21c...@gmail.com> wrote:
>
> On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
> <zhouch...@bytedance.com> wrote:
> >
> > Thanks for your explanation! Maybe it's best for us to return to 2 pages
> > if no other people's comments. And this really need more documentation :-)

Fine by me. Hmm we're basically wasting one extra page per CPU (since
these buffers are per-CPU), correct? That's not ideal, but not *too*
bad for now I suppose...

>
> I agree. we need some doc.
>
> besides, i actually think we can skip zswap frontend if
> over-compression is really
> happening.

IIUC, zsmalloc already checked that - and most people are (or should
be) using zsmalloc for zswap anyway. I wouldn't be opposed to adding
an added layer of protection on the zswap side, but not super high
priority I'd say.

Barry Song

unread,
Dec 27, 2023, 8:43:32 PM12/27/23
to Nhat Pham, Chengming Zhou, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
Thanks for this info. I guess you mean the below ?
unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
{
...

if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
return (unsigned long)ERR_PTR(-EINVAL);

}

i find zbud also has similar code:
static int zbud_alloc(struct zbud_pool *pool, size_t size, gfp_t gfp,
unsigned long *handle)
{
int chunks, i, freechunks;
struct zbud_header *zhdr = NULL;
enum buddy bud;
struct page *page;

if (!size || (gfp & __GFP_HIGHMEM))
return -EINVAL;
if (size > PAGE_SIZE - ZHDR_SIZE_ALIGNED - CHUNK_SIZE)
return -ENOSPC;

and z3fold,

static int z3fold_alloc(struct z3fold_pool *pool, size_t size, gfp_t gfp,
unsigned long *handle)
{
int chunks = size_to_chunks(size);
struct z3fold_header *zhdr = NULL;
struct page *page = NULL;
enum buddy bud;
bool can_sleep = gfpflags_allow_blocking(gfp);

if (!size || (gfp & __GFP_HIGHMEM))
return -EINVAL;

if (size > PAGE_SIZE)
return -ENOSPC;


Thus, I agree that another layer to check size in zswap isn't necessary now.


Thanks
Barry

Barry Song

unread,
Dec 27, 2023, 8:47:56 PM12/27/23
to Nhat Pham, Chengming Zhou, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On Thu, Dec 28, 2023 at 9:43 AM Barry Song <21c...@gmail.com> wrote:
>
> On Thu, Dec 28, 2023 at 7:26 AM Nhat Pham <nph...@gmail.com> wrote:
> >
> > On Wed, Dec 27, 2023 at 3:10 AM Barry Song <21c...@gmail.com> wrote:
> > >
> > > On Wed, Dec 27, 2023 at 5:16 PM Chengming Zhou
> > > <zhouch...@bytedance.com> wrote:
> > > >
> > > > Thanks for your explanation! Maybe it's best for us to return to 2 pages
> > > > if no other people's comments. And this really need more documentation :-)
> >
> > Fine by me. Hmm we're basically wasting one extra page per CPU (since
> > these buffers are per-CPU), correct? That's not ideal, but not *too*
> > bad for now I suppose...
> >
> > >
> > > I agree. we need some doc.
> > >
> > > besides, i actually think we can skip zswap frontend if
> > > over-compression is really
> > > happening.
> >
> > IIUC, zsmalloc already checked that - and most people are (or should
> > be) using zsmalloc for zswap anyway. I wouldn't be opposed to adding
> > an added layer of protection on the zswap side, but not super high
> > priority I'd say.
>
> Thanks for this info. I guess you mean the below ?
> unsigned long zs_malloc(struct zs_pool *pool, size_t size, gfp_t gfp)
> {
> ...
>
> if (unlikely(!size || size > ZS_MAX_ALLOC_SIZE))
> return (unsigned long)ERR_PTR(-EINVAL);

BTW, do you think zsmalloc is worth a patch to change the ret from
EINVAL to ENOSPC?
This seems more sensible and matches the behaviour of zswap, and zbud, z3fold.

ret = zpool_malloc(zpool, dlen, gfp, &handle);
if (ret == -ENOSPC) {
zswap_reject_compress_poor++;
goto put_dstmem;
}
if (ret) {
zswap_reject_alloc_fail++;
goto put_dstmem;
}
buf = z

Chengming Zhou

unread,
Dec 27, 2023, 9:29:00 PM12/27/23
to syzbot, ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com
#syz test

diff --git a/crypto/scompress.c b/crypto/scompress.c
index 442a82c9de7d..b108a30a7600 100644
--- a/crypto/scompress.c
+++ b/crypto/scompress.c
@@ -117,6 +117,7 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
struct crypto_scomp *scomp = *tfm_ctx;
void **ctx = acomp_request_ctx(req);
struct scomp_scratch *scratch;
+ unsigned int dlen;
int ret;

if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE)
@@ -128,6 +129,8 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
req->dlen = SCOMP_SCRATCH_SIZE;

+ dlen = req->dlen;
+
scratch = raw_cpu_ptr(&scomp_scratch);
spin_lock(&scratch->lock);

@@ -145,6 +148,9 @@ static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
ret = -ENOMEM;
goto out;
}
+ } else if (req->dlen > dlen) {
+ ret = -ENOSPC;
+ goto out;

syzbot

unread,
Dec 27, 2023, 10:58:05 PM12/27/23
to ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, zhouch...@bytedance.com
Hello,

syzbot has tested the proposed patch and the reproducer did not trigger any issue:

Reported-and-tested-by: syzbot+3eff5e...@syzkaller.appspotmail.com

Tested on:

commit: 39676dfe Add linux-next specific files for 20231222
git tree: linux-next
console output: https://syzkaller.appspot.com/x/log.txt?x=104d06d9e80000
kernel config: https://syzkaller.appspot.com/x/.config?x=f3761490b734dc96
dashboard link: https://syzkaller.appspot.com/bug?extid=3eff5e51bf1db122a16e
compiler: gcc (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40
patch: https://syzkaller.appspot.com/x/patch.diff?x=13756fe9e80000

Note: testing is done by a robot and is best-effort only.

Nhat Pham

unread,
Dec 28, 2023, 2:18:26 PM12/28/23
to Barry Song, Chengming Zhou, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
I haven't looked at the code surrounding it too closely, but IMHO this
would be nice. Poor compressibility of the workload's memory is
something we should monitor more carefully. This has come up a couple
times in discussion:

https://lore.kernel.org/linux-mm/CAJD7tkZXS-UJVAFfvxJ0nNgT...@mail.gmail.com/
https://lore.kernel.org/all/20231024234509.2...@gmail.com/T/#u

Herbert Xu

unread,
Dec 28, 2023, 10:30:46 PM12/28/23
to chengmi...@linux.dev, ak...@linux-foundation.org, chr...@kernel.org, da...@davemloft.net, linux-...@vger.kernel.org, linux-...@vger.kernel.org, nph...@gmail.com, syzkall...@googlegroups.com, yosry...@google.com, 21c...@gmail.com, zhouch...@bytedance.com, syzbot+3eff5e...@syzkaller.appspotmail.com
On Wed, Dec 27, 2023 at 09:35:23AM +0000, chengmi...@linux.dev wrote:
> From: Chengming Zhou <zhouch...@bytedance.com>
>
> The req->dst buffer size should be checked before copying from the
> scomp_scratch->dst to avoid req->dst buffer overflow problem.
>
> Fixes: 1ab53a77b772 ("crypto: acomp - add driver-side scomp interface")
> Reported-by: syzbot+3eff5e...@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/0000000000000b...@google.com/
> Signed-off-by: Chengming Zhou <zhouch...@bytedance.com>
> ---
> v2:
> - change error code to ENOSPC.
> ---
> crypto/scompress.c | 6 ++++++
> 1 file changed, 6 insertions(+)

Patch applied. Thanks.

Barry Song

unread,
Jan 3, 2024, 1:53:38 AMJan 3
to Chengming Zhou, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
On Wed, Dec 27, 2023 at 7:38 PM Chengming Zhou
Hi Chengming,
I still feel these two memcpys are too big and unnecessary, so i sent
a RFC[1] to remove
them as well as another one removing memcpy in zswap[2].
but unfortunately I don't have real hardware to run and collect data,
I wonder if you are
interested in testing and collecting data as you are actively
contributing to zswap.

[1] https://lore.kernel.org/linux-mm/20240103053134....@gmail.com/
[2]
https://lore.kernel.org/linux-mm/20240103025759....@gmail.com/
https://lore.kernel.org/linux-mm/20240103025759....@gmail.com/

Thanks
Barry

Chengming Zhou

unread,
Jan 3, 2024, 3:42:05 AMJan 3
to Barry Song, Nhat Pham, Chris Li, syzbot, ak...@linux-foundation.org, da...@davemloft.net, her...@gondor.apana.org.au, linux-...@vger.kernel.org, linux-...@vger.kernel.org, syzkall...@googlegroups.com, yosry...@google.com
Ok, I just tested these three patches on my server, found improvement in the
kernel build testcase on a tmpfs with zswap (lz4 + zsmalloc) enabled.

mm-stable 501a06fe8e4c patched
real 1m38.028s 1m32.317s
user 19m11.482s 18m39.439s
sys 19m26.445s 17m5.646s

The improvement looks good! So feel free to add:

Tested-by: Chengming Zhou <zhouch...@bytedance.com>

Thanks.
Reply all
Reply to author
Forward
0 new messages