[PATCH -fixes 0/4] Fixes KASAN and other along the way

16 views
Skip to first unread message

Alexandre Ghiti

unread,
Feb 18, 2022, 8:35:49 AM2/18/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
As reported by Aleksandr, syzbot riscv is broken since commit
54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit actually
breaks KASAN_INLINE which is not fixed in this series, that will come later
when found.

Nevertheless, this series fixes small things that made the syzbot
configuration + KASAN_OUTLINE fail to boot.

Note that even though the config at [1] boots fine with this series, I
was not able to boot the small config at [2] which fails because
kasan_poison receives a really weird address 0x4075706301000000 (maybe a
kasan person could provide some hint about what happens below in
do_ctors -> __asan_register_globals):

Thread 2 hit Breakpoint 1, kasan_poison (addr=<optimized out>, size=<optimized out>, value=<optimized out>, init=<optimized out>) at /home/alex/work/linux/mm/kasan/shadow.c:90
90 if (WARN_ON((unsigned long)addr & KASAN_GRANULE_MASK))
1: x/i $pc
=> 0xffffffff80261712 <kasan_poison>: andi a4,a0,7
5: /x $a0 = 0x4075706301000000

Thread 2 hit Breakpoint 2, handle_exception () at /home/alex/work/linux/arch/riscv/kernel/entry.S:27
27 csrrw tp, CSR_SCRATCH, tp
1: x/i $pc
=> 0xffffffff80004098 <handle_exception>: csrrw tp,sscratch,tp
5: /x $a0 = 0xe80eae0b60200000
(gdb) bt
#0 handle_exception () at /home/alex/work/linux/arch/riscv/kernel/entry.S:27
#1 0xffffffff80261746 in kasan_poison (addr=<optimized out>, size=<optimized out>, value=<optimized out>, init=<optimized out>)
at /home/alex/work/linux/mm/kasan/shadow.c:98
#2 0xffffffff802618b4 in kasan_unpoison (addr=<optimized out>, size=<optimized out>, init=<optimized out>)
at /home/alex/work/linux/mm/kasan/shadow.c:138
#3 0xffffffff80260876 in register_global (global=<optimized out>) at /home/alex/work/linux/mm/kasan/generic.c:214
#4 __asan_register_globals (globals=<optimized out>, size=<optimized out>) at /home/alex/work/linux/mm/kasan/generic.c:226
#5 0xffffffff8125efac in _sub_I_65535_1 ()
#6 0xffffffff81201b32 in do_ctors () at /home/alex/work/linux/init/main.c:1156
#7 do_basic_setup () at /home/alex/work/linux/init/main.c:1407
#8 kernel_init_freeable () at /home/alex/work/linux/init/main.c:1613
#9 0xffffffff81153ddc in kernel_init (unused=<optimized out>) at /home/alex/work/linux/init/main.c:1502
#10 0xffffffff800041c0 in handle_exception () at /home/alex/work/linux/arch/riscv/kernel/entry.S:231


Thanks again to Aleksandr for narrowing down the issues fixed here.


[1] https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138
[2] https://gist.github.com/AlexGhiti/a5a0cab0227e2bf38f9d12232591c0e4

Alexandre Ghiti (4):
riscv: Fix is_linear_mapping with recent move of KASAN region
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix DEBUG_VIRTUAL false warnings
riscv: Fix config KASAN && DEBUG_VIRTUAL

arch/riscv/include/asm/page.h | 2 +-
arch/riscv/mm/Makefile | 3 +++
arch/riscv/mm/kasan_init.c | 3 +--
arch/riscv/mm/physaddr.c | 4 +---
4 files changed, 6 insertions(+), 6 deletions(-)

--
2.32.0

Alexandre Ghiti

unread,
Feb 18, 2022, 8:36:47 AM2/18/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
KASAN region was recently moved between the linear mapping and the
kernel mapping, is_linear_mapping used to check the validity of an
address by using the start of the kernel mapping, which is now wrong.

Fix this by using the maximum size of the physical memory.

Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/include/asm/page.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 160e3a1e8f8b..004372f8da54 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -119,7 +119,7 @@ extern phys_addr_t phys_ram_base;
((x) >= kernel_map.virt_addr && (x) < (kernel_map.virt_addr + kernel_map.size))

#define is_linear_mapping(x) \
- ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr))
+ ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))

#define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
#define kernel_mapping_pa_to_va(y) ({ \
--
2.32.0

Alexandre Ghiti

unread,
Feb 18, 2022, 8:37:45 AM2/18/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
In order to get the pfn of a struct page* when sparsemem is enabled
without vmemmap, the mem_section structures need to be initialized which
happens in sparse_init.

But kasan_early_init calls pfn_to_page way before sparse_init is called,
which then tries to dereference a null mem_section pointer.

Fix this by removing the usage of this function in kasan_early_init.

Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/mm/kasan_init.c | 3 +--
1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index f61f7ca6fe0f..85e849318389 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -202,8 +202,7 @@ asmlinkage void __init kasan_early_init(void)

for (i = 0; i < PTRS_PER_PTE; ++i)
set_pte(kasan_early_shadow_pte + i,
- mk_pte(virt_to_page(kasan_early_shadow_page),
- PAGE_KERNEL));
+ pfn_pte(virt_to_pfn(kasan_early_shadow_page), PAGE_KERNEL));

for (i = 0; i < PTRS_PER_PMD; ++i)
set_pmd(kasan_early_shadow_pmd + i,
--
2.32.0

Alexandre Ghiti

unread,
Feb 18, 2022, 8:38:47 AM2/18/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
KERN_VIRT_SIZE used to encompass the kernel mapping before it was
redefined when moving the kasan mapping next to the kernel mapping to only
match the maximum amount of physical memory.

Then, kernel mapping addresses that go through __virt_to_phys are now
declared as wrong which is not true, one can use __virt_to_phys on such
addresses.

Fix this by redefining the condition that matches wrong addresses.

Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/mm/physaddr.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/arch/riscv/mm/physaddr.c b/arch/riscv/mm/physaddr.c
index e7fd0c253c7b..19cf25a74ee2 100644
--- a/arch/riscv/mm/physaddr.c
+++ b/arch/riscv/mm/physaddr.c
@@ -8,12 +8,10 @@

phys_addr_t __virt_to_phys(unsigned long x)
{
- phys_addr_t y = x - PAGE_OFFSET;
-
/*
* Boundary checking aginst the kernel linear mapping space.
*/
- WARN(y >= KERN_VIRT_SIZE,
+ WARN(!is_linear_mapping(x) && !is_kernel_mapping(x),
"virt_to_phys used for non-linear address: %pK (%pS)\n",
(void *)x, (void *)x);

--
2.32.0

Alexandre Ghiti

unread,
Feb 18, 2022, 8:39:47 AM2/18/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
__virt_to_phys function is called very early in the boot process (ie
kasan_early_init) so it should not be instrumented by KASAN otherwise it
bugs.

Fix this by declaring phys_addr.c as non-kasan instrumentable.

Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/mm/Makefile | 3 +++
1 file changed, 3 insertions(+)

diff --git a/arch/riscv/mm/Makefile b/arch/riscv/mm/Makefile
index 7ebaef10ea1b..ac7a25298a04 100644
--- a/arch/riscv/mm/Makefile
+++ b/arch/riscv/mm/Makefile
@@ -24,6 +24,9 @@ obj-$(CONFIG_KASAN) += kasan_init.o
ifdef CONFIG_KASAN
KASAN_SANITIZE_kasan_init.o := n
KASAN_SANITIZE_init.o := n
+ifdef CONFIG_DEBUG_VIRTUAL
+KASAN_SANITIZE_physaddr.o := n
+endif
endif

obj-$(CONFIG_DEBUG_VIRTUAL) += physaddr.o
--
2.32.0

kernel test robot

unread,
Feb 20, 2022, 12:55:52 PM2/20/22
to Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com, ll...@lists.linux.dev, kbuil...@lists.01.org
Hi Alexandre,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linus/master]
[also build test ERROR on v5.17-rc4 next-20220217]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url: https://github.com/0day-ci/linux/commits/Alexandre-Ghiti/Fixes-KASAN-and-other-along-the-way/20220220-181628
base: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 4f12b742eb2b3a850ac8be7dc4ed52976fc6cb0b
config: riscv-nommu_virt_defconfig (https://download.01.org/0day-ci/archive/20220221/202202210123...@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project d271fc04d5b97b12e6b797c6067d3c96a8d7470e)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# install riscv cross compiling tool for clang build
# apt-get install binutils-riscv64-linux-gnu
# https://github.com/0day-ci/linux/commit/de8a909a9eabf9066802a3396b7009cbf4fa4369
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Alexandre-Ghiti/Fixes-KASAN-and-other-along-the-way/20220220-181628
git checkout de8a909a9eabf9066802a3396b7009cbf4fa4369
# save the config file to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=riscv prepare

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <l...@intel.com>

All errors (new ones prefixed by >>):

In file included from arch/riscv/kernel/asm-offsets.c:10:
>> include/linux/mm.h:837:22: error: use of undeclared identifier 'KERN_VIRT_SIZE'; did you mean 'KERN_VERSION'?
struct page *page = virt_to_page(x);
^
arch/riscv/include/asm/page.h:165:42: note: expanded from macro 'virt_to_page'
#define virt_to_page(vaddr) (pfn_to_page(virt_to_pfn(vaddr)))
^
arch/riscv/include/asm/page.h:162:41: note: expanded from macro 'virt_to_pfn'
#define virt_to_pfn(vaddr) (phys_to_pfn(__pa(vaddr)))
^
arch/riscv/include/asm/page.h:156:18: note: expanded from macro '__pa'
#define __pa(x) __virt_to_phys((unsigned long)(x))
^
arch/riscv/include/asm/page.h:151:27: note: expanded from macro '__virt_to_phys'
#define __virt_to_phys(x) __va_to_pa_nodebug(x)
^
arch/riscv/include/asm/page.h:143:2: note: expanded from macro '__va_to_pa_nodebug'
is_linear_mapping(_x) ? \
^
arch/riscv/include/asm/page.h:122:75: note: expanded from macro 'is_linear_mapping'
((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
^
include/uapi/linux/sysctl.h:88:2: note: 'KERN_VERSION' declared here
KERN_VERSION=4, /* string: compile time info */
^
In file included from arch/riscv/kernel/asm-offsets.c:10:
include/linux/mm.h:844:22: error: use of undeclared identifier 'KERN_VIRT_SIZE'; did you mean 'KERN_VERSION'?
struct page *page = virt_to_page(x);
^
arch/riscv/include/asm/page.h:165:42: note: expanded from macro 'virt_to_page'
#define virt_to_page(vaddr) (pfn_to_page(virt_to_pfn(vaddr)))
^
arch/riscv/include/asm/page.h:162:41: note: expanded from macro 'virt_to_pfn'
#define virt_to_pfn(vaddr) (phys_to_pfn(__pa(vaddr)))
^
arch/riscv/include/asm/page.h:156:18: note: expanded from macro '__pa'
#define __pa(x) __virt_to_phys((unsigned long)(x))
^
arch/riscv/include/asm/page.h:151:27: note: expanded from macro '__virt_to_phys'
#define __virt_to_phys(x) __va_to_pa_nodebug(x)
^
arch/riscv/include/asm/page.h:143:2: note: expanded from macro '__va_to_pa_nodebug'
is_linear_mapping(_x) ? \
^
arch/riscv/include/asm/page.h:122:75: note: expanded from macro 'is_linear_mapping'
((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))
^
include/uapi/linux/sysctl.h:88:2: note: 'KERN_VERSION' declared here
KERN_VERSION=4, /* string: compile time info */
^
2 errors generated.
make[2]: *** [scripts/Makefile.build:121: arch/riscv/kernel/asm-offsets.s] Error 1
make[2]: Target '__build' not remade because of errors.
make[1]: *** [Makefile:1191: prepare0] Error 2
make[1]: Target 'prepare' not remade because of errors.
make: *** [Makefile:219: __sub-make] Error 2
make: Target 'prepare' not remade because of errors.


vim +837 include/linux/mm.h

70b50f94f1644e Andrea Arcangeli 2011-11-02 834
b49af68ff9fc5d Christoph Lameter 2007-05-06 835 static inline struct page *virt_to_head_page(const void *x)
b49af68ff9fc5d Christoph Lameter 2007-05-06 836 {
b49af68ff9fc5d Christoph Lameter 2007-05-06 @837 struct page *page = virt_to_page(x);
ccaafd7fd039ae Joonsoo Kim 2015-02-10 838
1d798ca3f16437 Kirill A. Shutemov 2015-11-06 839 return compound_head(page);
b49af68ff9fc5d Christoph Lameter 2007-05-06 840 }
b49af68ff9fc5d Christoph Lameter 2007-05-06 841

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuil...@lists.01.org

Alexandre Ghiti

unread,
Feb 21, 2022, 11:13:09 AM2/21/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Changes in v2:
- Fix kernel test robot failure regarding KERN_VIRT_SIZE that is
undefined for nommu config

Alexandre Ghiti (4):
riscv: Fix is_linear_mapping with recent move of KASAN region
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix DEBUG_VIRTUAL false warnings
riscv: Fix config KASAN && DEBUG_VIRTUAL

arch/riscv/include/asm/page.h | 2 +-
arch/riscv/include/asm/pgtable.h | 1 +
arch/riscv/mm/Makefile | 3 +++
arch/riscv/mm/kasan_init.c | 3 +--
arch/riscv/mm/physaddr.c | 4 +---
5 files changed, 7 insertions(+), 6 deletions(-)

--
2.32.0

Alexandre Ghiti

unread,
Feb 21, 2022, 11:14:14 AM2/21/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
KASAN region was recently moved between the linear mapping and the
kernel mapping, is_linear_mapping used to check the validity of an
address by using the start of the kernel mapping, which is now wrong.

Fix this by using the maximum size of the physical memory.

Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/include/asm/page.h | 2 +-
arch/riscv/include/asm/pgtable.h | 1 +
2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 160e3a1e8f8b..004372f8da54 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -119,7 +119,7 @@ extern phys_addr_t phys_ram_base;
((x) >= kernel_map.virt_addr && (x) < (kernel_map.virt_addr + kernel_map.size))

#define is_linear_mapping(x) \
- ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < kernel_map.virt_addr))
+ ((x) >= PAGE_OFFSET && (!IS_ENABLED(CONFIG_64BIT) || (x) < PAGE_OFFSET + KERN_VIRT_SIZE))

#define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + kernel_map.va_pa_offset))
#define kernel_mapping_pa_to_va(y) ({ \
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 7e949f25c933..e3549e50de95 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -13,6 +13,7 @@

#ifndef CONFIG_MMU
#define KERNEL_LINK_ADDR PAGE_OFFSET
+#define KERN_VIRT_SIZE (UL(-1))
#else

#define ADDRESS_SPACE_END (UL(-1))
--
2.32.0

Alexandre Ghiti

unread,
Feb 21, 2022, 11:15:11 AM2/21/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
In order to get the pfn of a struct page* when sparsemem is enabled
without vmemmap, the mem_section structures need to be initialized which
happens in sparse_init.

But kasan_early_init calls pfn_to_page way before sparse_init is called,
which then tries to dereference a null mem_section pointer.

Fix this by removing the usage of this function in kasan_early_init.

Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---

Alexandre Ghiti

unread,
Feb 21, 2022, 11:16:12 AM2/21/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
KERN_VIRT_SIZE used to encompass the kernel mapping before it was
redefined when moving the kasan mapping next to the kernel mapping to only
match the maximum amount of physical memory.

Then, kernel mapping addresses that go through __virt_to_phys are now
declared as wrong which is not true, one can use __virt_to_phys on such
addresses.

Fix this by redefining the condition that matches wrong addresses.

Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---

Alexandre Ghiti

unread,
Feb 21, 2022, 11:17:14 AM2/21/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
__virt_to_phys function is called very early in the boot process (ie
kasan_early_init) so it should not be instrumented by KASAN otherwise it
bugs.

Fix this by declaring phys_addr.c as non-kasan instrumentable.

Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---

Aleksandr Nogikh

unread,
Feb 22, 2022, 5:28:45 AM2/22/22
to Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
Hi Alexandre,

Thanks for the series!

However, I still haven't managed to boot the kernel. What I did:
1) Checked out the riscv/fixes branch (this is the one we're using on
syzbot). The latest commit was
6df2a016c0c8a3d0933ef33dd192ea6606b115e3.
2) Applied all 4 patches.
3) Used the config from the cover letter:
https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138
4) Built with `make -j32 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-`
5) Ran with `qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot
-device virtio-rng-pci -machine virt -device
virtio-net-pci,netdev=net0 -netdev
user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device
virtio-blk-device,drive=hd0 -drive
file=~/kernel-image/riscv64,if=none,format=raw,id=hd0 -snapshot
-kernel ~/linux-riscv/arch/riscv/boot/Image -append "root=/dev/vda
console=ttyS0 earlyprintk=serial"` (this is similar to how syzkaller
runs qemu).

Can you please hint at what I'm doing differently?

A simple config with KASAN, KASAN_OUTLINE and DEBUG_VIRTUAL now indeed
leads to a booting kernel, which was not the case before.
make defconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-
./scripts/config -e KASAN -e KASAN_OUTLINE -e DEBUG_VIRTUAL
make olddefconfig ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-

--
Best Regards,
Aleksandr

Alexandre Ghiti

unread,
Feb 23, 2022, 8:11:18 AM2/23/22
to Aleksandr Nogikh, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
Hi Aleksandr,

On Tue, Feb 22, 2022 at 11:28 AM Aleksandr Nogikh <nog...@google.com> wrote:
>
> Hi Alexandre,
>
> Thanks for the series!
>
> However, I still haven't managed to boot the kernel. What I did:
> 1) Checked out the riscv/fixes branch (this is the one we're using on
> syzbot). The latest commit was
> 6df2a016c0c8a3d0933ef33dd192ea6606b115e3.
> 2) Applied all 4 patches.
> 3) Used the config from the cover letter:
> https://gist.github.com/a-nogikh/279c85c2d24f47efcc3e865c08844138
> 4) Built with `make -j32 ARCH=riscv CROSS_COMPILE=riscv64-linux-gnu-`
> 5) Ran with `qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot
> -device virtio-rng-pci -machine virt -device
> virtio-net-pci,netdev=net0 -netdev
> user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device
> virtio-blk-device,drive=hd0 -drive
> file=~/kernel-image/riscv64,if=none,format=raw,id=hd0 -snapshot
> -kernel ~/linux-riscv/arch/riscv/boot/Image -append "root=/dev/vda
> console=ttyS0 earlyprintk=serial"` (this is similar to how syzkaller
> runs qemu).
>
> Can you please hint at what I'm doing differently?

A short summary of what I found to keep you updated:

I compared your command line and mine, the differences are that I use
"smp=4" and I add "earlycon" to the kernel command line. When added to
your command line, that allows it to boot. I understand why it helps
but I can't explain what's wrong...Anyway, I fixed a warning that I
had missed and that allows me to remove the "smp=4" and "earlycon".

But this is not over yet...Your command line still does not allow to
reach userspace, it fails with the following stacktrace:

[ 11.537817][ T1] Unable to handle kernel paging request at
virtual address fffff5eeffffc800
[ 11.539450][ T1] Oops [#1]
[ 11.539909][ T1] Modules linked in:
[ 11.540451][ T1] CPU: 0 PID: 1 Comm: swapper/0 Not tainted
5.17.0-rc1-00007-ga68b89289e26-dirty #28
[ 11.541364][ T1] Hardware name: riscv-virtio,qemu (DT)
[ 11.542032][ T1] epc : kasan_check_range+0x96/0x13e
[ 11.542654][ T1] ra : memset+0x1e/0x4c
[ 11.543388][ T1] epc : ffffffff8046c312 ra : ffffffff8046ca16 sp
: ffffaf8007337b70
[ 11.544037][ T1] gp : ffffffff85866c80 tp : ffffaf80073d8000 t0
: 0000000000046000
[ 11.544637][ T1] t1 : fffff5eeffffc9ff t2 : 0000000000000000 s0
: ffffaf8007337ba0
[ 11.545409][ T1] s1 : 0000000000001000 a0 : fffff5eeffffca00 a1
: 0000000000001000
[ 11.546072][ T1] a2 : 0000000000000001 a3 : ffffffff8039ef24 a4
: ffffaf7ffffe4000
[ 11.546707][ T1] a5 : fffff5eeffffc800 a6 : 0000004000000000 a7
: ffffaf7ffffe4fff
[ 11.547541][ T1] s2 : ffffaf7ffffe4000 s3 : 0000000000000000 s4
: ffffffff8467faa8
[ 11.548277][ T1] s5 : 0000000000000000 s6 : ffffffff85869840 s7
: 0000000000000000
[ 11.548950][ T1] s8 : 0000000000001000 s9 : ffffaf805a54a048
s10: ffffffff8588d420
[ 11.549705][ T1] s11: ffffaf7ffffe4000 t3 : 0000000000000000 t4
: 0000000000000040
[ 11.550465][ T1] t5 : fffff5eeffffca00 t6 : 0000000000000002
[ 11.551131][ T1] status: 0000000000000120 badaddr:
fffff5eeffffc800 cause: 000000000000000d
[ 11.551961][ T1] [<ffffffff8039ef24>] pcpu_alloc+0x84a/0x125c
[ 11.552928][ T1] [<ffffffff8039f994>] __alloc_percpu+0x28/0x34
[ 11.553555][ T1] [<ffffffff83286954>] ip_rt_init+0x15a/0x35c
[ 11.554128][ T1] [<ffffffff83286d24>] ip_init+0x18/0x30
[ 11.554642][ T1] [<ffffffff8328844a>] inet_init+0x2a6/0x550
[ 11.555428][ T1] [<ffffffff80003220>] do_one_initcall+0x132/0x7e4
[ 11.556049][ T1] [<ffffffff83201f7a>] kernel_init_freeable+0x510/0x5b4
[ 11.556771][ T1] [<ffffffff831424e4>] kernel_init+0x28/0x21c
[ 11.557344][ T1] [<ffffffff800056a0>] ret_from_exception+0x0/0x14
[ 11.585469][ T1] ---[ end trace 0000000000000000 ]---

0xfffff5eeffffc800 is a KASAN address that points to the very end of
vmalloc address range, which is weird since KASAN_VMALLOC is not
enabled.
Moreover my command line does not trigger the above bug, and I'm
trying to understand why:

/home/alex/work/qemu/build/riscv64-softmmu/qemu-system-riscv64 -M virt
-bios /home/alex/work/opensbi/build/platform/generic/firmware/fw_dynamic.bin
-kernel /home/alex/work/kernel-build/riscv_rv64_kernel/arch/riscv/boot/Image
-netdev user,id=net0 -device virtio-net-device,netdev=net0 -drive
file=/home/alex/work/kernel-build/rootfs.ext2,format=raw,id=hd0
-device virtio-blk-device,drive=hd0 -nographic -smp 4 -m 16G -s
-append "rootwait earlycon root=/dev/vda ro earlyprintk=serial"

I'm looking into all of this and will get back with a v3 soon :)

Thanks,

Alex

Alexandre Ghiti

unread,
Feb 23, 2022, 12:17:31 PM2/23/22
to Aleksandr Nogikh, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, linux...@lists.infradead.org, LKML, kasan-dev
When I read this email I saw that I did not use the same qemu version:
I have a locally built version that disables sv48, which is the one
that works so the problem came from the sv48 support.

In a nutshell, the issue comes from the fact that kasan inner regions
are not aligned on PGDIR_SIZE when sv48 (which is 4-level page table)
is on, and then when populating the kasan linear mapping region, that
clears the kasan vmalloc region which is in the same PGD: the fix is
to copy its content before initializing the linear mapping entries.
This issue only happens when KASAN_VMALLOC is disabled. I had fixed
this already for kasan_shallow_populate_pud, but missed
kasan_populate_pud.

Tomorrow I'll push the v3. It still does not fix the issue I describe
in the cover letter though, so still more work to do. At least, I was
able to reach userspace with your *exact* qemu command :)

Alex

Palmer Dabbelt

unread,
Feb 24, 2022, 10:57:47 PM2/24/22
to alexand...@canonical.com, nog...@google.com, Paul Walmsley, a...@eecs.berkeley.edu, ryabin...@gmail.com, gli...@google.com, andre...@gmail.com, dvy...@google.com, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
I can't find a v3.

Alexandre Ghiti

unread,
Feb 25, 2022, 7:39:59 AM2/25/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Changes in v3:
- Add PATCH 5/6 and PATCH 6/6

Changes in v2:
- Fix kernel test robot failure regarding KERN_VIRT_SIZE that is
undefined for nommu config

Alexandre Ghiti (6):
riscv: Fix is_linear_mapping with recent move of KASAN region
riscv: Fix config KASAN && SPARSEMEM && !SPARSE_VMEMMAP
riscv: Fix DEBUG_VIRTUAL false warnings
riscv: Fix config KASAN && DEBUG_VIRTUAL
riscv: Move high_memory initialization to setup_bootmem
riscv: Fix kasan pud population

arch/riscv/include/asm/page.h | 2 +-
arch/riscv/include/asm/pgtable.h | 1 +
arch/riscv/mm/Makefile | 3 +++
arch/riscv/mm/init.c | 2 +-
arch/riscv/mm/kasan_init.c | 8 +++++---
arch/riscv/mm/physaddr.c | 4 +---
6 files changed, 12 insertions(+), 8 deletions(-)

--
2.32.0

Alexandre Ghiti

unread,
Feb 25, 2022, 7:40:59 AM2/25/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
KASAN region was recently moved between the linear mapping and the
kernel mapping, is_linear_mapping used to check the validity of an
address by using the start of the kernel mapping, which is now wrong.

Fix this by using the maximum size of the physical memory.

Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/include/asm/page.h | 2 +-
arch/riscv/include/asm/pgtable.h | 1 +

Alexandre Ghiti

unread,
Feb 25, 2022, 7:42:01 AM2/25/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
In order to get the pfn of a struct page* when sparsemem is enabled
without vmemmap, the mem_section structures need to be initialized which
happens in sparse_init.

But kasan_early_init calls pfn_to_page way before sparse_init is called,
which then tries to dereference a null mem_section pointer.

Fix this by removing the usage of this function in kasan_early_init.

Fixes: 8ad8b72721d0 ("riscv: Add KASAN support")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---

Alexandre Ghiti

unread,
Feb 25, 2022, 7:43:02 AM2/25/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
KERN_VIRT_SIZE used to encompass the kernel mapping before it was
redefined when moving the kasan mapping next to the kernel mapping to only
match the maximum amount of physical memory.

Then, kernel mapping addresses that go through __virt_to_phys are now
declared as wrong which is not true, one can use __virt_to_phys on such
addresses.

Fix this by redefining the condition that matches wrong addresses.

Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---

Alexandre Ghiti

unread,
Feb 25, 2022, 7:44:03 AM2/25/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
__virt_to_phys function is called very early in the boot process (ie
kasan_early_init) so it should not be instrumented by KASAN otherwise it
bugs.

Fix this by declaring phys_addr.c as non-kasan instrumentable.

Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---

Alexandre Ghiti

unread,
Feb 25, 2022, 7:45:05 AM2/25/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
high_memory used to be initialized in mem_init, way after setup_bootmem.
But a call to dma_contiguous_reserve in this function gives rise to the
below warning because high_memory is equal to 0 and is used at the very
beginning at cma_declare_contiguous_nid.

It went unnoticed since the move of the kasan region redefined
KERN_VIRT_SIZE so that it does not encompass -1 anymore.

Fix this by initializing high_memory in setup_bootmem.

------------[ cut here ]------------
virt_to_phys used for non-linear address: ffffffffffffffff (0xffffffffffffffff)
WARNING: CPU: 0 PID: 0 at arch/riscv/mm/physaddr.c:14 __virt_to_phys+0xac/0x1b8
Modules linked in:
CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc1-00007-ga68b89289e26 #27
Hardware name: riscv-virtio,qemu (DT)
epc : __virt_to_phys+0xac/0x1b8
ra : __virt_to_phys+0xac/0x1b8
epc : ffffffff80014922 ra : ffffffff80014922 sp : ffffffff84a03c30
gp : ffffffff85866c80 tp : ffffffff84a3f180 t0 : ffffffff86bce657
t1 : fffffffef09406e8 t2 : 0000000000000000 s0 : ffffffff84a03c70
s1 : ffffffffffffffff a0 : 000000000000004f a1 : 00000000000f0000
a2 : 0000000000000002 a3 : ffffffff8011f408 a4 : 0000000000000000
a5 : 0000000000000000 a6 : 0000000000f00000 a7 : ffffffff84a03747
s2 : ffffffd800000000 s3 : ffffffff86ef4000 s4 : ffffffff8467f828
s5 : fffffff800000000 s6 : 8000000000006800 s7 : 0000000000000000
s8 : 0000000480000000 s9 : 0000000080038ea0 s10: 0000000000000000
s11: ffffffffffffffff t3 : ffffffff84a035c0 t4 : fffffffef09406e8
t5 : fffffffef09406e9 t6 : ffffffff84a03758
status: 0000000000000100 badaddr: 0000000000000000 cause: 0000000000000003
[<ffffffff8322ef4c>] cma_declare_contiguous_nid+0xf2/0x64a
[<ffffffff83212a58>] dma_contiguous_reserve_area+0x46/0xb4
[<ffffffff83212c3a>] dma_contiguous_reserve+0x174/0x18e
[<ffffffff83208fc2>] paging_init+0x12c/0x35e
[<ffffffff83206bd2>] setup_arch+0x120/0x74e
[<ffffffff83201416>] start_kernel+0xce/0x68c
irq event stamp: 0
hardirqs last enabled at (0): [<0000000000000000>] 0x0
hardirqs last disabled at (0): [<0000000000000000>] 0x0
softirqs last enabled at (0): [<0000000000000000>] 0x0
softirqs last disabled at (0): [<0000000000000000>] 0x0
---[ end trace 0000000000000000 ]---

Fixes: f7ae02333d13 ("riscv: Move KASAN mapping next to the kernel mapping")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/mm/init.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index c27294128e18..0d588032d6e6 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -125,7 +125,6 @@ void __init mem_init(void)
else
swiotlb_force = SWIOTLB_NO_FORCE;
#endif
- high_memory = (void *)(__va(PFN_PHYS(max_low_pfn)));
memblock_free_all();

print_vm_layout();
@@ -195,6 +194,7 @@ static void __init setup_bootmem(void)

min_low_pfn = PFN_UP(phys_ram_base);
max_low_pfn = max_pfn = PFN_DOWN(phys_ram_end);
+ high_memory = (void *)(__va(PFN_PHYS(max_low_pfn)));

dma32_phys_limit = min(4UL * SZ_1G, (unsigned long)PFN_PHYS(max_low_pfn));
set_max_mapnr(max_low_pfn - ARCH_PFN_OFFSET);
--
2.32.0

Alexandre Ghiti

unread,
Feb 25, 2022, 7:46:05 AM2/25/22
to Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Alexandre Ghiti, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
In sv48, the kasan inner regions are not aligned on PGDIR_SIZE and then
when we populate the kasan linear mapping region, we clear the kasan
vmalloc region which is in the same PGD.

Fix this by copying the content of the kasan early pud after allocating a
new PGD for the first time.

Fixes: e8a62cc26ddf ("riscv: Implement sv48 support")
Signed-off-by: Alexandre Ghiti <alexand...@canonical.com>
---
arch/riscv/mm/kasan_init.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/riscv/mm/kasan_init.c b/arch/riscv/mm/kasan_init.c
index 85e849318389..cd1a145257b7 100644
--- a/arch/riscv/mm/kasan_init.c
+++ b/arch/riscv/mm/kasan_init.c
@@ -113,8 +113,11 @@ static void __init kasan_populate_pud(pgd_t *pgd,
base_pud = pt_ops.get_pud_virt(pfn_to_phys(_pgd_pfn(*pgd)));
} else {
base_pud = (pud_t *)pgd_page_vaddr(*pgd);
- if (base_pud == lm_alias(kasan_early_shadow_pud))
+ if (base_pud == lm_alias(kasan_early_shadow_pud)) {
base_pud = memblock_alloc(PTRS_PER_PUD * sizeof(pud_t), PAGE_SIZE);
+ memcpy(base_pud, (void *)kasan_early_shadow_pud,
+ sizeof(pud_t) * PTRS_PER_PUD);
+ }
}

pudp = base_pud + pud_index(vaddr);
--
2.32.0

Marco Elver

unread,
Feb 25, 2022, 8:06:20 AM2/25/22
to Alexandre Ghiti, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
On Fri, 25 Feb 2022 at 13:40, Alexandre Ghiti
<alexand...@canonical.com> wrote:
>
> As reported by Aleksandr, syzbot riscv is broken since commit
> 54c5639d8f50 ("riscv: Fix asan-stack clang build"). This commit actually
> breaks KASAN_INLINE which is not fixed in this series, that will come later
> when found.
>
> Nevertheless, this series fixes small things that made the syzbot
> configuration + KASAN_OUTLINE fail to boot.
>
> Note that even though the config at [1] boots fine with this series, I
> was not able to boot the small config at [2] which fails because
> kasan_poison receives a really weird address 0x4075706301000000 (maybe a
> kasan person could provide some hint about what happens below in
> do_ctors -> __asan_register_globals):

asan_register_globals is responsible for poisoning redzones around
globals. As hinted by 'do_ctors', it calls constructors, and in this
case a compiler-generated constructor that calls
__asan_register_globals with metadata generated by the compiler. That
metadata contains information about global variables. Note, these
constructors are called on initial boot, but also every time a kernel
module (that has globals) is loaded.

It may also be a toolchain issue, but it's hard to say. If you're
using GCC to test, try Clang (11 or later), and vice-versa.

Alexandre Ghiti

unread,
Feb 25, 2022, 9:04:32 AM2/25/22
to Marco Elver, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Alexander Potapenko, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
I tried 3 different gcc toolchains already, but that did not fix the
issue. The only thing that worked was setting asan-globals=0 in
scripts/Makefile.kasan, but ok, that's not a fix.
I tried to bisect this issue but our kasan implementation has been
broken quite a few times, so it failed.

I keep digging!

Thanks for the tips,

Alex

Alexander Potapenko

unread,
Feb 25, 2022, 9:10:47 AM2/25/22
to Alexandre Ghiti, Marco Elver, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
The problem does not reproduce for me with GCC 11.2.0: kernels built with both [1] and [2] are bootable.
FWIW here is how I run them:

qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot \
  -device virtio-rng-pci -machine virt -device \
  virtio-net-pci,netdev=net0 -netdev \
  user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device \
  virtio-blk-device,drive=hd0 -drive \
  file=${IMAGE},if=none,format=raw,id=hd0 -snapshot \
  -kernel ${KERNEL_SRC_DIR}/arch/riscv/boot/Image -append "root=/dev/vda
  console=ttyS0 earlyprintk=serial"

 
Thanks for the tips,

Alex


--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg

Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.

     

This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

Alexandre Ghiti

unread,
Feb 25, 2022, 9:15:43 AM2/25/22
to Alexander Potapenko, Marco Elver, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
Do you mean you reach userspace? Because my image boots too, and fails
at some point:

[ 0.000150] sched_clock: 64 bits at 10MHz, resolution 100ns, wraps
every 4398046511100ns
[ 0.015847] Console: colour dummy device 80x25
[ 0.016899] printk: console [tty0] enabled
[ 0.020326] printk: bootconsole [ns16550a0] disabled

It traps here.

Alexander Potapenko

unread,
Feb 25, 2022, 9:31:20 AM2/25/22
to Alexandre Ghiti, Marco Elver, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
In my case, QEMU successfully boots to the login prompt.
I am running QEMU 6.2.0 (Debian 1:6.2+dfsg-2) and an image Aleksandr shared with me (guess it was built according to this instruction: https://github.com/google/syzkaller/blob/master/docs/linux/setup_linux-host_qemu-vm_riscv64-kernel.md)
 
It traps here.

> FWIW here is how I run them:
>
> qemu-system-riscv64 -m 2048 -smp 1 -nographic -no-reboot \
>   -device virtio-rng-pci -machine virt -device \
>   virtio-net-pci,netdev=net0 -netdev \
>   user,id=net0,restrict=on,hostfwd=tcp:127.0.0.1:12529-:22 -device \
>   virtio-blk-device,drive=hd0 -drive \
>   file=${IMAGE},if=none,format=raw,id=hd0 -snapshot \
>   -kernel ${KERNEL_SRC_DIR}/arch/riscv/boot/Image -append "root=/dev/vda
>   console=ttyS0 earlyprintk=serial"
>
>
>>
>> Thanks for the tips,
>>
>> Alex
>
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Liana Sebastian
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
>
> Diese E-Mail ist vertraulich. Falls Sie diese fälschlicherweise erhalten haben sollten, leiten Sie diese bitte nicht an jemand anderes weiter, löschen Sie alle Kopien und Anhänge davon und lassen Sie mich bitte wissen, dass die E-Mail an die falsche Person gesendet wurde.
>
>
>
> This e-mail is confidential. If you received this communication by mistake, please don't forward it to anyone else, please erase all copies and attachments, and please let me know that it has gone to the wrong person.

--
You received this message because you are subscribed to the Google Groups "kasan-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to kasan-dev+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/kasan-dev/CA%2BzEjCsQPVYSV7CdhKnvjujXkMXuRQd%3DVPok1awb20xifYmidw%40mail.gmail.com.

Alexandre Ghiti

unread,
Feb 25, 2022, 9:46:59 AM2/25/22
to Alexander Potapenko, Marco Elver, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
Nice thanks guys! I always use the latest opensbi and not the one that
is embedded in qemu, which is the only difference between your command
line (which works) and mine (which does not work). So the issue is
probably there, I really need to investigate that now.

That means I only need to fix KASAN_INLINE and we're good.

I imagine Palmer can add your Tested-by on the series then?

Thanks again!

Alex

Alexander Potapenko

unread,
Feb 25, 2022, 10:01:01 AM2/25/22
to Alexandre Ghiti, Marco Elver, Paul Walmsley, Palmer Dabbelt, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov, Aleksandr Nogikh, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
Great to hear that!
 
That means I only need to fix KASAN_INLINE and we're good.

I imagine Palmer can add your Tested-by on the series then?
Sure :) 

Palmer Dabbelt

unread,
Mar 1, 2022, 12:39:56 PM3/1/22
to gli...@google.com, alexand...@canonical.com, el...@google.com, Paul Walmsley, a...@eecs.berkeley.edu, ryabin...@gmail.com, andre...@gmail.com, dvy...@google.com, nog...@google.com, nic...@andestech.com, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
Do you mind actually posting that (i, the Tested-by tag)? It's less
likely to get lost that way. I intend on taking this into fixes ASAP,
my builds have blown up for some reason (I got bounced between machines,
so I'm blaming that) so I need to fix that first.
>> https://groups.google.com/d/msgid/kasan-dev/CA%2BzEjCuJw8N0dUmQNdFqDM96bzKqPDjRe4FUnOCbjhJtO0R8Hg%40mail.gmail.com
>> .

Palmer Dabbelt

unread,
Mar 3, 2022, 11:12:22 PM3/3/22
to gli...@google.com, alexand...@canonical.com, el...@google.com, Paul Walmsley, a...@eecs.berkeley.edu, ryabin...@gmail.com, andre...@gmail.com, dvy...@google.com, nog...@google.com, nic...@andestech.com, linux...@lists.infradead.org, linux-...@vger.kernel.org, kasa...@googlegroups.com
This is on fixes (with a "Tested-by: Alexander Potapenko
<gli...@google.com>"), along with some trivial commit message fixes.

Thanks!

Aleksandr Nogikh

unread,
Mar 9, 2022, 5:45:47 AM3/9/22
to Palmer Dabbelt, Alexander Potapenko, Alexandre Ghiti, Marco Elver, Paul Walmsley, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Dmitry Vyukov, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
I switched the riscv syzbot instance to KASAN_OUTLINE and now it is
finally being fuzzed again!

Thank you very much for the series!

--
Best Regards,
Aleksandr

Dmitry Vyukov

unread,
Mar 9, 2022, 5:52:37 AM3/9/22
to Aleksandr Nogikh, Palmer Dabbelt, Alexander Potapenko, Alexandre Ghiti, Marco Elver, Paul Walmsley, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
On Wed, 9 Mar 2022 at 11:45, Aleksandr Nogikh <nog...@google.com> wrote:
>
> I switched the riscv syzbot instance to KASAN_OUTLINE and now it is
> finally being fuzzed again!
>
> Thank you very much for the series!


But all riscv crashes are still classified as "corrupted" and thrown
away (not reported):
https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452

The problem is that risvc oopses don't contain "Call Trace:" in the
beginning of stack traces, so it's hard to make sense out of them.
arch/riscv seems to print "Call Trace:" in a wrong function, not where
all other arches print it.

Alexandre Ghiti

unread,
Mar 10, 2022, 3:42:14 AM3/10/22
to Dmitry Vyukov, Aleksandr Nogikh, Palmer Dabbelt, Alexander Potapenko, Marco Elver, Paul Walmsley, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
Hi,

On Wed, Mar 9, 2022 at 11:52 AM Dmitry Vyukov <dvy...@google.com> wrote:
>
> On Wed, 9 Mar 2022 at 11:45, Aleksandr Nogikh <nog...@google.com> wrote:
> >
> > I switched the riscv syzbot instance to KASAN_OUTLINE and now it is
> > finally being fuzzed again!
> >
> > Thank you very much for the series!
>
>
> But all riscv crashes are still classified as "corrupted" and thrown
> away (not reported):
> https://syzkaller.appspot.com/bug?id=d5bc3e0c66d200d72216ab343a67c4327e4a3452
>
> The problem is that risvc oopses don't contain "Call Trace:" in the
> beginning of stack traces, so it's hard to make sense out of them.
> arch/riscv seems to print "Call Trace:" in a wrong function, not where
> all other arches print it.
>

Does the following diff fix this issue?

diff --git a/arch/riscv/kernel/stacktrace.c b/arch/riscv/kernel/stacktrace.c
index 201ee206fb57..348ca19ccbf8 100644
--- a/arch/riscv/kernel/stacktrace.c
+++ b/arch/riscv/kernel/stacktrace.c
@@ -109,12 +109,12 @@ static bool print_trace_address(void *arg,
unsigned long pc)
noinline void dump_backtrace(struct pt_regs *regs, struct task_struct *task,
const char *loglvl)
{
+ pr_cont("%sCall Trace:\n", loglvl);
walk_stackframe(task, regs, print_trace_address, (void *)loglvl);
}

void show_stack(struct task_struct *task, unsigned long *sp, const
char *loglvl)
{
- pr_cont("%sCall Trace:\n", loglvl);
dump_backtrace(NULL, task, loglvl);
}

Thanks,

Alex

Aleksandr Nogikh

unread,
Mar 24, 2022, 12:53:42 PM3/24/22
to Alexandre Ghiti, Dmitry Vyukov, Palmer Dabbelt, Alexander Potapenko, Marco Elver, Paul Walmsley, Albert Ou, Andrey Ryabinin, Andrey Konovalov, Nick Hu, linux...@lists.infradead.org, LKML, kasan-dev
https://pastebin.com/pN4rUjSi))))On Thu, Mar 10, 2022 at 9:42 AM
I wouldn't say that all riscv crashes are ending up in the "corrupted
report" bucket, but for some classes of errors there are definitely
differences from other architectures and they prevent syzkaller from
making sense out of those reports. At the moment everything seems to
be working fine at least with "WARNING:", "KASAN:" and "kernel
panic:".

I've run syzkaller with and without the small patch. From what I
observed, it definitely helps with the "BUG: soft lockup in" class of
reports. Previously they were declared corrupted, now syzkaller parses
them normally.

There's still a problem with "INFO: rcu_preempt detected stalls on
CPUs/tasks", which might be a bit more complicated than just the Call
Trace printing location.

Here's an example of such a report from x86: https://pastebin.com/KMEE5YRf
There goes a header with the "rcu: INFO: rcu_preempt detected stalls
on CPUs/tasks:" title
(https://elixir.bootlin.com/linux/v5.17/source/kernel/rcu/tree_stall.h#L520),
then backtrace for one CPU
(https://elixir.bootlin.com/linux/v5.17/source/kernel/rcu/tree_stall.h#L331),
then there goes another error message about starving kthread
(https://elixir.bootlin.com/linux/v5.17/source/kernel/rcu/tree_stall.h#L442),
then there go two kthread-related traces.

And here's a report from riscv: https://pastebin.com/pN4rUjSi
There's de facto no backtrace between "rcu: INFO: rcu_preempt detected
stalls on CPUs/tasks:" and "rcu: RCU grace-period kthread stack
dump:".
Reply all
Reply to author
Forward
0 new messages