Trying to understand sepc register when program segfaults

34 views
Skip to first unread message

Richard W.M. Jones

unread,
Dec 27, 2017, 8:07:47 AM12/27/17
to sw-...@groups.riscv.org
I'm still trying to get to the bottom of the ebreak problem ...
Unfortunately I've been completely unable to force the process
to write a core dump, and I don't have gdb in the stage 3 environment.

However I can see the registers printed by the kernel:

Oops - illegal instruction [#4]
CPU: 0 PID: 72 Comm: rpmbuild Tainted: G D W 4.15.0-rc3-01064-g418457520fbb #8
sepc: ffffffe07d714eb8 ra : ffffffe07d714eb8 sp : ffffffe07cf39ac0
gp : ffffffe0002bf538 tp : ffffffe07dc7e580 t0 : ffffffe0000192f4
t1 : ffffffe000213508 t2 : ffffffe000213588 s0 : 0000000000000001
s1 : 00000000003e8405 a0 : ffffffe07cf39b98 a1 : 00000000003e8405
a2 : 0000000000001000 a3 : 0000000000000008 a4 : ffffffe07d624360
a5 : ffffffffffffffff a6 : 0000000000000fa0 a7 : 0000000000000000
s2 : 0000000000000010 s3 : 0000000000000010 s4 : ffffffe07e1fcfa0
s5 : 0000000000000055 s6 : 0000000000200000 s7 : 0000000000000040
s8 : 0000000000000008 s9 : 0000000000000010 s10: ffffffe07e1fcfa0
s11: 0000000000000055 t3 : 00000020000d3910 t4 : 0000000000000002
t5 : 0000000000000037 t6 : 0000000000000000
sstatus: 0000000000000100 sbadaddr: 00000000000081a4 scause: 0000000000000002
---[ end trace d80e35992f1aebc7 ]---

and the /proc/PID/maps of the corresponding process just before it
segfaults (see attached).

Unfortunately sepc makes no sense. The priv spec says that on a trap
into S-mode the virtual address should be written to sepc, but
sepc = ffffffe07d714eb8 doesn't map to any address in the map. In
fact it makes no sense at all even if I mask off upper bits.

I suppose this might be a qemu bug? Or is there something obvious
that I'm missing?

Unfortunately spike can no longer emulate disks
("--disk/+disk unsupported (use a ramdisk)") which makes it very
difficult to try to reproduce this in spike.

Rich.

00010000-00015000 r-xp 00000000 fe:00 1014765 /usr/bin/rpmbuild
00015000-00016000 r--p 00004000 fe:00 1014765 /usr/bin/rpmbuild
00016000-00017000 rw-p 00005000 fe:00 1014765 /usr/bin/rpmbuild
00017000-00059000 rw-p 00000000 00:00 0 [heap]
2000000000-2000017000 r-xp 00000000 fe:00 900943 /usr/lib/ld-2.26.so
2000018000-2000019000 r--p 00017000 fe:00 900943 /usr/lib/ld-2.26.so
2000019000-200001b000 rw-p 00018000 fe:00 900943 /usr/lib/ld-2.26.so
200001b000-200001d000 r-xp 00000000 00:00 0 [vdso]
200001d000-200001f000 rw-p 00000000 00:00 0
2000022000-2000049000 r-xp 00000000 fe:00 883454 /usr/lib64/librpmbuild.so.7.0.0
2000049000-200004b000 r--p 00026000 fe:00 883454 /usr/lib64/librpmbuild.so.7.0.0
200004b000-200004c000 rw-p 00028000 fe:00 883454 /usr/lib64/librpmbuild.so.7.0.0
200004c000-200004d000 rw-p 00000000 00:00 0
200004d000-20001fc000 r-xp 00000000 fe:00 885665 /usr/lib64/librpm.so.7.0.0
20001fc000-20001fd000 ---p 001af000 fe:00 885665 /usr/lib64/librpm.so.7.0.0
20001fd000-2000201000 r--p 001af000 fe:00 885665 /usr/lib64/librpm.so.7.0.0
2000201000-2000208000 rw-p 001b3000 fe:00 885665 /usr/lib64/librpm.so.7.0.0
2000208000-2000209000 rw-p 00000000 00:00 0
2000209000-2000221000 r-xp 00000000 fe:00 883921 /usr/lib64/libmagic.so.1.0.0
2000221000-2000222000 ---p 00018000 fe:00 883921 /usr/lib64/libmagic.so.1.0.0
2000222000-2000223000 r--p 00018000 fe:00 883921 /usr/lib64/libmagic.so.1.0.0
2000223000-2000224000 rw-p 00019000 fe:00 883921 /usr/lib64/libmagic.so.1.0.0
2000224000-200025e000 r-xp 00000000 fe:00 883052 /usr/lib64/libdw-0.170.so
200025e000-200025f000 ---p 0003a000 fe:00 883052 /usr/lib64/libdw-0.170.so
200025f000-2000261000 r--p 0003a000 fe:00 883052 /usr/lib64/libdw-0.170.so
2000261000-2000263000 rw-p 0003c000 fe:00 883052 /usr/lib64/libdw-0.170.so
2000263000-20002b9000 r-xp 00000000 fe:00 883901 /usr/lib64/librpmio.so.7.0.0
20002b9000-20002bc000 r--p 00055000 fe:00 883901 /usr/lib64/librpmio.so.7.0.0
20002bc000-20002bf000 rw-p 00058000 fe:00 883901 /usr/lib64/librpmio.so.7.0.0
20002bf000-20002c3000 rw-p 00000000 00:00 0
20002c3000-20002d5000 r-xp 00000000 fe:00 883804 /usr/lib64/libelf-0.170.so
20002d5000-20002d6000 r--p 00011000 fe:00 883804 /usr/lib64/libelf-0.170.so
20002d6000-20002d7000 rw-p 00012000 fe:00 883804 /usr/lib64/libelf-0.170.so
20002d7000-20002f3000 r-xp 00000000 fe:00 885699 /usr/lib64/liblzma.so.5.2.2
20002f3000-20002f4000 r--p 0001b000 fe:00 885699 /usr/lib64/liblzma.so.5.2.2
20002f4000-20002f5000 rw-p 0001c000 fe:00 885699 /usr/lib64/liblzma.so.5.2.2
20002f5000-2000389000 r-xp 00000000 fe:00 901582 /usr/lib/libm-2.26.so
2000389000-200038a000 r--p 00093000 fe:00 901582 /usr/lib/libm-2.26.so
200038a000-200038b000 rw-p 00094000 fe:00 901582 /usr/lib/libm-2.26.so
200038b000-20003ba000 r-xp 00000000 fe:00 883443 /usr/lib64/libbeecrypt.so.7.0.0
20003ba000-20003bb000 ---p 0002f000 fe:00 883443 /usr/lib64/libbeecrypt.so.7.0.0
20003bb000-20003bc000 r--p 0002f000 fe:00 883443 /usr/lib64/libbeecrypt.so.7.0.0
20003bc000-20003bf000 rw-p 00030000 fe:00 883443 /usr/lib64/libbeecrypt.so.7.0.0
20003bf000-20003c7000 r-xp 00000000 fe:00 885676 /usr/lib64/libpopt.so.0.0.0
20003c7000-20003c8000 r--p 00007000 fe:00 885676 /usr/lib64/libpopt.so.0.0.0
20003c8000-20003c9000 rw-p 00008000 fe:00 885676 /usr/lib64/libpopt.so.0.0.0
20003c9000-20003e4000 r-xp 00000000 fe:00 885608 /usr/lib64/libz.so.1.2.11
20003e4000-20003e5000 ---p 0001b000 fe:00 885608 /usr/lib64/libz.so.1.2.11
20003e5000-20003e6000 r--p 0001b000 fe:00 885608 /usr/lib64/libz.so.1.2.11
20003e6000-20003e7000 rw-p 0001c000 fe:00 885608 /usr/lib64/libz.so.1.2.11
20003e7000-20003e9000 rw-p 00000000 00:00 0
20003e9000-20003eb000 r-xp 00000000 fe:00 901546 /usr/lib/libdl-2.26.so
20003eb000-20003ec000 r--p 00001000 fe:00 901546 /usr/lib/libdl-2.26.so
20003ec000-20003ed000 rw-p 00002000 fe:00 901546 /usr/lib/libdl-2.26.so
20003ed000-2000400000 r-xp 00000000 fe:00 901592 /usr/lib/libpthread-2.26.so
2000400000-2000401000 r--p 00012000 fe:00 901592 /usr/lib/libpthread-2.26.so
2000401000-2000402000 rw-p 00013000 fe:00 901592 /usr/lib/libpthread-2.26.so
2000402000-2000406000 rw-p 00000000 00:00 0
2000406000-2000502000 r-xp 00000000 fe:00 901574 /usr/lib/libc-2.26.so
2000502000-2000506000 r--p 000fb000 fe:00 901574 /usr/lib/libc-2.26.so
2000506000-2000508000 rw-p 000ff000 fe:00 901574 /usr/lib/libc-2.26.so
2000508000-200050c000 rw-p 00000000 00:00 0
200050c000-2000511000 r-xp 00000000 fe:00 900942 /usr/lib/librt-2.26.so
2000511000-2000512000 r--p 00004000 fe:00 900942 /usr/lib/librt-2.26.so
2000512000-2000513000 rw-p 00005000 fe:00 900942 /usr/lib/librt-2.26.so
2000513000-2000517000 rw-p 00000000 00:00 0
2000517000-200051e000 r-xp 00000000 fe:00 901583 /usr/lib/libnss_files-2.26.so
200051e000-200051f000 ---p 00007000 fe:00 901583 /usr/lib/libnss_files-2.26.so
200051f000-2000520000 r--p 00007000 fe:00 901583 /usr/lib/libnss_files-2.26.so
2000520000-2000521000 rw-p 00008000 fe:00 901583 /usr/lib/libnss_files-2.26.so
2000521000-2000527000 rw-p 00000000 00:00 0
3fffac8000-3fffae9000 rw-p 00000000 00:00 0 [stack]


--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-builder quickly builds VMs from scratch
http://libguestfs.org/virt-builder.1.html

Cesar Eduardo Barros

unread,
Dec 27, 2017, 5:30:43 PM12/27/17
to Richard W.M. Jones, sw-...@groups.riscv.org
Sorry if I'm being dense, but aren't negative addresses in "kernel space"?

I first noticed the sp register, which is negative. To me, that
indicates a stack within the kernel, which would only make sense if the
code was running within the kernel. The gp and tp are also negative.
Several other registers are negative, probably all pointing to data
structures and/or code within the kernel.

Finally, sepc and ra are identical, so my guess is that this happened
immediately after a return from a subroutine. Perhaps the kernel
returned from a "noreturn" function?

Finally, the sstatus. Unless I'm misreading the bit fields, it has only
SPP=1, which if I understood correctly means "trap from S-mode".

Taking it all together, my guess is that this time it was the kernel, so
you should be looking at the kernel map, not at the process map. If you
have the kernel boot log saved somewhere, you might be able to at least
check the memory map to see if my guess makes sense (kernel code being
mapped somewhere around ffffffe07...)

--
Cesar Eduardo Barros
ces...@cesarb.eti.br

Richard W.M. Jones

unread,
Dec 27, 2017, 6:42:45 PM12/27/17
to Cesar Eduardo Barros, sw-...@groups.riscv.org
Thanks, this is news to me. Is there some documentation about how
Linux physical and virtual address space is used on RISC-V?

> I first noticed the sp register, which is negative. To me, that
> indicates a stack within the kernel, which would only make sense if
> the code was running within the kernel. The gp and tp are also
> negative. Several other registers are negative, probably all
> pointing to data structures and/or code within the kernel.
>
> Finally, sepc and ra are identical, so my guess is that this
> happened immediately after a return from a subroutine. Perhaps the
> kernel returned from a "noreturn" function?
>
> Finally, the sstatus. Unless I'm misreading the bit fields, it has
> only SPP=1, which if I understood correctly means "trap from
> S-mode".
>
> Taking it all together, my guess is that this time it was the
> kernel, so you should be looking at the kernel map, not at the
> process map. If you have the kernel boot log saved somewhere, you
> might be able to at least check the memory map to see if my guess
> makes sense (kernel code being mapped somewhere around ffffffe07...)

So since I posted this morning I've spent a good day looking at
this problem.

(As you essentially say above) what's happening is that something runs
EBREAK which enters qemu, then qemu returns to the guest in S-mode,
and the kernel doesn't know how to handle the exception so it prints
the above message. By the time the message above is printed all the
useful information about the original EBREAK has been lost.

My first approach was to modify qemu to dump the registers when it
hits the breakpoint, and from that I was able to get the actual PC
where the breakpoint occurs.

After searching every bit of code in the image, nothing contained
an EBREAK insn at the right offset.

But now I know that negative addresses are kernel addresses I'm going
to look in the kernel instead of userspace ...

Thanks,

Rich.

Richard W.M. Jones

unread,
Dec 27, 2017, 6:48:55 PM12/27/17
to Cesar Eduardo Barros, sw-...@groups.riscv.org
On Wed, Dec 27, 2017 at 11:42:41PM +0000, Richard W.M. Jones wrote:
> But now I know that negative addresses are kernel addresses I'm going
> to look in the kernel instead of userspace ...

So it turns out to be this BUG_ON in fs/buffer.c:

static inline void check_irqs_on(void)
{
#ifdef irqs_disabled
BUG_ON(irqs_disabled());
#endif
}

called from __find_get_block.

I'll need to look at this some more tomorrow.

Thanks again,

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
libguestfs lets you edit virtual machines. Supports shell scripting,
bindings from many languages. http://libguestfs.org

Cesar Eduardo Barros

unread,
Dec 27, 2017, 7:47:17 PM12/27/17
to Richard W.M. Jones, sw-...@groups.riscv.org
None that I know of, sorry. The best general documentation I could find
with a quick search was Documentation/vm/highmem.txt
(https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/vm/highmem.txt)
and chapter 15 of LDD (https://lwn.net/Kernel/LDD3/).

For 32-bit ARM, there's some documentation of the memory mappings at
Documentation/arm/memory.txt; for 64-bit ARM, at
Documentation/arm64/memory.txt; and for x86-64, at
Documentation/x86/x86_64/mm.txt.

It seems to be a common approach for operating systems to have the user
space mapped at the bottom of the virtual address space, and the kernel
space mapped at the top of the virtual address space. For 32-bit, either
2G:2G or 3G:1G splits are common; for 64-bit, the usual split is at the
middle, so when interpreted as signed numbers, the kernel is at negative
virtual addresses. There are also "splits" where the kernel and user
space are not mapped at the same time, either to have a larger virtual
address space (4G:4G split on 32-bit) or for extra security.

On Linux, the physical address space, or part of it, is permanently
mapped somewhere in the kernel part of the virtual address space.
There's also a "window" somewhere in the kernel part of the virtual
address space where the rest of the physical address space can be
temporarily mapped.

Richard W.M. Jones

unread,
Dec 28, 2017, 7:45:42 AM12/28/17
to Cesar Eduardo Barros, sw-...@groups.riscv.org
On Wed, Dec 27, 2017 at 11:48:52PM +0000, Richard W.M. Jones wrote:
> On Wed, Dec 27, 2017 at 11:42:41PM +0000, Richard W.M. Jones wrote:
> > But now I know that negative addresses are kernel addresses I'm going
> > to look in the kernel instead of userspace ...
>
> So it turns out to be this BUG_ON in fs/buffer.c:
>
> static inline void check_irqs_on(void)
> {
> #ifdef irqs_disabled
> BUG_ON(irqs_disabled());
> #endif
> }
>
> called from __find_get_block.
>
> I'll need to look at this some more tomorrow.

I changed BUG_ON -> WARN_ON to turn this into a warning and the
stack trace is:

WARNING: CPU: 0 PID: 46 at fs/buffer.c:1239 __find_get_block+0x17e/0x200
CPU: 0 PID: 46 Comm: rpm Tainted: G W 4.15.0-rc3-01065-g8872029562dc #9
Call Trace:
[<00000000ea551812>] walk_stackframe+0x0/0xa2
[<00000000414d6864>] show_stack+0x26/0x34
[<00000000403e7850>] dump_stack+0x20/0x2c
[<00000000955f824e>] __warn+0xc8/0xde
[<0000000010d2b4eb>] warn_slowpath_null+0x2c/0x3e
[<00000000a983dd7f>] __find_get_block+0x17a/0x200
[<0000000055329e12>] __getblk_gfp+0x26/0x1f6
[<000000000a1e8d7e>] __ext4_get_inode_loc+0xb8/0x2fc
[<00000000c6098694>] ext4_reserve_inode_write+0x38/0x98
[<00000000d5f8d023>] ext4_mark_inode_dirty+0x22/0xdc
[<00000000749686df>] ext4_dirty_inode+0x32/0x58
[<0000000009658672>] __mark_inode_dirty+0x24/0x176
[<00000000d7aaba94>] generic_update_time+0x56/0x76
[<0000000035468406>] file_update_time+0x7e/0xa8
[<0000000013fac338>] ext4_page_mkwrite+0x48/0x3cc
[<0000000036347b09>] do_page_mkwrite+0x22/0x84
[<0000000092e57baf>] do_wp_page+0x16e/0x342
[<000000000fb54aaf>] __handle_mm_fault+0x126/0x76c
[<00000000b2681717>] handle_mm_fault+0x24/0x30
[<000000008f939674>] do_page_fault+0xc8/0x2b8
[<00000000350ce1bc>] ret_from_syscall+0xa/0xe

Reading around this it seems as if it can be caused by *any* other
part of the kernel not restoring the interrupt state properly. For
example here's a bug report where FPU emulation code (on x86) was
disabling and not enabling interrupts which caused the same stack
trace.

https://lkml.org/lkml/2014/1/19/112

Oh well. The kernel works fine (with lots of warnings) now ...

Rich.

--
Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones
Read my programming and virtualization blog: http://rwmj.wordpress.com
virt-df lists disk usage of guests without needing to install any
software inside the virtual machine. Supports Linux and Windows.
http://people.redhat.com/~rjones/virt-df/

Cesar Eduardo Barros

unread,
Dec 28, 2017, 8:41:17 AM12/28/17
to Richard W.M. Jones, sw-...@groups.riscv.org
You might think it's working, but this BUG_ON exists because the code
first disables and then enables interrupts unconditionally (on SMP), so
now it's enabling interrupts when the caller had left them disabled.

My educated guess: since most of the code in this backtrace is
arch-indepentent, the first place to look would be the arch-dependent
do_page_fault function. It has the following code near the beginning, if
I'm looking at the right copy of the riscv linux code:

/* Enable interrupts if they were enabled in the parent context. */
if (likely(regs->sstatus & SR_PIE))
local_irq_enable();

So either riscv's do_page_fault has a bug, or something tried to write
to a copy-on-write page while interrupts were disabled.
Reply all
Reply to author
Forward
0 new messages