Debug GPF for x86

176 views
Skip to first unread message

Brennan Ashton

unread,
Oct 25, 2018, 3:56:32 PM10/25/18
to nu...@googlegroups.com
I'm trying to debug some issues with the QEMU-i486 port.  I get:

up_assert: Assertion failed at file:irq/irq_unexpectedisr.c line: 65 task: Idle Task

with irq 13 which is the General Protection Fault.   On ARM the PC is stored in the general registers, but my understanding is that this is not true for x86.  Before I dig in too far, does anyone have any hints on determining the access the caused the fault.

up_registerdump:  ds:00fe0010 irq:0000000d err:00000064
up_registerdump: edi:00000064 esi:0011f160 ebp:0011f160 esp:0011d5b0
up_registerdump: ebx:00000064 edx:00000000 ecx:00000000 eax:00fea68a
up_registerdump: eip:00000460  cs:00000008 flg:00000082  sp:00000064 ss:00000064

--Brennan

Gregory Nutt

unread,
Oct 25, 2018, 4:02:12 PM10/25/18
to nu...@googlegroups.com
No stack dump?  The registers are helpful, but you can usually analyze
the stack to figure out what happened.

Do you have CONFIG_ARCH_STACKDUMP=y?

Tips for analyzing the ARMv7-M stack are here:
http://www.nuttx.org/doku.php?id=wiki:howtos:cortexm-hardfault . x86
would, of course, be a little different but the general approach should
still apply.

Brennan Ashton

unread,
Oct 25, 2018, 4:06:53 PM10/25/18
to nu...@googlegroups.com
Sure, I have done that many times when working on ARM, but I am unclear how to extract the PC from the IA32 registers. 

Regardless this is the full dump:
Booting from ROM..
NuttShell (NSH)
nsh> irq_unexpected_isr: ERROR irq: 13

up_assert: Assertion failed at file:irq/irq_unexpectedisr.c line: 65 task: Idle Task
up_dumpstate: sp:         001204f8
up_dumpstate: stack base: 00120e18
up_dumpstate: stack size: 00005000
up_stackdump: 001204e0: 001204f4 00120510 00000000 00000000 001208a4 00120510 00000001
up_stackdump: 00120500: 001208a4 00000212 001208a4 00000212 001208a4 00000212 001208a2
up_stackdump: 00120520: 00000064 00115a9a 0011dbc8 00000041 0011dbe0 0000000d 00000000
up_stackdump: 00120540: 00000000 00115a4a 0000000d 0012058c 00000000 00000000 00000000
up_stackdump: 00120560: 00000000 00105b43 0000000d 0012058c 00000000 00000000 00000000
up_stackdump: 00120580: 00000000 00105ab2 0012058c 00fe0010 00000064 00000212 00122160
up_stackdump: 001205a0: 001208a4 00000000 00000000 00fea68a 0000000d 000008a4 00000468
up_stackdump: 001205c0: 00000082 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 001205e0: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120600: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120620: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120640: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120660: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120680: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 001206a0: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 001206c0: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 001206e0: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120700: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120720: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120740: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120760: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 00120780: 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4 001208a4
up_stackdump: 001207a0: 001208a4 001208a4 001208a4 001208a4 00000000 00100981 00122160
up_stackdump: 001207c0: 00000064 0011c4c8 00000000 00000064 00000001 001004c0 00122168
up_stackdump: 001207e0: 00000000 001208a4 00000003 0011f7e8 00120814 0011c4c8 00000002
up_stackdump: 00120800: 0011c551 00009500 00000000 00130000 00000000 0010036b 00005000
up_stackdump: 00120820: 00000000 00105cb0 00000000 00000002 000001a4 00104cfb 00000000
up_stackdump: 00120840: 00009500 001002d2 00000002 00000006 00000000 00100265 00000000
up_stackdump: 00120860: 00120e1c 000df1e4 00000000 00000000 00000000 0010001d 00009502
up_stackdump: 00120880: 00000000 00000000 00122160 00122160 0011f100 0011f100 00000000
up_stackdump: 001208a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000
up_stackdump: 001208c0: 0011f100 00000000 00122160 00000001 00000000 0000ffff 0000000f
up_stackdump: 001208e0: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120900: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120920: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120940: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120960: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120980: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 001209a0: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 001209c0: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 001209e0: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120a00: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120a20: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120a40: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120a60: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120a80: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120aa0: 00000000 0000ffff 00000000 0000ffff 00000000 0000ffff 0000000f
up_stackdump: 00120ac0: 00000000 00000000 00000001 00000000 00000000 00121090 001211b0
up_stackdump: 00120ae0: 00120f98 00120fc0 00121068 001212f0 00121360 001211e0 001212d0
up_stackdump: 00120b00: 0011f5c0 00000020 00000000 00000000 00000000 00000000 00000000
up_stackdump: 00120b20: ffff0001 00000000 000df1e0 00120e20 001ffff8 00000000 00000004
up_stackdump: 00120b40: 00000000 00000000 00000000 00120b54 00120b34 00000000 00000004
up_stackdump: 00120b60: 00120b44 00000000 00000000 00120b74 00120b54 00000000 00000004
up_stackdump: 00120b80: 00120b64 00000000 00000000 00120b94 00120b74 00000000 00000004
up_stackdump: 00120ba0: 00120b84 00000000 00000000 00120bb4 00120b94 00000000 00000004
up_stackdump: 00120bc0: 00120ba4 00000000 00000000 00120bd4 00120bb4 00000000 00000004
up_stackdump: 00120be0: 00120bc4 00000000 00000000 00120bf4 00120bd4 00000000 00000004
up_stackdump: 00120c00: 00120be4 00000000 00000000 00120c14 00120bf4 00000000 00000004
up_stackdump: 00120c20: 00120c04 00000000 00000000 00127a78 00120c14 00000000 00000004
up_stackdump: 00120c40: 00127a78 00000000 00000000 00120c54 00120c34 00000000 00000000
up_stackdump: 00120c60: 00120c44 0012058c 00000001 4d6eda00 00000000 00000000 00000000
up_stackdump: 00120c80: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120ca0: 00102710 0011f000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120cc0: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120ce0: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120d00: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120d20: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120d40: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120d60: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120d80: 00105be0 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120da0: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120dc0: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120de0: 00115a70 00000000 00115a70 00000000 00115a70 00000000 00115a70
up_stackdump: 00120e00: 00121380 00121858 00121890 001219a8 001219e4 00121af8 00000004
up_registerdump:  ds:00fe0010 irq:0000000d err:000008a4
up_registerdump: edi:00000064 esi:00000212 ebp:00122160 esp:001205b0
up_registerdump: ebx:001208a4 edx:00000000 ecx:00000000 eax:00fea68a
up_registerdump: eip:00000460  cs:00000008 flg:00000082  sp:001208a4 ss:001208a4


--
You received this message because you are subscribed to the Google Groups "NuttX" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nuttx+un...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Gregory Nutt

unread,
Oct 25, 2018, 4:12:19 PM10/25/18
to nu...@googlegroups.com

> Sure, I have done that many times when working on ARM, but I am
> unclear how to extract the PC from the IA32 registers.
> up_registerdump:  ds:00fe0010 irq:0000000d err:000008a4
> up_registerdump: edi:00000064 esi:00000212 ebp:00122160 esp:001205b0
> up_registerdump: ebx:001208a4 edx:00000000 ecx:00000000 eax:00fea68a
> up_registerdump: eip:00000460  cs:00000008 flg:00000082 sp:001208a4
> ss:001208a4

Oh, okay.  I x86 it is not called a PC, but an instruction pointer. You
want EIP.

Brennan Ashton

unread,
Oct 25, 2018, 4:40:19 PM10/25/18
to nu...@googlegroups.com
I looked at that register, but that address does not make sense, it is too low, I would expect everything to be above 0x00100000.   I guess that would explain the fault though because it would try to execute out of invalid memory.  The stack should be plenty large.

When I disable compiler optimizations I don't get the exception.

Brennan Ashton

unread,
Oct 25, 2018, 6:59:25 PM10/25/18
to nu...@googlegroups.com
Still trying to sort this one out, but I did find a cool feature in QEMU that has made this a lot easier.  If you turn on the in_asm debug flag on QEMU you can get a very verbose dump of the assembly that is being run with symbol context.  That lets you look for the call to up_assert and walk back from there.  In this case things had gone off the rails for a while, so the instruction at 0x00000460 was really not very helpful, but with this trace it is very easy to see that it was the function up_block_task that caused the processor to start executing garbage at 0x00000212


----------------
IN: up_block_task
0x00113920:  83 ec 0c                 subl     $0xc, %esp
0x00113923:  57                       pushl    %edi
0x00113924:  e8 a8 05 ff ff           calll    0x103ed1

----------------
IN: up_block_task
0x00113929:  83 c4 10                 addl     $0x10, %esp
0x0011392c:  85 c0                    testl    %eax, %eax
0x0011392e:  75 e1                    jne      0x113911

----------------
IN: up_block_task
0x00113930:  a1 b0 cb 11 00           movl     0x11cbb0, %eax
0x00113935:  83 ec 0c                 subl     $0xc, %esp
0x00113938:  83 c0 78                 addl     $0x78, %eax
0x0011393b:  50                       pushl    %eax
0x0011393c:  e8 c4 05 ff ff           calll    0x103f05

----------------
IN:
0x00000212:  00 f0                    addb     %dh, %al
0x00000214:  53                       pushl    %ebx
0x00000215:  ff 00                    incl     0(%eax)
0x00000217:  f0                       .byte    0xf0
0x00000218:  53                       pushl    %ebx
0x00000219:  ff 00                    incl     0(%eax)
0x0000021b:  f0                       .byte    0xf0
0x0000021c:  53                       pushl    %ebx
0x0000021d:  ff 00                    incl     0(%eax)
0x0000021f:  f0                       .byte    0xf0
0x00000220:  53                       pushl    %ebx
0x00000221:  ff 00                    incl     0(%eax)

<<<<Lots more garbage>>>>>

----------------
IN:
0x00000441:  00 00                    addb     %al, 0(%eax)
0x00000443:  00 00                    addb     %al, 0(%eax)
0x00000445:  00 00                    addb     %al, 0(%eax)
0x00000447:  00 00                    addb     %al, 0(%eax)
0x00000449:  03 50 00                 addl     0(%eax), %edx
0x0000044c:  00 10                    addb     %dl, 0(%eax)
0x0000044e:  00 00                    addb     %al, 0(%eax)
0x00000450:  00 08                    addb     %cl, 0(%eax)
0x00000452:  00 00                    addb     %al, 0(%eax)
0x00000454:  00 00                    addb     %al, 0(%eax)
0x00000456:  00 00                    addb     %al, 0(%eax)
0x00000458:  00 00                    addb     %al, 0(%eax)
0x0000045a:  00 00                    addb     %al, 0(%eax)
0x0000045c:  00 00                    addb     %al, 0(%eax)
0x0000045e:  00 00                    addb     %al, 0(%eax)
0x00000460:  07                       popl     %es

----------------
IN:
0x00103fc4:  fa                       cli     
0x00103fc5:  6a 0d                    pushl    $0xd
0x00103fc7:  e9 1f 01 00 00           jmp      0x1040eb

----------------
IN: isr_common
0x001040eb:  60                       pushal  
0x001040ec:  66 8c d8                 movw     %ds, %ax
0x001040ef:  50                       pushl    %eax
0x001040f0:  66 b8 10 00              movw     $0x10, %ax
0x001040f4:  8e d8                    movl     %eax, %ds

----------------
IN: isr_common
0x001040f6:  8e c0                    movl     %eax, %es

----------------
IN: isr_common
0x001040f8:  8e e0                    movl     %eax, %fs
0x001040fa:  8e e8                    movl     %eax, %gs
0x001040fc:  89 e0                    movl     %esp, %eax
0x001040fe:  50                       pushl    %eax
0x001040ff:  e8 6c 00 00 00           calll    0x104170

----------------
IN: isr_handler
0x00104170:  83 ec 14                 subl     $0x14, %esp
0x00104173:  8b 44 24 18              movl     0x18(%esp), %eax
0x00104177:  a3 84 cf 11 00           movl     %eax, 0x11cf84
0x0010417c:  50                       pushl    %eax
0x0010417d:  8b 40 24                 movl     0x24(%eax), %eax
0x00104180:  50                       pushl    %eax
0x00104181:  e8 da 85 00 00           calll    0x10c760

----------------
IN: irq_unexpected_isr
0x0010c7b0:  83 ec 14                 subl     $0x14, %esp
0x0010c7b3:  9c                       pushfl  
0x0010c7b4:  58                       popl     %eax
0x0010c7b5:  fa                       cli     
0x0010c7b6:  6a 41                    pushl    $0x41
0x0010c7b8:  68 4a a8 11 00           pushl    $0x11a84a
0x0010c7bd:  e8 8e 6f 00 00           calll    0x113750

----------------
IN: up_assert
0x00113750:  57                       pushl    %edi
0x00113751:  56                       pushl    %esi
0x00113752:  53                       pushl    %ebx
0x00113753:  8b 1d b0 cb 11 00        movl     0x11cbb0, %ebx
0x00113759:  e8 12 11 00 00           calll    0x114870

Gregory Nutt

unread,
Oct 25, 2018, 7:05:09 PM10/25/18
to nu...@googlegroups.com

> Still trying to sort this one out, but I did find a cool feature in
> QEMU that has made this a lot easier.  If you turn on the in_asm debug
> flag on QEMU you can get a very verbose dump of the assembly that is
> being run with symbol context. That lets you look for the call to
> up_assert and walk back from there.  In this case things had gone off
> the rails for a while, so the instruction at 0x00000460 was really not
> very helpful, but with this trace it is very easy to see that it was
> the function up_block_task that caused the processor to start
> executing garbage at 0x00000212

This is an error in context switching:  Task A switching to Task B.
Normally this is cause by a memory corruption problem.  The root cause
is probably someplace further back in time.  Something corrupted the
stack or saved context of Task B.  So Task A runs fine until the context
switch occurs, then when it tries to start Task B, it crashes.

That is not very much help.  Sorry. Memory corruption problems are not
fun to debug.  You could, of course, try the usual things like
increasing stack sizes.

Greg

Brennan Ashton

unread,
Oct 25, 2018, 7:12:35 PM10/25/18
to nu...@googlegroups.com
I made the stacks quite large already, so that seems unlikely, unless something else is going on.  I'll keep digging.

--Brennan 

Gregory Nutt

unread,
Oct 25, 2018, 7:21:55 PM10/25/18
to nu...@googlegroups.com
This looks like the same issue:  https://nuttx.yahoogroups.narkive.com/fERVe0Wl/run-qemu-x86-system-assert-1-attachment (no replies)
And maybe this:  https://nuttx.yahoogroups.narkive.com/NO6MUOAP/qemu-i486-nsh-configuration (with fixes)

The fixes were related to toolchain issues, specifically, doing a x86 cross-compile even on an x86 host.

Older, backup information: https://nuttx.yahoogroups.narkive.com/DUF4nLFA/qemu-port-of-nuttx


Brennan Ashton

unread,
Oct 25, 2018, 7:33:50 PM10/25/18
to nu...@googlegroups.com
Yeah, same issue as the first one,  someone also reported it here:

I'm adding in support for stack colorizing/checking to the i486 arch, maybe that will get me somewhere.

Gregory Nutt

unread,
Oct 25, 2018, 8:05:02 PM10/25/18
to nu...@googlegroups.com

On Thu, Oct 25, 2018 at 4:21 PM Gregory Nutt <spud...@gmail.com> wrote:
This looks like the same issue:  https://nuttx.yahoogroups.narkive.com/fERVe0Wl/run-qemu-x86-system-assert-1-attachment (no replies)
And maybe this:  https://nuttx.yahoogroups.narkive.com/NO6MUOAP/qemu-i486-nsh-configuration (with fixes)

The fixes were related to toolchain issues, specifically, doing a x86 cross-compile even on an x86 host.

Older, backup information: https://nuttx.yahoogroups.narkive.com/DUF4nLFA/qemu-port-of-nuttx

Yeah, same issue as the first one,  someone also reported it here:

Based on the comments in the second one, I still suspect the toolchain.  People have reported problems trying to build the i486 code with the host Linux x86 compiler.   Linux compilers, in general, don't work well with NuttX builds.  That second thread recommends using a standard ELF OS=none toolchain.  (You can build with with the NuttX buildroot).


Brennan Ashton

unread,
Oct 26, 2018, 3:32:44 PM10/26/18
to nu...@googlegroups.com
I think most of those issues are when they are not properly targeting a 32bit build.  I have built bare-metal 32bit code with this toolchain without much issue in the past.

I added the stack color code including for the idle thread, and I noticed something odd and I want to make sure I understand what the expected behavior is.

We have idle_stack which should be the low address for the idle stack, CONFIG_IDLETHREAD_STACKSIZE which should define the size, and g_idle_topstack which should store the high address.

For some reason g_idle_topstack does not seem to be in the correct place.

when the stack size is 2048 I see this:

(gdb) p /x &idle_stack
$8 = 0x11d3a0
(gdb) p /x g_idle_topstack
$9 = 0x11e150

I would expect g_idle_top_stack to be  0x11dba0 not 0x11e150

there are a bunch of other  things defined in that region:

0x11db90 <idle_stack+2032>:    0xdeadbeef    0x00100034    0x0011db9c    0x0011db9c
0x11dba0 <g_waitingformqnotempty>:    0x00000000    0x00000000    0x00000000    0x00000000
0x11dbb0 <g_readytorun>:    0x0011c040    0x0011c040    0x00000005    0x00000000
0x11dbc0 <g_waitingformqnotfull+4>:    0x00000000    0x00000000    0x00000000    0x00120320
0x11dbd0 <g_waitingforsignal+4>:    0x00120320    0x00000000    0x00000000    0x00000000
0x11dbe0 <g_pidhash>:    0x0011c040    0x00000000    0x00000000    0x0000ffff
0x11dbf0 <g_pidhash+16>:    0x00120320    0x00000002    0x00000000    0x0000ffff
0x11dc00 <g_pidhash+32>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc10 <g_pidhash+48>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc20 <g_pidhash+64>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc30 <g_pidhash+80>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc40 <g_pidhash+96>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc50 <g_pidhash+112>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc60 <g_pidhash+128>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc70 <g_pidhash+144>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc80 <g_pidhash+160>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dc90 <g_pidhash+176>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dca0 <g_pidhash+192>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dcb0 <g_pidhash+208>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dcc0 <g_pidhash+224>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dcd0 <g_pidhash+240>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dce0 <g_pidhash+256>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dcf0 <g_pidhash+272>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd00 <g_pidhash+288>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd10 <g_pidhash+304>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd20 <g_pidhash+320>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd30 <g_pidhash+336>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd40 <g_pidhash+352>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd50 <g_pidhash+368>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd60 <g_pidhash+384>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd70 <g_pidhash+400>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd80 <g_pidhash+416>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dd90 <g_pidhash+432>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dda0 <g_pidhash+448>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11ddb0 <g_pidhash+464>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11ddc0 <g_pidhash+480>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11ddd0 <g_pidhash+496>:    0x00000000    0x0000ffff    0x00000000    0x0000ffff
0x11dde0 <g_pendingtasks>:    0x00000000    0x00000000    0x00000002    0x0011f9d0
0x11ddf0 <g_delayed_kufree+4>:    0x0011f460    0x0011e3c0    0x0011e4ec    0x0011e160
0x11de00 <g_sigpendingaction+4>:    0x0011e2c8    0x0011e2f0    0x0011e398    0x0011e620
0x11de10 <g_sigpendingirqsignal+4>:    0x0011e690    0x0011e510    0x0011e600    0x0011c140
0x11de20 <g_wdfreelist+4>:    0x0011c500    0x0000001f    0x0011c120    0x0011c120
0x11de30:    0x00000000    0x00000000    0x00000000    0x00000000
0x11de40 <g_mmheap>:    0xffff0001    0x00000000    0x000e1eb0    0x0011e150
0x11de50 <g_mmheap+16>:    0x001ffff8    0x00000000    0x00000000    0x0011de64
0x11de60 <g_mmheap+32>:    0x00000000    0x00000000    0x00000000    0x0011de74
0x11de70 <g_mmheap+48>:    0x0011de54    0x00000000    0x00000000    0x0011de84
0x11de80 <g_mmheap+64>:    0x0011de64    0x00000000    0x00000000    0x0011de94
0x11de90 <g_mmheap+80>:    0x0011de74    0x00000000    0x00000000    0x0011dea4
0x11dea0 <g_mmheap+96>:    0x0011de84    0x00000000    0x00000000    0x0011deb4
0x11deb0 <g_mmheap+112>:    0x0011de94    0x00000000    0x00000000    0x0011dec4
0x11dec0 <g_mmheap+128>:    0x0011dea4    0x00000000    0x00000000    0x0011ded4
0x11ded0 <g_mmheap+144>:    0x0011deb4    0x00000000    0x00000000    0x0011dee4
0x11dee0 <g_mmheap+160>:    0x0011dec4    0x00000000    0x00000000    0x0011def4
0x11def0 <g_mmheap+176>:    0x0011ded4    0x00000000    0x00000000    0x0011d
f04
0x11df00 <g_mmheap+192>:    0x0011dee4    0x00000000    0x00000000    0x0011df14
0x11df10 <g_mmheap+208>:    0x0011def4    0x00000000    0x00000000    0x0011df24
0x11df20 <g_mmheap+224>:    0x0011df04    0x00000000    0x00000000    0x0011df34
0x11df30 <g_mmheap+240>:    0x0011df14    0x00000000    0x00000000    0x0011df44
0x11df40 <g_mmheap+256>:    0x0011df24    0x00000000    0x00000000    0x0012a9d8
0x11df50 <g_mmheap+272>:    0x0011df34    0x00000000    0x00000000    0x0011df64
0x11df60 <g_mmheap+288>:    0x0012a9d8    0x00000000    0x00000000    0x0011df74
0x11df70 <g_mmheap+304>:    0x0011df54    0x00000000    0x00000000    0x00000000
0x11df80 <g_mmheap+320>:    0x0011df64    0x0011d8ac    0x0011e6b0    0x0011eb88
0x11df90 <g_msgfreeirq>:    0x0011ebc0    0x0011ecd8    0x0011ed14    0x0011ee28
0x11dfa0 <g_system_timer>:    0x00000002    0x4d6eda00    0x00000000    0x00000000
0x11dfb0:    0x00000000    0x00000000    0x00000000    0x00000000
0x11dfc0 <g_irqvector>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11dfd0 <g_irqvector+16>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11dfe0 <g_irqvector+32>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11dff0 <g_irqvector+48>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e000 <g_irqvector+64>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e010 <g_irqvector+80>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e020 <g_irqvector+96>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e030 <g_irqvector+112>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e040 <g_irqvector+128>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e050 <g_irqvector+144>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e060 <g_irqvector+160>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e070 <g_irqvector+176>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e080 <g_irqvector+192>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e090 <g_irqvector+208>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e0a0 <g_irqvector+224>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e0b0 <g_irqvector+240>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e0c0 <g_irqvector+256>:    0x00104680    0x00000000    0x0010cc90    0x00000000
0x11e0d0 <g_irqvector+272>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e0e0 <g_irqvector+288>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e0f0 <g_irqvector+304>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e100 <g_irqvector+320>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e110 <g_irqvector+336>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e120 <g_irqvector+352>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e130 <g_irqvector+368>:    0x0010cc90    0x00000000    0x0010cc90    0x00000000
0x11e140 <g_alloctimers>:    0x00000000    0x00000000    0x0011d2c0    0x0011d384
0x11e150:    0x00000008    0x80000000    0x00000190    0x80000008






David Sidrane

unread,
Oct 26, 2018, 4:07:31 PM10/26/18
to nu...@googlegroups.com

Hi Brennan,

 

I think you need to de-reference the pointer and look at the contest of the variable named g_idle_topstack.

 

 

/* HEAP BASE: _sbss is the start of the BSS region (see ld.script) _ebss is

* the end of the BSS region (see ld.script). The heap continues from there

* until the end of memory.

*/

 

            .type    g_idle_topstack, @object

g_idle_topstack: <- variable name

            .long    _ebss <- variable value

            .size     g_idle_topstack, . - g_idle_topstack

            .end

 

 

David

--

Brennan Ashton

unread,
Oct 26, 2018, 4:19:36 PM10/26/18
to nu...@googlegroups.com
On Fri, Oct 26, 2018 at 1:07 PM David Sidrane <david....@gmail.com> wrote:

Hi Brennan,

 

I think you need to de-reference the pointer and look at the contest of the variable named g_idle_topstack.

 

 

/* HEAP BASE: _sbss is the start of the BSS region (see ld.script) _ebss is

* the end of the BSS region (see ld.script). The heap continues from there

* until the end of memory.

*/

 

            .type    g_idle_topstack, @object

g_idle_topstack: <- variable name

            .long    _ebss <- variable value

            .size     g_idle_topstack, . - g_idle_topstack

            .end

 

 

David


 
David,
Unfortunately not, it is `const uint32_t g_idle_topstack`  so it holds the address by value. 

--Brennan

Gregory Nutt

unread,
Oct 26, 2018, 4:40:03 PM10/26/18
to nu...@googlegroups.com
> Unfortunately not, it is `const uint32_t g_idle_topstack`  so it holds the address by value.

In Other architectures, the IDLE stack begins at _ebss and is size CONFIG_IDLETHREAD_STACKSIZE (perhaps aligned down as required).  g_idle_topstack is then a variable that holds the top of the stack (_ebss +   CONFIG_IDLETHREAD_STACKSIZE, perhaps aligned).  But I don't understand what is going on in arch/x86/src/qemu/qemu_head.S.  It looks wrong.

I see:

136 /* The stack for the IDLE task thread is declared in .bss.  NuttX boots and
137  * initializes on the IDLE thread, then at the completion of OS startup, this
138  * thread becomes the thread that executes when there is nothing else to
139  * do in the system (see up_idle()).
140  */
141
142         .type   idle_stack, @object
143         .comm   idle_stack, CONFIG_IDLETHREAD_STACKSIZE, 32  <<<-- Should not lie within .bss
144         .size   idle_stack, CONFIG_IDLETHREAD_STACKSIZE


That puts the IDLE stack withiin .bss (which is not desirable because then the .bss initialization logic has to initialize it to zero).  .bss is delineated by _sbss (start of .bss) and _ebss (end of bss).   So that means that:  _sbss <=  &idle_stack <= _sbss + CONFIG_IDLETHREAD_STACKSIZE .

But then g_idletopstack cannot be correct., g_idle_topstack is set to an addess the IDLE stack at _ebss.

158         .type   g_idle_topstack, @object
159 g_idle_topstack:
160         .long   _ebss <<<-- Not the top of othe stack.
161         .size   g_idle_topstack, . - g_idle_topstack

Then this value is used correctly to allocate the heap in arch/x86/src/common/up_allocateheap.c:

 86   *heap_start = (FAR void*)g_idle_topstack;  <<--- Needs to hold _ebss
 87   *heap_size = CONFIG_RAM_END - g_idle_topstack;

That is correct since g_idle_topstack = _ebss.

But the usage in up_assert.c is not correct.  g_idle_topstack is not the top of the IDLE stack:

141   if (rtcb->pid == 0)
142     {
143       ustackbase = g_idle_topstack - 4;  <<<-- Needs to be &idle_stck + CONFIG_IDLETHREAD_STACKSIZE
144       ustacksize = CONFIG_IDLETHREAD_STACKSIZE;
145     }

So the usage is not consistent.  Right now the only ill effects are chaos and inconsistently and the stack dumped in up_assert() will not be correctly positioned.









--Brennan

Gregory Nutt

unread,
Oct 26, 2018, 4:45:45 PM10/26/18
to nu...@googlegroups.com
Some typo fixes

That puts the IDLE stack withiin .bss (which is not desirable because then the .bss initialization logic has to initialize it to zero).  .bss is delineated by _sbss (start of .bss) and _ebss (end of bss).   So that means that:  _sbss <=  &idle_stack <= _ebss - CONFIG_IDLETHREAD_STACKSIZE .


In other architectures, &idle_stack == _ebss (aligned)


Then this value is used correctly to allocate the heap in arch/x86/src/common/up_allocateheap.c:

 86   *heap_start = (FAR void*)g_idle_topstack;  <<--- Needs to hold _ebss
 87   *heap_size = CONFIG_RAM_END - g_idle_topstack;

In other architectures g_idle_topstack = _ebss + CONFIG_IDLETHREAD_STACKSIZE (aligned)

But the usage in up_assert.c is not correct.  g_idle_topstack is not the top of the IDLE stack:

141   if (rtcb->pid == 0)
142     {
143       ustackbase = g_idle_topstack - 4;  <<<-- Needs to be &idle_stck + CONFIG_IDLETHREAD_STACKSIZE
144       ustacksize = CONFIG_IDLETHREAD_STACKSIZE;
145     }

Normally

  _sdata = start of .data
  _edata = end of .data
  _sbss  = start of .bss
  _ebss  = end of .bss
  &idle_stack = _ebss
  g_idle_topstack = _bss + CONFIG_IDLETHREAD_STACKSIZE
  start-of-heap = g_idle_stopstack
  end-of-heap = CONFIG_RAM_END

That is what we (eventually) need

Greg

Gregory Nutt

unread,
Oct 26, 2018, 5:16:40 PM10/26/18
to nu...@googlegroups.com
Can you try that attached change.  I don't want to committed it untested.

Greg

qemu-topstack.patch

Brennan Ashton

unread,
Oct 26, 2018, 5:43:03 PM10/26/18
to nu...@googlegroups.com
On Fri, Oct 26, 2018 at 2:16 PM Gregory Nutt <spud...@gmail.com> wrote:
Can you try that attached change.  I don't want to committed it untested.

Greg

Tested it and the locations seem to be correct now.  Looks like the idle stack is not getting thrashed, so more digging....

I'll send you the patches for colorizing the stack once I have this working.

--Brennan

Brennan Ashton

unread,
Oct 26, 2018, 7:17:14 PM10/26/18
to nu...@googlegroups.com
I'm starting to think there might be a bug in that context restore.

The user_main calls usleep(500000);

#0  up_block_task (tcb=0x120320, task_state=TSTATE_WAIT_SIG)
    at common/up_blocktask.c:77
#1  0x00115087 in nxsig_timedwait (set=0x12a8a4, info=0x0, timeout=0x12a914)
    at signal/sig_timedwait.c:364
#2  0x00114d32 in nxsig_nanosleep (rqtp=0x12a914, rmtp=0x0)
    at signal/sig_nanosleep.c:138
#3  0x00114e62 in clock_nanosleep (clockid=0 '\000', flags=0, rqtp=0x12a914, rmtp=0x0)
    at signal/sig_nanosleep.c:324
#4  0x001137a7 in usleep (usec=500000) at unistd/lib_usleep.c:124
#5  0x001047c5 in user_main (argc=5, argv=0x12a99c) at ostest_main.c:226
#6  0x001009a6 in task_start () at task/task_start.c:145


And the the context restore gets called for the idle task where nothing looks too out of the ordinary.

171              up_fullcontextrestore(rtcb->xcp.regs);
(gdb) p *rtcb
$7 = {flink = 0x0, blink = 0x0, group = 0x11eed0, pid = 0,
  start = 0x100040 <os_start>, entry = {pthread = 0x100040 <os_start>,
    main = 0x100040 <os_start>}, sched_priority = 0 '\000', init_priority = 0 '\000',
  task_state = 3 '\003', flags = 6, lockcount = 0, waitdog = 0x0, adj_stack_size = 0,
  stack_alloc_ptr = 0x0, adj_stack_ptr = 0x0, waitsem = 0x0, sigprocmask = 0,
  sigwaitmask = 0, sigpendactionq = {head = 0x0, tail = 0x0}, sigpostedq = {
    head = 0x0, tail = 0x0}, sigunbinfo = {si_signo = 0 '\000', si_code = 0 '\000',
    si_errno = 0 '\000', si_value = {sival_int = 0, sival_ptr = 0x0}},
  msgwaitq = 0x0, pthread_data = {0x0, 0x0, 0x0, 0x0}, pterrno = 0, xcp = {
    sigdeliver = 0x0, saved_eip = 0, saved_eflags = 0, regs = {16, 100, 1163448,
      1176672, 0, 1176672, 0, 0, 1, 0, 0, 1065257, 8, 22, 1171552, 16}},
  name = "Idle Task", '\000' <repeats 22 times>}

The stored EIP is from up_unblock_task right after the call to up_saveusercontext

Soon as the iret instruction is issued the GPF is triggered.

I am way out of my depth on x86 at this point, but it seems like this could be cause by an issue with the Segment Descriptor?

--Brennan

Gregory Nutt

unread,
Oct 26, 2018, 7:26:42 PM10/26/18
to nu...@googlegroups.com
> And the the context restore gets called for the idle task where
nothing looks too out of the ordinary.
>
> 171              up_fullcontextrestore(rtcb->xcp.regs);
> (gdb) p *rtcb

This is returning to the IDLE task:

> $7 = {flink = 0x0, blink = 0x0, group = 0x11eed0, pid = 0,
>   start = 0x100040 <os_start>, entry = {pthread = 0x100040 <os_start>,
>     main = 0x100040 <os_start>}, sched_priority = 0 '\000',
init_priority = 0 '\000',
>   task_state = 3 '\003', flags = 6, lockcount = 0, waitdog = 0x0,
adj_stack_size = 0,
>   stack_alloc_ptr = 0x0, adj_stack_ptr = 0x0, waitsem = 0x0,
sigprocmask = 0,
>   sigwaitmask = 0, sigpendactionq = {head = 0x0, tail = 0x0},
sigpostedq = {
>     head = 0x0, tail = 0x0}, sigunbinfo = {si_signo = 0 '\000',
si_code = 0 '\000',
>     si_errno = 0 '\000', si_value = {sival_int = 0, sival_ptr = 0x0}},
>   msgwaitq = 0x0, pthread_data = {0x0, 0x0, 0x0, 0x0}, pterrno = 0,
xcp = {
>     sigdeliver = 0x0, saved_eip = 0, saved_eflags = 0, regs = {16,
100, 1163448,
>       1176672, 0, 1176672, 0, 0, 1, 0, 0, 1065257, 8, 22, 1171552, 16}},
>   name = "Idle Task", '\000' <repeats 22 times>}
>
> The stored EIP is from up_unblock_task right after the call to
up_saveusercontext
>
> Soon as the iret instruction is issued the GPF is triggered.
>
> I am way out of my depth on x86 at this point, but it seems like this
could be cause by an issue with the Segment Descriptor?

I don't have any x86 expertise either.  I The EIP is rtcbg->xcp.regs[11]
= 1065257 = 0x0010:4129

But without taking an x86 crash course, I could not say much more.

patacongo

unread,
Oct 26, 2018, 7:33:30 PM10/26/18
to NuttX
> And the the context restore gets called for the idle task where
nothing looks too out of the ordinary.
 > Soon as the iret instruction is issued the GPF is triggered.


I would probably debug this using GDB like this:

1. 'objdump -d nuttx | vim -' will give me an assembly languate reference.  Using this I would
2. Single step through up_fullcontextrestore() on instruction at a time using the 'gdb) si' instructure.  That will print the address and you can refer to the objdump output to see the instruction.  You can print the register contents.

Certainly you will find the exact instruction that causes the GPF and the exact register contents and the sequence leading up to the crash.

Brennan Ashton

unread,
Oct 26, 2018, 7:46:34 PM10/26/18
to nu...@googlegroups.com
You just listed the exact process that I have been working with.  The crash happens soon as the iret instruction is executed.  I'll dig into the ia32 manual some more and see if I can understand what exactly is causing it.  There are about 20 different cases that can cause this fault some of which a related to the execution of iret.


up_fullcontextrestore () at chip/qemu_fullcontextrestore.S:156
156        movl    (4*REG_ESI)(%eax), %esi
(gdb)
157        movl    (4*REG_EDI)(%eax), %edi
(gdb)
158        movl    (4*REG_EBP)(%eax), %ebp
(gdb)
159        movl    (4*REG_EBX)(%eax), %ebx
(gdb)
160        movl    (4*REG_EDX)(%eax), %edx
(gdb)
161        movl    (4*REG_ECX)(%eax), %ecx
(gdb)
170        mov        (4*REG_DS)(%eax), %ds
(gdb)
174        popl    %eax
(gdb)
up_fullcontextrestore () at chip/qemu_fullcontextrestore.S:175
175        iret
(gdb) i r
eax            0x1    1
ecx            0x0    0
edx            0x0    0
ebx            0x11f460    1176672
esp            0x11de78    0x11de78
ebp            0x11f460    0x11f460
esi            0x11c0b8    1163448
edi            0x64    100
eip            0x1041b6    0x1041b6 <up_fullcontextrestore+65>
eflags         0x46    [ PF ZF ]
cs             0x8    8
ss             0x10    16
ds             0x10    16
es             0x10    16
fs             0x10    16
gs             0x10    16

(gdb) x/16xw (0x11de78-16*4)
0x11de38:    0x0000000a    0x00000037    0x0011f7a4    0x001034a8
0x11de48:    0x0000000a    0x00000037    0x0011f7a4    0x001103f4
0x11de58:    0x0011c120    0x0011d628    0x00120320    0x0011093a
0x11de68:    0x00120320    0x0011d3cc    0x00000000    0x00000001

Brennan Ashton

unread,
Oct 26, 2018, 9:36:57 PM10/26/18
to nu...@googlegroups.com
Looks like I made some more progress,  gdb was not single instruction stepping over the iret instruction properly so I was missing some more instructions before the fault.  Once I added a a breakpoint on the address I expected it to return to based on the ESP register I was able to get a lot more context.

the tdlr; is that something is up with the context save or restore and the stack pointer is not being restored correctly.

I'm going to try and dig more in this weekend.
Here are some notes:

Things are about to break when we are coming back from up_fullcontextrestore

From the QEMU in_asm output.
IN: up_fullcontextrestore
0x001041b5:  58                       popl     %eax
0x001041b6:  cf                       iretl    

----------------
IN: up_unblock_task
0x00104129:  83                       .byte    0x83

We see the jump to: 0x00104129

Lets look at what the code is doing here in Radare2 (helps me add more context to the assembly)

[0x001040d0 1% 220 nuttx.elf]> pd $r @ sym.up_unblock_task                             
/ (fcn) sym.up_unblock_task 111                                                        
|   sym.up_unblock_task (int arg_1ch);                                                 
|           ; var int local_4h @ esp+0x4                                               
|           ; arg int arg_1ch @ esp+0x1c                                               
|           ; XREFS: CALL 0x0010094c  CALL 0x001011f9  CALL 0x001012a7  CALL 0x0010170e
|           ; XREFS: CALL 0x001018ec  CALL 0x00104124  CALL 0x0010d0af  CALL 0x0010d55e
|           ; XREFS: CALL 0x0010f7f6  CODE 0x00114fc5                                  
|           0x001040d0 b    56             push esi                                    
|           0x001040d1      53             push ebx                                    
|           0x001040d2      83ec10         sub esp, 0x10                               
|           0x001040d5      8b5c241c       mov ebx, dword [arg_1ch]       ; [0x1c:4]=-1
|           0x001040d9      8b35b0d31100   mov esi, dword [obj.g_readytorun]       ; [0
|           0x001040df      53             push ebx                                    
|           0x001040e0      e86bb00000     call sym.sched_removeblocked ;[1]           
|           0x001040e5      891c24         mov dword [esp], ebx                        
|           0x001040e8      e8a3ae0000     call sym.sched_addreadytorun ;[2]           
|           0x001040ed      83c410         add esp, 0x10                               
|           0x001040f0      84c0           test al, al                                 
|       ,=< 0x001040f2      7426           je 0x10411a                 ;[3]            
|       |   0x001040f4      83c678         add esi, 0x78               ; 'x'           
|       |   0x001040f7      8b1584d71100   mov edx, dword [obj.g_current_regs]       ;
|       |   0x001040fd      85d2           test edx, edx                               
|      ,==< 0x001040ff      741f           je 0x104120                 ;[4]            
|      ||   0x00104101      83ec0c         sub esp, 0xc                                
|      ||   0x00104104 b    56             push esi                                    
|      ||   0x00104105      e846feffff     call sym.up_savestate       ;[5]            
|      ||   0x0010410a      a1b0d31100     mov eax, dword [obj.g_readytorun]       ; [0
|      ||   0x0010410f      83c078         add eax, 0x78               ; 'x'           
|      ||   0x00104112      a384d71100     mov dword [obj.g_current_regs], eax       ;
|      ||   0x00104117      83c410         add esp, 0x10                               
|      ||   ; CODE XREFS from sym.up_unblock_task (0x1040f2, 0x10412e)                 
|     .-`-> 0x0010411a      58             pop eax                                     
|     :|    0x0010411b      5b             pop ebx                                     
|     :|    0x0010411c      5e             pop esi                                     
|     :|    0x0010411d      c3             ret                                         
      :|    0x0010411e      6690           nop                                         
|     :`--> 0x00104120      83ec0c         sub esp, 0xc                                
|     :     0x00104123      56             push esi                                    
|     :     0x00104124      e818000000     call sym.up_saveusercontext ;[6]            
|     :     0x00104129      83c410         add esp, 0x10                               
|     :     0x0010412c      85c0           test eax, eax                               
|     `===< 0x0010412e      75ea           jne 0x10411a                ;[3]            
|           0x00104130      a1b0d31100     mov eax, dword [obj.g_readytorun]       ; [0


In our case we take the conditional jump:
|     .-`-> 0x0010411a      58             pop eax                                     
|     :|    0x0010411b      5b             pop ebx                                     
|     :|    0x0010411c      5e             pop esi                                     
|     :|    0x0010411d      c3             ret

Looking at the register ESP we determine where we are returning to

(gdb) i r esp
esp            0x11dea0    0x11dea0


That is very unfortunate because that is some place in the stack that we are now going to execute from.

We had already determined that this is the idle task that we are returning to, so we can place a breakpoint in gdb
on up_unblock_task and look for the backtrace for the context that we should be returning to.    Restarting the execution:


Breakpoint 6, up_unblock_task (tcb=0x11f460) at common/up_unblocktask.c:72
72    {
(gdb) bt
#0  up_unblock_task (tcb=0x11f460) at common/up_unblocktask.c:72
#1  0x00100951 in task_activate (tcb=0x11f460) at task/task_activate.c:92
#2  0x0010048c in thread_create (name=0x1152c2 "init", ttype=ttype@entry=0 '\000',
    priority=100, stack_size=2048, entry=0x104b10 <ostest_main>, argv=0x0)
    at task/task_create.c:169
#3  0x00100519 in nxtask_create (name=<optimized out>, priority=<optimized out>,
    stack_size=<optimized out>, entry=0x104b10 <ostest_main>, argv=0x0)
    at task/task_create.c:233
#4  0x00100351 in os_do_appstart () at init/os_bringup.c:266
#5  os_start_application () at init/os_bringup.c:379
#6  os_bringup () at init/os_bringup.c:453
#7  0x001002d2 in os_start () at init/os_start.c:827
#8  0x00100034 in __start () at chip/qemu_head.S:134


So my expectation is that we would be that we would be jumping to something near 0x00100951

going back to our state right before we jump to the random address in the stack, we can see our backtrace is garbage

(gdb) bt
#0  0x0010411d in up_unblock_task (tcb=0x115087 <nxsig_timedwait+183>)
    at common/up_unblocktask.c:122
#1  0x0011df10 in ?? ()
#2  0x00000000 in ?? ()



Lets dump a portion of the idle stack and see whats up:
(gdb) x/512wx &_ebss
<snip>
0x11de00:    0x0011fa23    0x001034a8    0x0000000a    0x00000010
0x11de10:    0x00000031    0x001034a8    0x0000000a    0x0011de30
0x11de20:    0x0011fa23    0x00110e0e    0x0000000a    0x00000000
0x11de30:    0x00000020    0x00110e0e    0x0000000a    0x00000037
0x11de40:    0x0011f7a4    0x001034a8    0x0000000a    0x00000037
0x11de50:    0x0011f7a4    0x001103f4    0x0011c120    0x0011d628
0x11de60:    0x00120320    0x0011093a    0x00120320    0x0011d3cc
0x11de70:    0x00000000    0x00000001    0x00104129    0x00000008
0x11de80:    0x00000016    0x00114241    0x0011c0b8    0x00000006
0x11de90:    0x1dcd6500    0x0011507c    0x00120320    0x00000000
0x11dea0:    0x0011df10    0x00115087    0x00120320    0x00000006
0x11deb0:    0x00120320    0x00115023    0x00000012    0x0011c120
0x11dec0:    0x00000000    0x00000000    0x0000000a    0x0011f7a4
0x11ded0:    0x0011fa20    0x00000000    0x0000000a    0x00000000
0x11dee0:    0x0011dfbc    0x0011df80    0x0011df10    0x00000293
0x11def0:    0x0000000e    0x00114d32    0x0011df10    0x00000000
0x11df00:    0x0011df80    0x00000293    0x0011dfbc    0x00000030
0x11df10:    0x00000000    0x00102645    0x0000000a    0x00000000
0x11df20:    0x0011546e    0x00000000    0x00000000    0x00000000
0x11df30:    0x0012a99c    0x00114e62    0x0011df80    0x00000000
0x11df40:    0x0011546f    0x00112817    0x0011dfbc    0x0000000a
0x11df50:    0x00000005    0x00000000    0x00000000    0x00000000
0x11df60:    0x00000000    0x001137a7    0x00000000    0x00000000
0x11df70:    0x0011df80    0x00000000    0x0011d29c    0x00112e70
0x11df80:    0x00000000    0x1dcd6500    0x00000020    0x0011dfcc
0x11df90:    0x00000000    0x001047c5    0x0007a120    0x00000000
0x11dfa0:    0x0011f460    0x0010443a    0x00000020    0x0011dfcc
0x11dfb0:    0x0011e018    0x0010442a    0x0011fac0    0x0011f460
0x11dfc0:    0x00000000    0x0010438c    0x0011dfcc    0x00000010
0x11dfd0:    0x00000000    0x00000000    0x00000000    0x00000000
0x11dfe0:    0x00000000    0x001009a6    0x00000005    0x0012a99c
0x11dff0:    0x00000020    0x00000000    0x00100970    0x00000008
0x11e000:    0x00000202    0x00103612    0x00120398    0x0011545c
0x11e010:    0x00115470    0x0011f460    0x0011f7a8    0x00000000
0x11e020:    0x00000000    0x001009e0    0x00000000    0x00000000
0x11e030:    0x00000000    0x00000000    0x00000000    0x0011e050
0x11e040:    0x00000000    0x001009ae    0x00000000    0x001202b0
0x11e050:    0x00000020    0x00000000    0x00100970    0x00000008
0x11e060:    0x00000202    0x00104141    0x0011f4d8    0x0011f518
0x11e070:    0x0000001f    0x00000064    0x0011f460    0x00000212
0x11e080:    0x00000000    0x00100951    0x0011f460    0x00000000
0x11e090:    0x00000064    0x00000000    0x00000000    0x00000064
0x11e0a0:    0x00000001    0x0010048c    0x0011f460    0x001152c2
0x11e0b0:    0x00000000    0x00000000    0x00000003    0x0011c528
0x11e0c0:    0x0011e14c    0x001152c2    0x00000006    0x001152a8
0x11e0d0:    0x0011e0e4    0x0011e14c    0x00000000    0x00129000
0x11e0e0:    0x00000000    0x00100351    0x00000800    0x00104b10
0x11e0f0:    0x00000000    0x00104b10    0x00000000    0x001152a8
0x11e100:    0x001152c8    0x0010364b    0x00000001    0x0011e14c
0x11e110:    0x0011e14c    0x001002d2    0x00000002    0x00000006
0x11e120:    0x00000000    0x0010027e    0xdeadbeef    0xdeadbeef
0x11e130:    0x0011e150    0x000e1eb0    0xdeadbeef    0xdeadbeef
0x11e140:    0xdeadbeef    0x00100034    0x0011e14c    0x0011e14c


At 0x11dea0  we see 0x0011df10 which is not correct.  But we expect to be returning to
0x00100951 so we can look that up an it is much earlier in the stack at 0x11e084

With GDB we can assign it.

(gdb) set $sp = 0x11e084

And then we can check the back trace to see if the frames look correct.

(gdb) bt
#0  0x0010411d in up_unblock_task (tcb=0x11f460) at common/up_unblocktask.c:122
#1  0x00100951 in task_activate (tcb=0x11f460) at task/task_activate.c:92
#2  0x0010048c in thread_create (name=0x1152c2 "init", ttype=ttype@entry=0 '\000',
    priority=100, stack_size=2048, entry=0x104b10 <ostest_main>, argv=0x0)
    at task/task_create.c:169
#3  0x00100519 in nxtask_create (name=<optimized out>, priority=<optimized out>,
    stack_size=<optimized out>, entry=0x104b10 <ostest_main>, argv=0x0)
    at task/task_create.c:233
#4  0x00100351 in os_do_appstart () at init/os_bringup.c:266
#5  os_start_application () at init/os_bringup.c:379
#6  os_bringup () at init/os_bringup.c:453
#7  0x001002d2 in os_start () at init/os_start.c:827
#8  0x00100034 in __start () at chip/qemu_head.S:134



Code then starts to execute, and fail some time later which is not that unexpected.
Something is clearly correct in the context save/restore code.

--Brennan

Yang Chung-Fan

unread,
Oct 27, 2018, 4:29:06 AM10/27/18
to NuttX


2018年10月26日金曜日 4時56分32秒 UTC+9 Brennan Ashton:
According to your log, seems like the stack had gone futher than expected in the kernel code.
Are you using the i486 port in the repository? Or did you modified it to suooprt other modes?
If so which one is it? Real-mode, IA-32, IA-32e, Intel64, these mode's exception stack are very different.
If I didn't misunderstood the path taken in up_unblock_task, it was in a IRQ handler traying to do a context switch.

Make sure up_fullcontextrestore, up_saveusercontext, and the IRQ vectors matched up when using the save area.

By the way, I see you are using E-prefix on the regs, which implies you are doing a 32-bit X86 port.
I will totally recommand to work on x64_64 instead of 32bit, due to the simplicity of exception handling and working without segmentation.

Yang.

Gregory Nutt

unread,
Oct 27, 2018, 1:02:54 PM10/27/18
to nu...@googlegroups.com
There are really two different kinds of context switches.  I refer to
them as synchronous and asynchronous context switches (but there might
be better names):

1. An asynchronous context switch occurs when the system is interrupted
and, because of on actions within the interrupt handler, a context
switch is generated.  In this case, the state of the interrupted task is
saved when the interrupt handler is entered but a different task state
is restored when the interrupt handler returns.

2. A synchronous context switch in my terminology occurs when a task
explicitly suspends itself by calling some OS interface, such as
usleep() in your case.  Now there are two ways to implement a
synchronous context switch:

2a. You can implemented the moral equivalent of setjmp()  and longjmp()
on steriods.  up_savecontext() is the moral equivalent of setjmp(); it
saves the current state of the task (and like setjmp returns 0 or 1 to
indicate if the context is being restored. up_fullcontextrestore() is
like longjmp(); it restores the context saved by either the interrupt
handler in (1) or by up_savecontext().

The naming differences save vs. fullrestore is because when
up_savecontext() is called, it does not need to save all of registers. 
Only a subset needs to be saved because the processor ABI provides that
some registers a volatile or caller saved when up_savecontext() is called.

The are a couple of downsides to the approach of (2a):  First they are
tricky to write.  You are probably experiencing some problems in the the
implementation of those functions now (although I don't know why several
other people have reported that the i486 port is working with no
problem).  And second, they have limited usage.  They can be used only
in the FLAT build mode where all tasks are running with the same
privileges.  If you were to try to do up_fullcontextrestore() to get
from an unprivileged task to a privileged task, you would get and
excpetion (could that be what you are seeing?)

2b. In order to the limitations of up_savecontext() and
up_fullcontextrestore(), you have to do something a little different. 
One way is to used a system call (a trap in x86 or an SVCALL in ARM
land).  This generates a software interrupt.  The software interrupt
saves the context on entry (replacing the functionality of
up_savecontext()) and restores the new context on return (replacing the
functionality of up_fullcontextrestore()). The tiny software interrupt
just sets up the context switch.

The ARMv7-M does synchronous contest switches in this way.  You can see
the software interrupt handler here:
https://bitbucket.org/nuttx/nuttx/src/511c90d05013f8027dab4ffde3b3e7eda34c2603/arch/arm/src/armv7-m/up_svcall.c#lines-226

The advantages of approach (2b)  are:  (1) it is trivially easy to
implement.  If interrupt level "asynchronous" context switches work,
then so will these synchronous context switches.  And (2) you an switch
between privileged and unprivileged tasks.  That happens for free when
the interrupt returns.  The downsides are only that it is significantly
slower.  It adds the overhead of interrupt processing and interrupt IRQ
dispatching to each context swith.

I think the 2b is the correct way to go despite its worse performance. 
And I think it might be a simple way to avoid the complexity of the
problem you are dealing with now.  Several modifications would be
required:  (1) Creating of the software handlers for the context
swtich.  Some logic is in place, but this would have to be expanding,
and (2) converting all of the back-to-back calls to up_savecontext() and
up_restorefullcontext() to a single call to up_switchcontext().

My interest in working with qemu and i486 are pretty low (can you even
get i486 chips anymore?).  But I have been tossing around the idea of
porting to a quark for the past few weeks.  Quarks are Pentiums and
would require a little different environment.  I don't know the status
of Quarks, however.  It seems that they are dying out.  I do see the
Quark D2000 which is down at only 32MHz.  Does anyone have any opinion
about that?  Would there be any value to a Quark port?

Greg

Brennan Ashton

unread,
Oct 27, 2018, 1:46:18 PM10/27/18
to nu...@googlegroups.com
My interest in working with qemu and i486 are pretty low (can you even
get i486 chips anymore?).  But I have been tossing around the idea of
porting to a quark for the past few weeks.  Quarks are Pentiums and
would require a little different environment.  I don't know the status
of Quarks, however.  It seems that they are dying out.  I do see the
Quark D2000 which is down at only 32MHz.  Does anyone have any opinion
about that?  Would there be any value to a Quark port?


Greg,
My interest in supporting embedded targets for x86 are fairly low.  My real interest here is to be able to really lower the bar for developing support for USB devices and implementing some network features that I have wanted to work on for a long time.

Last year Intel killed off most of the group around the Quark, including the Curie (used on the Arduino 101), so I don't expect them to be doing much outside of chips targeting phones which will certainly be 64bit.

As for the reports of it working, it sort-of works when optimization is turned off and the stack is large enough, but it looks to me like that is really by chance since it is still executing some garbage.   Some of those reports of it working came with caveats of stuff not working like serial input.

Thanks for the additional information on the context switches, I'll dig back into that on Monday, I think I am close, I just need to look more into the state of things right before leaving the idle thread.

--Brennan

Brennan Ashton

unread,
Oct 27, 2018, 1:54:29 PM10/27/18
to nu...@googlegroups.com
Yang,
For now I figured I would just go back to using the i486 port because it had worked OK at some point and I did not want to step on what you were doing too much.  Also QEMU does not support using multiboot to boot a 64-bit elf even if you make sure to do all the work to move to long mode.  You can remove three lines in QEMU and it will work, and people have asked for this to be removed multiple times, but upstream does not want to leave that exposure, instead saying that a 32bit bootloader like GRUB should be used to boot the 64-bit kernel.

As for the location of the stack, yeah there was an issue with that, which has been resolved.   I did notice that you has a commit at one point referring to a bug with up_fullcontextrestore and you had commented out part of the logic, but it was not clear what the bug was.

I'm happy to move my efforts to 64-bit if you think that would be better, and I can work around the QEMU issues.  I mostly did not want to step on areas that you were trying to work on.

--Brennan

Gregory Nutt

unread,
Oct 27, 2018, 3:13:59 PM10/27/18
to nu...@googlegroups.com

For now I figured I would just go back to using the i486 port because it had worked OK at some point and I did not want to step on what you were doing too much.  ...

Another option is RGMP.  The RGMP project has, I think, been abandoned.  See http://rgmp.sourceforge.net/wiki/index.php/Running_NuttX_on_RGMP

RGMP permitted you to run NuttX on one CPU of a multi-core x86_64 and Linux on another core.

I have removed RGMP from the NuttX main repository, but there is a copy in the Obsoleted repository as described here:  https://bitbucket.org/nuttx/obsoleted/src/345bd40c5fcbd82309e557cc995c8bb233d82876/ChangeLog#lines-171


Gregory Nutt

unread,
Oct 27, 2018, 3:23:12 PM10/27/18
to nu...@googlegroups.com

Last year Intel killed off most of the group around the Quark, including the Curie (used on the Arduino 101), so I don't expect them to be doing much outside of chips targeting phones which will certainly be 64bit.

The D2000 is a very different creature, clearly for a different market.  32MHz vs GHz.  32-bit x86.  I know that the higher end Quarks are basically gone, but these low-end versions seem to be hanging in:  https://bitbucket.org/nuttx/obsoleted/src/345bd40c5fcbd82309e557cc995c8bb233d82876/ChangeLog#lines-171

The dev boards are cheap but I hesitating primarily because the debug environment was not clear to me.


Yang Chung-Fan

unread,
Oct 28, 2018, 3:57:49 AM10/28/18
to NuttX


2018年10月28日日曜日 2時54分29秒 UTC+9 Brennan Ashton:
 Brennan,

I hit the exactly same problem during early stage of development. Jailhouse had a poor debug interface, but Qemu don't simply take any 64bit kernel images.
I continued with Jailhouse anyway, which is my primary concern.

About the commit, I can try to recall the problem if you gave the commit hash.
The primary problem on the save area is that IRQ handler align the stack, but it sometimes use a error code to destry it, causing fxsave, fxrstor and some xmm instructions requiring a aligned stack to GP.
Also, the CS, SS, DS must be cananical upon iret. SS and DS can be zero in long mode for most of the instructions, but will instanly crash upon some special ones, i.e. iretq.

I will be try to convince my advisor on opening the x86_64 part without jailhouse, joining a comunity could benifit everyone.
I can really have someone work with the APIC and IOAPIC part of x86, these things are crazily complex, causing I lost focus from my research too often.

Yang


Yang Chung-Fan

unread,
Oct 28, 2018, 3:59:26 AM10/28/18
to NuttX
Greg,

I am instrested in the RGMP project, any details on it?

It is very similiar to my approach on achieveing a hybird system of GPOS and RTOS.

Yang

2018年10月28日日曜日 4時13分59秒 UTC+9 patacongo:

Gregory Nutt

unread,
Oct 28, 2018, 8:27:25 AM10/28/18
to nu...@googlegroups.com

> I am instrested in the RGMP project, any details on it?
>
> It is very similiar to my approach on achieveing a hybird system of
> GPOS and RTOS.

I am not an expert on RGMP.  I helped with some of the NuttX-side
support but otherwise, I have never even used it.  I would suggest you
contact the author. Qiang Yu (I can send a very old email if you cannot
contact him).  He was a frequent contributor to NuttX when he was a
student, but I have not heard from him in a long time.

There is a sourceforge web page at
http://rgmp.sourceforge.net/wiki/index.php/Main_Page
The sourceforge page is at  https://sourceforge.net/projects/rgmp/ and
the code is there under the Git tab.

Through Qiang Yu, I was also invited to Beijing and I worked with a
related project at Capital Normal University using NuttX with Linux in a
TrustZone architecture on a quad-core i.MX6 (ARM Cortex-A9).

Greg


Reply all
Reply to author
Forward
0 new messages