Looks like I made some more progress, gdb was not single instruction stepping over the iret instruction properly so I was missing some more instructions before the fault. Once I added a a breakpoint on the address I expected it to return to based on the ESP register I was able to get a lot more context.
the tdlr; is that something is up with the context save or restore and the stack pointer is not being restored correctly.
I'm going to try and dig more in this weekend.
Here are some notes:
Things are about to break when we are coming back from up_fullcontextrestore
From the QEMU in_asm output.
IN: up_fullcontextrestore
0x001041b5: 58 popl %eax
0x001041b6: cf iretl
----------------
IN: up_unblock_task
0x00104129: 83 .byte 0x83
We see the jump to: 0x00104129
Lets look at what the code is doing here in Radare2 (helps me add more context to the assembly)
[0x001040d0 1% 220 nuttx.elf]> pd $r @ sym.up_unblock_task
/ (fcn) sym.up_unblock_task 111
| sym.up_unblock_task (int arg_1ch);
| ; var int local_4h @ esp+0x4
| ; arg int arg_1ch @ esp+0x1c
| ; XREFS: CALL 0x0010094c CALL 0x001011f9 CALL 0x001012a7 CALL 0x0010170e
| ; XREFS: CALL 0x001018ec CALL 0x00104124 CALL 0x0010d0af CALL 0x0010d55e
| ; XREFS: CALL 0x0010f7f6 CODE 0x00114fc5
| 0x001040d0 b 56 push esi
| 0x001040d1 53 push ebx
| 0x001040d2 83ec10 sub esp, 0x10
| 0x001040d5 8b5c241c mov ebx, dword [arg_1ch] ; [0x1c:4]=-1
| 0x001040d9 8b35b0d31100 mov esi, dword [obj.g_readytorun] ; [0
| 0x001040df 53 push ebx
| 0x001040e0 e86bb00000 call sym.sched_removeblocked ;[1]
| 0x001040e5 891c24 mov dword [esp], ebx
| 0x001040e8 e8a3ae0000 call sym.sched_addreadytorun ;[2]
| 0x001040ed 83c410 add esp, 0x10
| 0x001040f0 84c0 test al, al
| ,=< 0x001040f2 7426 je 0x10411a ;[3]
| | 0x001040f4 83c678 add esi, 0x78 ; 'x'
| | 0x001040f7 8b1584d71100 mov edx, dword [obj.g_current_regs] ;
| | 0x001040fd 85d2 test edx, edx
| ,==< 0x001040ff 741f je 0x104120 ;[4]
| || 0x00104101 83ec0c sub esp, 0xc
| || 0x00104104 b 56 push esi
| || 0x00104105 e846feffff call sym.up_savestate ;[5]
| || 0x0010410a a1b0d31100 mov eax, dword [obj.g_readytorun] ; [0
| || 0x0010410f 83c078 add eax, 0x78 ; 'x'
| || 0x00104112 a384d71100 mov dword [obj.g_current_regs], eax ;
| || 0x00104117 83c410 add esp, 0x10
| || ; CODE XREFS from sym.up_unblock_task (0x1040f2, 0x10412e)
| .-`-> 0x0010411a 58 pop eax
| :| 0x0010411b 5b pop ebx
| :| 0x0010411c 5e pop esi
| :| 0x0010411d c3 ret
:| 0x0010411e 6690 nop
| :`--> 0x00104120 83ec0c sub esp, 0xc
| : 0x00104123 56 push esi
| : 0x00104124 e818000000 call sym.up_saveusercontext ;[6]
| : 0x00104129 83c410 add esp, 0x10
| : 0x0010412c 85c0 test eax, eax
| `===< 0x0010412e 75ea jne 0x10411a ;[3]
| 0x00104130 a1b0d31100 mov eax, dword [obj.g_readytorun] ; [0
In our case we take the conditional jump:
| .-`-> 0x0010411a 58 pop eax
| :| 0x0010411b 5b pop ebx
| :| 0x0010411c 5e pop esi
| :| 0x0010411d c3 ret
Looking at the register ESP we determine where we are returning to
(gdb) i r esp
esp 0x11dea0 0x11dea0
That is very unfortunate because that is some place in the stack that we are now going to execute from.
We had already determined that this is the idle task that we are returning to, so we can place a breakpoint in gdb
on up_unblock_task and look for the backtrace for the context that we should be returning to. Restarting the execution:
Breakpoint 6, up_unblock_task (tcb=0x11f460) at common/up_unblocktask.c:72
72 {
(gdb) bt
#0 up_unblock_task (tcb=0x11f460) at common/up_unblocktask.c:72
#1 0x00100951 in task_activate (tcb=0x11f460) at task/task_activate.c:92
#2 0x0010048c in thread_create (name=0x1152c2 "init", ttype=ttype@entry=0 '\000',
priority=100, stack_size=2048, entry=0x104b10 <ostest_main>, argv=0x0)
at task/task_create.c:169
#3 0x00100519 in nxtask_create (name=<optimized out>, priority=<optimized out>,
stack_size=<optimized out>, entry=0x104b10 <ostest_main>, argv=0x0)
at task/task_create.c:233
#4 0x00100351 in os_do_appstart () at init/os_bringup.c:266
#5 os_start_application () at init/os_bringup.c:379
#6 os_bringup () at init/os_bringup.c:453
#7 0x001002d2 in os_start () at init/os_start.c:827
#8 0x00100034 in __start () at chip/qemu_head.S:134
So my expectation is that we would be that we would be jumping to something near 0x00100951
going back to our state right before we jump to the random address in the stack, we can see our backtrace is garbage
(gdb) bt
#0 0x0010411d in up_unblock_task (tcb=0x115087 <nxsig_timedwait+183>)
at common/up_unblocktask.c:122
#1 0x0011df10 in ?? ()
#2 0x00000000 in ?? ()
Lets dump a portion of the idle stack and see whats up:
(gdb) x/512wx &_ebss
<snip>
0x11de00: 0x0011fa23 0x001034a8 0x0000000a 0x00000010
0x11de10: 0x00000031 0x001034a8 0x0000000a 0x0011de30
0x11de20: 0x0011fa23 0x00110e0e 0x0000000a 0x00000000
0x11de30: 0x00000020 0x00110e0e 0x0000000a 0x00000037
0x11de40: 0x0011f7a4 0x001034a8 0x0000000a 0x00000037
0x11de50: 0x0011f7a4 0x001103f4 0x0011c120 0x0011d628
0x11de60: 0x00120320 0x0011093a 0x00120320 0x0011d3cc
0x11de70: 0x00000000 0x00000001 0x00104129 0x00000008
0x11de80: 0x00000016 0x00114241 0x0011c0b8 0x00000006
0x11de90: 0x1dcd6500 0x0011507c 0x00120320 0x00000000
0x11dea0: 0x0011df10 0x00115087 0x00120320 0x00000006
0x11deb0: 0x00120320 0x00115023 0x00000012 0x0011c120
0x11dec0: 0x00000000 0x00000000 0x0000000a 0x0011f7a4
0x11ded0: 0x0011fa20 0x00000000 0x0000000a 0x00000000
0x11dee0: 0x0011dfbc 0x0011df80 0x0011df10 0x00000293
0x11def0: 0x0000000e 0x00114d32 0x0011df10 0x00000000
0x11df00: 0x0011df80 0x00000293 0x0011dfbc 0x00000030
0x11df10: 0x00000000 0x00102645 0x0000000a 0x00000000
0x11df20: 0x0011546e 0x00000000 0x00000000 0x00000000
0x11df30: 0x0012a99c 0x00114e62 0x0011df80 0x00000000
0x11df40: 0x0011546f 0x00112817 0x0011dfbc 0x0000000a
0x11df50: 0x00000005 0x00000000 0x00000000 0x00000000
0x11df60: 0x00000000 0x001137a7 0x00000000 0x00000000
0x11df70: 0x0011df80 0x00000000 0x0011d29c 0x00112e70
0x11df80: 0x00000000 0x1dcd6500 0x00000020 0x0011dfcc
0x11df90: 0x00000000 0x001047c5 0x0007a120 0x00000000
0x11dfa0: 0x0011f460 0x0010443a 0x00000020 0x0011dfcc
0x11dfb0: 0x0011e018 0x0010442a 0x0011fac0 0x0011f460
0x11dfc0: 0x00000000 0x0010438c 0x0011dfcc 0x00000010
0x11dfd0: 0x00000000 0x00000000 0x00000000 0x00000000
0x11dfe0: 0x00000000 0x001009a6 0x00000005 0x0012a99c
0x11dff0: 0x00000020 0x00000000 0x00100970 0x00000008
0x11e000: 0x00000202 0x00103612 0x00120398 0x0011545c
0x11e010: 0x00115470 0x0011f460 0x0011f7a8 0x00000000
0x11e020: 0x00000000 0x001009e0 0x00000000 0x00000000
0x11e030: 0x00000000 0x00000000 0x00000000 0x0011e050
0x11e040: 0x00000000 0x001009ae 0x00000000 0x001202b0
0x11e050: 0x00000020 0x00000000 0x00100970 0x00000008
0x11e060: 0x00000202 0x00104141 0x0011f4d8 0x0011f518
0x11e070: 0x0000001f 0x00000064 0x0011f460 0x00000212
0x11e080: 0x00000000 0x00100951 0x0011f460 0x00000000
0x11e090: 0x00000064 0x00000000 0x00000000 0x00000064
0x11e0a0: 0x00000001 0x0010048c 0x0011f460 0x001152c2
0x11e0b0: 0x00000000 0x00000000 0x00000003 0x0011c528
0x11e0c0: 0x0011e14c 0x001152c2 0x00000006 0x001152a8
0x11e0d0: 0x0011e0e4 0x0011e14c 0x00000000 0x00129000
0x11e0e0: 0x00000000 0x00100351 0x00000800 0x00104b10
0x11e0f0: 0x00000000 0x00104b10 0x00000000 0x001152a8
0x11e100: 0x001152c8 0x0010364b 0x00000001 0x0011e14c
0x11e110: 0x0011e14c 0x001002d2 0x00000002 0x00000006
0x11e120: 0x00000000 0x0010027e 0xdeadbeef 0xdeadbeef
0x11e130: 0x0011e150 0x000e1eb0 0xdeadbeef 0xdeadbeef
0x11e140: 0xdeadbeef 0x00100034 0x0011e14c 0x0011e14c
At 0x11dea0 we see 0x0011df10 which is not correct. But we expect to be returning to
0x00100951 so we can look that up an it is much earlier in the stack at 0x11e084
With GDB we can assign it.
(gdb) set $sp = 0x11e084
And then we can check the back trace to see if the frames look correct.
(gdb) bt
#0 0x0010411d in up_unblock_task (tcb=0x11f460) at common/up_unblocktask.c:122
#1 0x00100951 in task_activate (tcb=0x11f460) at task/task_activate.c:92
#2 0x0010048c in thread_create (name=0x1152c2 "init", ttype=ttype@entry=0 '\000',
priority=100, stack_size=2048, entry=0x104b10 <ostest_main>, argv=0x0)
at task/task_create.c:169
#3 0x00100519 in nxtask_create (name=<optimized out>, priority=<optimized out>,
stack_size=<optimized out>, entry=0x104b10 <ostest_main>, argv=0x0)
at task/task_create.c:233
#4 0x00100351 in os_do_appstart () at init/os_bringup.c:266
#5 os_start_application () at init/os_bringup.c:379
#6 os_bringup () at init/os_bringup.c:453
#7 0x001002d2 in os_start () at init/os_start.c:827
#8 0x00100034 in __start () at chip/qemu_head.S:134
Code then starts to execute, and fail some time later which is not that unexpected.
Something is clearly correct in the context save/restore code.
--Brennan