Tool Internal Crash claims NULL address when address is correct

16 views
Skip to first unread message

Mohammad Ewais

unread,
May 18, 2022, 5:43:42 PM5/18/22
to DynamoRIO Users
Hello,

This is a little bit weird. I am encountering a segmentation fault in a replaced function call in my client.

First of all, I instrument all BBs, this is the last BB before the crash:
Meta - 0x0000000000000000: mov    qword ptr [gs:0x08], rsp
Meta - 0x0000000000000000: mov    qword ptr [gs:0x18], 0x00000000
Meta - 0x0000000000000000: mov    r11, 0x00000fffc01b0330
Meta - 0x0000000000000000: jmp    r11
App  - 0x00007F45023400F0: nop


The address of that BB matches the address of the mmap function in the application, which I override using dr_wrap_replace_native. The function never finishes, and I encounter the crash inside it.

In the BB above, I insert calls to print the RSP right before the jmp instruction, it has the value: 
0x0000110000002E77

I then have this info about the crash in the logs:
computing memory target for 0x00000fffc018b4da causing SIGSEGV, kernel claims it is 0x0000000000000000
opnd_compute_address for: oword ptr [rsp]
        base => 0x0000110000002b9f
        index,scale => 0x0000110000002b9f
        disp => 0x0000110000002b9f

For SIGSEGV at cache pc 0x00000fffc018b4da, computed target write 0x0000000000000000
        faulting instr: movaps oword ptr [rsp], xmm0
add_process_lock: 0 lock 0x00007f46a9437a40: name=report_buf_lock(mutex)@/home/travis/build/DynamoRIO/dynamorio/core/utils.c:2026
rank=88 owner=167333 owning_dc=0x00007f4628e5cd40 contended_event=0xffffffff prev=0x0000000000000000
lock count_times_acquired=       1                              0                               0                              0                               0+2 report_buf_lock(mutex)@/home/travis/build/DynamoRIO/dynamorio/core/utils.c:2026
SYSLOG_CRITICAL: Application /home/mewais/DCSim/Debug/ReplacementMain (167333).  Tool internal crash at PC 0x00000fffc018b4da.  Please report this at your tool's issue tracker.  Program aborted.
Received SIGSEGV at client library pc 0x00000fffc018b4da in thread 167333
Base: 0x00007f46a8e1f000
Registers:eax=0x00000fffc02bdc10 ebx=0x0000000000020022 ecx=0x0000000000000004 edx=0x000000000000002f
        esi=0x00000fffc025a960 edi=0x0000110000002b9f esp=0x0000110000002b9f ebp=0x0000000000000000
        r8 =0x0000110000002e0f r9 =0x0000000000000000 r10=0x000011000000227f r11=0x0000000000000246
        r12=0x0000110000002e1f r13=0x0000110000002bbf r14=0x00000fffc02bdc00 r15=0x0000110000002e0f
        eflags=0x0000000000010206

This is very weird for me to understand. RSP clearly holds a reasonable value of 0x0000110000002b9f, at the same time the instruction movaps oword ptr [rsp], xmm0
segfaults because it computes address 0?! How is this possible?

I was able to track down this instruction in the objdump of my client, here's the snippet:
    7218b4a0:   f3 0f 1e fa             endbr64
    7218b4a4:   41 56                   push   %r14
    7218b4a6:   45 31 c9                xor    %r9d,%r9d
    7218b4a9:   41 55                   push   %r13
    7218b4ab:   41 54                   push   %r12
    7218b4ad:   49 89 fc                mov    %rdi,%r12
    7218b4b0:   55                      push   %rbp
    7218b4b1:   53                      push   %rbx
    7218b4b2:   48 81 ec 20 02 00 00    sub    $0x220,%rsp
    7218b4b9:   4c 8b 35 d8 5c 13 00    mov    0x135cd8(%rip),%r14        # 722c1198 <_ZTVN3fmt2v819basic_memory_bufferIcLm500ESaIcEEE@@Base+0x3598>
    7218b4c0:   4c 8d 6c 24 20          lea    0x20(%rsp),%r13
    7218b4c5:   48 89 e7                mov    %rsp,%rdi
    7218b4c8:   49 8d 46 10             lea    0x10(%r14),%rax
    7218b4cc:   66 49 0f 6e cd          movq   %r13,%xmm1
    7218b4d1:   66 48 0f 6e c0          movq   %rax,%xmm0
    7218b4d6:   66 0f 6c c1             punpcklqdq %xmm1,%xmm0
   
7218b4da:   0f 29 04 24             movaps %xmm0,(%rsp)

Obviously the RSP was also being used before the offending instruction, no issues with its values or permissions. So why does this specific instruction compute the address as 0 when it clearly has a valid value?

EXTRAS:
This stack is actually created by my client using dr_raw_mem_alloc
 because I want it to be specifically at the range 0x110000001000-0x110000002FFF. I also made sure to give it DR_MEMPROT_READ | DR_MEMPROT_WRITE permissions. I don't think this is the cause of the issue since the application and the client were both working fine for a while before that mmap call. It also shows that the RSP at the point of failure is still very much within the stack boundaries.

Derek Bruening

unread,
May 18, 2022, 6:57:11 PM5/18/22
to Mohammad Ewais, DynamoRIO Users
movaps requires an aligned address which 0x0000110000002b9f is not: so that looks like the source of the fault.
Probably the kernel does not set si_addr for an alignment fault or something like that: I don't remember hitting that before but such kernel limitations on signal info are not uncommon.
With the NULL si_addr, DR's first opnd check fails; it goes on to look for an unwritable page but that also fails; so it gives up and sets it to NULL.
 

I was able to track down this instruction in the objdump of my client, here's the snippet:
    7218b4a0:   f3 0f 1e fa             endbr64
    7218b4a4:   41 56                   push   %r14
    7218b4a6:   45 31 c9                xor    %r9d,%r9d
    7218b4a9:   41 55                   push   %r13
    7218b4ab:   41 54                   push   %r12
    7218b4ad:   49 89 fc                mov    %rdi,%r12
    7218b4b0:   55                      push   %rbp
    7218b4b1:   53                      push   %rbx
    7218b4b2:   48 81 ec 20 02 00 00    sub    $0x220,%rsp
    7218b4b9:   4c 8b 35 d8 5c 13 00    mov    0x135cd8(%rip),%r14        # 722c1198 <_ZTVN3fmt2v819basic_memory_bufferIcLm500ESaIcEEE@@Base+0x3598>
    7218b4c0:   4c 8d 6c 24 20          lea    0x20(%rsp),%r13
    7218b4c5:   48 89 e7                mov    %rsp,%rdi
    7218b4c8:   49 8d 46 10             lea    0x10(%r14),%rax
    7218b4cc:   66 49 0f 6e cd          movq   %r13,%xmm1
    7218b4d1:   66 48 0f 6e c0          movq   %rax,%xmm0
    7218b4d6:   66 0f 6c c1             punpcklqdq %xmm1,%xmm0
   
7218b4da:   0f 29 04 24             movaps %xmm0,(%rsp)

Obviously the RSP was also being used before the offending instruction, no issues with its values or permissions. So why does this specific instruction compute the address as 0 when it clearly has a valid value?

EXTRAS:
This stack is actually created by my client using dr_raw_mem_alloc
 because I want it to be specifically at the range 0x110000001000-0x110000002FFF. I also made sure to give it DR_MEMPROT_READ | DR_MEMPROT_WRITE permissions. I don't think this is the cause of the issue since the application and the client were both working fine for a while before that mmap call. It also shows that the RSP at the point of failure is still very much within the stack boundaries.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/be3d62b8-4774-420b-9266-666f7193f055n%40googlegroups.com.

Mohammad Ewais

unread,
May 19, 2022, 2:09:01 PM5/19/22
to DynamoRIO Users
I see, thanks a lot for the explanation.

This made me realize that I was wrongly supplying the SP to the application as 0x110000001000 + 8191 rather than + 8192. Issue solved.

Reply all
Reply to author
Forward
0 new messages