Internal Error

57 views
Skip to first unread message

Mohammad Ewais

unread,
Dec 16, 2021, 1:29:58 AM12/16/21
to DynamoRIO Users
Hi,

I am encountering a weird error that I am not sure about the meaning of. 

<Application /path/to/app (15997).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/memcache.c:248 ALIGNED(start, PAGE_SIZE)
(Error occurred @3381 frags)
version 8.0.0, build 1
-no_dynamic_options -disasm_mask 1 -loglevel 4 -client_lib '/path/to/libSNEClient.so;0;"--node-name" "Node/TheTestNode-0" "--process-name" "ProgramName" "--process-index" "0" "--process-host-ranks" "1" "--process-host-names" "mpi-slave1" "--process-host-threads" "8" "--process-rank" "1" "--process-master
0x00007fff33fc4770 0x00000000710dc095
0x00007fff33fc49c0 0x00000000712f28bb
0x00007fff33fc4a50 0x00000000712f2eca
0x00007fff33fc4a80 0x00000000712c7508
0x00007fff33fc4d50 0x00000000710d10e3
0x00007fff33fc4e50 0x00000000710c8805
0x00007fff33fc4f10 0x00000000710c3f5f
0x00007fff33fc4ff0 0x00007fffb41ccf0d
0x00007fffffffeb00 0x0000000000000000
/path/to/libSNEClient.so=0x00007fffb3f38000
/lib/x86_64-linux-gnu/libstdc++.so.6=0x00007ffff773d000
/lib/x86_64-linux-gnu/libgcc_s.so.1=0x00007ffff7f70000
/lib/x86_64-linux-gnu/libm.so.6=0x00007ffff7def000
/psth/to/DynamoRIO/ext/lib64/debug/libdrsyms.so=0x00007fffb4c4b000
/lib/x86_64-linux-gnu/libc.so.6=0x00007ffff7b6c000
/usr/lib64/ld-linux-x86-64.so.2=0x00007ffff7f8c000
/path/to/DynamoRIO/ext/lib64/debug/libdrx.so=0x00>
<rank order violation shared_cache_lock(mutex)@/home/travis/build/DynamoRIO/dynamorio/core/fcache.c:1590 acquired after all_memory_areas(readwrite)@/home/travis/build/DynamoRIO/dynamorio/core/unix/memcache.c:101 in tid:3e7d>
<rank order violation shared_cache_lock(mutex)@/home/travis/build/DynamoRIO/dynamorio/core/fcache.c:1590 acquired after all_memory_areas(readwrite)@/home/travis/build/DynamoRIO/dynamorio/core/unix/memcache.c:101 in tid:3e7d>


Judging by the last two lines, I suspected it may be an issue with my mutexes, but I've made sure that every thing that's locked has been unlocked.

As for the first line, I don't really have any clue what it means.

Also, I instrument BB and heavily modify memory operations, but the BB where this happens has no memory operations and so is never touched by me. Here is the BB:
0x00007FFFF7A921D0 : nop    edx
0x00007FFFF7A921D4 : mov    eax, 0x0000000a
0x00007FFFF7A921D9 : syscall


Any useful tips or directions for me? What does this error pertain to? What aspect of my client could this be related to?

Thanks.

assad.hashm...@gmail.com

unread,
Dec 16, 2021, 9:44:08 AM12/16/21
to DynamoRIO Users
> Also, I instrument BB and heavily modify memory operations, but the BB where this happens has no memory operations and so is never touched by me. Here is the BB:
> 0x00007FFFF7A921D0 : nop    edx
> 0x00007FFFF7A921D4 : mov    eax, 0x0000000a
> 0x00007FFFF7A921D9 : syscall

The BB you highlight looks like it is calling mprotect() (eax = 10).
The assert in memcache.c (ASSERT(ALIGNED(start, PAGE_SIZE)) is in the memcache_update() function which is called by various functions including set_protection() and others when DynamoRIO handles such syscalls on behalf of the application and client.

When you say "I instrument BB and heavily modify memory operations", can you give us more details of what you are modifying? Especially anything which may end up causing Linux to alter memory sizes and/or permissions.

It may be worth narrowing down the cause of the ASSERT by finding out which part of the client triggers it, e.g. inserting dr_printf()s around suspect code.

Does the ASSERT appear if you just execute with drrun, i.e. no client? If so, it's the application which is doing something unusual or something DynamoRIO does not expect, rather than anything in the client. If so it's worth finding out which part of the application is causing the ASSERT to narrow it down.

Mohammad Ewais

unread,
Dec 16, 2021, 10:35:51 AM12/16/21
to DynamoRIO Users
Hi Assad,

Thanks a lot for your help and response. I know the following:
1. This error does not come up with other clients that do not alter the application. So this is definitely my doing, I was just trying to find out which part of my doing :)
2. This error and that BB both happen inside a `pthread_create` call, I know because I wrapped it and printed before and after it
3. I heavily modify the application's memory in the following way(s), for the purpose of "splitting" one application across multiple machines:
    i. Using drwrap_replace, I override all calls to `malloc`, `mmap`, and any other memory allocation/freeing function. I use my own fake allocators which give out fake addresses.
    ii. Using drwrap_wrap, I wrap all calls to `pthread_create` and the like, so I can decide whether I'd send them to another machine or let them through.
    iii. Using BB instrumentation, I modify every memory access so I can translate the fake addresses from before to my real memory address. So far it has been successful, but it breaks inside pthread_create as I mentioned earlier.
4. Inside `pthread_create` I get one of these calls to `mmap` for memory allocation, which I also override.

Here's my theory, based on all the above plus your input:
Inside the `pthread_create` call, I got a call to `mmap`, which I have of course overridden. If the syscall (or DR, after taking over the syscall) is trying to change the memory protection for the fake memory address returned by my mmap, it will definitely fail. If that's the case, then I also need to drwrap_replace whatever function that calls this syscall (mprotect?).

assad.hashm...@gmail.com

unread,
Dec 16, 2021, 11:20:34 AM12/16/21
to DynamoRIO Users
> 3. I heavily modify the application's memory in the following way(s), ...
. . .

> If the syscall (or DR, after taking over the syscall) is trying to change the memory protection for the fake memory address returned by my mmap, it will definitely fail. If that's the case, then I also need to drwrap_replace whatever function that calls this syscall (mprotect?).

Yes, it sounds like your client is essentially another layer in terms of memory handling!
So you should handle ALL memory syscalls consistently, including mprotect().
Does the ASSERT happen if you run single threaded? That may be another clue.

Mohammad Ewais

unread,
Dec 16, 2021, 11:40:52 AM12/16/21
to DynamoRIO Users
No, this is the first `pthread_create` call I encounter, before it I don't have any errors.

I just tried drwrap_replacing the mprotect function from the libstdc module, and as far as I can tell, it got successfully replaced (first line of the following output is my print). I get the following error though, which is also a bit unexplainable for me (I can't follow the inner workings and functions of these source files):

[12/16/21 04:31:27 PM] [SNE-Node/TheTestNode-0-mpi-slave1-ProgramName] [debug   ] [REPLACEMENT] Replacement mprotect called on address 0x8000fe012c00:0x8000fe812c00 with 3
<Application /home/mewais/DCSim/Debug/Test2 (16905).  Internal Error: DynamoRIO debug check failure: /home/travis/build/DynamoRIO/dynamorio/core/unix/signal.c:5287 syscall_signal || safe_is_in_fcache(dcontext, pc, (byte *)sc->SC_XSP)

(Error occurred @3381 frags)
version 8.0.0, build 1
-no_dynamic_options -disasm_mask 1 -loglevel 4 -client_lib '/home/mewais/DCSim/Debug//libSNEClient.so;0;"--node-name" "Node/TheTestNode-0" "--process-name" "ProgramName" "--process-index" "0" "--process-host-ranks" "1" "--process-host-names" "mpi-slave1" "--process-host-threads" "8" "--process-rank" "1" "--process-master
0x00007fff33fe0690 0x00000000710dc095
0x00007fff33fe08e0 0x00000000712dfa98
0x00007fff33fe0ab0 0x000000007129f1d3
0x00007fffffffeb00 0x0000000000000000
/home/mewais/DCSim/Debug//libSNEClient.so=0x00007fffb3f38000
/lib/x86_64-linux-gnu/libstdc++.so.6=0x00007ffff773d000
/lib/x86_64-linux-gnu/libgcc_s.so.1=0x00007ffff7f70000
/lib/x86_64-linux-gnu/libm.so.6=0x00007ffff7def000
/home/mewais/DynamoRIO/ext/lib64/debug/libdrsyms.so=0x00007fffb4c4a000
/lib/x86_64-linux-gnu/libc.so.6=0x00007ffff7b6c000
/usr/lib64/ld-linux-x86-64.so.2=0x00007ffff7f8c000
/home/mewais/DynamoRIO/ext/lib64/debug/libdrx.so=0x00>


Mohammad Ewais

unread,
Dec 16, 2021, 1:30:29 PM12/16/21
to DynamoRIO Users
I just realized this is coming from a signal handler, and there's probably a SIGSEGV somewhere, I will debug more.

Thanks for the help.

Derek Bruening

unread,
Dec 16, 2021, 1:48:54 PM12/16/21
to dynamor...@googlegroups.com
I would suggest getting a symbolized callstack for any assert (or crash) which helps us to understand what is happening: e.g. having the symbolized callstack for the original memcache.c ALIGNED assert would help.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/b4687387-2009-419e-b559-6b0037048652n%40googlegroups.com.

Mohammad Ewais

unread,
Dec 16, 2021, 4:29:52 PM12/16/21
to DynamoRIO Users
Sorry, can you just point me to how to do that? This stacktrace is dumped by DR automatically. last time I tried printing the stacktrace myself it didn't work.

Derek Bruening

unread,
Dec 16, 2021, 5:07:24 PM12/16/21
to dynamor...@googlegroups.com
Easiest thing is to attach gdb and load the symbols: https://dynamorio.org/page_debugging.html   (You could pass the auto-printed addresses to addr2line or sthg: we used to have a script to automate that but I don't think it is still maintained and it was focused on Windows.)

Mohammad Ewais

unread,
Dec 16, 2021, 5:11:37 PM12/16/21
to DynamoRIO Users
I'll try the second or a variant of it. Unfortunately my application runs over MPI, which makes GDB really tedious to use.
Reply all
Reply to author
Forward
0 new messages