There have been previous questions about using DR with MPI, mostly focusing on instrumenting an MPI application. I have a different question.
rather than the target app, I would like my DR client itself to be MPI capable! I want to be able to send and receive MPI messages inside the client. I tried the Naive approach of simply throwing MPI_Init inside my dr_client_main, obviously it did not work or I would not be asking here :)
<Initial options = -no_dynamic_options -client_lib '/home/mewais/SNE/Build/libSNEClient.so;0;"--so-many-args"' -code_api -stack_size 56K -signal_stack_size 32K -max_elide_jmp 0 -max_elide_call 0 -early_inject -emulate_brk -no_inline_ignored_syscalls -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<CURIOSITY : privload_recurse_cnt < 20 in file /home/travis/build/DynamoRIO/dynamorio/core/loader_shared.c line 652
version 8.0.0, build 1
0x00007ffee900fd90 0x00000000711d861a
0x00007ffee900fe00 0x00000000712ecbc9
0x00007ffee9010030 0x00000000712ec5b3
0x00007ffee9010090 0x00000000711d8fe8
0x00007ffee90100b0 0x00000000711d889f
0x00007ffee9010110 0x00000000712ecbc9
0x00007ffee9010340 0x00000000712ec5b3
0x00007ffee90103a0 0x00000000711d8fe8
0x00007ffee90103c0 0x00000000711d7169
0x00007ffee90105f0 0x00000000711d7346
0x00007ffee9010830 0x0000000071052844
0x00007ffee9011060 0x00000000712efc4d
0x00007ffee9012110 0x000000007129f1bd
/home/mewais/SNE/Build/libSNEClient.so=0x00007f5d4ee2b000
/home/mewais/DynamoRIO/ext/lib64/debug/libdrwrap.so=0x00007f5d4f273000
/lib/x86_64-linux-gnu/libmpi_cxx.so.40=0x00007f5d93087000
/lib/x86_64-linux-gnu/libstdc++.so.6=0x00007f5d924dd000
/lib/x86_64-linux-gnu/libgcc_s.so.1=0x00007f5d928f0000
/opt/lib/libmpi.so.40=0x00007f5d92f5d000
/opt/lib/libopen-rte.so.40=0x00007f5d92df2000
/lib/x86_64-linux-gnu/libz.so.1=0x00007f5d92ef50>
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/home/mewais/SNE/Build/libSNEClient.so' 0x00007f5d4f0b7b50
add-symbol-file '/home/mewais/DynamoRIO/lib64/debug/libdynamorio.so' 0x0000000071040fe0
add-symbol-file '/lib/x86_64-linux-gnu/libmpi_cxx.so.40' 0x00007f5d93097660
add-symbol-file '/opt/lib/libmpi.so.40' 0x00007f5d92f893f0
add-symbol-file '/opt/lib/libopen-rte.so.40' 0x00007f5d92e0d6e0
add-symbol-file '/opt/lib/libopen-pal.so.40' 0x00007f5d92d5db50
add-symbol-file '/lib/x86_64-linux-gnu/libdl.so.2' 0x00007f5d92f57220
add-symbol-file '/lib/x86_64-linux-gnu/libc.so.6' 0x00007f5d9298e650
add-symbol-file '/usr/lib64/ld-linux-x86-64.so.2' 0x00007f5d92f1f090
add-symbol-file '/lib/x86_64-linux-gnu/libutil.so.1' 0x00007f5d92f193e0
add-symbol-file '/lib/x86_64-linux-gnu/libhwloc.so.15' 0x00007f5d92ce6d10
add-symbol-file '/lib/x86_64-linux-gnu/libm.so.6' 0x00007f5d92ba03c0
add-symbol-file '/lib/x86_64-linux-gnu/libudev.so.1' 0x00007f5d92b6be50
add-symbol-file '/lib/x86_64-linux-gnu/libpthread.so.0' 0x00007f5d9294ca60
add-symbol-file '/lib/x86_64-linux-gnu/libevent_core-2.1.so.7' 0x00007f5d92914b00
add-symbol-file '/lib/x86_64-linux-gnu/libevent_pthreads-2.1.so.7' 0x00007f5d92f132e0
add-symbol-file '/lib/x86_64-linux-gnu/libz.so.1' 0x00007f5d92ef7280
add-symbol-file '/lib/x86_64-linux-gnu/libstdc++.so.6' 0x00007f5d9257e2e0
add-symbol-file '/lib/x86_64-linux-gnu/libgcc_s.so.1' 0x00007f5d928f35e0
add-symbol-file '/home/mewais/DynamoRIO/ext/lib64/debug/libdrwrap.so' 0x00007f5d4f276150
add-symbol-file '/home/mewais/DynamoRIO/ext/lib64/debug/libdrmgr.so' 0x00007f5d4f4855f0
>
<(1+x) Handling our fault in a TRY at 0x000000007129f6b2>
<Application ./Test1 (29572). Tool internal crash at PC 0x00007f5d92daeda0. Please report this at your tool's issue tracker. Program aborted.
Received SIGSEGV at pc 0x00007f5d92daeda0 in thread 29572
Base: 0x0000000071000000
Registers:eax=0x00007f5c00000000 ebx=0x00007f5d92dcac20 ecx=0x0000000000002000 edx=0x0000000000000000
esi=0x0000000000000001 edi=0x00007f5d92decde0 esp=0x00007ffee9010020 ebp=0x00007f5d92dcaf00
r8 =0x00007f5d92decdf8 r9 =0x00007f5ccef31ef0 r10=0x0000000000000000 r11=0x00007f5ccef2dee0
r12=0x0000000000000000 r13=0x00007f5d92dcafc0 r14=0x00007f5d92dcadd0 r15=0x00007f5d92dcad00
eflags=0x0000000000010206
version 8.0.0, build 1
-no_dynamic_options -client_lib '/home/mewais/SNE/Build/libSNEClient.so;0;"--so-many-args"' -code_api -stack_size 56K -signal_stack_size 32K -
0x00007f5d92dcaf00 0x0003000200010001>
--------------------------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
I am still new to DR, so I could not really gdb into this yet, but the whole thing just works if I comment out MPI_Init so I am pretty sure it is the cause (I also have NOT yet added any other MPI calls)
Is there a solution to this? I can think of some workarounds, like maybe having a separate process be solely responsible for MPI, and make the client talk to it via sockets or pipes or so, would that be possible? Is there a way to get MPI and DR to work nicely together without having to go through complex and potentially slow workarounds?