I have successfully built and ran my client on a local machine with no issues. I am now trying to run my client on a SLURM cluster. Naturally, they provide all sorts of softwares through a module system, which includes multiple versions of libc. I load the module for gcc 11.3.0 which also loads libc 2.30 (actively avoiding any libc after 2.34).
And all the binaries included under their modules are compiled with RPATH to use the correct libraries (at least according to their documentation).
Using this, my CMake build runs fine. The client links with a couple of static libraries that are also built under the same configuration, plus of course a dynamic libc. After the build, I run this ldd Release/libDCSClient.so and I get the following output, which seems to be fine:
linux-vdso.so.1 (0x00007f14aa40e000)
libdrwrap.so => not found
libdrutil.so => not found
libdrx.so => not found
libdrreg.so => not found
libdrsyms.so => not found
libdrmgr.so => not found
libdynamorio.so => not found
Trial 1The problem I have is that when I run, DR seems to load the incorrect libc. I ran with the -debug flag and got this as part of my output. You can see the multiple lookup errors, the path to the wrong libc, and the segmentation fault at the first use of libc in my client:
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNKSt10filesystem7__cxx114path5_List13_Impl_deleterclEPNS2_5_ImplE>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNSt10filesystem7__cxx114path5_ListC1Ev>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNSt10filesystem6statusERKNS_7__cxx114pathE>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNSt7__cxx1118basic_stringstreamIcSt11char_traitsIcESaIcEEC1Ev>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZSt28__throw_bad_array_new_lengthv>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNSt10filesystem18create_directoriesERKNS_7__cxx114pathE>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNSt10filesystem12current_pathERKNS_7__cxx114pathE>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNSt10filesystem7__cxx114path14_M_split_cmptsEv>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNKSt10filesystem7__cxx114path11parent_pathEv>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNSt10filesystem12current_pathB5cxx11Ev>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNKRSt7__cxx1115basic_stringbufIcSt11char_traitsIcESaIcEE3strEv>
<WARNING! symbol lookup error: libDCSClient.so undefined symbol _ZNKSt10filesystem7__cxx114path7compareERKS1_>
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/../../../Release/libDCSClient.so' 0x000000007257cb50
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/lib64/debug/libdynamorio.so' 0x00007fc1b2558fe0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrwrap.so' 0x0000000074003150
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrmgr.so' 0x000000003f4a85f0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrutil.so' 0x00000000750015c0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrx.so' 0x0000000077003d20
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrreg.so' 0x00000000780028d0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrsyms.so' 0x0000000076009b10
add-symbol-file '/usr/lib64/libc.so.6' 0x00007fc131f24c40
add-symbol-file '/usr/lib64/ld-linux-x86-64.so.2' 0x00007fc131cd1080
add-symbol-file '/usr/lib64/libstdc++.so.6' 0x00007fc1319c9b90
add-symbol-file '/usr/lib64/libm.so.6' 0x00007fc1315c3520
add-symbol-file '/usr/lib64/libgcc_s.so.1' 0x00007fc1313a0e00
>
<Application /lustre06/project/6005345/mewais/DCSim/Bench/gbbs/build/execroot/__main__/bazel-out/k8-opt/bin/benchmarks/BFS/NonDeterministicBFS/BFS_main (2547091). Tool internal crash at PC 0x0000000072895ae7. Please report this at your tool's issue tracker. Program aborted.
Received SIGSEGV at client library pc 0x0000000072895ae7 in thread 2547091
Base: 0x00007fc1b2518000
Registers:eax=0x0000000000000000 ebx=0x000000003f49fb28 ecx=0x00007fc1325ce8e0 edx=0x0000000000000000
esi=0x0000000000000001 edi=0x000000007305e9f0 esp=0x00007ffd6ab30090 ebp=0x00007ffd6ab30130
r8 =0x00007fc1325ce940 r9 =0x00007ffd6ab2fe60 r10=0x00007ffd6ab2fe91 r11=0x0000000000000246
r12=0x00007ffd6ab30110 r13=0x00007fc132520540 r14=0x00007ffd6ab30120 r15=0x000000007305ea00
eflags=0x0000000000010246
version 8.0.0, build 1
-no_dynamic_options -disasm_mask 1 -loglevel 4 -client_lib '/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/../../../Release/libDCSClient.so;0;"--config" "/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/BFS.cfg" "--node-name" "Node/NV_1C_64C_S_128GB_Node-0" "--proce
0x00007ffd6ab30130 0x0000000000000012
0x000000003f49fb58 0x0000000000000000
/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/../../../Release/libDCSClient.so=0x0000000072000000
/usr/lib64/libstdc++.so.6=0x00007fc13193a000
/usr/lib64/libgcc_s.so.1=0x00007fc13139e000
/usr/lib64/libm.so.6=0x00007fc1315b7000
/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrsyms.so=0x0000000076000000
/usr/lib64/libc.so.6=0x00007fc131f03000
/usr/lib64/ld-linux-x86-64.so.2=0x00007fc131cd0>
Trial 2
I also exec my client through another process, so I tried adding this LD_LIBRARY_PATH to the exec: LD_LIBRARY_PATH=/cvmfs/soft.computecanada.ca/gentoo/2020/lib64/:/cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/. This has the exact same paths reported by
ldd above. This time, DR doesn't even have the time to print any debugging output, instead it dies with just this:
Inconsistency detected by ld.so: dl-call-libc-early-init.c: 37: _dl_call_libc_early_init: Assertion `sym != NULL' failed!
Trial 3Since LD_LIBRARY_PATH didn't work, I tried setting
-Wl,-rpath directly during build. I add this to my build
target_link_options(DCSClient PUBLIC -Wl,-rpath /cvmfs/soft.computecanada.ca/gentoo/2020/usr/lib/gcc/x86_64-pc-linux-gnu/11.3.0/ -Wl,-rpath=/cvmfs/soft.computecanada.ca/gentoo/2020/lib64/) and again these are the same two paths reported by
ldd above. Now, when I run DR with
-debug flag I get the following:
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/../../../Release/libDCSClient.so' 0x000000007257cb50
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/lib64/debug/libdynamorio.so' 0x00007fae95f4afe0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrwrap.so' 0x0000000074003150
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrmgr.so' 0x00000000486015f0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrutil.so' 0x00000000750015c0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrx.so' 0x0000000077003d20
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrreg.so' 0x00000000780028d0
add-symbol-file '/lustre06/project/6005345/mewais/DynamoRIO-Linux-8.0.0-1/ext/lib64/debug/libdrsyms.so' 0x0000000076009b10
add-symbol-file '/usr/lib64/libc.so.6' 0x00007fae15916c40
add-symbol-file '/usr/lib64/ld-linux-x86-64.so.2' 0x00007fae156c3080
>
Received SIGSEGV at client library pc 0x0000000072895ae7 in thread 3288107
Base: 0x00007fae95f0a000
Registers:eax=0x0000000000000000 ebx=0x00000000485f8b28 ecx=0x00007fae15fc1020 edx=0x0000000000000000
esi=0x0000000000000001 edi=0x000000007305e9f0 esp=0x00007ffe888495d0 ebp=0x00007ffe88849670
r8 =0x000000007305ea48 r9 =0x00007ffe888493a0 r10=0x00007ffe888493d1 r11=0x0000000000000246
r12=0x00007ffe88849650 r13=0x00007fae15f12540 r14=0x00007ffe88849660 r15=0x000000007305ea00
eflags=0x0000000000010246
version 8.0.0, build 1
-no_dynamic_options -disasm_mask 1 -loglevel 4 -client_lib '/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/../../../Release/libDCSClient.so;0;"--config" "/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/BFS.cfg" "--node-name" "Node/NV_1C_64C_S_128GB_Node-0" "--proce
0x00007ffe88849670 0x0000000000000012
0x00000000485f8b58 0x0000000000000000
/lustre06/project/6005345/mewais/DCSim/Workspace/SingleNode_Local128GB/GBBS/../../../Release/libDCSClient.so=0x0000000072000000
/lustre06/proje>
You can see there are no more symbol lookup errors. You can also see that
libgcc,
libstdc++ and
libm get loaded from the correct path now. But
libc does not (even though it exists at the same path as
libm!), same for
ld-linux-x86-64 though I don't know if that carries any significance as DR uses its private loader?!
Trial 4Because using -rpath didn't work as well, I tried explicitly linking to all libc libraries with their full path by doing the following in CMake:
but it seems that the DR linker does not respect these as the output looks exactly like that of Trial 1.
So, it seems
Trial 3 is the closest to solving the issue, but is still incomplete. How do I get DR to load the correct libc?