DynamoRIO assert fail: rank order violation

94 views
Skip to first unread message

francesc.ma...@gmail.com

unread,
Feb 16, 2021, 5:35:37 AM2/16/21
to DynamoRIO Users
Hi,

I have been trying to run a tool I have on a new machine (A64FX CPU, Armv8.2-A + SVE), and I have been having some issues.

When I run dynamoRIO without any tool and a ls command it works, but when I try to run a more complex binary (HPCG benchmark, MPI+OpenMP), it fails due to a rank order violation. Running in debug mode I get:
<log dir=/fefs/scratch/bsc18/bsc18292/romol/musa_fuji/dynamorio/bin64/../logs/xhpcg.157.00000004>
<Starting application /fefs/scratch/bsc18/bsc18292/romol/gem5-apps/gem5-apps/HPCG/build_fuji/bin/xhpcg (157)>
<Initial options = -no_dynamic_options -loglevel 5 -code_api -stack_size 64K -signal_stack_size 64K -max_elide_jmp 0 -max_elide_call 0 -vmm_block_size 64K -initial_heap_unit_size 64K -initial_heap_nonpers_size 64K -initial_global_heap_unit_size 512K -max_heap_unit_size 4M -heap_commit_increment 64K -cache_commit_increment 64K -cache_bb_unit_init 64K -cache_bb_unit_max 64K -cache_bb_unit_quadruple 64K -cache_trace_unit_init 64K -cache_trace_unit_max 64K -cache_trace_unit_quadruple 64K -cache_shared_bb_unit_init 512K -cache_shared_bb_unit_max 512K -cache_shared_bb_unit_quadruple 512K -cache_shared_trace_unit_init 512K -cache_shared_trace_unit_max 512K -cache_shared_trace_unit_quadruple 512K -cache_bb_unit_upgrade 64K -cache_trace_unit_upgrade 64K -cache_shared_bb_unit_upgrade 512K -cache_shared_trace_unit_upgrade 512K -early_inject -emulate_brk -no_inline_ignored_syscalls -no_per_thread_guard_pages -native_exec_default_list '' -no_native_exec_managed_code -no_indcall2direct >
<Paste into GDB to debug DynamoRIO clients:
set confirm off
add-symbol-file '/fefs/scratch/bsc18/bsc18292/romol/dynamoRIO/build_fuji_debug/lib64/debug/libdynamorio.so' 0x0000ffffb6eb4160
>
<rank order violation module_data_lock(readwrite)@/fefs/scratch/bsc18/bsc18292/romol/dynamoRIO/dynamorio/core/module_list.c:59 acquired after fcache_unit_areas(readwrite)@/fefs/scratch/bsc18/bsc18292/romol/dynamoRIO/dynamorio/core/fcache.c:880 in tid:9d>
<Application /fefs/scratch/bsc18/bsc18292/romol/gem5-apps/gem5-apps/HPCG/build_fuji/bin/xhpcg (157).  Internal Error: DynamoRIO debug check failure: /fefs/scratch/bsc18/bsc18292/romol/dynamoRIO/dynamorio/core/utils.c:626 (dcontext->thread_owned_locks->last_lock->rank < lock->rank IF_CLIENT_INTERFACE( || first_client || both_client)) && "rank order violation"
(Error occurred @603 frags)
version 8.0.18670, custom build
-no_dynamic_options -loglevel 5 -code_api -stack_size 64K -signal_stack_size 64K -max_elide_jmp 0 -max_elide_call 0 -vmm_block_size 64K -initial_heap_unit_size 64K -initial_heap_nonpers_size 64K -initial_global_heap_unit_size 512K -max_heap_unit_size 4M -heap_commit_increment 64K -cache_commit_increment 64K -cache_bb_uni
0x0000fffdb706ed50 0x0000ffffb6f5b634
0x0000fffdb706eef0 0x0000ffffb6f5dc58
0x0000fffdb706f130 0x0000ffffb6f5f028
0x0000fffdb706f180 0x0000ffffb7042d44
0x0000fffdb706f1b0 0x0000ffffb71ca524
0x0000fffdb706f1c0 0x0000ffffb701136c
0x0000fffdb706f1f0 0x0000ffffb7011688
0x0000fffdb706f250 0x0000ffffb7014424
0x0000fffdb706f2c0 0x0000ffffb70179d0
0x0000fffdb706f3f0 0x0000ffffb6f094d8
0x0000fffdb706f430 0x0000ffffb6f0b674
0x0000fffdb706f620 0x0000ffffb6f27b2c
0x0000fffdb706f6c0 0x0000ffffb6f27c60
0x0000fffdb706f710 0x0000ffffb6ee5558
0x0000fffdb706f750 0x0000ffffb6f57488
/fefs/scratch/bsc18/bsc18292/romol/dynamoRIO/build_fuji_debug/lib64/debug/libdynamorio.so=0x0000ffffb6ea0000>
<rank order violation shared_cache_lock(mutex)@/fefs/scratch/bsc18/bsc18292/romol/dynamoRIO/dynamorio/core/fcache.c:1598 acquired after fcache_unit_areas(readwrite)@/fefs/scratch/bsc18/bsc18292/romol/dynamoRIO/dynamorio/core/fcache.c:880 in tid:9d>


I am using a current commit (commit 0732d709d  Feb 12th 2021)
I attach the two logfiles (loglevel 5).

I get this error with this specific binary (HPCG benchmark) that uses MPI+OpenMP), the problem may be related to the MPI implementation, as the machine uses a custom MPI implementation based on OpenMPI. Regretfully I cannot change to OpenMPI as it does not properly support the network.

Do you know what could be causing this and which steps I could follow to solve it?

Kind Regards,
Francesc.

hpcg_logs.tar.gz

Abhinav Sharma

unread,
Feb 16, 2021, 2:43:13 PM2/16/21
to dynamor...@googlegroups.com
Hi Francesc,
Do the assert failure and rank order violation log also occur in DR opt build, or in debug build with a lower/no loglevel? Unfortunately, we do not have comprehensive testing for higher loglevel in our regular test suite so there could be some issue that affects only higher loglevels (which are lower priority, unless they're blocking collection of some important debug information).

Abhinav

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/28dcfda0-82cb-45b2-bc66-19df2b010568n%40googlegroups.com.

francesc.ma...@gmail.com

unread,
Feb 17, 2021, 4:23:16 AM2/17/21
to DynamoRIO Users
Hi Abhinav,

Yes, I get the same error regardless of the build/run options, I just thought that the log may have information regarding why the rank order violation happens.

Kind Regards,
Francesc.

francesc.ma...@gmail.com

unread,
Feb 17, 2021, 6:17:29 AM2/17/21
to DynamoRIO Users
Hi,

I am sorry, I have a small correction, when I run with a smaller loglevel or withtout debug I get a segmentation fault, and I cannot get meaningful information from gdb
<Application tried to execute from unreadable memory 0x0000000000000000.
This may be a result of an unsuccessful attack or a potential application vulnerability.>
[arms1b0-14c:00099] *** Process received signal ***
[arms1b0-14c:00099] Signal: Segmentation fault (11)
[arms1b0-14c:00099] Signal code:  (128)
[arms1b0-14c:00099] Failing at address: (nil)
[arms1b0-14c:00099] [ 0] [0xffffb719817c]
[arms1b0-14c:00099] *** End of error message ***

Thread 1 "test" received signal SIGSEGV, Segmentation fault.
0x000000004e25260c in ?? ()
#0  0x000000004e25260c in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)


Any idea of how to find the root cause of the problem?

Kind Regards,
Francesc.

francesc.ma...@gmail.com

unread,
Feb 17, 2021, 6:31:11 AM2/17/21
to DynamoRIO Users
Hi,

Sorry again, I seem to have found the cause of the segfault, the OpenMPI library I was linking against seems to have SVE instructions, as running everything through the SVE emulation tool from ArmIE solves the issue. I thought if that was the issue there would be an illegal instruction signal or something like that.

Kind Regards,
Francesc.

Message has been deleted
Message has been deleted
Message has been deleted

Derek Bruening

unread,
Feb 17, 2021, 1:08:55 PM2/17/21
to dynamor...@googlegroups.com
DR does not yet have full support for decoding all SVE instructions, but a crash would not be expected for plain DR since none of those instructions should affect core operations of the system.  Does it work with plain DR (i.e., no tool/client at all) and the crash only happens with your tool?

If it crashes with plain DR, or works on plain DR but crashes with one of the sample clients shipped with DR, if you could file an issue in the tracker that would be appreciated.

On Wed, Feb 17, 2021 at 1:02 PM 'hashmi...@googlemail.com' via DynamoRIO Users <dynamor...@googlegroups.com> wrote:
Hi Francesc,

> the OpenMPI library I was linking against seems to have SVE instructions
Does the library have SVE2 instructions?

The failure happened on an A64FX which supports SVE but not SVE2.
This may explain why it worked with ArmIE which emulates SVE2 as well as SVE.

Regards,
Assad
Reply all
Reply to author
Forward
Message has been deleted
Message has been deleted
0 new messages