Hello,
i am trying to use attach/detach/reattach feature on Ubuntu 20.04 AArch64 for some single thread application and currently it crashes. dynamorio is built in debug mode and with full logging enabled:
./bin64/drrun -attach 4042305 -ops "-logdir /home/abudanko/dynamorio_fixes/build/log_dir -logmask 0x0fffffff -loglevel 4" -c ./clients/lib64/debug/libdrmemtrace.so -offline
Below is a part of execution log of signal handler processing nudge detach signal:
...
86800:d_r_dispatch:
d_r_dispatch: target = 0x0000ffffab42755c
86802:check_thread_vm_area: check_thread_vm_area: pc = 0x0000ffffab42755c
86804:prepend_entry_to_fraglist: prepend_entry_to_fraglist: putting fragment @0x0000ffffab42755c (shared) on vmarea 0x0000ffffab427000-0
86806:check_thread_vm_area: check_thread_vm_area: check_stop = 0x0000ffffab428000
86808:build_bb_ilist:
interp: 86810:build_bb_ilist: start_pc = 0x0000ffffab42755c
0x0000ffffab42755c b98007e0 ldrsw +0x04(%sp)[4byte] -> %x0
0x0000ffffab427560 f94007e1 ldr +0x08(%sp)[8byte] -> %x1
0x0000ffffab427564 8b000020 add %x1 %x0 lsl $0x00 -> %x0
0x0000ffffab427568 f90007e0 str %x0 -> +0x08(%sp)[8byte]
0x0000ffffab42756c b94007e0 ldr +0x04(%sp)[4byte] -> %w0
0x0000ffffab427570 11000400 add %w0 $0x0001 lsl $0x00 -> %w0
0x0000ffffab427574 b90007e0 str %w0 -> +0x04(%sp)[4byte]
0x0000ffffab427578 b94007e1 ldr +0x04(%sp)[4byte] -> %w1
0x0000ffffab42757c 5290d3c0 movz $0x869e lsl $0x00 -> %w0
0x0000ffffab427580 72a00020 movk %w0 $0x0001 lsl $0x10 -> %w0
0x0000ffffab427584 6b00003f subs %w1 %w0 lsl $0x00 -> %wzr
0x0000ffffab427588 54fffead b.le $0x0000ffffab42755c
86812:build_bb_ilist: end_pc = 0x0000ffffab42758c
87048:dispatch_enter_fcache_stats: Entry into F2(0x0000ffffab42755c).0x0000aaaa929fb170
...
87086:dispatch_exit_fcache_stats: Exit from F2(0x0000ffffab42755c).0x0000aaaa929fb36c
...
87612:main_signal_handler_C:
main_signal_handler: thread=4042305, sig=32, xsp=0x0000fffda61c3da0, retaddr=0x1000000400000020
87614:main_signal_handler_C: siginfo: sig = 32, pid = 0, status = 0, errno = 268435460, si_code = -1
87616:dump_sigcontext: Signal context:
87618:dump_sigcontext: x0 = 0x000000000000a248
87620:dump_sigcontext: x1 = 0x0000ffffab425560
...
87682:dump_sigcontext: pc = 0x0000aaaa929fb208
...
87756:dump_sigcontext:
87758:is_signal_for_us: is_signal_for_us T4042305: sig=32 code=-1 errno=268435460
87760:is_signal_for_us: is_signal_for_us T4042305: sig=32 code=-1 errno=268435460
87762:handle_nudge_signal: received nudge version=1 flags=0x0 mask=0x4 id=0x00000000 arg=0x0000000000000000
87766:d_r_notify: SYSLOG_INFORMATION: received nudge mask=0x4 id=0x00000000 arg=0x0000000000000000
87768:unlink_fragment_for_signal: F2, f->tag = 0x0000ffffab42755c, f->start_pc = 0x0000aaaa929fb16c, pc = 0x0000aaaa929fb208
87770:unlink_fragment_for_signal: unlinking outgoing for interrupted F2
87772:unlink_fragment_outgoing: unlinking outgoing from frag F2(0x0000ffffab42755c)
87774:unlink_branch: unlinking branch F2.0x0000aaaa929fb368
87776:unlink_branch: unlinking branch F2.0x0000aaaa929fb36c
87778:unlink_fragment_for_signal: dcontext->nudge_pending = 0x0000000000000000
87780:enter_nolinking: entered function
87782:enter_nolinking: with CODE_CACHE
87784:flush_fragments_synch_unlink_priv: FLUSH STAGE 1: synch_unlink_priv(thread 4042305 flushtime 0): 0x0000000000000000-0xffffffffffff
87786:make_writable: make_writable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000 0
87808:make_unwritable: make_unwritable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000
87824:flush_fragments_synch_priv: considering thread #1/1 = 4042305
87826:flush_fragments_synch_priv: thread 4042305 synch not required
87828:flush_fragments_thread_unlink: thread 4042305 has no fragments in region to flush
i added code at the end of unlink_fragment_for_signal() function at signal.c file and that code implements unlink of the fragment that nudge detach signal hits:
static bool87830:flush_fragments_unlink_shared: FLUSH STAGE 2: unlink_shared(thread 4042305): flusher is 4042305
87832:flush_fragments_unlink_shared: flushing shared fragments
87836:add_process_lock: add_process_lock: 0 lock 0x00000000715f3e58: name=shared_delete_lock(mutex)@/home/abudanko/dynamorio_fixes/core/
rank=14 owner=4042305 owning_dc=0x0000fffda6188680 contended_event=0x7ffffffff prev=0x0000000000000000
lock count_times_acquired= 1 0 0 0
87838:vm_area_unlink_fragments: vm_area_unlink_fragments 0x0000000000000000..0xffffffffffffffff
87840:vm_area_unlink_fragments: marking region 0x0000ffffab427000..0x0000ffffab428000 for deletion & unlinking all its frags
87860:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab427578) mask=0x00000000000003ff offset=0 trying 0x000000000000032d:
87862:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab427578 to shared_future at table[813]
87864:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=813)
87880:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab427554) mask=0x00000000000003ff offset=0 trying 0x000000000000022d:
87882:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab427554 to shared_future at table[557]
87884:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=557)
87902:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab42758c) mask=0x00000000000003ff offset=0 trying 0x000000000000009e:
87904:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab42758c to shared_future at table[158]
87906:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=158)
87922:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab42755c) mask=0x00000000000003ff offset=0 trying 0x00000000000001f4:
87924:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab42755c to shared_future at table[500]
87926:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=500)
87942:hashtable_fragment_add: hashtable_fragment_add(0x0000ffffab42757c) mask=0x00000000000003ff offset=0 trying 0x0000000000000110:
87944:hashtable_fragment_add: hashtable_fragment_add: added 0x0000ffffab42757c to shared_future at table[272]
87946:hashtable_fragment_remove_helper_open_address: hashtable_fragment_remove_helper_open_address(table=0x0000fffda6180248, hindex=272)
87954:vm_area_unlink_fragments: Before removing vm area:
0x0000ffffab427000-0x0000ffffab428000 ---- ELF SO test_lib.so
87956:vm_area_unlink_fragments: Removing shared vm area 0x0000ffffab427000-0x0000ffffab428000
87962:vm_area_unlink_fragments: After removing vm area:
87964:vm_area_unlink_fragments: Unlinked 5 frags
87966:make_writable: make_writable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000 0
87982:make_unwritable: make_unwritable: pc 0x0000aaaa929f5000 -> 0x0000aaaa929f5000-0x0000aaaa929f7000
87998:flush_fragments_unlink_shared: Flushed 5 fragments from 0x0000000000000000-0xffffffffffffffff
88000:flush_fragments_end_synch: FLUSH STAGE 3: end_synch(thread 4042305): flusher is 4042305
88002:main_signal_handler_C: main_signal_handler 32 returning now to 0x0000aaaa929fb208
...
88014:vm_area_check_shared_pending: thread 4042305 (flushtime 0) walking pending deletion list (was_I_flushed==F2)
88016:vm_area_check_shared_pending: Considering #0: 0x0000ffffab427000..0x0000ffffab428000 flushtime 1
88018:vm_area_check_shared_pending: dec => ref_count is now 0, flushtime diff is 0
88020:add_process_lock: add_process_lock: 0 lock 0x00000000715f3ed0: name=lazy_delete_lock(mutex)@/home/abudanko/dynamorio_fixes/core/vm
rank=15 owner=4042305 owning_dc=0x0000fffda6188680 contended_event=0xfffdffffffff prev=0x0000000000000000
lock count_times_acquired= 1 0 0 0
88022:check_lazy_deletion_list: checking lazy list @ timestamp 1
88024:vm_area_check_shared_pending: deleting all fragments in region 0x0000ffffab427000..0x0000ffffab428000 flushtime 1
88026:fragment_delete: fragment_delete: *0x0000fffda6151740 F5(0x0000ffffab427578).0x0000aaaa929fb6dc bb 0x47
88028:fcache_remove_fragment: fcache_remove_fragment: F5 from Basic block (shared) unit
88030:fragment_delete: fragment_delete: *0x0000fffda61516d0 F4(0x0000ffffab427554).0x0000aaaa929fb598 bb 0x47
88032:fcache_remove_fragment: fcache_remove_fragment: F4 from Basic block (shared) unit
88034:fragment_delete: fragment_delete: *0x0000fffda6151650 F3(0x0000ffffab42758c).0x0000aaaa929fb3c8 bb 0x47
88036:fcache_remove_fragment: fcache_remove_fragment: F3 from Basic block (shared) unit
88038:fragment_delete: fragment_delete: *0x0000fffda61515d0 F2(0x0000ffffab42755c).0x0000aaaa929fb16c bb 0x47
88040:fcache_remove_fragment: fcache_remove_fragment: F2 from Basic block (shared) unit
88042:fragment_delete: fragment_delete: *0x0000fffda6151520 F1(0x0000ffffab42757c).0x0000aaaa929fb008 bb 0x47
88044:fcache_remove_fragment: fcache_remove_fragment: F1 from Basic block (shared) unit
88046:add_process_lock: add_process_lock: 0 lock 0x00000000715f36a8: name=unit_flush_lock(mutex)@/home/abudanko/dynamorio_fixes/core/fca
rank=23 owner=4042305 owning_dc=0x0000fffda6188680 contended_event=0xfffdffffffff prev=0x0000000000000000
lock count_times_acquired= 1 0 0 0
88048:vm_area_check_shared_pending: thread 4042305 done walking pending list @flushtime 1
88050:vm_area_check_shared_pending: Flushed 5 frags
88052:dispatch_enter_dynamorio: Just flushed last_fragment
After return from signal handler thread gets back to dynamorio dispatch and starts instrumenting code and building code cache again:
88054:d_r_dispatch:
d_r_dispatch: target = 0x0000ffffab42755c
88056:check_thread_vm_area: check_thread_vm_area: pc = 0x0000ffffab42755c
new shared vm area: 0x0000ffffab427000-0x0000ffffab428000 ---- ELF SO test_lib.so
88062:check_thread_vm_area: checking thread vmareas against executable_areas
88064:prepend_entry_to_fraglist: prepend_entry_to_fraglist: putting fragment @0x0000ffffab42755c (shared) on vmarea 0x0000ffffab427000-0
88066:check_thread_vm_area: check_thread_vm_area: check_stop = 0x0000ffffab428000
88068:build_bb_ilist:
interp: 88070:build_bb_ilist: start_pc = 0x0000ffffab42755c
0x0000ffffab42755c b98007e0 ldrsw +0x04(%sp)[4byte] -> %x0
0x0000ffffab427560 f94007e1 ldr +0x08(%sp)[8byte] -> %x1
0x0000ffffab427564 8b000020 add %x1 %x0 lsl $0x00 -> %x0
0x0000ffffab427568 f90007e0 str %x0 -> +0x08(%sp)[8byte]
0x0000ffffab42756c b94007e0 ldr +0x04(%sp)[4byte] -> %w0
0x0000ffffab427570 11000400 add %w0 $0x0001 lsl $0x00 -> %w0
0x0000ffffab427574 b90007e0 str %w0 -> +0x04(%sp)[4byte]
0x0000ffffab427578 b94007e1 ldr +0x04(%sp)[4byte] -> %w1
0x0000ffffab42757c 5290d3c0 movz $0x869e lsl $0x00 -> %w0
0x0000ffffab427580 72a00020 movk %w0 $0x0001 lsl $0x10 -> %w0
0x0000ffffab427584 6b00003f subs %w1 %w0 lsl $0x00 -> %wzr
0x0000ffffab427588 54fffead b.le $0x0000ffffab42755c
88072:build_bb_ilist: end_pc = 0x0000ffffab42758c
88074:instrument_basic_block:
instrument_basic_block ******************
88076:instrument_basic_block:
before instrumentation:
TAG 0x0000ffffab42755c
+0 L3 @0x0000fffda61a9448 b98007e0 ldrsw +0x04(%sp)[4byte] -> %x0
+4 L3 @0x0000fffda623b898 f94007e1 ldr +0x08(%sp)[8byte] -> %x1
+8 L3 @0x0000fffda61a8190 8b000020 add %x1 %x0 lsl $0x00 -> %x0
...
Should detach code i.e.
handle_nudge() -> call_switch_stack(dcontext, (byte )((ptr_uint_t)d_r_detachstack + DYNAMORIO_STACK_SIZE), (void ()(void *))detach_externally_on_new_stack, NULL, true);execute prior getting back to dynamorio dispatcher so application code instrumentation would not happen till the next attach?
Thanks,
Alexei
Hi Alexei,I assume this is related to the "-attach <pid>" feature. I'm trying to understand the unexpected behavior you're observing; see below.
> Should detach code i.e.> handle_nudge() -> call_switch_stack(dcontext, (byte )((ptr_uint_t)d_r_detachstack + DYNAMORIO_STACK_SIZE), (void ()(void *))detach_externally_on_new_stack, NULL, true);
> execute prior getting back to dynamorio dispatcher so application code instrumentation would not happen till the next attach?I see that handle_nudge_signal -> nudge_add_pending adds the nudge signal to dcontext->nudge_pending, which should invoke enter_nolinking -> handle_nudge -> call_switch_stack.
> i added code at the end of unlink_fragment_for_signal() function at signal.c file and that code implements unlink of the fragment that nudge detach signal hits:> /* Added here to get out to the dispatcher in a forced order */ flush_fragments_synch_unlink_priv(dcontext, UNIVERSAL_REGION_BASE, UNIVERSAL_REGION_SIZE, false, false, false);
> flush_fragments_unlink_shared(dcontext, UNIVERSAL_REGION_BASE, UNIVERSAL_REGION_SIZE, NULL _IF_DGCDIAG(NULL)); flush_fragments_end_synch(dcontext, false);Can you clarify the goal behind "Added here to get out to the dispatcher in a forced order"?
> After return from signal handler thread gets back to dynamorio dispatch and starts instrumenting code and building code cache again:Can you clarify: do you see the above unexpected behavior with your unlink_fragment_for_signal code changes or also without them? Do you mean DynamoRIO does not detach successfully on the nudge signal? If this is a bug in the current implementation, please file it separately https://github.com/DynamoRIO/dynamorio/issues/ also.
--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/dynamorio-users/7e067067-558d-4b65-8641-fe3729366821n%40googlegroups.com.
If you could describe how you are executing this: this is running an application in one shell and while it's running in another shell you say "drrun -attach <pid>" followed by "drconfig -detach pid" followed by "drrun -attach pid"? Is there no time gap between the detach and 2nd attach? Does it all work fine if there is a time gap?
Does the translation code not handle a prefix instruction on a synchall==detach translation? There are prefixes in x86 ibl targets: but they are rarer than aarch64 so I wonder if it's possible that such a failure has always been the case...are all your failure instances with a prefix translation?
You mean the nudge detach calls thread_get_nudged_mcontext() for the nudged thread, while the nudge signal is always delayed and it should use the mcontext stored in the dcontext?
Was the nudge signal for detach not delayed in the past or something, or is there some other invocation point where the signal just arrived?
It would be worth looking into the history of that code. How do the automated tests of detach nudges pass then?
The nudge point is close enough to the next fragment exit point that it doesn't matter?
Note that we use internally-triggered detach (start/stop API) extensively and detach is pretty stable on large x86 and aarch64 applications
there:
so problems observed are much more likely to be related to the signal trigger and thus the detaching thread itself and not the other threads.
At this point I would suggest filing an issue and putting further details there.