I've tested this on AArch64 on several examples where there is CPU intensive work, such as an infinite loop with a simple arithmetic operation. And usually the detach works in 3-5 attempts. If I insert the 0.1s slip in the loop then the detach triggers the first time stably.
Debug log for a simple app. infinite loop with a simple arithmetic operation:
dispatch.c:374 dispatch_enter_fcache()
fragment.c:5686 enter_nolinking() 0 tag -1307965664
---send signal first attept--- (bin64/drconfig -detach `pidof simple`)
Start signal handler:
signal.c:6098 main_signal_handler_C() sig 4 call handle_suspend_signal
signal.c:8507 handle_suspend_signal()
signal.c:8672 handle_nudge_signal()
safe_is_in_fcache() check on fcache_fragment_pclookup (not found pc in the table)
signal.c:8748 call nudge_add_pending()
nudge.c:474 nudge_add_pending pending -1220444792, version=1 flags=0x0 mask=0x4 id=0x00000000
nudge.c:489 nudge_add_pending change pending -1220444792
signal.c:6110 main_signal_handler_C() sig 4 after call handle_suspend_signal
---send signal second attept--- (bin64/drconfig -detach `pidof simple`)
Start signal handler:
signal.c:6098 main_signal_handler_C() sig 4 call handle_suspend_signal
signal.c:8507 handle_suspend_signal()
signal.c:8672 handle_nudge_signal()
safe_is_in_fcache() check on fcache_fragment_pclookup (not found pc in the table)
signal.c:8748 call nudge_add_pending()
nudge.c:474 nudge_add_pending pending -1220425635, version=1 flags=0x0 mask=0x4 id=0x00000000
signal.c:6110 main_signal_handler_C() sig 4 after call handle_suspend_signal
---send signal 3rd attept (successful)--- (bin64/drconfig -detach `pidof simple`)
Start signal handler:
signal.c:6098 main_signal_handler_C() sig 4 call handle_suspend_signal
signal.c:8507 handle_suspend_signal()
signal.c:8672 handle_nudge_signal()
safe_is_in_fcache() check on fcache_fragment_pclookup (found pc in the table)
signal.c:4480 unlink_fragment_for_signal() (this does not happen in previous attempts)
signal.c:8748 call nudge_add_pending
nudge.c:474 nudge_add_pending pending -1220457864, version=1 flags=0x0 mask=0x4 id=0x00000000
signal.c:6110 main_signal_handler_C() sig 4 after call handle_suspend_signal
d_r_dispatch()
dispatch.c:374 dispatch_enter_fcache()
fragment.c:5686 enter_nolinking -1220444792 tag -1307965664
fragment.c:5693 dcontext->interrupted_for_nudge != NULL -1220457864
fragment.c:5706 call handle_nudge()
nudge.c:291 handle_nudge()
synch.c:2304 detach_externally_on_new_stack()
fcache_fragment_pclookup
Probably I need to add something else in the debug log?
BR,
Artem
--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/47d2042b-c65c-4312-b335-2fd4c744a93bn%40googlegroups.com.
The application is in the code cache (there is a simple infinite loop with an arithmetic operation). Therefore, the dispatcher is not called.Now we need to understand where was that thread when the detach signal successfully interrupted it.вторник, 26 марта 2024 г. в 19:01:49 UTC+3, Derek Bruening:
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/58732d04-d47e-44ee-a5f6-80f33dc29435n%40googlegroups.com.
Yes, you were right, the detach works after checking for execution from the cache code and when detach not pass we are in the client code or dynamo_dll.
This check in safe_is_in_fcache() is triggered for drcachesim client:
is_in_client_lib(pc)
This check in safe_is_in_fcache() is triggered for write trace buffer to file:
is_in_dynamo_dll(pc)
Cache entry at this location: entry 0x0000aaaa944f50c0 pc 0x0000aaaa944fb008
Not success detach. in “dynamo_dll” pc 0x0000000071203484
Not success detach. in “dynamo_dll” pc 0x000000007120349c
Not success detach. in “dynamo_dll” pc 0x000000007120349c
Success detach. in “fcache” pc 0x0000aaaa944fb08c
And if I run drrun without any client on the same app (infinite simple loop), then the detach triggers every time.
So, as I understand it is necessary to add the processing of nudge in case when the application is not in the code cache.
Could you help with determining the appropriate place to add this feature?
Regular signals handle also in the main_signal_handler_C()?среда, 27 марта 2024 г. в 22:50:18 UTC+3, Derek Bruening:
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/533f6c9b-b822-4304-9171-7d076a2e4e9an%40googlegroups.com.
Added function names from libdynamorio.so.
And what if after sending a nudge signal through "drconfig -detach" expect a response and in case the detach did not work, for example, when we realized that we are not in the code cache, then send a signal in response, that we weren't detached. In this case, send the signal again until success.?
dispatch.c:550 enter_fcache() entry 0x0000aaaa888850c0 pc 0x0000aaaa8888b008
…..
Not success detach. pc 0x00000000712036d0 /data/disk5/artemshc/dynamorio_mica/build/lib64/release/libdynamorio.so: memcpy + 8
Not success detach. pc 0x000000007120157c /data/disk5/artemshc/dynamorio_mica/build/lib64/release/libdynamorio.so: dynamorio_syscall + 44
Not success detach. pc 0x00000000712036d4 /data/disk5/artemshc/dynamorio_mica/build/lib64/release/libdynamorio.so: memcpy + 12
Not success detach. pc 0x00000000712036f4 /data/disk5/artemshc/dynamorio_mica/build/lib64/release/libdynamorio.so: memset + 16
Not success detach. pc 0x00000000712036d4 /data/disk5/artemshc/dynamorio_mica/build/lib64/release/libdynamorio.so: memcpy + 12
Not success detach. pc 0x0000fffd81275458 /usr/lib/aarch64-linux-gnu/liblz4.so.1.9.2
Not success detach. pc 0x0000fffd8127544c /usr/lib/aarch64-linux-gnu/liblz4.so.1.9.2
Success detach. pc 0x0000aaaa8888b038 in code cache.
Artem
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/0a85fe70-c829-43f3-902a-5c0ac581431fn%40googlegroups.com.
The call stacks (see below) show that memcpy and memset were called from the client. dynamorio_syscall was not caught under bugger.
So, I think we can try fixing nudge part of the bug #569 and see what happens with detach.
As I understood from the description to the bug, it is necessary to implement write address of the current fragment each time before calling clean_call. Reset this value when return from clean call. If during the nudge signal we are not in the cache code, then make unlink from the recorded fragment. I`m getting it right?
#0 memset () at dynamorio/core/arch/aarch64/memfuncs.asm:71
#1 0x0000ffffb3fb99e4 in dynamorio::drmemtrace::online_instru_t::append_thread_header (this=0x10218b3b409e140,
buf_ptr=0xfffdb4097000 "Hp\t\264\375\377", tid=65533, file_type=3019990656)
at dynamorio/clients/drcachesim/tracer/instru_online.cpp:188
#2 0x0000ffffb3fa7f84 in dynamorio::drmemtrace::event_post_syscall (drcontext=0x714847d0 <get_dr_tls_base_addr+16>,
sysnum=0) at dynamorio/clients/drcachesim/tracer/tracer.cpp:1624
#0 memcpy () at dynamorio/core/arch/aarch64/memfuncs.asm:55
#1 0x000000007120316c in d_r_memmove (dst=0xfffdb40b1050, src=0xfffdb40f1050, n=65536)
at dynamorio/core/string.c:179
#2 0x0000fffff7d734a4 in encode_opndsgen_6594a000_00001fff (pc=0xfffff7aac610 <extend_unit_end+1420> " \a",
instr=0xfffdb40988b0, enc=65535, di=0xfffdb402eee0)
at dynamorio/build_debug/opnd_encode_funcs.h:16398
#3 0x0000fffff7d771a8 in encode_opndsgen_8540c000_003f1fff (pc=0x0, instr=0xfffdb402ef80, enc=1, di=0x6610)
at dynamorio/build_debug/opnd_encode_funcs.h:16839
#4 0x0000ffffb3fb7058 in dynamorio::drmemtrace::offline_instru_t::get_modoffs (this=0xfffdb401f708,
drcontext=0xfffff3f8b610, pc=0x0, modidx=0xfffff3f8b730)
at dynamorio/clients/drcachesim/tracer/instru_offline.cpp:491
#5 0x0000ffffb3fb7f70 in dynamorio::drmemtrace::offline_instru_t::instr_has_multiple_different_memrefs (this=0x0, instr=0x0)
at dynamorio/clients/drcachesim/tracer/instru_offline.cpp:731
Artem
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/8bada855-6c87-41cb-887c-c209149e1a7fn%40googlegroups.com.
The way to find the return address from the stack, as I understand, should be something like this (taken from the find_next_fragment_from_gencode function):
cache_pc retaddr = NULL;
byte *ra_slot =
dcontext->dstack - get_clean_call_switch_stack_size() - sizeof(retaddr);
if (in_clean_call_save(dcontext, dcontext-> interrupted _pc)) {
ra_slot -= get_clean_call_temp_stack_size();
}
if (d_r_safe_read(ra_slot, sizeof(retaddr), &retaddr)) {
dr_printf("RETADDR %p\n", retaddr);
}
At the same time, it is not clear how to use it, because, as I understand, unlink should be done when we are in the code cache, but at the same time we can not do a delayed signal check and make unlink in the code cache, because there is the code of the application itself. I also tried to make unlink before exiting clean_call and tried to make unlink in the signal handler itself passing the last fragment of unlink_fragment_for_signal function, in both cases detach works, but the application crashes SIGSEGV.
Where we should use return address and do unlink?
BTW, what happens when we do unlink? The code shows that unlink occurs for each branches in the code cache but it is not clear what means unlink is for a branch?
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/19791847-2b70-48e0-bf4c-f2b7350dca16n%40googlegroups.com.
Thank you for reminding us of the unlink concept. Now it's clear why we should call unlink.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/978fa54b-a456-4a8c-b631-4bc3ab16f9b8n%40googlegroups.com.