Re: fork related error

32 views
Skip to first unread message

Derek Bruening

unread,
May 16, 2022, 1:58:26 PM5/16/22
to Prasun Ratn, DynamoRIO Users
The malloc check is there to ensure dr$sim can operate linked statically into the app.  If you're not using it in that way, you can disable that as a workaround.  Given that when linked statically it really can't handle a fork+exec, maybe the warning should be disabled there -- though maybe what you hit is a fork w/o a subsequent exec?

I also hit a problem with fork, but it wasn't this one (maybe your tree is older and would hit my symptom earlier once updated, and maybe mine would hit yours once fixed): it is a failure to open a file which might be a regression from the recent multi-window-trace-samples feature: https://github.com/DynamoRIO/dynamorio/issues/5495

On Fri, May 13, 2022 at 2:47 PM Prasun Ratn <prasu...@gmail.com> wrote:
I am seeing this assert while running drcachesim -offline (debug build).

<Usage error: malloc invoked mid-run when disallowed by DR_DISALLOW_UNSAFE_STATIC (/home/prasun/dr/dynamorio/core/loader_shared.c, line 1083)

It looks like `drx_open_unique_appid_file(op_outdir.get_value().c_str(),` in init_offline_dir is causing an unexpected malloc. If I hardcode a path there the assert is not seen.

```
#0  os_terminate (dcontext=0x0, flags=TERMINATE_PROCESS) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/unix/os.c:1399
#1  0x00007fb792bd4923 in soft_terminate () at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/utils.c:110
#2  0x00007fb792bd4ce5 in external_error (file=0x7fb792e66748 "/home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/loader_shared.c", line=1083,
    msg=0x7fb792e66b90 "malloc invoked mid-run when disallowed by DR_DISALLOW_UNSAFE_STATIC") at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/utils.c:210
#3  0x00007fb792cd59ae in redirect_malloc_initonly (size=73) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/loader_shared.c:1082
#4  0x00007fb7923e4298 in operator new(unsigned long) ()
#5  0x00007fb79247813d in void std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >::_M_construct<char*>(char*, char*, std::forward_iterator_tag) ()
#6  0x00007fb74e790122 in droption_t<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >::get_value (this=0x7fb74e9bb020)
    at /home/amd/prasun/workloads/tracing/DynamoRioToS64/build-DDEBUGON/ext/include/droption.h:432
#7  0x00007fb74e78d348 in init_offline_dir () at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/clients/drcachesim/tracer/tracer.cpp:2431
#8  0x00007fb74e78d625 in fork_init (drcontext=0x7fb54e7f3080) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/clients/drcachesim/tracer/tracer.cpp:2494
#9  0x00007fb792cdf82f in instrument_fork_init (dcontext=0x7fb54e7f3080) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/lib/instrument.c:1432
#10 0x00007fb792b4c0e7 in dynamorio_fork_init (dcontext=0x7fb54e7f3080) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/dynamo.c:881
#11 0x00007fb792dcb446 in post_system_call (dcontext=0x7fb54e7f3080) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/unix/os.c:8326
#12 0x00007fb792bca98a in handle_post_system_call (dcontext=0x7fb54e7f3080) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/dispatch.c:2196
#13 0x00007fb792bc0f6b in dispatch_enter_dynamorio (dcontext=0x7fb54e7f3080) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/dispatch.c:894
#14 0x00007fb792bbc5ee in d_r_dispatch (dcontext=0x7fb54e7f3080) at /home/amd/prasun/workloads/tracing/DynamoRioToS64/dynamorio/core/dispatch.c:160
```

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/76624106-3410-4636-bc0b-8c450afce4d3n%40googlegroups.com.

Prasun Ratn

unread,
May 18, 2022, 11:24:07 AM5/18/22
to Derek Bruening, DynamoRIO Users
I was not able to reproduce this issue on the main branch or with a directed test on my branch. But I fixed it locally and then re-ran but I hit another assert.

<CURIOSITY : (0) && "crashed while walking dynamic header" in file /home/prasun/dynamorio/core/unix/module_elf.c line 326
<CURIOSITY : out_data->alignment == alignment in file /home/prasun/dynamorio/core/unix/module.c line 483
<Application <dir>/python3.7 (59312) DynamoRIO usage error : meta-instr faulted?  must set translation field and handle fault!>
<Usage error: meta-instr faulted?  must set translation field and handle fault! (/home/prasun/dynamorio/core/translate.c, line 1016)

This assert occurs in the master branch as well. I saw this in a slightly old rev (Oct 27 a314825) from which my code was forked.

It is seen in later revs also but stops showing at 4bcc907 (Mar 10).  However the benchmark never completes. It normally takes under 4 minutes to run but when I ran with basic_counts analyzer it did not finish overnight (this is a 64 thread run but I don't think basic counts should have much overhead?). With just drrun it runs fine (takes about 10s extra). The processes don't use CPU and seem to be in wait/pipe_wait. With '-offline' it also keeps running for hours but I didn't see anything unusual with gdb or in the logs (-loglevel 4) - it seemed to be executing app code with tracing instrumentation. This is seen in the most recent build (adb1bd4 May 17). This is a tensorflow benchmark (BERT) which has JIT code so that may be playing a part but we did not see the error with another tensorflow run.

In case it is useful, I see this assert in the commit (1cd0ba5 Mar 8) before 4bcc907 (Mar 10):

ASSERT FAILURE: /home/prasun/dynamorio/clients/drcachesim/tracer/tracer.cpp:1692: tracing_disabled.load(std::memory_order_acquire) == BBDUP_MODE_COUNT ()

Derek Bruening

unread,
May 24, 2022, 6:16:30 PM5/24/22
to Prasun Ratn, DynamoRIO Users
I reproduced the malloc issue in my PR #5500 in the 32-bit GA CI test.  Like in your callstack it's from droption whose C++ class use allocation has surfaced before and is filed as https://github.com/DynamoRIO/dynamorio/issues/4660.  Unfortunately it's not trivial to solve.
Reply all
Reply to author
Forward
0 new messages