Running JVM processes under DR on Windows (many access violations)

92 views
Skip to first unread message

Daniel Elsner

unread,
May 5, 2022, 7:58:10 AM5/5/22
to DynamoRIO Users
I'm trying to use the DR client I created on Windows 10 to instrument Java applications (e.g., simply Maven) and face considerable overhead (if using -attach) or even DR crashes (if starting the application via DR).

I registered a simple exception handler in a DR client and observed that the JVM process throws many 0xc0000005 access violation exceptions (which presumably is normal for JVM processes: https://stackoverflow.com/a/36258856).
Now, I've searched through all GH issues and posts in this Google group related to DR and instrumenting JVM processes, but could not get it to work. 
I've tried a handful of JDK versions (OpenJDK 8, 11, 14, 17) and a variety of DR runtime options (-no_hw_cache_consistency, -ignore_assert_list '*', -s 60, -vm_size 1G, -no_enable_reset, -disable_traces, -no_sandbox_writes) on DynamoRIO 9.0.1.

Basically, to reproduce it, use a JDK of your choice on Windows and run for example the following, which already takes way too long:
drrun.exe -- cmd /c "mvn -version" or drrun.exe -- cmd /c "mvn test"  (in a simple Maven project)
(Maven will use the java.exe application internally)

Do you have any idea how I could switch off handling any 0xC0000005 exceptions from the DR client, to reduce the overhead?

I've tried the following already:

static bool event_exception(void* drcontext, dr_exception_t* excpt) {

    DWORD exception_code = excpt->record->ExceptionCode;

    if (exception_code == EXCEPTION_ACCESS_VIOLATION) {

        return false; // tried "true" and "dr_redirect_execution()" as well...

    }

    return true;

}

// ...

drmgr_register_exception_event(event_exception);


Thanks in advance!

Derek Bruening

unread,
May 5, 2022, 9:25:48 AM5/5/22
to Daniel Elsner, DynamoRIO Users
Probably the access violation exceptions are coming from DR's cache consistency mechanism trying to handle the JIT (presumably most of them are not there with -no_hw_cache_consistency).  Unfortunately in recent years Java does not seem to work well without issues under DR: xref https://github.com/DynamoRIO/dynamorio/issues/3733.  Help is needed to improve the situation.  Aside from individually working on diagnosing and fixing whatever the bugs turn out to be, and adding targeted regression tests, integrating the JIT-optimizing experimental branch https://dynamorio.org/page_jitopt.html could improve performance.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/476dac81-fc2d-4a88-a908-26a183dfb26cn%40googlegroups.com.

Daniel Elsner

unread,
May 5, 2022, 9:42:54 AM5/5/22
to DynamoRIO Users
Derek, thank you for your response and the references. 
I'll take a closer look at the experimental branch and will try to understand what's going on.

For my use case, perhaps there is a simpler solution:
I want to employ a simple analysis pass (similar to drcov) on which BBs have been covered from the DLLs loaded by the JVM process via System.load("libexternal"). 
I'm thus not interested in analyzing any of the JVM binaries, but only in externally loaded DLLs.
Let me be clear: I don't want to transform any of the binaries, but merely keep track of which BBs have been covered from the external DLLs (the application could even run natively, if that's somehow possible).

Do you know of any further runtime option, in addition to -no_hw_cache_consistency, that should minimize DR interference?
Thank you.

Derek Bruening

unread,
May 5, 2022, 12:04:00 PM5/5/22
to Daniel Elsner, DynamoRIO Users
You could try -native_exec to run Java libraries natively; the default list is here: https://github.com/DynamoRIO/dynamorio/blob/master/core/optionsx.h#L2612
However, it has not been used in a long time, so it might have some bitrot, and different Java implementations might not interoperate well with it.

Daniel Elsner

unread,
May 5, 2022, 12:08:45 PM5/5/22
to DynamoRIO Users
Thank you, I'll see if that is an approach that works.
I think the hint on the JIT is also useful, I'll try running the JVM process with JIT disabled for analysis purposes, i.e., with -Djava.compiler=NONE -Xint. (initial experiments show that this at least prevents DR from crashing for me)

Derek Bruening

unread,
May 5, 2022, 1:02:05 PM5/5/22
to Daniel Elsner, DynamoRIO Users
On Thu, May 5, 2022 at 12:08 PM Daniel Elsner <dvel...@gmail.com> wrote:
Thank you, I'll see if that is an approach that works.
I think the hint on the JIT is also useful, I'll try running the JVM process with JIT disabled for analysis purposes, i.e., with -Djava.compiler=NONE -Xint. (initial experiments show that this at least prevents DR from crashing for me)

Yes, I would expect all problems to come from DR trying to keep its code caches consistent in the face of JIT changes, though it's possible there are other factors.  DR has functionality that tries to handle any kind of code modification from an app but it seems there are some bugs in there.
 

Daniel Elsner

unread,
Jul 10, 2022, 3:48:13 PM7/10/22
to DynamoRIO Users
Hi Derek, 

I'm still tinkering with the JVM problems due to JIT changes. 
Disabling JIT works, but the slowdown is unbearable, right now running the Java program with JIT disabled and `-attach JVM_PID -t drcov -disable_traces` gives roughly ~12x the original runtime.
I also tried `-native_exec -native_exec_list jvm.dll -native_exec_retakeover -takeover_attempts 16` (with JIT enabled/disabled) which doesn't give any crashes and good performance (if JIT enabled), but DR is not able to retake over after any interaction with jvm.dll. 
Furthermore, I experimented with `-no_hw_cache_consistency`, `-no_sandbox_writes`, `-no_enable_reset` (saw them in the GitHub issue you referenced), without achieving any good results.

Am I missing any other possible way that you could think of to get this problem under control? 
One idea I've been thinking about is patching each exported JNI function in native modules that I call from Java to give back control to DR manually.
Essentially, I would wrap each exported JNI function with `if (!dr_app_running_under_dynamorio()) dr_app_take_over()` and run DR with `-native_exec -native_exec_list jvm.dll`.
What do you think, is this a valid approach?

If all of this doesn't work, I'll consider taking a deeper look into the JIT experimental branch, but after skimming a bit I'm convinced it would take me quite a while to get into the code.

Thanks again,
Daniel


Daniel Elsner

unread,
Jul 11, 2022, 8:42:11 AM7/11/22
to DynamoRIO Users
Adding to my last question: If I get a JVM crash (access violation exception) at a certain address inside the DR code cache, how can I get the original address and print it for debugging purposes?
Thanks.

Derek Bruening

unread,
Jul 12, 2022, 9:42:58 PM7/12/22
to Daniel Elsner, DynamoRIO Users
12x overhead for an app with code reuse without the JIT sounds pathological.  Unless it is a short app not reusing any code and just running new code the entire time.  I would profile it to see what is going on.

For translating the code cache PC: ideally there would be a debugger integration package to auto-magically do that; there was a "drdbg" project but it was never finished.  The dr_app_pc_from_cache_pc() function does the translation.  Debug logs will also provide the information, or in the debugger locate the fragment_t data structure to get the tag of the containing block (find the linkstub_t from the exit stub(s) to do this: they are laid out after the fragment_t).

Daniel Elsner

unread,
Jul 13, 2022, 3:45:13 AM7/13/22
to DynamoRIO Users
Agreed, though, in this case, the app does precisely this, just running new code the entire time. Also, another (native, non-JVM) app running against the same code base (that does reuse some code) has only an overhead of 2-3x. Therefore, the overhead is most likely coming from the JVM context.

Thanks for the hints on translating the code cache PC, I'll check it out.

Reply all
Reply to author
Forward
0 new messages