Massive slowdown of drcov in recent versions

42 views
Skip to first unread message

Philippe Lieser

unread,
Jan 26, 2022, 6:46:19 AM1/26/22
to DynamoR...@googlegroups.com

Hello,

 

Currently we use an old Version of DynamoRIO, which we wanted to update to the new 9.0.0 Version.

We run our test with “drrun -t drcov -- ...”. Which currently takes around 30 minutes.

Using the new 9.0.0 version, the same test now need over 2 hours (the timeout of the runner, tests still not completed).

The cause seems to be a change between cronbuild-8.0.18740 (tests take about 30 min) and cronbuild-8.0.18747 (timeout after 2h).

 

Has someone an idea which of the changes between this two version could be the cause? And if there is a configuration we could change to prevent this massive slowdown?

 

Thanks

Philippe

 

Derek Bruening

unread,
Jan 26, 2022, 11:01:46 AM1/26/22
to dynamor...@googlegroups.com
Hmm, a regression is not good.
Looking at the differences between the two: https://github.com/DynamoRIO/dynamorio/compare/cronbuild-8.0.18740...cronbuild-8.0.18747
Nothing jumps out as really affecting drcov, since drcov doesn't use drreg and doesn't use the string loop expansion.  The core changes are mostly around signals.
Questions:
  • What platform is this: x86_64?  Is it Linux?  If Linux, does the app use signals?
  • If you run without drcov (i.e., just "drrun -- <your-app>") is the slowdown still there?
  • Is this a hang, rather than a slowdown?  I.e., it gets stuck somewhere (deadlock, e.g.) and just sits there?  Would it be possible to attach and get a callstack somewhere after the 30 minute mark where it is likely hung?
  • Would it be possible to binary search the commits in between?  That is more work though as you would have to build DR.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/0f36f29e1bd14e028bfe860e66144c94%40rohde-schwarz.com.

Philippe Lieser

unread,
Jan 27, 2022, 1:20:48 PM1/27/22
to dynamor...@googlegroups.com

> What platform is this: x86_64?  Is it Linux?

 

It is a x86 Executable, running in a 64-bit Windows (Docker container).

 

> If Linux, does the app use signals?

 

The platform we run the test on is Windows.

 

> If you run without drcov (i.e., just "drrun -- <your-app>") is the slowdown still there?

 

Yes, the slowdown is still there without drcov.

 

> Is this a hang, rather than a slowdown?

 

I don’t think it is. Currently running the test locally (still running as I write this, but already more than 2h), and there I can see it is getting further (by the occasional output).

 

Unfortunately attaching the Visual Studio debugger to the running test provides strange behavior.

During the first run there I attached it after quite a while, I did see there the main thread was waiting for a result (and std::future), but did not see any other thread with a useful stack trace, i.e. there the result should have been generated. Was thinking maybe was indeed a crash of the worker thread and killed it.

 

During the current run, there I attached it very early to set breakpoints, I did not see any usable stack trace for the main thread. Currently I do see again something useful for the main thread (waiting for generated RSA key), but the stack trace view for the worker thread is again broken.

 

What we do a quite a few times in the tests in generating RSA keys with the Botan library. This already takes a long time in a normal debug build under Windows, without running with drrun. And is I think the main cause why the test take long even without using drrun (around 22 min).

My current impression is that this is slowed down now even more. But currently still unsure if:

1. By how much this RSA key generation really is slowed down by the new drrun version

2. If it is the only thing responsible for the massive slowdown.

 

> Would it be possible to binary search the commits in between?

 

I can try doing that, but may take a few days until I find time for it.

Derek Bruening

unread,
Jan 27, 2022, 1:32:52 PM1/27/22
to dynamor...@googlegroups.com
On Thu, Jan 27, 2022 at 1:20 PM Philippe Lieser <Philipp...@rohde-schwarz.com> wrote:

> What platform is this: x86_64?  Is it Linux?

 

It is a x86 Executable, running in a 64-bit Windows (Docker container).


If it's WIndows, the changes in that range seem even less relevant.  Not a single one seems likely to have any effect on plain DR or drcov on Windows:
  • Not relevant (UNIX only): c6474fda6 i#4890: Add EEXIST to expected mmap failures (#4891)
  • Not relevant (drcov does not use drreg): 238bb25de i#3823 drreg: Avoid GPR spill slot conflicts with earlier phases (#4887)
  • Not relevant (UNIX only): 10aed67a4 i#4425: save signal stack space with deeper call stack (#4888)
  • Not relevant (drcov does not expand string loops): 931913257 i#4866: Mark string loop expansion as emulation (#4877)
  • Not relevant (drcov does not call this): 7d2d52258 i#3962 label cb: Add instr_clear_label_callback() (#4889)
  • Not relevant (you're not using drcachesim): b39b85d19 i#4881: Fix LRU counter bug in drcachesim (#4883)
  • Not relevant (UNIX only): 638fb106d i#4884: Check signal exit before syscall, etc. reasons (#4886)
  • Not relevant (drcov does not call this): 88a8a6cad i#1312: Adds instr_zeroes_zmmh() (#4875)
  • Not relevant (UNIX only): 403122fa9 i#4256: prevent triggered assertion for duplicated elf_headers on AArch64 (#4876)
  • Not relevant (UNIX only): 7a2f10feb i#4878 signals_pending: Clean up bool-vs-int usage (#4879)
  • Not relevant (AArch64 only): 1640ffe36 i#2626 AArch64 v8.0 Decode: Order dis-a64.txt alphabetically (#4882)
  • Not relevant (UNIX only): 1e0802560 i#4878: Fix whereami overflow corruption of signal field (#4880)

 

> If Linux, does the app use signals?

 

The platform we run the test on is Windows.

 

> If you run without drcov (i.e., just "drrun -- <your-app>") is the slowdown still there?

 

Yes, the slowdown is still there without drcov.

 

> Is this a hang, rather than a slowdown?

 

I don’t think it is. Currently running the test locally (still running as I write this, but already more than 2h), and there I can see it is getting further (by the occasional output).

 

Unfortunately attaching the Visual Studio debugger to the running test provides strange behavior.

During the first run there I attached it after quite a while, I did see there the main thread was waiting for a result (and std::future), but did not see any other thread with a useful stack trace, i.e. there the result should have been generated. Was thinking maybe was indeed a crash of the worker thread and killed it.

 

During the current run, there I attached it very early to set breakpoints, I did not see any usable stack trace for the main thread. Currently I do see again something useful for the main thread (waiting for generated RSA key), but the stack trace view for the worker thread is again broken.


For callstacks, note that DR's symbols are not found by windbg, so if a thread is in DR code you will not get a useful callstack by default.  We have a script to load symbols: https://dynamorio.org/page_debugging.html#autotoc_md154.
 

Philippe Lieser

unread,
Jan 28, 2022, 9:30:14 AM1/28/22
to dynamor...@googlegroups.com

Did a few test runs just generating an RSA key using Botan (version 2.14.0) (code at the bottom):

 

key length | no drrun | 8.0.18740 | 8.0.18747

---------------------------------------------

1024       | 1.795s   | 3.282s    | 30.307s

2048       | 7.29s    | 5.221s    | 7m16s

2048       | 4.297s   | 11.939s   | 2m36s

4096       | 55.728s  | 43.128s   | 2h17m

4096       | 19.509s  | 23.371s   | 28m57s

4096       | 3m15s    | 1m38s     |

4096       | 27.336s  | 38.346s   |

 

Although times are fluctuating a lot, one can still see that there is a big slowdown then using the 8.0.18747 Version.

So seems like the newer DynamoRIO versions have some performance problems with the Botan code.

 

Is that enough Information for you to take a closer look at that is going wrong? Or should I still try to compile DynamoRIO myself and binary search the commits in between?

 

Test code:

 

#include <botan/auto_rng.h>

#include <botan/rsa.h>

Botan::AutoSeeded_RNG rng;

Botan::RSA_PrivateKey key( rng, 1024 );

Derek Bruening

unread,
Feb 5, 2022, 11:09:35 PM2/5/22
to dynamor...@googlegroups.com
I downloaded Botan-2.14.0.tar.xz, compiled it with VS2017 64-bit, and built a program based on the code snippet you provided.
But I don't see any performance difference running on 64-bit Win10 21H1.
Am I missing something?

% cat botan.cpp
#include <botan/auto_rng.h>
#include <botan/rsa.h>
#include <iostream>
int main()
{
    Botan::AutoSeeded_RNG rng;
    Botan::RSA_PrivateKey key( rng, 1024 );
    std::cout << "Success.\n";
    return 0;
}

% cl /Zi /EHsc /Febotan.exe botan.cpp /Ic:/Botan/include/botan-2 c:/Botan/lib/botan.lib
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27045 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.
botan.cpp
Microsoft (R) Incremental Linker Version 14.16.27045.0
Copyright (C) Microsoft Corporation.  All rights reserved
/debug
/out:botan.exe
botan.obj
c:/Botan/lib/botan.lib

% /usr/bin/time ~/dr/releases/DynamoRIO-Windows-8.0.18740/bin64/drrun -- ./botan.exe
Success.
0.00user 0.01system 0:00.55elapsed 2%CPU (0avgtext+0avgdata 5108maxresident)k
0inputs+0outputs (1326major+0minor)pagefaults 0swaps

% /usr/bin/time ~/dr/releases/DynamoRIO-Windows-8.0.18747/bin64/drrun -- ./botan.exe
Success.
0.01user 0.00system 0:00.53elapsed 2%CPU (0avgtext+0avgdata 5108maxresident)k
0inputs+0outputs (1325major+0minor)pagefaults 0swaps




Philippe Lieser

unread,
Feb 7, 2022, 9:25:39 AM2/7/22
to dynamor...@googlegroups.com

I did a few more tests with a fresh Botan build, and noticed that the slowdown only happens with the debug build of Botan. Both with 32-bit and 64-bit (OS Win10 1909).

 

The commands I used to configure Botan:

 

64-bit release (no slowdown; 0.473s -> 0.459s):

py .\configure.py --minimized-build --build-targets=shared --enable-modules=auto_rng,system_rng,rsa

 

64-bit debug (slowdown; 1.231s -> 10.133s):

py .\configure.py --minimized-build --build-targets=shared --enable-modules=auto_rng,system_rng,rsa --debug-mode

 

32-bit release (no slowdown; 0.828s -> 0.878s):

py .\configure.py --minimized-build --build-targets=shared --enable-modules=auto_rng,system_rng,rsa --cpu=x86_32

 

32-bit debug (slowdown; 2.257s -> 18.670s):

py .\configure.py --minimized-build --build-targets=shared --enable-modules=auto_rng,system_rng,rsa --cpu=x86_32 --debug-mode

Derek Bruening

unread,
Feb 7, 2022, 10:33:50 AM2/7/22
to dynamor...@googlegroups.com
Sanity checks on other platforms also show no performance difference between the two releases on programs like SPECCPU (below).

$ /usr/bin/time ~/dr/releases/DynamoRIO-Linux-8.0.18740/bin64/drrun -- ~/spec2006/bzip2-test/bzip2_base.gcc-64bit ~/spec2006/bzip2-test/dryer.jpg 2
2.16user 0.02system 0:02.21elapsed 99%CPU (0avgtext+0avgdata 21276maxresident)k
0inputs+8outputs (0major+6401minor)pagefaults 0swaps

$ /usr/bin/time ~/dr/releases/DynamoRIO-Linux-8.0.18747/bin64/drrun -- ~/spec2006/bzip2-test/bzip2_base.gcc-64bit ~/spec2006/bzip2-test/dryer.jpg 2
2.15user 0.03system 0:02.22elapsed 98%CPU (0avgtext+0avgdata 22856maxresident)k
0inputs+8outputs (0major+6601minor)pagefaults 0swaps

Derek Bruening

unread,
Feb 13, 2022, 12:36:54 PM2/13/22
to dynamor...@googlegroups.com
I can reproduce the problem.  It is a regression from commit 638fb10.  This is likely to cause large slowdowns on many Windows programs.  I filed https://github.com/DynamoRIO/dynamorio/issues/5352 on this.

Derek Bruening

unread,
Feb 14, 2022, 3:01:53 PM2/14/22
to dynamor...@googlegroups.com
There is a 9.0.1 release with the fix: https://github.com/DynamoRIO/dynamorio/releases/tag/release_9.0.1.  Please use it instead of 9.0.0.
Reply all
Reply to author
Forward
0 new messages