Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

How to unscramble the stats and optimize the client?

18 views
Skip to first unread message

Songyan Tang CN

unread,
Sep 24, 2024, 3:56:09 AM9/24/24
to DynamoRIO Users
My program has 2 times running time under dynamorio, so I want to figure out if there is some way I could optimize the client, but I don't know how to unscramble the stats report.
I extract the last piece of the stats report, what info could I get from it? What should I do to optimize?
(Begin) All statistics @540088 (3:50.042):
Current threads under DynamoRIO control : 1
Peak threads under DynamoRIO control : 1
Threads ever created : 1
System calls, pre : 5372
System calls, post : 4695
System calls, pre, ignorable : 2347
System calls, post, ignorable : 2347
Ignorable system calls : 51
Non-ignorable system calls : 36
Application mmaps : 989
Application munmaps : 30
Application modules with code : 82
Application code seen (bytes) : 24438803
Interpreted calls, direct and indirect : 305218
Interpreted indirect calls : 8006
Interpreted indirect jmps : 3485
Interpreted rets : 45340
Dynamic option synchronizations : 2
Dynamic option synchronizations, no change : 2
Code origin addresses checked : 518237
Code origin addresses in last area : 512201
Cache consistency non-code nop flushes : 30
Shared deletion regions unlinked : 169
Shared deletion region walks : 169
Shared deletion ref count decrements : 169
Shared deletion max pending : 1
Shared deletion region removals: ref 0 : 169
Fragments added to lazy deletion list : 21889
Lazy list instances moved to pending list : 169
Lazy list fragments moved to pending list : 21801
Generated code protection changes : 3
Protection change calls : 39
Protection change pages : 1661
Fragments generated, bb and trace : 540088
Basic block fragments generated : 518198
Trace fragments generated : 21890
Trace building reset: no trace head : 21890
Number of bbs in all emitted traces : 54758
Maximum number of bbs in a trace : 42
Trace wannabes prevented from being traces : 585885
Shadowed trace head deleted : 21889
Trace heads re-marked : 3461
Future fragments generated : 288861
Shared fragments generated : 485292
Shared bbs generated : 463402
Shared traces generated : 21890
Private fragments generated : 54796
Private bbs generated : 54796
Shared future fragments generated : 206861
Unique fragments generated : 540085
Maximum fragment requested size in bytes : 9495
Maximum fragment size in bytes : 9492
Maximum instrs in a bb : 257
BBs truncated due to instr limits : 866
Direct exit stubs created : 659239
Indirect exit stubs created : 63020
Separate stubs created : 459648
Rip-relative instrs seen : 765737
Rip-relative unreachable leas : 377623
Rip-relative unreachable non-leas : 372131
Rip-relative unreachable spills avoided : 57389
BBs with one indirect exit : 56831
BBs with one direct exit : 320599
BBs with two direct exits : 140768
BBs with an also_vmarea : 970
BB direct exits >SHRT_MAX from fragment tag : 326331
BB direct exits <=SHRT_MAX from fragment tag : 332908
BB cbr fall-through >SHRT_MAX from fragment tag : 1234
BB cbr fall-through <=SHRT_MAX from fragment tag : 147587
BBs using post-linkstub fragment offset : 363661
BBs that write OF but no other arithmetic flags : 330020
BBs that read a flag before writing any : 3
BBs that write no arithmetic flags : 188175
BBs that write no arithmetic flags, end in ib : 14251
Cbrs sharing a single exit stub : 150933
Fragments requiring post_linkstub offs : 385551
Fragments smaller than minimum fcache slot size : 43526
Fragments final size < minimum fcache slot size : 41897
Fragments deleted for any reason : 76564
Fragments deleted due to capacity conflicts : 4
Trace heads marked : 292956
Fragments deleted and replaced with traces : 1
Fragments deleted for munmap or RO consistency : 21801
Trace fragments targeted by IBL : 7089
IBT resizes : 15
Exits due to IBL cold misses : 1017718
Extra exits due to trace building : 1447
Fragments regenerated or duplicated : 2
Trace fragments extended : 54758
Trace building private copies created : 54758
Trace building private copies deleted : 54758
Trace building private copies futures deleted : 81906
Trace building private copies futures avoided : 54745
Trace inline-ib comparisons : 9070
Trace inline-ib no eflag restore needed : 7014
Trace fragments extended, ibl exits updated : 6189
Branches linked, direct : 442329
Branches linked, indirect : 69209
Fcache exits, total : 2111074
Fcache exits, system call executions : 4695
Fcache exits, from traces : 944994
Fcache exits, from BBs : 1161385
Fcache exits, total indirect branches : 1026254
Fcache exits, non-trace indirect branches : 363969
Fcache exits, ind target not in cache : 306441
Fcache exits, ind target extending a trace, BAD : 5937
Fcache exits, ind target in cache but not table : 713876
Fcache exits, from BB, ind target ... : 287037
Fcache exits, BB->BB, ind target ... : 286633
Fcache exits, BB->BB trace head, ind target ... : 13794
Fcache exits, BB->trace, ind target ... : 404
Fcache exits, from trace, ind target ... : 426839
Fcache exits, trace->trace, ind target ... : 6676
Fcache exits, trace->BB not trace head, ind tgt : 3587
Fcache exits, trace->BB trace head, ind target : 416576
Fcache exits, dir target not in cache : 156975
Fcache exits, link not allowed : 917778
Fcache exits, target trace head : 867844
Fcache exits, extending a trace : 48300
Fcache exits, non-ignorable system call : 5372
Fcache exits, no link shared <-> private : 1634
Fcache exits needing cbr disambiguation : 635172
Fragments with OF restore prefix : 4325
Fcache bb capacity (bytes) : 8192
Fcache bb peak capacity (bytes) : 8192
Fcache bb space claimed (bytes) : 4412
Fcache bb space used (bytes) : 3092
Fcache bb peak used (bytes) : 4412
Fcache bb headers (bytes) : 264
Fcache bb fragment bodies (bytes) : -88643829173994
Fcache bb direct exit stubs (bytes) : -845388
Fcache bb align space (bytes) : 39572
Fcache bb empty space (bytes) : 1320
Fcache shared bb capacity (bytes) : 41369600
Fcache shared bb peak capacity (bytes) : 41369600
Fcache shared bb space claimed (bytes) : 41368652
Fcache shared bb space used (bytes) : 39146568
Fcache shared bb peak used (bytes) : 39146568
Fcache shared bb headers (bytes) : 3532808
Fcache shared bb fragment bodies (bytes) : 36603060
Fcache shared bb align space (bytes) : 761290
Fcache shared bb empty space (bytes) : 6220
Fcache shared bb free coalesce prev : 2005
Fcache shared bb free coalesce next : 5084
Fcache shared bb return last : 24
Fcache shared bb free use larger bucket : 10902
Fcache shared bb free split : 7968
Fcache shared trace capacity (bytes) : 3170304
Fcache shared trace peak capacity (bytes) : 3170304
Fcache shared trace space claimed (bytes) : 3168136
Fcache shared trace space used (bytes) : 3162320
Fcache shared trace peak used (bytes) : 3162328
Fcache shared trace headers (bytes) : 175120
Fcache shared trace fragment bodies (bytes) : 2491605
Fcache shared trace fragment prefixes (bytes) : 385771
Fcache shared trace align space (bytes) : 119521
Fcache shared trace free coalesce next : 7
Fcache shared trace free use larger bucket : 25
Fcache shared trace free split : 22
Fcache combined claimed (bytes) : 44541200
Current fcache combined capacity (bytes) : 44548096
Peak fcache combined capacity (bytes) : 44548096
Current fcache units on live list : 779
Peak fcache units on live list : 779
Fcache unit lookups : 773392
Separate shared trace direct exit stubs (bytes) : 1078424
Separate shared bb direct exit stubs (bytes) : 9045509
Special heap units : 46
Peak special heap units : 46
Special heap align space (bytes) : 8
Current special heap capacity (bytes) : 10137600
Peak special heap capacity (bytes) : 10137600
Current heap units on live list : 298
Peak heap units on live list : 298
Heap headers (bytes) : 1480
Heap align space (bytes) : 1496053
Peak heap align space (bytes) : 1496060
Heap bucket pad space (bytes) : 7676744
Peak heap bucket pad space (bytes) : 7677168
Heap allocs in buckets : 26346994
Heap allocs variable-sized : 237
Total reserved memory : 150503424
Peak total reserved memory : 150503424
Guard pages, reserved virtual pages : 2264
Peak guard pages, reserved virtual pages : 2264
Current stack capacity (bytes) : 204800
Peak stack capacity (bytes) : 204800
Mmap capacity (bytes) : 44584960
Peak mmap capacity (bytes) : 44584960
Mmap reserved but not committed (bytes) : 131072
Peak mmap reserved but not committed (bytes) : 167936
Heap claimed (bytes) : 94796953
Peak heap claimed (bytes) : 94798829
Current heap capacity (bytes) : 95744000
Peak heap capacity (bytes) : 95744000
Heap reserved but not committed (bytes) : 565248
Peak heap reserved but not committed (bytes) : 8978432
File map capacity (bytes) : 2768896
Peak file map capacity (bytes) : 2768896
Current total memory from OS (bytes) : 155955200
Peak total memory from OS (bytes) : 155955200
Current vmm blocks for unreachable heap : 21632
Peak vmm blocks for unreachable heap : 21632
Current vmm blocks for stack : 58
Peak vmm blocks for stack : 58
Current vmm blocks for unreachable special heap : 5
Peak vmm blocks for unreachable special heap : 5
Current vmm blocks for unreachable special mmap : 14
Peak vmm blocks for unreachable special mmap : 14
Current vmm blocks for reachable heap : 50
Peak vmm blocks for reachable heap : 50
Current vmm blocks for cache : 12464
Peak vmm blocks for cache : 12464
Current vmm blocks for reachable special heap : 2586
Peak vmm blocks for reachable special heap : 2586
Current vmm blocks for reachable special mmap : 74
Peak vmm blocks for reachable special mmap : 95
Our virtual memory blocks in use : 36883
Peak our virtual memory blocks in use : 36883
Allocations using multiple vmm blocks : 1167
Blocks used for multi-block allocs : 40244
Current vmm virtual memory in use (bytes) : 151072768
Peak vmm virtual memory in use (bytes) : 151072768
Number of safe reads : 522727
Number of vmarea vector resize reallocations : 12
Peak vmarea vector length : 779
Peak dynamo areas vector length : 9
Peak executable areas vector length : 90
-pad_jmps fragments size overestimated : 285316
-pad_jmps excess instances coalesced w/ nxt free : 3932
-pad_jmps excess instances failed to be returned : 14711
-pad_jmps excess bytes failed to be returned : 264120
-pad_jmps body bytes shared bb : 1390206
-pad_jmps excess bytes shared bb : 1328597
Bytes shared frags ever : 38002989
-pad_jmps start_pcs shifted shared bb : 26967
-pad_jmps start_pcs shifted bytes shared bb : 61609
-pad_jmps excess bytes released shared bb : 914204
-pad_jmps no pad exits shared bb : 540330
-pad_jmps body bytes shtrace : 90279
-pad_jmps excess bytes shtrace : 84346
Bytes shared frags ever : 2879128
-pad_jmps start_pcs shifted shtrace : 2069
-pad_jmps start_pcs shifted bytes shtrace : 4181
-pad_jmps excess bytes released shtrace : 94160
-pad_jmps inserted nops shtrace : 871
-pad_jmps inserted nop bytes shtrace : 1752
-pad_jmps no pad exits shtrace : 60353
-pad_jmps body bytes temp : 164274
-pad_jmps excess bytes temp : 159805
Bytes temp frags ever : 3448349
-pad_jmps start_pcs shifted temp : 2007
-pad_jmps start_pcs shifted bytes temp : 4469
-pad_jmps excess bytes released temp : 199292
-pad_jmps no shift stubs temp : 85404
-pad_jmps no pad exits temp : 89586
-pad_jmps body bytes bb : 114
-pad_jmps excess bytes bb : 112
Bytes bb frags ever : 3156
-pad_jmps start_pcs shifted bb : 2
-pad_jmps start_pcs shifted bytes bb : 2
-pad_jmps excess bytes released bb : 132
-pad_jmps no shift stubs bb : 76
-pad_jmps no pad exits bb : 74
Trace fragment ending with an IBL : 1654
Trace fragment ending with an IBL, return : 808
Trace fragment ending with an IBL, ind call : 568
Trace fragment ending with an IBL, ind jump : 278
App reference with FS/GS seg being mangled : 5299
(End) All statistics

Derek Bruening

unread,
Sep 24, 2024, 10:41:34 AM9/24/24
to Songyan Tang CN, DynamoRIO Users
As the tutorial slides talk about, generally you want to move the frequently-hit instrumentation from clean calls into inlined tuned instrumentation (or possibly "lean calls" to custom generated shared code), but this takes effort so I would first run a general profiler and see where time is spent.  If your instrumentation does something on every basic block you definitely do not want a clean call that frequently.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/dynamorio-users/d3edcd09-60ef-4923-b0e0-c0eda0f39bb5n%40googlegroups.com.

Songyan Tang CN

unread,
Sep 25, 2024, 4:00:25 AM9/25/24
to DynamoRIO Users
Thanks for the reply.
However, when I run an empty client I still get two times of the running time. And use perf to record I got result like this, it is a PyTorch program so I'm not sure if this high overhead is caused by PyTorch itself or DynamoRIO?
image (1).png
Reply all
Reply to author
Forward
0 new messages