DynamoRIO overhead analysis

257 views
Skip to first unread message

Ganesh Paramasivan

unread,
Apr 16, 2013, 4:43:19 AM4/16/13
to dynamor...@googlegroups.com
Hi,

Version used : DynamoRIO-Linux-3.2.0-3
Ubuntu 12.04 (Kernel Linux 3.6.0-030600rc7-generic)

1. To begin with, whenever a new basic block is created, when I loop through the instrlist, there are about 83 instructions before the first and the last instruction of the application basic block. For example, if there are about 5 instructions in the application basic block (with valid app_pc), I can see (83+83) 166 additional instructions in the instrlist with 0 app_pc. 

I believe these are the additional DynamoRIO instructions used for instrumentation, etc. Will I be able to know what instructions are there in this overhead?

2. After the basic block being added to the trace, I assume the above 166 instructions will not be there as part of the execution. Can you please let me know how can I confirm this?

3. I have instrumented the basic blocks to run PAPI to get the instruction count after being added to the trace. For example, for a basic block with 5 instructions, the PAPI total instruction output I get is about 521. Out of this, PAPI overhead is about 220, Basic block instructions 5, my instrumentation code is about 100 instructions. I am not sure of the source of the remaining ~200 instructions. Can you please let me know what operations dynamoRIO will potentially do even after adding the basic block to the trace? Relating this to my question 2, can those 166 instructions (with 0 app_pc) be responsible for this?

Thanks,
Ganesh

Qin Zhao

unread,
Apr 16, 2013, 10:28:20 AM4/16/13
to dynamor...@googlegroups.com
On Tue, Apr 16, 2013 at 4:43 AM, Ganesh Paramasivan <gan...@gmail.com> wrote:
Hi,

Version used : DynamoRIO-Linux-3.2.0-3
Ubuntu 12.04 (Kernel Linux 3.6.0-030600rc7-generic)

1. To begin with, whenever a new basic block is created, when I loop through the instrlist, there are about 83 instructions before the first and the last instruction of the application basic block. For example, if there are about 5 instructions in the application basic block (with valid app_pc), I can see (83+83) 166 additional instructions in the instrlist with 0 app_pc. 
The base DR won't insert any instructions for normal cases.
-loglevel 3 will dump the instruction before/after instrumentation to show what instructions to be executed.
It looks like you inserted some clear call or something that cause 83 additional instructions.
 

I believe these are the additional DynamoRIO instructions used for instrumentation, etc. Will I be able to know what instructions are there in this overhead?

I am not sure if it is DR inserted.
 

2. After the basic block being added to the trace, I assume the above 166 instructions will not be there as part of the execution. Can you please let me know how can I confirm this?
 
use -debug -loglevel 3 option to dump the log file and check the result.
 

3. I have instrumented the basic blocks to run PAPI to get the instruction count after being added to the trace. For example, for a basic block with 5 instructions, the PAPI total instruction output I get is about 521. Out of this, PAPI overhead is about 220, Basic block instructions 5, my instrumentation code is about 100 instructions. I am not sure of the source of the remaining ~200 instructions. Can you please let me know what operations dynamoRIO will potentially do even after adding the basic block to the trace? Relating this to my question 2, can those 166 instructions (with 0 app_pc) be responsible for this?

Thanks,
Ganesh

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To post to this group, send email to dynamor...@googlegroups.com.
Visit this group at http://groups.google.com/group/dynamorio-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 



--
Interested in Yoga? Be careful of The Yoga Cult or The Scary Yoga Obsession.
More information from  Lorie Anderson and Rick Ross.

Ganesh Paramasivan

unread,
Apr 17, 2013, 4:07:54 AM4/17/13
to dynamor...@googlegroups.com
Thanks Qin,

Yes, I am using dr_insert_clean_call to insert some code into each basic block (at the beginning and end of the basic block). I have observed that these 83 instructions are constant irrespective of my instrumentation code, so I assume it is due to dr_insert_clean_call function. My doubt, in this case, is whether these instructions remain in the basic block instrlist till the end of the run (even after the basic block added to the trace) or will be there only during the basic block creation. 

With respect to debug run, I did the below steps and it has taken some hours till now with no output (my app is a simple matrix multiplication program with 1x1 matrices). Not sure if it is stuck in between and is expected.

cmake -DDEBUG=ON <client code location>
make
drrun -debug -client lib.so -loglevel 3 -logdir dr_log/ <executable>

Please let me know if I am missing something. 

Thanks,
Ganesh

Ganesh Paramasivan

unread,
Apr 17, 2013, 5:06:51 AM4/17/13
to dynamor...@googlegroups.com
Hi Qin,

Sorry, my analysis about the 83 instructions was wrong. Those are indeed seems to be my instrumentation code. I am looking at the same now. Sorry about that. 

Can you please let me know if I am doing the right thing for the debug run?

Thanks,
Ganesh

Derek Bruening

unread,
Apr 17, 2013, 9:48:10 AM4/17/13
to dynamor...@googlegroups.com
On Wed, Apr 17, 2013 at 4:07 AM, Ganesh Paramasivan <gan...@gmail.com> wrote:
drrun -debug -client lib.so -loglevel 3 -logdir dr_log/ <executable>

drrun --help

 usage: drrun [options] <app and args to run>
   or: drrun [options] [DR options] -- <app and args to run>
   or: drrun [options] [DR options] -c <client> [client options] -- <app and args to run>
   ...
       -client <path> <ID> "<options>"

Ganesh Paramasivan

unread,
Apr 26, 2013, 5:16:26 AM4/26/13
to dynamor...@googlegroups.com
Thanks Derek and Qin. 

I am able to run the debug build and get the logs of the instrumentation. 

As I have mentioned before I have been using dr_insert_clean_call to insert code to the beginning of each basic block. When I analyse the debug logs, I can see that exactly 83 instructions were getting added to each basic block and one among those 83 is the call to the actual code to be inserted to the basic blocks. 

I have also observed that for some simple instrumentation (kind simple calculations, without any function calls), the new code is getting inlined into the beginning of the basic block with a minimal 14 instructions overhead. 

So I want to know if I can somehow force the instrumentation to be inlined always instead of calling the instrumentation code always. If this cannot be done with dr_insert_clean_call, is there any other way of doing this without the overhead of 83 instructions per basic block. 

Thanks,
Ganesh

Qin Zhao

unread,
Apr 26, 2013, 10:58:20 AM4/26/13
to dynamor...@googlegroups.com
So I want to know if I can somehow force the instrumentation to be inlined always instead of calling the instrumentation code always. If this cannot be done with dr_insert_clean_call, is there any other way of doing this without the overhead of 83 instructions per basic block. 

Most of those 83 instructions are context switch instructions to make sure the clean call will not mess up with application context.
DR tries to analyze the clean call and inline it if possible, that's why you see there are cases only a handful instructions inserted instead.
However, if the clean call are too complex (e.g. have function calls) and analysis failed, DR have to insert the full context switch instead.
There are no auto-magic way to eliminate those instructions, but there are a few things we can do:
1. simplify the clean-call as much as possible, so DR has a better chance to inline it.
2. manually insert instructions to perform the context switch, and use
   dr_insert_call (http://dynamorio.org/docs/dr__ir__utils_8h.html#af2a3575059c29dae25ab02c9eb1d0ce9) to insert the call without DR's full context switch.
   By using that, you can control what context you want to save at the risk of breaking transparency if you fail to save necessary context.
 3. directly insert instructions to perform the task you want to do in clean-call, which would give you the best performance.

By the way, we convert our inline context switch to out-of-line context switch in our latest DR, so you will see 2 call instructions to the context save/restore instead.

Qin

 

Thanks,
Ganesh

On Wednesday, April 17, 2013 9:48:10 PM UTC+8, Derek Bruening wrote:
On Wed, Apr 17, 2013 at 4:07 AM, Ganesh Paramasivan <gan...@gmail.com> wrote:
drrun -debug -client lib.so -loglevel 3 -logdir dr_log/ <executable>

drrun --help

 usage: drrun [options] <app and args to run>
   or: drrun [options] [DR options] -- <app and args to run>
   or: drrun [options] [DR options] -c <client> [client options] -- <app and args to run>
   ...
       -client <path> <ID> "<options>"

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To post to this group, send email to dynamor...@googlegroups.com.
Visit this group at http://groups.google.com/group/dynamorio-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Derek Bruening

unread,
Apr 26, 2013, 11:13:05 AM4/26/13
to dynamor...@googlegroups.com
For more information on auto-inlining and how to observe or control its aggressiveness see our documentation: http://dynamorio.org/docs/using.html#op_cleancall


Jan Newger

unread,
Apr 26, 2013, 3:11:36 PM4/26/13
to dynamor...@googlegroups.com


On Friday, April 26, 2013 5:13:05 PM UTC+2, Derek Bruening wrote:
For more information on auto-inlining and how to observe or control its aggressiveness see our documentation: http://dynamorio.org/docs/using.html#op_cleancall




On Fri, Apr 26, 2013 at 10:58 AM, Qin Zhao <qin....@gmail.com> wrote:
So I want to know if I can somehow force the instrumentation to be inlined always instead of calling the instrumentation code always. If this cannot be done with dr_insert_clean_call, is there any other way of doing this without the overhead of 83 instructions per basic block. 

Most of those 83 instructions are context switch instructions to make sure the clean call will not mess up with application context.
DR tries to analyze the clean call and inline it if possible, that's why you see there are cases only a handful instructions inserted instead.
However, if the clean call are too complex (e.g. have function calls) and analysis failed, DR have to insert the full context switch instead.
There are no auto-magic way to eliminate those instructions, but there are a few things we can do:
1. simplify the clean-call as much as possible, so DR has a better chance to inline it.
2. manually insert instructions to perform the context switch, and use
   dr_insert_call (http://dynamorio.org/docs/dr__ir__utils_8h.html#af2a3575059c29dae25ab02c9eb1d0ce9) to insert the call without DR's full context switch.
   By using that, you can control what context you want to save at the risk of breaking transparency if you fail to save necessary context.
 3. directly insert instructions to perform the task you want to do in clean-call, which would give you the best performance.

By the way, we convert our inline context switch to out-of-line context switch in our latest DR, so you will see 2 call instructions to the context save/restore instead.


Just out of interest, can you elaborate on why you decided to do this? 

Qin Zhao

unread,
Apr 26, 2013, 3:20:04 PM4/26/13
to dynamor...@googlegroups.com

By the way, we convert our inline context switch to out-of-line context switch in our latest DR, so you will see 2 call instructions to the context save/restore instead.


Just out of interest, can you elaborate on why you decided to do this? 

This is because the context switch code for clean call (especially in x64) have too man instructions, out-of-line context switch would save a lot of space in code cache for case like very large application or clients inserting too many clean calls per basic block.

Jan Newger

unread,
Apr 26, 2013, 4:11:46 PM4/26/13
to dynamor...@googlegroups.com
Thanks for the clarification, that makes sense.

Ganesh Paramasivan

unread,
May 7, 2013, 10:48:29 PM5/7/13
to dynamor...@googlegroups.com
Thanks Qin and all.

I am now trying to insert the instructions directly to the basic block. I am using instrlist_meta_preinsertinstrlist_meta_append etc. Is there a way I can insert a block of instructions to an instrlist? Or in other words create a instrlist and append/prepend to existing instrlist?

Thanks,
Ganesh

Qin Zhao

unread,
May 7, 2013, 11:14:48 PM5/7/13
to dynamor...@googlegroups.com
On Tue, May 7, 2013 at 10:48 PM, Ganesh Paramasivan <gan...@gmail.com> wrote:
Thanks Qin and all.

I am now trying to insert the instructions directly to the basic block. I am using instrlist_meta_preinsertinstrlist_meta_append etc. Is there a way I can insert a block of instructions to an instrlist? Or in other words create a instrlist and append/prepend to existing instrlist?
 
I do not think we have API for concatenating two instrlists
It is not difficult to implement an utility function to do so.
I usually just insert instrs to the instrlist instead of creating the second one and then merge them.


Thanks,
Ganesh


On Saturday, April 27, 2013 4:11:46 AM UTC+8, Jan Newger wrote:
Thanks for the clarification, that makes sense.

On Friday, April 26, 2013 9:20:04 PM UTC+2, qin wrote:

By the way, we convert our inline context switch to out-of-line context switch in our latest DR, so you will see 2 call instructions to the context save/restore instead.


Just out of interest, can you elaborate on why you decided to do this? 

This is because the context switch code for clean call (especially in x64) have too man instructions, out-of-line context switch would save a lot of space in code cache for case like very large application or clients inserting too many clean calls per basic block.

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To post to this group, send email to dynamor...@googlegroups.com.
Visit this group at http://groups.google.com/group/dynamorio-users?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Ganesh Paramasivan

unread,
May 7, 2013, 11:22:26 PM5/7/13
to dynamor...@googlegroups.com
Thanks Qin. As I was trying to insert about 10 instructions, I just wanted to look for any utils like that. 

Thanks,
Ganesh
Reply all
Reply to author
Forward
0 new messages