Understanding instrlist_meta_postinsert vs instrlist_meta_preinsert

99 views
Skip to first unread message

Igor R

unread,
Jun 1, 2014, 5:25:40 AM6/1/14
to dynamor...@googlegroups.com
Hello,
 
Is instrlist_meta_postinsert(), called for some instr, always equivalent to instrlist_meta_preinsert() called for instr_get_next(instr)?
 
Thanks.

Byron Hawkins

unread,
Jun 1, 2014, 11:37:25 AM6/1/14
to dynamor...@googlegroups.com
Yes, they give the identical result (assuming instr_get_next(inst) is not null).

Igor R

unread,
Jun 2, 2014, 3:14:21 AM6/2/14
to dynamor...@googlegroups.com
Yes, they give the identical result (assuming instr_get_next(inst) is not null).
 
 
What if it is null? instrlist_meta_preinsert() would just append the metainstruction to the end of the instruction list, wouldn't it? If so, it's still identical to instrlist_meta_postinsert, right?
But what I'm trying to understand is whether postinserted metainstructions are associated with the previous app instruction or with the next one. Consider the following:
 
xor %rax, %rax
(meta1)
l1:
mov %rax, %rbx
(meta2)
jmp l1
 
If meta1 & meta2 were post-inserted, will jmp transfer the control to meta1 or to mov? What if they were pre-inserted?

Byron Hawkins

unread,
Jun 2, 2014, 4:42:07 AM6/2/14
to dynamor...@googlegroups.com


On Monday, June 2, 2014 12:14:21 AM UTC-7, Igor R wrote:
Yes, they give the identical result (assuming instr_get_next(inst) is not null).
 
 
What if it is null? instrlist_meta_preinsert() would just append the metainstruction to the end of the instruction list, wouldn't it? If so, it's still identical to instrlist_meta_postinsert, right?

Yes, my mistake there.
 
But what I'm trying to understand is whether postinserted metainstructions are associated with the previous app instruction or with the next one. Consider the following:
 
xor %rax, %rax
(meta1)
l1:
mov %rax, %rbx
(meta2)
jmp l1
 

What's l1, a label? If so, the jmp target will be the instruction after l1. But keep in mind that the label is a "first class" member of the list just like the executable instructions, so insertion will not accidentally go on the wrong side of the label. To be especially safe in this regard, you can always insert relative to the label explicitly.

Igor R

unread,
Jun 2, 2014, 4:53:54 AM6/2/14
to dynamor...@googlegroups.com
But what I'm trying to understand is whether postinserted metainstructions are associated with the previous app instruction or with the next one. Consider the following:
 
xor %rax, %rax
(meta1)
l1:
mov %rax, %rbx
(meta2)
jmp l1
 

What's l1, a label? If so, the jmp target will be the instruction after l1. But keep in mind that the label is a "first class" member of the list just like the executable instructions, so insertion will not accidentally go on the wrong side of the label. To be especially safe in this regard, you can always insert relative to the label explicitly.
 
 
It's not a "meta" label, I meant an application "label", i.e. just an immediate address.
So, when CTI occurs in the application code, what meta-instructions are considered to be associated with the target instruction? Preinserted for that instruction *and* postinserted for the pervious one (i.e.for the statically preceeding one)? Or the former only?
 

Byron Hawkins

unread,
Jun 2, 2014, 5:10:56 AM6/2/14
to dynamor...@googlegroups.com
Actually neither. If you want meta instructions to execute at the jmp target, you'll have to retarget to the first meta instruction. The "meta" qualifier is only kept by DR for making decisions about instances of instr_t. Once an instruction goes into the code cache, it's an x86 instruction like any other--meta or otherwise. So when your jmp executes, the next instruction will be the physical target of the jmp, and from there execution goes in sequence until the next cti. 
 

Igor R

unread,
Jun 2, 2014, 6:05:24 AM6/2/14
to dynamor...@googlegroups.com

 
xor %rax, %rax
(meta1)
l1:
mov %rax, %rbx
(meta2)
jmp l1
 

What's l1, a label? If so, the jmp target will be the instruction after l1. But keep in mind that the label is a "first class" member of the list just like the executable instructions, so insertion will not accidentally go on the wrong side of the label. To be especially safe in this regard, you can always insert relative to the label explicitly.
 
 
It's not a "meta" label, I meant an application "label", i.e. just an immediate address.
So, when CTI occurs in the application code, what meta-instructions are considered to be associated with the target instruction? Preinserted for that instruction *and* postinserted for the pervious one (i.e.for the statically preceeding one)? Or the former only?

Actually neither. If you want meta instructions to execute at the jmp target, you'll have to retarget to the first meta instruction. The "meta" qualifier is only kept by DR for making decisions about instances of instr_t. Once an instruction goes into the code cache, it's an x86 instruction like any other--meta or otherwise. So when your jmp executes, the next instruction will be the physical target of the jmp, and from there execution goes in sequence until the next cti.
 
 
Then, in order to take into account additional instructions DR recalculates the target jmp address anyway, doesn't it? If so, the question is still valid: does the re-calculation take into account meta-instructions preinserted for the target instruction? If yes, this effectively means, in terms of my question, that "pre-inserted meta-instructions are considered to be associated with the target instruction". Or am I missing something?
I believe it's documented somewhere, so I'd appreciate the relevant link.
 

Byron Hawkins

unread,
Jun 2, 2014, 6:23:12 AM6/2/14
to dynamor...@googlegroups.com
I don't quite follow. The jmp target is an instruction, so it will jump to whichever instruction you specify. The only calculation is to determine the address of that target. So it's entirely up to you to decide which instruction will be the target. The "meta" qualifier has no role in the matter, and neither does pre or post inserting.

Igor R

unread,
Jun 2, 2014, 6:51:34 AM6/2/14
to dynamor...@googlegroups.com
JMP target in my example is "mov %rax, %rbx" instruction. Lets assume we call: instrlist_meta_preinsert(bb, instr, new_instr) - where instr is the aforementioned "mov". Now, when jmp executes, where does it transfer the control? To the original "mov" or to the newly inserted new_instr? If I understand you correctly, the former is right. But then I don't quite understand how some the DR examples work - eg, memtrace. Do they assume the code is linear and there're no CTIs?
 

Byron Hawkins

unread,
Jun 2, 2014, 7:03:50 AM6/2/14
to dynamor...@googlegroups.com
Yes, DR does not change a jmp target. I'm not sure what assumptions may be made by the examples, though in hand-coded assembly it's quite common to have a large number of strict dependencies (i.e., there will be errors if any tiny thing changes, no matter how logical it might seem).
 

Igor R

unread,
Sep 26, 2014, 11:24:11 AM9/26/14
to dynamor...@googlegroups.com
Ths is a bit old thread, but I'd like to try and discuss it again, as the conclusions here might be somewhat misleading...
Consider the following instruction sequence:
     jmp label1
     xchg %rax, %rax
     label1: mov $123, %rbx
being instrumented as follows:
 
for (auto instr = instrlist_first(bb); instr; instr = next)
{
  next = instr_get_next(instr);
  if (instr_is_mov(instr) && instr_uses_reg(instr, DR_REG_XBX))
    PRE(instr, int3(drcontext));

In other words, "mov" is the original jump target and it's preceeded by int3 meta. One can see in GDB that the jump *is* retargeted to int3 meta automatically. The same effect is observed if postinserting after instr_get_prev(instr).
So, I'd like to understand better what DR does to CTIs. Does it always retargets them to the very first meta preceeding the original target?
 
Thanks.
 
 
 

Derek Bruening

unread,
Sep 26, 2014, 12:09:57 PM9/26/14
to dynamor...@googlegroups.com
On Fri, Sep 26, 2014 at 11:24 AM, Igor R <boost...@gmail.com> wrote: 
Ths is a bit old thread, but I'd like to try and discuss it again, as the conclusions here might be somewhat misleading...
Consider the following instruction sequence:
     jmp label1
     xchg %rax, %rax
     label1: mov $123, %rbx

This is not an application instruction sequence a client will ever see.  I have to assume you're talking about an already-instrumented sequence, in which case the jmp is probably meta and the label is a separate label instruction.  It's probably best to use DR's output which clearly labels meta vs app instructions and shows label instructions as first-class.

Igor R

unread,
Sep 26, 2014, 12:20:50 PM9/26/14
to dynamor...@googlegroups.com
It's the application sequence I wrote by my own hands in assembly...
 
 

Derek Bruening

unread,
Sep 26, 2014, 12:31:50 PM9/26/14
to dynamor...@googlegroups.com
And your client did not see that sequence in one instrlist.  DR is a dynamic tool.  Clients see dynamic basic blocks.  See also http://dynamorio.org/docs/API_BT.html: "Both basic blocks and traces present a linear view of control flow. In other words, instruction sequences have a single entrance and one or more exits".

--
You received this message because you are subscribed to the Google Groups "DynamoRIO Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dynamorio-use...@googlegroups.com.
To post to this group, send email to dynamor...@googlegroups.com.
Visit this group at http://groups.google.com/group/dynamorio-users.
For more options, visit https://groups.google.com/d/optout.

Igor R

unread,
Sep 26, 2014, 1:01:20 PM9/26/14
to dynamor...@googlegroups.com
 
 
On Fri, Sep 26, 2014 at 12:20 PM, Igor R <boost...@gmail.com> wrote:
On Fri, Sep 26, 2014 at 11:24 AM, Igor R <boost...@gmail.com> wrote: 
Ths is a bit old thread, but I'd like to try and discuss it again, as the conclusions here might be somewhat misleading...
Consider the following instruction sequence:
     jmp label1
     xchg %rax, %rax
     label1: mov $123, %rbx

This is not an application instruction sequence a client will ever see.  I have to assume you're talking about an already-instrumented sequence, in which case the jmp is probably meta and the label is a separate label instruction.  It's probably best to use DR's output which clearly labels meta vs app instructions and shows label instructions as first-class.

 
 
It's the application sequence I wrote by my own hands in assembly...
> And your client did not see that sequence in one instrlist.  DR is a dynamic tool.  Clients see dynamic basic blocks.  See also http://dynamorio.org/docs/API_BT.html: "Both basic blocks and traces present a linear view of control flow. In other words, instruction sequences have a single entrance and one or more exits".
 
 
Sure, I understand that...
So, I believe the explanation of the above observation is like this: the jump was targeting some basic-block, and it keeps targeting the same basic block after the instrumentation, but the bb itself changed - now it begins with the preinserted "int 3". That's why we get the effect of "retargted" jump.
Actually, I was concerned about a more complicated case, when another jump targets a middle of a previously instrumented bb. But according to DR papers, DR treats such a "partial bb" as a completely new one, so it will be just re-instrumented and put in some other place in the code-cache, right?
(I believe all this is obvious for DR experts, but I just wanted to articulate that to be sure I don't fall in a pitfall here.)
 
 
Reply all
Reply to author
Forward
0 new messages