Hello Derek,
The logs told me nothing more than I already knew, the BB was not executing even though it was logged being instrumented. It is worth noting that this was the first BB after a cache flush. Not sure if that causes any issues.
Anyway, I refactored my code a bit, and managed to get the BB to actually work, with a different problem now:
The original BB Start PC was: 0x00007FFFF7A97D4D
The inserted BB translation PC is: 0x00007FFFF7A97D4B (which is the same for all instructions, including the syscall). This PC + the 2 bytes of the syscall should yield the original PC above, so fall through should be fine.
What I get, however, is the following, the syscall gets invoked many times in a loop, then the loop suddenly breaks and the original BB starts doing its business.
Here's a representative part of the log:
Instrumentation
before instrumentation:
TAG 0x00007ffff7a97d4d
+0 L3 @0x00007fff345c0d98 31 ed xor ebp, ebp
+2 L3 @0x00007fff345c0e60 58 pop rax
+3 L3 @0x00007fff345c0f58 5f pop rdi
+4 L3 @0x00007fff345c1020 ff d0 call rax
END 0x00007ffff7a97d4d
after instrumentation:
TAG 0x00007ffff7a97d4d
+0 L4 @0x00007fff345c1150 48 a3 18 9b 1f b5 ff mov qword ptr [0x00007fffb51f9b18], rax
7f 00 00
+10 L4 @0x00007fff345c11d0 48 89 3d 19 12 d4 80 mov qword ptr [0x00007fffb51f9b20], rdi
+17 L4 @0x00007fff345c1250 48 89 35 21 12 d4 80 mov qword ptr [0x00007fffb51f9b28], rsi
+24 L4 @0x00007fff345c12d0 48 b8 9e 00 00 00 00 mov rax, 0x000000000000009e
00 00 00
+34 L4 @0x00007fff345c1350 48 bf 03 10 00 00 00 mov rdi, 0x0000000000001003
00 00 00
+44 L4 @0x00007fff345c13d0 48 be 78 9c 1f b5 ff mov rsi, 0x00007fffb51f9c78
7f 00 00
+54 L3 @0x00007fff345c10e8 0f 05 syscall
END 0x00007ffff7a97d4d
Fragment 3792, tag 0x00007ffff7a97d4d, flags 0x1801018, shared, tracehead, size 59, must end trace:
0x00007fffb51ec04c 48 a3 18 9b 1f b5 ff mov qword ptr [0x00007fffb51f9b18], rax
7f 00 00
0x00007fffb51ec056 48 89 3d c3 da 00 00 mov <rel> qword ptr [0x00007fffb51f9b20], rdi
0x00007fffb51ec05d 48 89 35 c4 da 00 00 mov <rel> qword ptr [0x00007fffb51f9b28], rsi
0x00007fffb51ec064 48 b8 9e 00 00 00 00 mov rax, 0x000000000000009e
00 00 00
0x00007fffb51ec06e 48 bf 03 10 00 00 00 mov rdi, 0x0000000000001003
00 00 00
0x00007fffb51ec078 48 be 78 9c 1f b5 ff mov rsi, 0x00007fffb51f9c78
7f 00 00
0x00007fffb51ec082 e9 bc 5b d3 ff jmp 0x00007fffb4f21c43
-------- exit stub 0: -------- <target: 0x00007ffff7a97d4d> type: fall-through/speculated/IAT
0x00007fffb4f21c43 67 65 48 a3 00 00 00 mov qword ptr [gs:0x00], rax
00
0x00007fffb4f21c4b 48 b8 00 80 26 34 ff mov rax, 0x00007fff34268000
7f 00 00
0x00007fffb4f21c55 e9 a6 e1 2d ff jmp $0x00007fffb41ffe00 <fcache_return>
which so far looks OK, just the original BB giving place for the inserted one. It even shows the correct fall through address
Loop
Entry into F3792(0x00007ffff7a97d4d).0x00007fffb51ec04c (trace head)(shared)
fcache_enter = 0x00007fffb41ffd00, target = 0x00007fffb51ec04c
Exit from F3792(0x00007ffff7a97d4d).0x00007fffb51ec082 (shared)
(block ends with syscall)
Entry into do_syscall to execute a non-ignorable system call
system call 158
fcache_enter = 0x00007fffb41ffd00, target = 0x00007fffb4200c40
Exit from system call
post syscall: sysnum=0x000000000000009e, result=0x0000000000000000 (0)
thread 49300 segment change => app lib tls base: 0x000000abbba01640, alt tls base: 0x0000000000000000
finished handling system call
d_r_dispatch: target = 0x00007ffff7a97d4d
priv_mcontext_t @0x00007fff348d8bc0
SKIPPED for size
Entry into F3792(0x00007ffff7a97d4d).0x00007fffb51ec04c (trace head)(shared)
This is the bad part, it keeps going on for a while. And every time it tries to fall through to 0x00007FFFF7A97D4D it takes the inserted BB (which starts at 0x00007FFFF7A97D4B) rather than invoke my BB instrumentation for the original.
Break
Entry into F3792(0x00007ffff7a97d4d).0x00007fffb51ec04c (trace head)(shared)
fcache_enter = 0x00007fffb41ffd00, target = 0x00007fffb51ec04c
Exit from F3792(0x00007ffff7a97d4d).0x00007fffb51ec082 (shared)
(block ends with syscall)
Entry into do_syscall to execute a non-ignorable system call
system call 158
fcache_enter = 0x00007fffb41ffd00, target = 0x00007fffb4200c40
Exit from system call
post syscall: sysnum=0x000000000000009e, result=0x0000000000000000 (0)
thread 49300 segment change => app lib tls base: 0x000000abbba01640, alt tls base: 0x0000000000000000
finished handling system call
d_r_dispatch: target = 0x00007ffff7a97d4d
Going to start trace with F3792 (tag 0x00007ffff7a97d4d)
Creating private copy of F3792 (0x00007ffff7a97d4d) for trace creation
check_thread_vm_area: pc = 0x00007ffff7a97d4d
new vm area for thread: 0x00007ffff79a6000-0x00007ffff7b11000 ---- libc.so.6
checking thread vmareas against executable_areas
prepend_entry_to_fraglist: putting fragment @0x00007ffff7a97d4d (private) on vmarea 0x00007ffff79a6000-0x00007ffff7b11000
check_thread_vm_area: check_stop = 0x00007ffff7b11000
I think the line in red is what triggers it to break the loop. Of course, I could be wrong.
Solution
So, based on this. Is there a way to insert the new BB with a different tag? One that starts at 0x00007FFFF7A97D4B instead of 0x00007FFFF7A97D4D? Is there any other way to resolve this conflict?
Technically, this does not affect or break the application, but it does create may be a 100 system calls instead of 1, which is obviously not good would affect performance.