GetPreviousInstructionPc question

14 views
Skip to first unread message

evgeny777

unread,
Apr 20, 2017, 4:44:33 AM4/20/17
to address-sanitizer
I noticed that GetPreviousInstructionPc() function returns 'pc - 1' for both arm32 and arm64.
This causes odd addresses to appear in stack traces, which is nonsense, as both arm32/64 instructions 
have 4 byte size and alignment.

The x86 and x86_64 cases are even more confusing, because instruction length is not constant. What exactly this 'pc - 1' is expected to return?

But even if one is able to get previous instruction address correctly he may still get confusing results. In case some instruction triggers
hardware exception, its address will go to ASAN stack trace (via SlowUnwindStackWithContext). Returning address of previous instruction
in such case can be extremely confusing.

Is there any point in using this function?

Dmitry Vyukov

unread,
Apr 20, 2017, 5:01:25 AM4/20/17
to address-sanitizer
Hi,

Yes, there is a very bold point in using this function.
Typically top frame PC is obtained with __builtin_return_address,
which means that it points to the next instruction after the call. And
we need to obtain debug info associated with the call instruction. To
achieve that we subtract 1 from PC. All symbolization code that we've
seen is fine with PC pointing into a middle of an instruction.

Now, if we print pc-1 in reports (do we?), then it's a bug. We need to
print unaltered PC in reports.

Re hardware exceptions. This needs to be fixed. A trivial change would
be to add 1 to PCs pointing to faulting instruction. Then
GetPreviousInstructionPc will offset this and we get correct debug
info. However, then we will print incorrect PC in report. So a proper
fix would be to augment all stack traces with a flag saying if top PC
needs to be adjusted during symbolization or not.

Yuri Gribov

unread,
Apr 20, 2017, 5:11:38 AM4/20/17
to address-...@googlegroups.com
On Thu, Apr 20, 2017 at 9:44 AM, evgeny777 <evgeny....@gmail.com> wrote:
> I noticed that GetPreviousInstructionPc() function returns 'pc - 1' for both
> arm32 and arm64.
> This causes odd addresses to appear in stack traces, which is nonsense, as
> both arm32/64 instructions
> have 4 byte size and alignment.

Thumb instructions may have 2 byte size. Parsing instruction to
determine size may not justify complexity.

-I

evgeny777

unread,
Apr 20, 2017, 5:11:48 AM4/20/17
to address-sanitizer
Thanks for clarifying it, Dmitry.

Here is piece of report I get:

==18244==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x60200000001a at pc 0x0000005a9cad bp 0x7ffc10528760 sp 0x7ffc10528740
WRITE of size 1 at 0x60200000001a thread T0
    #0 0x5a9cac  (/home/evgeny/work/linker_scripts/asan/asan+0x5a9cac)
    #1 0x7f310488082f  (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #2 0x419498  (/home/evgeny/work/linker_scripts/asan/asan+0x419498)

....

Below is the piece of disassembly of main :

    ..... 
    0x5a9ca8 <+136>: callq  0x56d9d0                  ; ::__asan_report_store1(__sanitizer::uptr) at asan_rtl.cc:136
    0x5a9cad <+141>: xorl   %eax, %eax
    .....

As you may noticed 0x5a9cac == (0x5a9cad - 1)

Dmitry Vyukov

unread,
Apr 20, 2017, 5:20:35 AM4/20/17
to address-sanitizer
On Thu, Apr 20, 2017 at 11:11 AM, evgeny777 <evgeny....@gmail.com> wrote:
> Thanks for clarifying it, Dmitry.
>
> Here is piece of report I get:
>
> ==18244==ERROR: AddressSanitizer: heap-buffer-overflow on address
> 0x60200000001a at pc 0x0000005a9cad bp 0x7ffc10528760 sp 0x7ffc10528740
> WRITE of size 1 at 0x60200000001a thread T0
> #0 0x5a9cac (/home/evgeny/work/linker_scripts/asan/asan+0x5a9cac)
> #1 0x7f310488082f (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
> #2 0x419498 (/home/evgeny/work/linker_scripts/asan/asan+0x419498)
>
> ....
>
> Below is the piece of disassembly of main :
>
> .....
> 0x5a9ca8 <+136>: callq 0x56d9d0 ;
> ::__asan_report_store1(__sanitizer::uptr) at asan_rtl.cc:136
> 0x5a9cad <+141>: xorl %eax, %eax
> .....
>
> As you may noticed 0x5a9cac == (0x5a9cad - 1)


I think tsan prints unmodified PC and we should do the same in asan.
This also reliefs us from figuring out correct instruction length on
ARM/thumb/etc as nobody sees the modified PC.
> --
> You received this message because you are subscribed to the Google Groups
> "address-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to address-saniti...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Yuri Gribov

unread,
Apr 20, 2017, 5:51:00 AM4/20/17
to address-...@googlegroups.com
On Thu, Apr 20, 2017 at 10:20 AM, 'Dmitry Vyukov' via
address-sanitizer <address-...@googlegroups.com> wrote:
> On Thu, Apr 20, 2017 at 11:11 AM, evgeny777 <evgeny....@gmail.com> wrote:
>> Thanks for clarifying it, Dmitry.
>>
>> Here is piece of report I get:
>>
>> ==18244==ERROR: AddressSanitizer: heap-buffer-overflow on address
>> 0x60200000001a at pc 0x0000005a9cad bp 0x7ffc10528760 sp 0x7ffc10528740
>> WRITE of size 1 at 0x60200000001a thread T0
>> #0 0x5a9cac (/home/evgeny/work/linker_scripts/asan/asan+0x5a9cac)
>> #1 0x7f310488082f (/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
>> #2 0x419498 (/home/evgeny/work/linker_scripts/asan/asan+0x419498)
>>
>> ....
>>
>> Below is the piece of disassembly of main :
>>
>> .....
>> 0x5a9ca8 <+136>: callq 0x56d9d0 ;
>> ::__asan_report_store1(__sanitizer::uptr) at asan_rtl.cc:136
>> 0x5a9cad <+141>: xorl %eax, %eax
>> .....
>>
>> As you may noticed 0x5a9cac == (0x5a9cad - 1)
>
>
> I think tsan prints unmodified PC and we should do the same in asan.
> This also reliefs us from figuring out correct instruction length on
> ARM/thumb/etc as nobody sees the modified PC.

Hm, the unmodified PC will make symbolized stacktraces less readable.
What's the problem with "-1"? Addr2line and other bintools work fine
with it.

Dmitry Vyukov

unread,
Apr 20, 2017, 5:54:20 AM4/20/17
to address-sanitizer
I literally mean "print unmodified PC" (as a hex value). I am not
proposing to change how symbolization works.

Yuri Gribov

unread,
Apr 20, 2017, 6:01:00 AM4/20/17
to address-...@googlegroups.com
On Thu, Apr 20, 2017 at 10:53 AM, 'Dmitry Vyukov' via
My understanding is that symbolization code symbolizes "pc-1" as well.
If we keep symbolization code unchanged and only change printed hex
value, this will probly cause offline symbolizer to behave differently
from internal symbolizer.

Actually the situation is even more interesting: on ARM we do pc &
(~1) (not pc-1) to cancel out Thumb bit.

Eugene Leviant

unread,
Apr 20, 2017, 6:37:26 AM4/20/17
to address-...@googlegroups.com
Yuri, I think on ARM we also subtract 1 from pc:

#if defined(__arm__)
// Cancel Thumb bit.
pc = pc & (~1);
#endif
#if defined(__powerpc__) || defined(__powerpc64__)
// ...
#elif defined(__sparc__) || defined(__mips__)
// ...
#else
return pc - 1; // Called for ARM as well as for Thumb and aarch64
#endif
> You received this message because you are subscribed to a topic in the Google Groups "address-sanitizer" group.
> To unsubscribe from this topic, visit https://groups.google.com/d/topic/address-sanitizer/n-w--JmHIyU/unsubscribe.
> To unsubscribe from this group and all its topics, send an email to address-saniti...@googlegroups.com.

Dmitry Vyukov

unread,
Apr 20, 2017, 7:02:57 AM4/20/17
to address-sanitizer
Okay, I don't think this is worth any more of our time.
If there is a spot improvement to GetPreviousInstructionPc for arm,
that will be fine to do.
Reply all
Reply to author
Forward
0 new messages