AddressSanitizer's allocator

Jonas Wagner

unread,

Sep 25, 2014, 10:45:10 AM9/25/14

to address-...@googlegroups.com

Dear AddressSanitizer developers,

I'm thinking about ways to optimize the performance of ASan's allocator. There are a few benchmarks where a large fraction of the overhead comes from the allocator and the quarantine queue, rather than the checks themselves (e.g., gcc from SPEC2006).

When I looked at the allocator, I was surprised that it is implemented inside ASan's runtime library (or rather, in sanitizer_common). This is unlike other intercepted functions such as strcpy, which forward to the implementation from libc. What is the reason for this?

Would it be possible to implement asan_malloc as a decorator on top of libc malloc? Or on top of an existing implementation such as tcmalloc? This seems desirable to me because these are highly tuned. It might also simplify the sanitizer codebase.

I'm sure this case has been considered. What are the reasons for the current design?

Besides this question, I wonder if there are other ways of optimizing the allocator or the quarantine mechanism. If you can think of any (relatively) low-hanging fruit, I'd be motivated to give it a try.

Best,
Jonas

Dmitry Vyukov

unread,

Sep 25, 2014, 11:49:25 AM9/25/14

to address-sanitizer

Hi Jonas,

When I measured performance of sanitizer allocator against tcmalloc on
a large real application that calls malloc a lot, sanitizer allocator
was 10% faster. You are free to do you own measurements. Maybe you
will discover some inefficiency in the allocator that you can fix.

One historical reason for using own allocator is that we needed the
meta information that is not covered by shadow, so it can't be a
simple prefix/postfix of the memory block itself. We don't use the
meta info anymore.

Even if we switch to tcmalloc, we still need quarantine on top of it.
And quarantine kills cache locality. You remove that overhead simply
by switching to a different allocator.

One of the low hanging fruits is disabling allocator stats for asan.
The stats are required for tsan, but asan keeps own stats. So it's
possible to implement a NoOpStats class for asan, and parametrize the
allocator with that class. I don't know of any other low hanging
fruits.

Also note that the allocator must scale well for high core counts.

Konstantin Serebryany

unread,

Sep 25, 2014, 7:57:34 PM9/25/14

to address-...@googlegroups.com

We never used out-of-chunk metadata in asan.
We did use it in tsan -- not any more.
We still use it in msan.

Another major feature of the allocator compared to wrapping the system malloc
is that asan's allocator insures that the left redzone of a chunk is
always the right redzone of preceding chunk.
You can not achieve that with external malloc w/o adding too many redzones.

>
> Even if we switch to tcmalloc, we still need quarantine on top of it.
> And quarantine kills cache locality. You remove that overhead simply
> by switching to a different allocator.
>
> One of the low hanging fruits is disabling allocator stats for asan.
> The stats are required for tsan, but asan keeps own stats. So it's
> possible to implement a NoOpStats class for asan, and parametrize the
> allocator with that class. I don't know of any other low hanging
> fruits.

I would love to see good malloc benchmarks, multi-threaded and not, in
our code base... This will really help.

--kcc

>
> Also note that the allocator must scale well for high core counts.
>

> --
> You received this message because you are subscribed to the Google Groups "address-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to address-saniti...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Jonas Wagner

unread,

Sep 26, 2014, 4:30:32 AM9/26/14

to address-...@googlegroups.com

Thanks a lot for all these replies! They really help me understand the design.

When I measured performance of sanitizer allocator against tcmalloc on
a large real application that calls malloc a lot, sanitizer allocator
was 10% faster.

This is interesting! It probably means there are no large gains to get here…

My motivation comes from the gcc SPEC benchmark; here are a few numbers:

Raw ASan slowdown: 1.91x
removing checks (-mllvm -asan-instrument-*=0) reduces overhead by: 39.7%
removing stack poisoning (-mllvm -asan-stack=0) increases overhead by 5.58% (?)
removing interception (replace_str=0:replace_intrin=0) reduces overhead by 1.03%
removing quarantine (quarantine_size=0) reduces overhead by 12.17%
removing heap poisoning (poison_heap=0) reduces overhead by 22.12%

Compared to other SPEC benchmarks, the fraction of overhead due to checks is very low for gcc. Much overhead seems to come from heap poisoning and from the quarantine queue, but there are also some 30% of overhead that hide elsewhere… I suspect that the allocator might be to blame. This suspicion arose because perf showed a high number of samples in the allocator.

(Note that these numbers might contain measurement inaccuracies, and that I have only looked at single factors, not combinations of them. All of these were measured with malloc_context_size=0.)

Even if we switch to tcmalloc, we still need quarantine on top of it.
And quarantine kills cache locality.

This makes sense :(

Another major feature of the allocator compared to wrapping the system malloc
is that asan's allocator insures that the left redzone of a chunk is
always the right redzone of preceding chunk.

This is smart, and seems indeed impossible to do when wrapping the system malloc. How does this interact with adaptive redzone sizes? Could it happen that a large object is located right after a small one, and thus the left redzone of the large object is smaller than desired?

One of the low hanging fruits is disabling allocator stats for asan.

That sounds possible.

Also, when looking at the code, I also saw that the allocator initializes the newly allocated memory. Setting max_malloc_fill_size=0 in ASAN_OPTIONS saved 2% or so for the gcc SPEC benchmark. I guess the initialization is done to erase pointers and detect more cases of use-after-free?

I’ll look a bit more into this, and try to come up with other optimization ideas. Additional suggestions are welcome!

Best,
Jonas

Alexander Potapenko

unread,

Sep 26, 2014, 4:43:49 AM9/26/14

to address-...@googlegroups.com

On Fri, Sep 26, 2014 at 12:29 PM, Jonas Wagner <jonas....@epfl.ch> wrote:
> Thanks a lot for all these replies! They really help me understand the
> design.
>>
>> When I measured performance of sanitizer allocator against tcmalloc on
>> a large real application that calls malloc a lot, sanitizer allocator
>> was 10% faster.
>
> This is interesting! It probably means there are no large gains to get here…
>
> My motivation comes from the gcc SPEC benchmark; here are a few numbers:
>
> Raw ASan slowdown: 1.91x
> removing checks (-mllvm -asan-instrument-*=0) reduces overhead by: 39.7%
> removing stack poisoning (-mllvm -asan-stack=0) increases overhead by 5.58%
> (?)
> removing interception (replace_str=0:replace_intrin=0) reduces overhead by
> 1.03%
> removing quarantine (quarantine_size=0) reduces overhead by 12.17%
> removing heap poisoning (poison_heap=0) reduces overhead by 22.12%
>
> Compared to other SPEC benchmarks, the fraction of overhead due to checks is
> very low for gcc. Much overhead seems to come from heap poisoning and from
> the quarantine queue, but there are also some 30% of overhead that hide
> elsewhere… I suspect that the allocator might be to blame. This suspicion
> arose because perf showed a high number of samples in the allocator.

Have you compared ASan perf stats to those with glibc allocator or tcmalloc?

> (Note that these numbers might contain measurement inaccuracies, and that I
> have only looked at single factors, not combinations of them. All of these
> were measured with malloc_context_size=0.)
>>
>> Even if we switch to tcmalloc, we still need quarantine on top of it.
>> And quarantine kills cache locality.
>
> This makes sense :(
>>
>> Another major feature of the allocator compared to wrapping the system
>> malloc
>> is that asan's allocator insures that the left redzone of a chunk is
>> always the right redzone of preceding chunk.
>
> This is smart, and seems indeed impossible to do when wrapping the system
> malloc. How does this interact with adaptive redzone sizes? Could it happen
> that a large object is located right after a small one, and thus the left
> redzone of the large object is smaller than desired?

In ASan allocator objects belonging to different size classes cannot
be adjacent in memory.

>>
>> One of the low hanging fruits is disabling allocator stats for asan.
>
> That sounds possible.
>
> Also, when looking at the code, I also saw that the allocator initializes
> the newly allocated memory. Setting max_malloc_fill_size=0 in ASAN_OPTIONS
> saved 2% or so for the gcc SPEC benchmark. I guess the initialization is
> done to erase pointers and detect more cases of use-after-free?
>
> I’ll look a bit more into this, and try to come up with other optimization
> ideas. Additional suggestions are welcome!

A slightly unrelated thing that we're interested in is heap
randomization under ASan.
Right now a sequence of memory allocations with known sizes will
always get the same addresses under ASan.
A better idea is to randomize those addresses to make the heap layout
less deterministic.
We don't exactly know how the randomization should behave, but it
probably should perform no worse than that of glibc malloc.
> Best,
> Jonas

>
> --
> You received this message because you are subscribed to the Google Groups
> "address-sanitizer" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to address-saniti...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

--
Alexander Potapenko
Software Engineer
Google Moscow

Jonas Wagner

unread,

Sep 26, 2014, 8:08:57 AM9/26/14

to address-...@googlegroups.com

Hi,

> Compared to other SPEC benchmarks, the fraction of overhead due to checks is
> very low for gcc. Much overhead seems to come from heap poisoning and from
> the quarantine queue, but there are also some 30% of overhead that hide
> elsewhere… I suspect that the allocator might be to blame. This suspicion
> arose because perf showed a high number of samples in the allocator.

Have you compared ASan perf stats to those with glibc allocator or tcmalloc?

That comparison was against glibc's allocator. I haven't tried tcmalloc. Also note that gcc is a single-threaded benchmark AFAIK.

I'd like to try other benchmarks and other allocators... do you have a good idea of what else I should try?

In ASan allocator objects belonging to different size classes cannot
be adjacent in memory.

OK, that makes sense. In this case the redzone optimizations should indeed be sound.

A slightly unrelated thing that we're interested in is heap
randomization under ASan.

I see. Not directly related to what I'm trying to achieve, but an interesting idea in general! I don't have many thoughts regarding that right now, except that the need for deterministic replay is very important. Users should have the possibility to manually set a seed, so they can reproduce an issue once it occurred.

Also, this might be very interesting for the stack, too. I think even a slightly randomized stack layout (through random redzone sizes or variable reordering) could make it harder to write exploits. I know ASan is a debugging tool more than a security hardening tool, but I'd still find this interesting.

Cheers,
Jonas

Konstantin Serebryany

unread,

Sep 26, 2014, 4:26:13 PM9/26/14

to address-...@googlegroups.com

This is a poor-man's detector of uninitialized data. Not strictly necessary.

>
> I’ll look a bit more into this, and try to come up with other optimization
> ideas. Additional suggestions are welcome!
>
> Best,
> Jonas
>

Yuri Gribov

unread,

Sep 26, 2014, 4:54:43 PM9/26/14

to address-...@googlegroups.com

On Fri, Sep 26, 2014 at 4:08 PM, Jonas Wagner <jonas....@epfl.ch> wrote:
> Also, this might be very interesting for the stack, too. I think even a
> slightly randomized stack layout (through random redzone sizes or variable
> reordering) could make it harder to write exploits. I know ASan is a
> debugging tool more than a security hardening tool, but I'd still find this
> interesting.

AFAIK KAsan people experimented with something like this to get cheap
probabilistic UAR detection.

-Y

Reply all

Reply to author

Forward