Thanks a lot for all these replies! They really help me understand the design.
When I measured performance of sanitizer allocator against tcmalloc on
a large real application that calls malloc a lot, sanitizer allocator
was 10% faster.
This is interesting! It probably means there are no large gains to get here…
My motivation comes from the gcc SPEC benchmark; here are a few numbers:
Raw ASan slowdown: 1.91x
removing checks (-mllvm -asan-instrument-*=0) reduces overhead by: 39.7%
removing stack poisoning (-mllvm -asan-stack=0) increases overhead by 5.58% (?)
removing interception (replace_str=0:replace_intrin=0) reduces overhead by 1.03%
removing quarantine (quarantine_size=0) reduces overhead by 12.17%
removing heap poisoning (poison_heap=0) reduces overhead by 22.12%
Compared to other SPEC benchmarks, the fraction of overhead due to checks is very low for gcc. Much overhead seems to come from heap poisoning and from the quarantine queue, but there are also some 30% of overhead that hide elsewhere… I suspect that the allocator might be to blame. This suspicion arose because perf showed a high number of samples in the allocator.
(Note that these numbers might contain measurement inaccuracies, and that I have only looked at single factors, not combinations of them. All of these were measured with malloc_context_size=0.)
Even if we switch to tcmalloc, we still need quarantine on top of it.
And quarantine kills cache locality.
This makes sense :(
Another major feature of the allocator compared to wrapping the system malloc
is that asan's allocator insures that the left redzone of a chunk is
always the right redzone of preceding chunk.
This is smart, and seems indeed impossible to do when wrapping the system malloc. How does this interact with adaptive redzone sizes? Could it happen that a large object is located right after a small one, and thus the left redzone of the large object is smaller than desired?
One of the low hanging fruits is disabling allocator stats for asan.
That sounds possible.
Also, when looking at the code, I also saw that the allocator initializes the newly allocated memory. Setting max_malloc_fill_size=0 in ASAN_OPTIONS saved 2% or so for the gcc SPEC benchmark. I guess the initialization is done to erase pointers and detect more cases of use-after-free?
I’ll look a bit more into this, and try to come up with other optimization ideas. Additional suggestions are welcome!
Best,
Jonas
> Compared to other SPEC benchmarks, the fraction of overhead due to checks is
> very low for gcc. Much overhead seems to come from heap poisoning and from
> the quarantine queue, but there are also some 30% of overhead that hide
> elsewhere… I suspect that the allocator might be to blame. This suspicion
> arose because perf showed a high number of samples in the allocator.
Have you compared ASan perf stats to those with glibc allocator or tcmalloc?
In ASan allocator objects belonging to different size classes cannot
be adjacent in memory.
A slightly unrelated thing that we're interested in is heap
randomization under ASan.