PartitionAlloc Moar Of The Things!

Chris Palmer

unread,

Nov 2, 2016, 4:20:13 PM11/2/16

to platform-architecture-dev, projec...@chromium.org, Kentaro Hara, Justin Schuh

Hi all,

jschuh tasked me with seeing if we can use PartitionAlloc in more places throughout Chromium, not just in Blink. Additionally, we'd like to (try to) harden it further, repair some hardening regressions while not regressing performance, and maybe even use PA everywhere, instead of tcmalloc.

I wrote up a draft of a plan, which you can see and comment on here:

https://docs.google.com/document/d/16FROhiOc0eAg58Kn30mERT96I2eYOvTF8VDi6cwkvyc/edit#

I appreciate your feedback and clues!

Kentaro Hara

unread,

Nov 2, 2016, 10:32:58 PM11/2/16

to Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

PartitionAlloc everywhere is my dream (even though we cannot replace memory allocators used by some Android vendors).

The plan looks good to me :)

--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.
To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CAOuvq205iEQ6oUD9PSCbCBask6rWubsW5jX2PTi1FRttMt_Q2g%40mail.gmail.com.

--

Kentaro Hara, Tokyo, Japan

Primiano Tucci

unread,

Nov 3, 2016, 2:16:55 PM11/3/16

to Kentaro Hara, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

I am generally excited by the proposal, I remember lot of interesting results coming out of partitionalloc in blink.

Here some suggestions form my side, mostly to speed up things and avoid bummers:

- Definitely happy about all the plan about PAlloc in /base (we have a base/allocator folder, hint hint)

- I think the right model is to NOT replace the default allocator but make PA something you can opt-in on a per-class basis similarly to what happens in blink with the WTF_MAKE_FAST_ALLOCATED macro. The reason of this is:

1) The current state today is: we replace the default allocator with tcmalloc only on Linux desktop but intercept allocations on all platforms (%iOS) with the shim. Replacing the allocator has very subtle consequences and today we have coverage for that only on Linux.

In the shim where we intercept allocations we do security checks like suicide-on-OOM. You might be tempted to use the shim to enable PA everywhere like we do today only on Linux. I'd advise against that. The key thing here is that "intercepting" allocations is safer than "replacing" allocations. replacing is dangerous.

I am sure that on Android (and I think the same happens on mac) some few allocations still bypass the shim. For instance when something inside libc calls strdup(). Or when a gl driver does allocations in its .so. Today we just accept that and practically what happens is that we don't enforce the security checks in those few places. If you now replace the default allocator there you start running into situations where you have two heaps: a 99% heap coming from your allocator and a 1% heap for the examples I mentioned above. bad things will happen when you have a free() cross-heap.

In other words: replacing the default allocator means committing yourself to a long tail of hard-to-debug bugs like this. I won't go there :) Also it would honestly be a huge stability risk for Android where OEMs are known to play extensively with libc (so we cannot make any assumptions of what happens there. what happens in libc stays in libc).

2) It has been a long time since I looked into PA, but IIRC it is extremely optimized for single threaded scenarios, but just takes a spinlock on MT. It works great for blink but I am not sure will work great *everywhere* in the browser process.

I think that having a opt-in model allows you a more gradual migration that can be incrementally tested against the perf waterfall. If you just replace the default allocator I am pretty sure that some perf benchmarks will jump up (I'm thinking to cc thread pools) while something else will improve. And you don't want to be in a state where either everybody is in or everybody is out.

For this reason I'd be a bit cautious about that "I'd simply aim at replacing tcmalloc with PA."

On Thu, Nov 3, 2016 at 2:32 AM Kentaro Hara <har...@chromium.org> wrote:

PartitionAlloc everywhere is my dream (even though we cannot replace memory allocators used by some Android vendors).

The plan looks good to me :)

On Thu, Nov 3, 2016 at 5:20 AM, Chris Palmer <pal...@chromium.org> wrote:

Hi all,

jschuh tasked me with seeing if we can use PartitionAlloc in more places throughout Chromium, not just in Blink. Additionally, we'd like to (try to) harden it further, repair some hardening regressions while not regressing performance, and maybe even use PA everywhere, instead of tcmalloc.

I wrote up a draft of a plan, which you can see and comment on here:

https://docs.google.com/document/d/16FROhiOc0eAg58Kn30mERT96I2eYOvTF8VDi6cwkvyc/edit#

I appreciate your feedback and clues!

--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.

To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CAOuvq205iEQ6oUD9PSCbCBask6rWubsW5jX2PTi1FRttMt_Q2g%40mail.gmail.com.

--
Kentaro Hara, Tokyo, Japan

--

You received this message because you are subscribed to the Google Groups "Project TRIM" group.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jxquP9zJPGrgR_u0Rm2_uTUNiizWA_CuOBwp5Lo57%3DoAg%40mail.gmail.com.

Chris Palmer

unread,

Nov 3, 2016, 8:32:46 PM11/3/16

to Primiano Tucci, Kentaro Hara, platform-architecture-dev, Project TRIM, Justin Schuh

On Thu, Nov 3, 2016 at 11:16 AM, Primiano Tucci <prim...@chromium.org> wrote:

- I think the right model is to NOT replace the default allocator but make PA something you can opt-in on a per-class basis similarly to what happens in blink with the WTF_MAKE_FAST_ALLOCATED macro. The reason of this is:

I agree. I don't think really can be a one-size-fits-all allocator, especially given the necessary trade-offs related to concurrency.

2) It has been a long time since I looked into PA, but IIRC it is extremely optimized for single threaded scenarios, but just takes a spinlock on MT. It works great for blink but I am not sure will work great *everywhere* in the browser process.

Correct, and I agree.

But, if people have ideas where they'd like to use PA for either performance or safety reasons, please comment in the document. Thanks!

Kentaro Hara

unread,

Nov 4, 2016, 1:30:18 AM11/4/16

to Chris Palmer, Primiano Tucci, platform-architecture-dev, Project TRIM, Justin Schuh

Though I want to aim at PartitionAlloc everywhere in the end :)

Even if PartitionAlloc does not give us a clear performance/memory win compared to tcmalloc, it's still a big win for the following reasons:

- PartitionAlloc is more secure.

- Removing tcmalloc reduces # of allocators in the system.

- PartitionAlloc provides a detailed profiling info including object types. Replacing tcmalloc with PartitionAlloc improves our profiling tool chain.

However, I do agree that we should proceed with the replacement incrementally by adding USING_FAST_MALLOC() macros to a certain set of classes. To get good performance from PartitionAlloc, it is very important that objects are classified into partitions properly. if you simply put all objects in one partition, you won't be able to get good performance. For example, we confirmed that mixing content/ objects and Blink objects in one partition regresses performance. So I'd suggest introducing USING_FAST_MALLOC() macros to Skia, CC, V8 etc and move them to dedicated partitions incrementally.

Elliott Sprehn

unread,

Nov 4, 2016, 2:35:23 AM11/4/16

to Kentaro Hara, Primiano Tucci, Chris Palmer, Project TRIM, platform-architecture-dev, Justin Schuh

This sounds like a great plan, looking forward to having PA available in more places.

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CABg10jwUqSE7eAfe0-7-F5%3DYFrZ2cczuGwXKFuqtD4VHDezGoQ%40mail.gmail.com.

Primiano Tucci

unread,

Nov 4, 2016, 9:38:03 AM11/4/16

to Kentaro Hara, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

On Fri, Nov 4, 2016 at 5:30 AM Kentaro Hara <har...@chromium.org> wrote:

Though I want to aim at PartitionAlloc everywhere in the end :)

I will be very happy if we get to that state. I am just saying that I think that the right way to get there is to NOT just replace the default malloc() with PA.

That is dangerous stability-wise and will force to put everything into one partition, which is likely to not "just work" perf-wise as discussed above.

My feeling is that the right long-term solution will be to identify the right set of partitions like it happened in blink and opting-in the right code in the right partition.

I know that this is more work than just replacing malloc. On the good side this is: 1) safer; 2) incremental, allowing to get data and reason on a case-per case basis. 3) distributable over teams. Once we get to a state where PA is in base and people can easily opt-in by just putting a macro it can become other people's responsibility to drive the rest of the migration.

> - Removing tcmalloc reduces # of allocators in the system.

I think that an acceptable solution could be getting to a state where things are either PAlloced or glibc-malloc-ed (without tcmalloc).

After the work on the shim things like suicide-on-OOM are not allocator-specific anymore and happen at the shim level.

Here's the question for the security experts: are there other properties other than suicide-on-OOM and int32 wrapping (no >4GB allocations) that make the current tcmalloc secure?

In other words, if we get to a state where things are either PA-allocted or glibc-alloced + the two aforementioned checks above would that be a security regression? Or are the remaining tcmalloc benefit purely related to performance?

Even if PartitionAlloc does not give us a clear performance/memory win compared to tcmalloc, it's still a big win for the following reasons:

- PartitionAlloc is more secure.

- PartitionAlloc provides a detailed profiling info including object types. Replacing tcmalloc with PartitionAlloc improves our profiling tool chain.

However, I do agree that we should proceed with the replacement incrementally by adding USING_FAST_MALLOC() macros to a certain set of classes. To get good performance from PartitionAlloc, it is very important that objects are classified into partitions properly. if you simply put all objects in one partition, you won't be able to get good performance. For example, we confirmed that mixing content/ objects and Blink objects in one partition regresses performance. So I'd suggest introducing USING_FAST_MALLOC() macros to Skia, CC, V8 etc and move them to dedicated partitions incrementally.

Yup these are all great arguments which I fully agree on.

On Fri, Nov 4, 2016 at 9:32 AM, Chris Palmer <pal...@chromium.org> wrote:
On Thu, Nov 3, 2016 at 11:16 AM, Primiano Tucci <prim...@chromium.org> wrote:

- I think the right model is to NOT replace the default allocator but make PA something you can opt-in on a per-class basis similarly to what happens in blink with the WTF_MAKE_FAST_ALLOCATED macro. The reason of this is:

I agree. I don't think really can be a one-size-fits-all allocator, especially given the necessary trade-offs related to concurrency.

2) It has been a long time since I looked into PA, but IIRC it is extremely optimized for single threaded scenarios, but just takes a spinlock on MT. It works great for blink but I am not sure will work great *everywhere* in the browser process.

Correct, and I agree.

But, if people have ideas where they'd like to use PA for either performance or safety reasons, please comment in the document. Thanks!

--
Kentaro Hara, Tokyo, Japan

--

You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.
To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwUqSE7eAfe0-7-F5%3DYFrZ2cczuGwXKFuqtD4VHDezGoQ%40mail.gmail.com.

Kentaro Hara

unread,

Nov 4, 2016, 11:19:32 AM11/4/16

to Primiano Tucci, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

On Fri, Nov 4, 2016 at 10:30 PM, Primiano Tucci <prim...@chromium.org> wrote:

On Fri, Nov 4, 2016 at 5:30 AM Kentaro Hara <har...@chromium.org> wrote:
Though I want to aim at PartitionAlloc everywhere in the end :)
I will be very happy if we get to that state. I am just saying that I think that the right way to get there is to NOT just replace the default malloc() with PA.
That is dangerous stability-wise and will force to put everything into one partition, which is likely to not "just work" perf-wise as discussed above.
My feeling is that the right long-term solution will be to identify the right set of partitions like it happened in blink and opting-in the right code in the right partition.
I know that this is more work than just replacing malloc. On the good side this is: 1) safer; 2) incremental, allowing to get data and reason on a case-per case basis. 3) distributable over teams. Once we get to a state where PA is in base and people can easily opt-in by just putting a macro it can become other people's responsibility to drive the rest of the migration.

> - Removing tcmalloc reduces # of allocators in the system.

I think that an acceptable solution could be getting to a state where things are either PAlloced or glibc-malloc-ed (without tcmalloc).
After the work on the shim things like suicide-on-OOM are not allocator-specific anymore and happen at the shim level.

Here's the question for the security experts: are there other properties other than suicide-on-OOM and int32 wrapping (no >4GB allocations) that make the current tcmalloc secure?
In other words, if we get to a state where things are either PA-allocted or glibc-alloced + the two aforementioned checks above would that be a security regression? Or are the remaining tcmalloc benefit purely related to performance?

I'm a bit behind. Would you summarize why you prefer glibc-malloc than tcmalloc? (I agree with your conclusion but want to correctly understand why the shim layer prefers glibc-malloc.)

Even if PartitionAlloc does not give us a clear performance/memory win compared to tcmalloc, it's still a big win for the following reasons:

- PartitionAlloc is more secure.
- PartitionAlloc provides a detailed profiling info including object types. Replacing tcmalloc with PartitionAlloc improves our profiling tool chain.
However, I do agree that we should proceed with the replacement incrementally by adding USING_FAST_MALLOC() macros to a certain set of classes. To get good performance from PartitionAlloc, it is very important that objects are classified into partitions properly. if you simply put all objects in one partition, you won't be able to get good performance. For example, we confirmed that mixing content/ objects and Blink objects in one partition regresses performance. So I'd suggest introducing USING_FAST_MALLOC() macros to Skia, CC, V8 etc and move them to dedicated partitions incrementally.
Yup these are all great arguments which I fully agree on.

On Fri, Nov 4, 2016 at 9:32 AM, Chris Palmer <pal...@chromium.org> wrote:
On Thu, Nov 3, 2016 at 11:16 AM, Primiano Tucci <prim...@chromium.org> wrote:

- I think the right model is to NOT replace the default allocator but make PA something you can opt-in on a per-class basis similarly to what happens in blink with the WTF_MAKE_FAST_ALLOCATED macro. The reason of this is:

I agree. I don't think really can be a one-size-fits-all allocator, especially given the necessary trade-offs related to concurrency.

2) It has been a long time since I looked into PA, but IIRC it is extremely optimized for single threaded scenarios, but just takes a spinlock on MT. It works great for blink but I am not sure will work great *everywhere* in the browser process.

Correct, and I agree.

But, if people have ideas where they'd like to use PA for either performance or safety reasons, please comment in the document. Thanks!

--
Kentaro Hara, Tokyo, Japan

--

You received this message because you are subscribed to the Google Groups "Project TRIM" group.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwUqSE7eAfe0-7-F5%3DYFrZ2cczuGwXKFuqtD4VHDezGoQ%40mail.gmail.com.

Bruce Dawson

unread,

Nov 4, 2016, 1:49:29 PM11/4/16

to Kentaro Hara, Primiano Tucci, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

Since PartitionAlloc has different buckets for different object types my assumption is that it is less memory efficient. The memory released by deleting a CFoo object can't be used for allocating a CBar object even if the sizes match. Is this assumption correct?

If so then we need to be aware of the memory/performance costs of moving to heavier usage of PartitionAlloc, costs that will be highly page dependent.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwnbKjzwqpgOcKDWJyZrzARZjveSTML1NsCC1Zfddiy7w%40mail.gmail.com.

Chris Palmer

unread,

Nov 4, 2016, 6:21:22 PM11/4/16

to Primiano Tucci, Kentaro Hara, platform-architecture-dev, Project TRIM, Justin Schuh

On Fri, Nov 4, 2016 at 6:30 AM, Primiano Tucci <prim...@chromium.org> wrote:

Here's the question for the security experts: are there other properties other than suicide-on-OOM and int32 wrapping (no >4GB allocations) that make the current tcmalloc secure?
In other words, if we get to a state where things are either PA-allocted or glibc-alloced + the two aforementioned checks above would that be a security regression? Or are the remaining tcmalloc benefit purely related to performance?

No, the security guarantees are more than just suicide-on-OOM and no-silly-sizes. (The size limit is (signed) INT_MAX, not 2**32 - 1, but whatever.) Those features are great, but are kind of the bare minimum from a safety perspective.

PA also employs other important security mechanisms:

1. Guard pages at the beginning and end of each "super page" (multiple OS pages, containing "slot spans", which contain "partition pages", which contain allocated objects).

(If you don't already know, a guard page is a page marked with PROT_NONE via mprotect/PAGE_NOACCESS via VirtualProtectEx. When the attacker's buffer overflow exploit (or other attack; see below) crosses into a guard page, the kernel kills the process. Don't worry; it's an honorable death, and that renderer will feast in Valhalla.)

2. Guard pages between the heap metadata and the objects. Similarly to stopping buffer overflow, now the attacker has a harder time of corrupting heap metadata (which can lead to memory write primitives, double-frees, and other nice exploitation tools).

3. Objects of different types are likely to be of different sizes, and hence in different partitions. Consider the attacker trying to exploit a user-after-free bug. They do basically this:

* Find a UAF bug in the HTML <foo> element's implementation

* Create a document containing a <foo>, and which does the things that cause the renderer to incorrectly free the <foo>

* In the document, create an ArrayBuffer in JS; with non-PA, it will likely/possibly get allocated where the <foo> 'was'

* Fill the ArrayBuffer with bytes that maybe look like a <foo> object, except with a terrible, evil vtable

* Cause the renderer to invoke a method on the <foo> that it thinks is still live

* Do a dance, because the renderer just followed an evil vtable entry to execute the attacker's evil code/ROP gadgets/et c.

With PA, the attack will be more likely to fail, because ArrayBuffers (for example) are more likely to be allocated in a different partition than <foo> element objects. The result will probably be a crash that is outside the attacker's control, or maybe the UAF bug will remain hidden to us for a bit longer since by luck the <foo> is still 'valid' as an image in memory. For now...

Now, even with the ArrayBuffer being in a different partition, the attacker might still try to allocate an ArrayBuffer, hoping that it gets put in a partition *near enough* to the partition containing the <foo>, and with a silly size, causing it to extend into the partition containing <foo>. (Or, equivalently, take advantage of a missing bounds check in the ArrayBuffer implementation.) But, the guard pages might still stop this attack from working, unless the attacker can do a write to the ArrayBuffer such that the write hops over the guard pages. Depending on the specifics, the attacker might or might not be able to craft a working exploit. Without PA, the attacker has a much easier time of things.

(The above is representative of real attacks we have seen, and which motivated the creation of PA. Now, obviously, we would much rather have true memory safety and true type safety — never forget that! — but in this performance-sensitive application we have to resort to tricks. PA is a pretty good trick, though. :) )

Chris Palmer

unread,

Nov 4, 2016, 6:35:53 PM11/4/16

to Bruce Dawson, Kentaro Hara, Primiano Tucci, platform-architecture-dev, Project TRIM, Justin Schuh

On Fri, Nov 4, 2016 at 10:48 AM, Bruce Dawson <bruce...@chromium.org> wrote:

Since PartitionAlloc has different buckets for different object types my assumption is that it is less memory efficient. The memory released by deleting a CFoo object can't be used for allocating a CBar object even if the sizes match. Is this assumption correct?

It depends. :) This is why people are saying we should incrementally, and with testing, make different classes use or not use PA. It's true that in the worst case, we allocate 1 whole super page for objects whose size = sizeof(CFoo) and then only ever use 1 such object. That would indicate that CFoo is not a good class to use PA for. But, in Blink, we often allocate many (say) <div>s, or <svg>s, et c. So PA has so far been both time- and space-efficient with PA. (The lack of locking, and the locality provided by super pages holding many of your <div>s contiguously, improved latency.)

It might well turn out that (say) Skia or PDFium have similar properties to Blink, but that (say) Cronet does not.

Of course, there is a trade-off: If many classes are all of the same size (or the same size as rounded up to the nearest partition slot size), then we lose some of the exploit mitigation benefit I described in the previous email. If we found that CFoo, CBaz, and CQuux were all of the same partition slot size, and that CFoo's implementation was full of UAFs, and that CBaz is (like ArrayBuffer) very flexible and useful from an attacker's perspective, then we might decide that we should tune CBaz for security instead of for performance. We might do something to ensure it ends up in a different partition than the UAF-prone CFoo. Of course, the real fix is to fix all the UAFs in CFoo, and once we have done that, maybe we'd feel safe putting CBaz and CFoo in the same partition again.

If it happened that putting CBaz in its own partition was too space-inefficient because it is allocated rarely, we might also de-tune CQuux to put it in with CBaz — if, and only if, CQuux is (unlike CFoo) resilient against UAF. Then we'd still be wasted space, but less so.

That is all very measurable on a per-class basis, which PA enables, and I expect we'll be doing a whole lot of it. :)

Primiano Tucci

unread,

Nov 4, 2016, 7:02:39 PM11/4/16

to Kentaro Hara, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

Trying to reply in batch, snipping here and there.

On Fri, Nov 4, 2016 at 3:19 PM Kentaro Hara <har...@chromium.org> wrote:

I'm a bit behind. Would you summarize why you prefer glibc-malloc than tcmalloc? (I agree with your conclusion but want to correctly understand why the shim layer prefers glibc-malloc.)

Simply maintenance cost and code size. In other words my question was: if we get to a state where say 60-70% of the browser is switched to PA, do we feel do we still have to maintain tc-malloc for the remaining 30% on Linux? Or can we just route that to the default allocator without major risks?

On Fri, Nov 4, 2016 at 5:49 PM Bruce Dawson <bruce...@chromium.org> wrote:

Since PartitionAlloc has different buckets for different object types my assumption is that it is less memory efficient. The memory released by deleting a CFoo object can't be used for allocating a CBar object even if the sizes match. Is this assumption correct?

Correct. On the other side, though, there is another aspect about tc-malloc: It makes quite extensive uses of thread caches, which tend to be a non-negligible cost in a process which has 30+ threads.

The problem of tcmalloc is that once you free something that was hosted in a thread-local cache that memory won't be usable to serve an allocation from a different thread. So it really heavily depends on the allocation pattern.

I am not saying I expect PA to be better. But I am not saying I necessarily expect to be worse than tcmalloc. However your becomes more an argument for platforms like Android, where we use the default allocator from the system, which has been quite tuned over the years to be memory efficient.

On Fri, Nov 4, 2016 at 10:21 PM Chris Palmer <pal...@chromium.org> wrote:

> PA also employs other important security mechanisms:

Chris. Thanks for the super thorough summary, I was partly aware of some of the PA security benefits. My question was about TCMalloc though :)

Let me reword the question: say that 3 months from now we are in the state where 60% or more of the browser process has been switched to PA. What should we do with the remaining 40% on Linux?

A) do we feel we have to keep tcmalloc as well because it gives additional security properties compared to glibc (does tcmalloc have any of these features you mentioned about PA?)

B) do we feel keeping the default allocator + the aforementioned trivial checks is safe enough?

Essentially I am *not* questioning whether there is a security benefit in the switch to PA. I am questioning whether is there a security benefit in the current tcmalloc and whether we should preserve it for the non-PA cases that will be left around.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwUqSE7eAfe0-7-F5%3DYFrZ2cczuGwXKFuqtD4VHDezGoQ%40mail.gmail.com.

--
Kentaro Hara, Tokyo, Japan

--

You received this message because you are subscribed to the Google Groups "Project TRIM" group.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwnbKjzwqpgOcKDWJyZrzARZjveSTML1NsCC1Zfddiy7w%40mail.gmail.com.

Kentaro Hara

unread,

Nov 6, 2016, 9:36:23 PM11/6/16

to Primiano Tucci, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

> On Fri, Nov 4, 2016 at 5:49 PM Bruce Dawson <bruce...@chromium.org> wrote:
> Since PartitionAlloc has different buckets for different object types my assumption is that it is less memory efficient. The memory released by deleting a

> CFoo object can't be used for allocating a CBar object even if the sizes match. Is this assumption correct?

Correct. On the other side, though, there is another aspect about tc-malloc: It makes quite extensive uses of thread caches, which tend to be a non-negligible cost in a process which has 30+ threads.
The problem of tcmalloc is that once you free something that was hosted in a thread-local cache that memory won't be usable to serve an allocation from a different thread. So it really heavily depends on the allocation pattern.
I am not saying I expect PA to be better. But I am not saying I necessarily expect to be worse than tcmalloc. However your becomes more an argument for platforms like Android, where we use the default allocator from the system, which has been quite tuned over the years to be memory efficient.

This would not necessarily be correct. There are cases where PA performs better than tcmalloc (in terms of memory consumption):

- tcmalloc adds a header to each object. PA doesn't have the memory overhead.

- tcmalloc uses the threaded cache, which sometimes wastes memory.

- PA has a mechanism to decommit as many unused system pages as possible.

When tasak@ tried to replace tcmalloc with PA locally, the memory result was actually a mixed bag -- PA is better in some benchmarks but tcmalloc is better in other benchmarks.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwUqSE7eAfe0-7-F5%3DYFrZ2cczuGwXKFuqtD4VHDezGoQ%40mail.gmail.com.

--
Kentaro Hara, Tokyo, Japan

--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwnbKjzwqpgOcKDWJyZrzARZjveSTML1NsCC1Zfddiy7w%40mail.gmail.com.

Kentaro Hara

unread,

Nov 6, 2016, 9:44:03 PM11/6/16

to Primiano Tucci, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

Overall, I think that:

- Move PA to base/

- Use PA for more things in browser and renderer

- Deprecate tcmalloc

is a great direction.

Primiano Tucci

unread,

Nov 7, 2016, 11:33:57 AM11/7/16

to Kentaro Hara, Chris Palmer, platform-architecture-dev, Project TRIM, Justin Schuh

On Mon, Nov 7, 2016 at 3:44 AM Kentaro Hara <har...@chromium.org> wrote:

Overall, I think that:

- Move PA to base/
- Use PA for more things in browser and renderer
- Deprecate tcmalloc

is a great direction.

SGTM.

% the open question whether "deprecate tcmalloc" can raise some security concern (for the places which will not be covered by PA at the time of the tcmalloc deprecation).

On the good side I see people who know about security on this thread. As long as they are fine with that I'm happy.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwUqSE7eAfe0-7-F5%3DYFrZ2cczuGwXKFuqtD4VHDezGoQ%40mail.gmail.com.

--
Kentaro Hara, Tokyo, Japan

--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jwnbKjzwqpgOcKDWJyZrzARZjveSTML1NsCC1Zfddiy7w%40mail.gmail.com.

--
Kentaro Hara, Tokyo, Japan

--

You received this message because you are subscribed to the Google Groups "Project TRIM" group.

To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.

To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jxeT9TeNi_NX600eouZBbaB2fYJny9TBP2ky8woEZn%2Bpg%40mail.gmail.com.

Chris Palmer

unread,

Nov 7, 2016, 3:16:31 PM11/7/16

to Primiano Tucci, Kentaro Hara, platform-architecture-dev, Project TRIM, Justin Schuh

On Fri, Nov 4, 2016 at 4:02 PM, Primiano Tucci <prim...@chromium.org> wrote:

Chris. Thanks for the super thorough summary, I was partly aware of some of the PA security benefits. My question was about TCMalloc though :)

haha! I'd invoke the |FacePalm| method on |this|, but I am no longer certain the pointer is valid. :)

Let me reword the question: say that 3 months from now we are in the state where 60% or more of the browser process has been switched to PA. What should we do with the remaining 40% on Linux?
A) do we feel we have to keep tcmalloc as well because it gives additional security properties compared to glibc (does tcmalloc have any of these features you mentioned about PA?)

It might. Historically, glibc (and other system libcs on open source OSs) have not paid much attention to exploit mitigation; that may have changed. The same may be true of the need for parallelism — it's hard to beat TC, although jemalloc does claim good concurrency (see jemalloc.net). glibc still uses dlmalloc, which doesn't make any claims about latency under high concurrency (http://gee.cs.oswego.edu/dl/html/malloc.html).

I hear (in other threads) that TC is growing new hardening features, so it might be that we'd want to keep TC/move to the latest version of upstream TC and use it as well as possible.

Those are all empirical questions that we can look into as a separate (fun) project when the time comes.

B) do we feel keeping the default allocator + the aforementioned trivial checks is safe enough?

Essentially I am *not* questioning whether there is a security benefit in the switch to PA. I am questioning whether is there a security benefit in the current tcmalloc and whether we should preserve it for the non-PA cases that will be left around.

I'm not sure yet. It's certainly the case that we can have more control of TC — or at least it's a known quantity, for both security and performance — than whatever malloc the platform happens to provide.

Primiano Tucci

unread,

Nov 7, 2016, 3:29:08 PM11/7/16

to Chris Palmer, Kentaro Hara, platform-architecture-dev, Project TRIM, Justin Schuh

I see, thanks a lot for the detailed and honest answers.

It seems that there are no reasons for not moving PA in base and start using that in various places. Or, to put in less-doubly-negative words, it sounds a great plan. Thanks for sharing it :)

Will see what happens in future with tcmalloc then. From a practical viewpoint, PA is already in chrome so I can't see how promoting it up to base can make anything worse (last famous words before the apocalypse :) )

--

You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.
To post to this group, send email to projec...@chromium.org.

To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CAOuvq23ViKs3wSC%2BevLKsGQrbzFnUCGNSZ-Ada_nUxtp%2BkOwNQ%40mail.gmail.com.

Reply all

Reply to author

Forward