Hipalmer@ moved PartitionAlloc to base/ in preparation for using PartitionAlloc for more objects in Chromium. Specifically, we want to use PartitionAlloc for V8 (*), Skia, CC, PDFium etc.However, (as some people have showed concern in the past,) the current PartitionAlloc is optimized for single-threaded allocations using a spin lock. To move more objects in Chromium to PartitionAlloc, we'll need to improve the performance of multi-threaded allocations.On this document (see the first comment), palmer@ is saying that he has an idea to replace the spin lock with futex, but Hannes is saying that we'll need a bit more work than introducing futex.
What is our plan here?Also what benchmarks should we use to confirm that PartitionAlloc is robust for multi-threaded cases? I think blink_perf.*, Speedometer etc in telemetry benchmarks are not useful since they are single-threaded.
(*) V8 means Zone allocators and other V8 objects currently allocated on malloc. It does not mean V8 objects allocated on GC-managed heaps.--Kentaro Hara, Tokyo, Japan
--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.
To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jy5F%3D3r73jWaRTiWyF4AgXUnseyRuvNhOS2vz%3DhDBUy8Q%40mail.gmail.com.
On Wed, Jan 11, 2017 at 5:12 AM Kentaro Hara <har...@chromium.org> wrote:Hipalmer@ moved PartitionAlloc to base/ in preparation for using PartitionAlloc for more objects in Chromium. Specifically, we want to use PartitionAlloc for V8 (*), Skia, CC, PDFium etc.However, (as some people have showed concern in the past,) the current PartitionAlloc is optimized for single-threaded allocations using a spin lock. To move more objects in Chromium to PartitionAlloc, we'll need to improve the performance of multi-threaded allocations.On this document (see the first comment), palmer@ is saying that he has an idea to replace the spin lock with futex, but Hannes is saying that we'll need a bit more work than introducing futex.IMHO the key here is not really futex vs spinlock. My memory is that PA has not been designed to be used under contention (hence the spinlock: once you assume low contention go for the simplest and lowest-latency option). What really makes multi-thread scenarios go fast is either thread caching (but that comes with cost and complexity, see tc-malloc) or isolation of contention domains (read: ideally each thread should have its own partition)A futex will just make it so that in the case of contention, threads will be deschedule away, avoiding burning CPU, at the cost of an increased latency.My fear here is about introducing contention in the first place. futex vs spinlock doesn't solve the problem that, in case of contention, some threads will be blocked on others.While developing the heap profiler, I recall seeing rates of 400 K alloc(and frees) / second during page load and scrolling. I don't know/remember on which threads does the contention originate, but if it is >1 thread we might want to be careful. (tip: with some tweaks the heap profiler might be a good way to figure out which are the areas subjects to major contention, happy to start a separate thread on this)
So, to summarize my comment above, I'm not against replacing the spinlock with a futex. I'm just not convinced that that's sufficient to be good.
What is our plan here?Also what benchmarks should we use to confirm that PartitionAlloc is robust for multi-threaded cases? I think blink_perf.*, Speedometer etc in telemetry benchmarks are not useful since they are single-threaded.I'd definitely use benchmarks that stimulate page load and scrolling for the reasons mentioned above. TTFMP and TTFI in system_health.common_desktop might be good choices.Likewise the smoothness/silk benchmarks, provided they are still up and running.
(*) V8 means Zone allocators and other V8 objects currently allocated on malloc. It does not mean V8 objects allocated on GC-managed heaps.--Kentaro Hara, Tokyo, Japan
--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.
To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jy5F%3D3r73jWaRTiWyF4AgXUnseyRuvNhOS2vz%3DhDBUy8Q%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CA%2ByH71fK%2ByFLkJH6DWa9Pi8S%3Dc%2Bo9o4XHmD7TwhMEFoFr4kpSQ%40mail.gmail.com.
On Thu, Jan 12, 2017 at 8:38 PM, Primiano Tucci <prim...@chromium.org> wrote:On Wed, Jan 11, 2017 at 5:12 AM Kentaro Hara <har...@chromium.org> wrote:Hipalmer@ moved PartitionAlloc to base/ in preparation for using PartitionAlloc for more objects in Chromium. Specifically, we want to use PartitionAlloc for V8 (*), Skia, CC, PDFium etc.However, (as some people have showed concern in the past,) the current PartitionAlloc is optimized for single-threaded allocations using a spin lock. To move more objects in Chromium to PartitionAlloc, we'll need to improve the performance of multi-threaded allocations.On this document (see the first comment), palmer@ is saying that he has an idea to replace the spin lock with futex, but Hannes is saying that we'll need a bit more work than introducing futex.IMHO the key here is not really futex vs spinlock. My memory is that PA has not been designed to be used under contention (hence the spinlock: once you assume low contention go for the simplest and lowest-latency option). What really makes multi-thread scenarios go fast is either thread caching (but that comes with cost and complexity, see tc-malloc) or isolation of contention domains (read: ideally each thread should have its own partition)A futex will just make it so that in the case of contention, threads will be deschedule away, avoiding burning CPU, at the cost of an increased latency.My fear here is about introducing contention in the first place. futex vs spinlock doesn't solve the problem that, in case of contention, some threads will be blocked on others.While developing the heap profiler, I recall seeing rates of 400 K alloc(and frees) / second during page load and scrolling. I don't know/remember on which threads does the contention originate, but if it is >1 thread we might want to be careful. (tip: with some tweaks the heap profiler might be a good way to figure out which are the areas subjects to major contention, happy to start a separate thread on this)Ah, this is a very good point.Another concern would be (CPU's) cache performance. If we share one partition among multiple threads, it might pollute their caches.Would it be crazy (=waste memory too much) to allocate one partition per thread? Then we can eliminate the lock completely. Oilpan is doing that.
So, to summarize my comment above, I'm not against replacing the spinlock with a futex. I'm just not convinced that that's sufficient to be good.What is our plan here?Also what benchmarks should we use to confirm that PartitionAlloc is robust for multi-threaded cases? I think blink_perf.*, Speedometer etc in telemetry benchmarks are not useful since they are single-threaded.I'd definitely use benchmarks that stimulate page load and scrolling for the reasons mentioned above. TTFMP and TTFI in system_health.common_desktop might be good choices.Likewise the smoothness/silk benchmarks, provided they are still up and running.Thanks for the benchmarks. Yeah, it looks better to collect some numbers before speculating a lot.
(*) V8 means Zone allocators and other V8 objects currently allocated on malloc. It does not mean V8 objects allocated on GC-managed heaps.--Kentaro Hara, Tokyo, Japan
--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.
To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jy5F%3D3r73jWaRTiWyF4AgXUnseyRuvNhOS2vz%3DhDBUy8Q%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CA%2ByH71fK%2ByFLkJH6DWa9Pi8S%3Dc%2Bo9o4XHmD7TwhMEFoFr4kpSQ%40mail.gmail.com.
--Kentaro Hara, Tokyo, Japan
--
You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim...@chromium.org.
To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jyS8OXt3UOXaikOROKEH_w8razRxF%2BTV6bcNWbrEcyqdg%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CA%2ByH71c1CoW5%2BHYarTnuerjgqcpCdGnAHcufYLEc6hr6Jp9O5g%40mail.gmail.com.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CAOuvq20Rit1e3g4cRxT74vB8FWL%2BhgGNCpERkQiz5sy7sC9jsQ%40mail.gmail.com.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.
To post to this group, send email to projec...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jy5F%3D3r73jWaRTiWyF4AgXUnseyRuvNhOS2vz%3DhDBUy8Q%40mail.gmail.com.
--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CA%2ByH71fK%2ByFLkJH6DWa9Pi8S%3Dc%2Bo9o4XHmD7TwhMEFoFr4kpSQ%40mail.gmail.com.
----Kentaro Hara, Tokyo, Japan
You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.
To post to this group, send email to projec...@chromium.org.
--To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/project-trim/CABg10jyS8OXt3UOXaikOROKEH_w8razRxF%2BTV6bcNWbrEcyqdg%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
--To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CA%2ByH71c1CoW5%2BHYarTnuerjgqcpCdGnAHcufYLEc6hr6Jp9O5g%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "Project TRIM" group.
To unsubscribe from this group and stop receiving emails from it, send an email to project-trim+unsubscribe@chromium.org.
Hmm I am missing something. gInitializedLock you mentioned should not be a problem anyways as it is used only to initialize partitions, that is a quite rare event (or am I misreading here).When we were talking about contention I was thinking about the per-partition lock in PartitionRootGeneric (here).
--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architecture-dev+unsub...@chromium.org.
To post to this group, send email to platform-architecture-dev@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAOuvq23j%2BuoNnugeQWxT19Vdc5w0V-V46nd_QEwgT9TO_PoRfw%40mail.gmail.com.
Yeah, I agree with primiano's analysis -- one partition per thread would not be a good idea on a browser process where a task - thread mapping is dynamic.On Fri, Jan 13, 2017 at 6:26 AM, Chris Palmer <pal...@chromium.org> wrote:On Thu, Jan 12, 2017 at 11:25 AM Primiano Tucci <prim...@chromium.org> wrote:Hmm I am missing something. gInitializedLock you mentioned should not be a problem anyways as it is used only to initialize partitions, that is a quite rare event (or am I misreading here).When we were talking about contention I was thinking about the per-partition lock in PartitionRootGeneric (here).Oh yeah, you're right there. Well, I guess what remains is getting a better lock than SpinLock, and measuring thread contention vs. partition overhead.--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To post to this group, send email to platform-arc...@chromium.org.