Allocation guideline for Blink

374 views
Skip to first unread message

Michael Lippautz

unread,
Aug 25, 2022, 4:43:01 AM8/25/22
to platform-architecture-dev
[bcc a few folks to not put anybody on the spot]

Hey p-a-d@,

Over the last few weeks the topic of where memory in Blink should be allocated came up a few times. I'd like to broaden this discussion. My goal is to avoid going in circles and the resulting technical and social friction as much as possible.

We do have a guideline, albeit it's a bit outdated and could probably use some love. Should we allow more or less freedom in our suggestions? I guess we could also explicitly mention things like concurrency (see below).

While some cases seem obvious, e.g. ScriptWrappable that is directly connected to a JS object and vice versa, others are maybe not so clear. E.g. objects with well-defined lifetimes that could live on Oilpan but may also very well live outside of the managed heap.

There's areas where Oilpan is maybe not a great fit such as sharing memory across threads. There's plans on improving that but nothing that's usable immediately.

The general trade off space is unfortunately complex in this area. I can try to enumerate them somewhere and while some things may generally hold (UAF freedom in GC), others are really implementation specifics (most recent: pointer compression to save memory).

Any thoughts?

-Michael

Kentaro Hara

unread,
Aug 25, 2022, 4:57:41 AM8/25/22
to Michael Lippautz, platform-architecture-dev
Thanks Michael for starting the thread!

We do have a guideline, albeit it's a bit outdated and could probably use some love.

It's a bit outdated but overall I think the guideline is still valid. It is saying:

- Use Oilpan for all script exposed objects (i.e., derives from ScriptWrappable).
- Use Oilpan if managing its lifetime is usually simpler with Oilpan. But see the next bullet.
- If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole. If so and the object can be allocated not using Oilpan without creating cyclic references or complex lifetime handling, then use PartitionAlloc. For example, we allocate Strings on PartitionAlloc.

I'm curious to understand the pain points caused with this guideline. Does anyone have concrete examples? :)

There's areas where Oilpan is maybe not a great fit such as sharing memory across threads. There's plans on improving that but nothing that's usable immediately.

Regarding sharing memory across threads: This is not an encouraged programming paradigm because historically shared memory programming paradigm in Blink has caused many security / stability issues caused by use-after-frees, threading races, and locks here and there. It should be limited to well designed and encapsulated components regardless of whether we use Oilpan or not.


--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAH%2BmL5D6ds9qJpOmJcS%3D8dRBxfgrHfRY6MnVH%2BmTkcN_sjKmHg%40mail.gmail.com.


--
Kentaro Hara, Tokyo

Michael Lippautz

unread,
Aug 25, 2022, 6:20:23 AM8/25/22
to Kentaro Hara, Michael Lippautz, platform-architecture-dev
On Thu, Aug 25, 2022 at 10:57 AM Kentaro Hara <har...@chromium.org> wrote:
Thanks Michael for starting the thread!

We do have a guideline, albeit it's a bit outdated and could probably use some love.

It's a bit outdated but overall I think the guideline is still valid. It is saying:

- Use Oilpan for all script exposed objects (i.e., derives from ScriptWrappable).
- Use Oilpan if managing its lifetime is usually simpler with Oilpan. But see the next bullet.
- If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole. If so and the object can be allocated not using Oilpan without creating cyclic references or complex lifetime handling, then use PartitionAlloc. For example, we allocate Strings on PartitionAlloc.

I'm curious to understand the pain points caused with this guideline. Does anyone have concrete examples? :)

I think the pain points are probably around "what's simpler with Oilpan" (concrete example: unique_ptr vs single Member) and what's "high allocation rate" that would affect the GC. (Leaving aside that we ignore security and debuggability completely here.)

The pain point gets worse as there's areas where Oilpan is actually superior wrt to performance compared to malloc: Backing stores can be compacted, avoiding fragmentation at all _and_ can be immediately freed as needed. On top there's pointer compression now which also saves memory.


There's areas where Oilpan is maybe not a great fit such as sharing memory across threads. There's plans on improving that but nothing that's usable immediately.

Regarding sharing memory across threads: This is not an encouraged programming paradigm because historically shared memory programming paradigm in Blink has caused many security / stability issues caused by use-after-frees, threading races, and locks here and there. It should be limited to well designed and encapsulated components regardless of whether we use Oilpan or not.

I see that there's a need for concurrency and sometimes shared memory is unavoidable for performance reasons. Discouraging shared memory does not prevent new uses from showing up though :) (This is the situation we are in now.)

Maybe we should acknowledge the need for it and rather work on guidelines on where to contain it?

Daniel Cheng

unread,
Aug 25, 2022, 11:08:15 AM8/25/22
to Michael Lippautz, Kentaro Hara, platform-architecture-dev
On Thu, 25 Aug 2022 at 18:20, Michael Lippautz <mlip...@chromium.org> wrote:
On Thu, Aug 25, 2022 at 10:57 AM Kentaro Hara <har...@chromium.org> wrote:
Thanks Michael for starting the thread!

We do have a guideline, albeit it's a bit outdated and could probably use some love.

It's a bit outdated but overall I think the guideline is still valid. It is saying:

- Use Oilpan for all script exposed objects (i.e., derives from ScriptWrappable).
- Use Oilpan if managing its lifetime is usually simpler with Oilpan. But see the next bullet.
- If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole. If so and the object can be allocated not using Oilpan without creating cyclic references or complex lifetime handling, then use PartitionAlloc. For example, we allocate Strings on PartitionAlloc.

I'm curious to understand the pain points caused with this guideline. Does anyone have concrete examples? :)

I think the pain points are probably around "what's simpler with Oilpan" (concrete example: unique_ptr vs single Member) and what's "high allocation rate" that would affect the GC. (Leaving aside that we ignore security and debuggability completely here.)

In general, I think it's better to bias towards Oilpan in Blink unless there's performance data that indicates otherwise, right? Even using something like std::unique_ptr is likely not better if it means you need to use WeakPersistent or Persistents. In general, I feel like there's a fair amount of danger when Oilpan needs to interact with non-Oilpan, so striving to minimize that surface seems safest to me.
 

The pain point gets worse as there's areas where Oilpan is actually superior wrt to performance compared to malloc: Backing stores can be compacted, avoiding fragmentation at all _and_ can be immediately freed as needed. On top there's pointer compression now which also saves memory.


There's areas where Oilpan is maybe not a great fit such as sharing memory across threads. There's plans on improving that but nothing that's usable immediately.

Regarding sharing memory across threads: This is not an encouraged programming paradigm because historically shared memory programming paradigm in Blink has caused many security / stability issues caused by use-after-frees, threading races, and locks here and there. It should be limited to well designed and encapsulated components regardless of whether we use Oilpan or not.

I see that there's a need for concurrency and sometimes shared memory is unavoidable for performance reasons. Discouraging shared memory does not prevent new uses from showing up though :) (This is the situation we are in now.)

Maybe we should acknowledge the need for it and rather work on guidelines on where to contain it?

Out of curiosity, where is this happening now? IIRC, Oilpan doesn't have the nicest primitives for cross-thread access.

Daniel
 
 


On Thu, Aug 25, 2022 at 5:43 PM Michael Lippautz <mlip...@chromium.org> wrote:
[bcc a few folks to not put anybody on the spot]

Hey p-a-d@,

Over the last few weeks the topic of where memory in Blink should be allocated came up a few times. I'd like to broaden this discussion. My goal is to avoid going in circles and the resulting technical and social friction as much as possible.

We do have a guideline, albeit it's a bit outdated and could probably use some love. Should we allow more or less freedom in our suggestions? I guess we could also explicitly mention things like concurrency (see below).

While some cases seem obvious, e.g. ScriptWrappable that is directly connected to a JS object and vice versa, others are maybe not so clear. E.g. objects with well-defined lifetimes that could live on Oilpan but may also very well live outside of the managed heap.

There's areas where Oilpan is maybe not a great fit such as sharing memory across threads. There's plans on improving that but nothing that's usable immediately.

The general trade off space is unfortunately complex in this area. I can try to enumerate them somewhere and while some things may generally hold (UAF freedom in GC), others are really implementation specifics (most recent: pointer compression to save memory).

Any thoughts?

-Michael

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.
To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/platform-architecture-dev/CAH%2BmL5D6ds9qJpOmJcS%3D8dRBxfgrHfRY6MnVH%2BmTkcN_sjKmHg%40mail.gmail.com.


--
Kentaro Hara, Tokyo

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

Scott Violet

unread,
Aug 25, 2022, 11:51:48 AM8/25/22
to Michael Lippautz, platform-architecture-dev, Clark Duvall, Stefan Zager
Thanks for bcc'ing me on this Michael. I've added a couple of folks whom I've spoken with recently.

The single biggest constraint oilpan imposes is the difficulty (if not impossibility) of working cross threads. This severely limits the ability to do interesting and complex work off the main thread.

Second to that, I'm interested in understanding the cost associated with moving objects to oilpan. The doc you link to suggests potential problem areas:

. It is not a good idea to unnecessarily increase the number of objects managed by Oilpan.
. If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole.

These guidelines are pretty vague.

From my experience, I've encountered a number of places where objects with simple and well understood lifetimes have to be garbage collected purely because they are part of a bigger graph of objects where one references a DOM node.

  -Scott

Stefan Zager

unread,
Aug 25, 2022, 12:30:06 PM8/25/22
to platform-architecture-dev, Daniel Cheng, Kentaro, platform-architecture-dev, Michael Lippautz
On Thursday, August 25, 2022 at 8:08:15 AM UTC-7 Daniel Cheng wrote:
On Thu, 25 Aug 2022 at 18:20, Michael Lippautz <mlip...@chromium.org> wrote:
On Thu, Aug 25, 2022 at 10:57 AM Kentaro Hara <har...@chromium.org> wrote:

I see that there's a need for concurrency and sometimes shared memory is unavoidable for performance reasons. Discouraging shared memory does not prevent new uses from showing up though :) (This is the situation we are in now.)

Maybe we should acknowledge the need for it and rather work on guidelines on where to contain it?

Out of curiosity, where is this happening now? IIRC, Oilpan doesn't have the nicest primitives for cross-thread access.

I am currently working on a project to change the main/compositor thread boundary, which involves concurrent access to non-oilpan objects. That is a prerequisite for another project that will move core rendering work off the main thread, involving concurrent access to additional (non-oilpain) objects.

If the trend continues, it's inevitable that we'll have to resolve the issue of how to make oilpan and threads play nicely.

Stefan Zager

unread,
Aug 25, 2022, 12:31:36 PM8/25/22
to platform-architecture-dev, Stefan Zager, Daniel Cheng, Kentaro, platform-architecture-dev, Michael Lippautz
On Thursday, August 25, 2022 at 9:30:06 AM UTC-7 Stefan Zager wrote:
... concurrent access to additional (non-oilpain) objects.
                                                                            ^^^^^^^

Unintentional typo, but I am chuckling :)

Xianzhu Wang

unread,
Aug 25, 2022, 12:51:15 PM8/25/22
to Stefan Zager, platform-architecture-dev, Daniel Cheng, Kentaro, Michael Lippautz
As the default malloc method is not encouraged, can we require every class/struct in blink to specify (or inherit) a memory allocation method? Are there cases in blink where malloc is more appropriate or has to be used?

--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

Michael Lippautz

unread,
Aug 25, 2022, 4:04:17 PM8/25/22
to Daniel Cheng, Michael Lippautz, Kentaro Hara, platform-architecture-dev
On Thu, Aug 25, 2022 at 5:08 PM Daniel Cheng <dch...@chromium.org> wrote:
On Thu, 25 Aug 2022 at 18:20, Michael Lippautz <mlip...@chromium.org> wrote:
On Thu, Aug 25, 2022 at 10:57 AM Kentaro Hara <har...@chromium.org> wrote:
Thanks Michael for starting the thread!

We do have a guideline, albeit it's a bit outdated and could probably use some love.

It's a bit outdated but overall I think the guideline is still valid. It is saying:

- Use Oilpan for all script exposed objects (i.e., derives from ScriptWrappable).
- Use Oilpan if managing its lifetime is usually simpler with Oilpan. But see the next bullet.
- If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole. If so and the object can be allocated not using Oilpan without creating cyclic references or complex lifetime handling, then use PartitionAlloc. For example, we allocate Strings on PartitionAlloc.

I'm curious to understand the pain points caused with this guideline. Does anyone have concrete examples? :)

I think the pain points are probably around "what's simpler with Oilpan" (concrete example: unique_ptr vs single Member) and what's "high allocation rate" that would affect the GC. (Leaving aside that we ignore security and debuggability completely here.)

In general, I think it's better to bias towards Oilpan in Blink unless there's performance data that indicates otherwise, right? Even using something like std::unique_ptr is likely not better if it means you need to use WeakPersistent or Persistents. In general, I feel like there's a fair amount of danger when Oilpan needs to interact with non-Oilpan, so striving to minimize that surface seems safest to me.

+1
 
 

The pain point gets worse as there's areas where Oilpan is actually superior wrt to performance compared to malloc: Backing stores can be compacted, avoiding fragmentation at all _and_ can be immediately freed as needed. On top there's pointer compression now which also saves memory.


There's areas where Oilpan is maybe not a great fit such as sharing memory across threads. There's plans on improving that but nothing that's usable immediately.

Regarding sharing memory across threads: This is not an encouraged programming paradigm because historically shared memory programming paradigm in Blink has caused many security / stability issues caused by use-after-frees, threading races, and locks here and there. It should be limited to well designed and encapsulated components regardless of whether we use Oilpan or not.

I see that there's a need for concurrency and sometimes shared memory is unavoidable for performance reasons. Discouraging shared memory does not prevent new uses from showing up though :) (This is the situation we are in now.)

Maybe we should acknowledge the need for it and rather work on guidelines on where to contain it?

Out of curiosity, where is this happening now? IIRC, Oilpan doesn't have the nicest primitives for cross-thread access.

It has CrossThreadPersistent. IIUC, then it was designed to allow copy/move/dtor off thread while access should be on the main thread.

It also allows for read-only access on immutable refs though which is what some components use it for.

Michael Lippautz

unread,
Aug 25, 2022, 4:22:45 PM8/25/22
to Scott Violet, Michael Lippautz, platform-architecture-dev, Clark Duvall, Stefan Zager
On Thu, Aug 25, 2022 at 5:51 PM Scott Violet <s...@chromium.org> wrote:
Thanks for bcc'ing me on this Michael. I've added a couple of folks whom I've spoken with recently.

The single biggest constraint oilpan imposes is the difficulty (if not impossibility) of working cross threads. This severely limits the ability to do interesting and complex work off the main thread.

I see this as a recurring point in this thread, and I have to agree. I have been actually suggesting malloc'ed memory numerous times to fix issues where we would just not have the right primitives.

We are currently designing a global heap for V8 first but want to move to a global heap more generally. In such a world state could easily be shared and even allocated from any thread. Isolates, or in Blink Main thread/Worker/Worklet scopes would still be local with a truly global multi-threaded heap underneath. This is further away though and definitely not something we can offer soon.
 

Second to that, I'm interested in understanding the cost associated with moving objects to oilpan. The doc you link to suggests potential problem areas:

. It is not a good idea to unnecessarily increase the number of objects managed by Oilpan.
. If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole.


We should clarify that, which was one of the goals of this thread.
 
These guidelines are pretty vague.

From my experience, I've encountered a number of places where objects with simple and well understood lifetimes have to be garbage collected purely because they are part of a bigger graph of objects where one references a DOM node.


A penny for every UAFs as a result of well understood liveness :) 

Are these objects purely referring to DOM objects or are there also back references?

What is the concern of having such memory on the GCed heap? Is it performance/memory?

There's plenty of upsides these days with using Oilpan. The biggest one is probably security as most issues with Oilpan actually happen at the boundary between malloc and Oilpan memory (leaks or UAF, pick your poison).

Michael Lippautz

unread,
Aug 25, 2022, 4:49:46 PM8/25/22
to Michael Lippautz, Scott Violet, platform-architecture-dev, Clark Duvall, Stefan Zager
The thread branched a bit but let me still provide some more context to cross-threading since it was mentioned quite a lot in this thread and I have the feeling that Oilpan's support there is not really well understood:
- CrossThreadPersistent (CTP) was initially designed to allow retaining an object from a different thread (thread-safe copy/move/dtor). The idea was to use that reference on the thread it was created though (post back a callback).
- CTP allows for reading immutable refs. This may be a mis-design (public APIs in the very early versions) but is currently one of the major use cases.
- Any concurrent write (except copy and move) to it should be a data race as that would require external synchronization which Oilpan doesn't know about. (It would also require a raw pointer from the original thread which is hard to get safely.)

Kentaro mentioned above that shared memory concurrency is generally not encouraged. If we still wanted to address some pain points, there's a way forward though:
- We could move towards supporting safe read-only access using a more explicit mechanism that avoids the known caveats of CTP;
- Off-thread allocation could also be supported but would be further away; We have an explicit abstraction called LocalHeap in V8 that allows exactly that and works well.
- The truly concurrent global heap mentioned above is something we are designing for 2023 (for V8, but same principles would apply to Blink);

-Michael

Stefan Zager

unread,
Aug 25, 2022, 4:51:12 PM8/25/22
to Michael Lippautz, Scott Violet, platform-architecture-dev, Clark Duvall
On Thu, Aug 25, 2022 at 1:22 PM Michael Lippautz <mlip...@chromium.org> wrote:
On Thu, Aug 25, 2022 at 5:51 PM Scott Violet <s...@chromium.org> wrote:
 
From my experience, I've encountered a number of places where objects with simple and well understood lifetimes have to be garbage collected purely because they are part of a bigger graph of objects where one references a DOM node.


A penny for every UAFs as a result of well understood liveness :) 

Are these objects purely referring to DOM objects or are there also back references?

What is the concern of having such memory on the GCed heap? Is it performance/memory?

There's plenty of upsides these days with using Oilpan. The biggest one is probably security as most issues with Oilpan actually happen at the boundary between malloc and Oilpan memory (leaks or UAF, pick your poison).

Just to make sure I'm clear about the scope of this issue, I'm mostly thinking about two kinds of object:

- ScriptWrappable objects
- Non-ScriptWrappable objects that participate in rendering. Typically these are allocated during LocalFrameView::UpdateLifecyclePhases and/or LocalFrameView::UpdateStyleAndLayout.

What other types of objects are we talking about?

As for rendering data: I have always thought unique_ptr<> was a more natural fit for these types. Jokes aside, they do have clear ownership semantics and a well-understood lifetime. I think the major obstacle -- maybe the only obstacle -- is the back pointer from LayoutObject to Node. I hope everyone will agree that LayoutObject::node_ is a wart that we would love to be rid of. And I think if we did that, then the conversation about which objects to oilpan becomes less fraught with security considerations. There is a viable way to do this, details upon request.

Kentaro Hara

unread,
Aug 26, 2022, 1:12:00 AM8/26/22
to Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, Scott Violet, platform-architecture-dev, Clark Duvall
Thanks all for the thoughts!

> Second to that, I'm interested in understanding the cost associated with moving objects to oilpan. The doc you link to suggests potential problem areas:
> . It is not a good idea to unnecessarily increase the number of objects managed by Oilpan.
> . If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole.  
> These guidelines are pretty vague.
 
We should clarify that, which was one of the goals of this thread.

I love the framing Daniel proposed: "In general, I think it's better to bias towards Oilpan in Blink unless there's performance data that indicates otherwise, right? Even using something like std::unique_ptr is likely not better if it means you need to use WeakPersistent or Persistents. In general, I feel like there's a fair amount of danger when Oilpan needs to interact with non-Oilpan, so striving to minimize that surface seems safest to me."


As for rendering data: I have always thought unique_ptr<> was a more natural fit for these types. Jokes aside, they do have clear ownership semantics and a well-understood lifetime. I think the major obstacle -- maybe the only obstacle -- is the back pointer from LayoutObject to Node. I hope everyone will agree that LayoutObject::node_ is a wart that we would love to be rid of. And I think if we did that, then the conversation about which objects to oilpan becomes less fraught with security considerations. There is a viable way to do this, details upon request.

@Keishi Hattori can probably add more details but this is not true. There were a lot of raw pointers to Layout objects and there was even manual reference counting. I forgot a link but I remember that @Koji Ishii fixed a P0 zero day use-after-free bug last year that wouldn't have happened if Layout objects were Oilpaned.

Regarding multi-threading: I want to be really careful about introducing a shared memory paradigm to Blink because historically it was a major source of security & stability bugs. Initially, people thought it would be an efficient solution for parallel processing but it got out of control very easily and became not maintainable, leaving a significant technical debt. I can talk about the history of WebAudio if necessary. I think that a shared memory paradigm can be used for a well-designed, contained component (e.g., concurrent GC) but should be avoided for components that are touched by many developers.

Regarding the off-thre-thread compositing, would it be hard to design the threading s.t. the main thread clones data and passes it to the parallel thread so that they don't need to share memory? For example, the main thread generates a snapshot of PhysicalFragments (i.e., generational PhysicalFragments) and feed the snapshot to the parallel thread?



--
You received this message because you are subscribed to the Google Groups "platform-architecture-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to platform-architect...@chromium.org.

Stefan Zager

unread,
Aug 26, 2022, 3:02:19 AM8/26/22
to Kentaro Hara, Keishi Hattori, Koji Ishii, Michael Lippautz, Scott Violet, platform-architecture-dev, Clark Duvall
On Thu, Aug 25, 2022 at 10:12 PM Kentaro Hara <har...@chromium.org> wrote:

As for rendering data: I have always thought unique_ptr<> was a more natural fit for these types. Jokes aside, they do have clear ownership semantics and a well-understood lifetime. I think the major obstacle -- maybe the only obstacle -- is the back pointer from LayoutObject to Node. I hope everyone will agree that LayoutObject::node_ is a wart that we would love to be rid of. And I think if we did that, then the conversation about which objects to oilpan becomes less fraught with security considerations. There is a viable way to do this, details upon request.

@Keishi Hattori can probably add more details but this is not true. There were a lot of raw pointers to Layout objects and there was even manual reference counting. I forgot a link but I remember that @Koji Ishii fixed a P0 zero day use-after-free bug last year that wouldn't have happened if Layout objects were Oilpaned.

Regarding multi-threading: I want to be really careful about introducing a shared memory paradigm to Blink because historically it was a major source of security & stability bugs. Initially, people thought it would be an efficient solution for parallel processing but it got out of control very easily and became not maintainable, leaving a significant technical debt. I can talk about the history of WebAudio if necessary. I think that a shared memory paradigm can be used for a well-designed, contained component (e.g., concurrent GC) but should be avoided for components that are touched by many developers.

I respect what you're saying, and I still hope to change your mind, but not in this thread :)
 
Regarding the off-thre-thread compositing, would it be hard to design the threading s.t. the main thread clones data and passes it to the parallel thread so that they don't need to share memory? For example, the main thread generates a snapshot of PhysicalFragments (i.e., generational PhysicalFragments) and feed the snapshot to the parallel thread?

Currently, the interface between the blink main and compositor threads is mostly -- but not entirely -- based on copying data in this way, colloquially known as "message passing". This works reasonably well based on the current thread boundary, because the compositing information copied between the two threads is relatively small. However, this strategy is unlikely to scale to the much larger data involved in style, layout, and paint; the overhead of copying those is likely to be a significant tax on the performance gains of any multi-threaded-rendering project.

Kentaro Hara

unread,
Aug 26, 2022, 3:21:20 AM8/26/22
to Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, Scott Violet, platform-architecture-dev, Clark Duvall
Currently, the interface between the blink main and compositor threads is mostly -- but not entirely -- based on copying data in this way, colloquially known as "message passing". This works reasonably well based on the current thread boundary, because the compositing information copied between the two threads is relatively small. However, this strategy is unlikely to scale to the much larger data involved in style, layout, and paint; the overhead of copying those is likely to be a significant tax on the performance gains of any multi-threaded-rendering project.

Thanks for the clarification! Yeah, "message passing" is the recommended, less error-prone pattern :)

If you need to go with shared memory, I really want to make sure that it is very well designed. For example, if 1) the concurrent access is limited to the fragment tree, 2) the fragment tree is immutable, and 3) the fragment tree has no outgoing pointers, I'm less worried about it. In this case, I think you can put the fragment tree off Oilpan's heap because it has no outgoing pointers.

Feel free to start a separate email thread if you want to dig into the design :)

--
Kentaro Hara, Tokyo

Stefan Zager

unread,
Aug 26, 2022, 3:33:30 AM8/26/22
to Kentaro Hara, Keishi Hattori, Koji Ishii, Michael Lippautz, Scott Violet, platform-architecture-dev, Clark Duvall
On Fri, Aug 26, 2022 at 12:21 AM Kentaro Hara <har...@chromium.org> wrote:
Currently, the interface between the blink main and compositor threads is mostly -- but not entirely -- based on copying data in this way, colloquially known as "message passing". This works reasonably well based on the current thread boundary, because the compositing information copied between the two threads is relatively small. However, this strategy is unlikely to scale to the much larger data involved in style, layout, and paint; the overhead of copying those is likely to be a significant tax on the performance gains of any multi-threaded-rendering project.

Thanks for the clarification! Yeah, "message passing" is the recommended, less error-prone pattern :)

If you need to go with shared memory, I really want to make sure that it is very well designed. For example, if 1) the concurrent access is limited to the fragment tree, 2) the fragment tree is immutable, and 3) the fragment tree has no outgoing pointers, I'm less worried about it. In this case, I think you can put the fragment tree off Oilpan's heap because it has no outgoing pointers.

I strongly agree on all points :) This should be sufficient to move all rendering stages from paint onward to run on another thread, which would be an enormously ambitious project. I do think there's room to explore concurrent access to oilpan-managed objects, but we can postpone that argument for a long time.

Feel free to start a separate email thread if you want to dig into the design :)

Thanks, I'm in the process of collecting my thoughts into a doc, I'll be sure it share it.

Scott Violet

unread,
Aug 26, 2022, 12:13:57 PM8/26/22
to Kentaro Hara, Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, platform-architecture-dev, Clark Duvall
On Thu, Aug 25, 2022 at 10:12 PM Kentaro Hara <har...@chromium.org> wrote:
Thanks all for the thoughts!

> Second to that, I'm interested in understanding the cost associated with moving objects to oilpan. The doc you link to suggests potential problem areas:
> . It is not a good idea to unnecessarily increase the number of objects managed by Oilpan.
> . If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole.  
> These guidelines are pretty vague.
 
We should clarify that, which was one of the goals of this thread.

I love the framing Daniel proposed: "In general, I think it's better to bias towards Oilpan in Blink unless there's performance data that indicates otherwise, right? Even using something like std::unique_ptr is likely not better if it means you need to use WeakPersistent or Persistents. In general, I feel like there's a fair amount of danger when Oilpan needs to interact with non-Oilpan, so striving to minimize that surface seems safest to me."

As mentioned earlier in the thread, I would definitely love to see a more thorough analysis of the cost associated with oilpan. If we are to offer more specific guidance around what should be in oilpan it seems as though we need this analysis.

  -Scott

Scott Violet

unread,
Aug 26, 2022, 12:17:59 PM8/26/22
to Kentaro Hara, Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, platform-architecture-dev, Clark Duvall
On Fri, Aug 26, 2022 at 9:13 AM Scott Violet <s...@chromium.org> wrote:


On Thu, Aug 25, 2022 at 10:12 PM Kentaro Hara <har...@chromium.org> wrote:
Thanks all for the thoughts!

> Second to that, I'm interested in understanding the cost associated with moving objects to oilpan. The doc you link to suggests potential problem areas:
> . It is not a good idea to unnecessarily increase the number of objects managed by Oilpan.
> . If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole.  
> These guidelines are pretty vague.
 
We should clarify that, which was one of the goals of this thread.

I love the framing Daniel proposed: "In general, I think it's better to bias towards Oilpan in Blink unless there's performance data that indicates otherwise, right? Even using something like std::unique_ptr is likely not better if it means you need to use WeakPersistent or Persistents. In general, I feel like there's a fair amount of danger when Oilpan needs to interact with non-Oilpan, so striving to minimize that surface seems safest to me."

As mentioned earlier in the thread, I would definitely love to see a more thorough analysis of the cost associated with oilpan. If we are to offer more specific guidance around what should be in oilpan it seems as though we need this analysis.

I have done very little in the v8 side. But a question for those more familiar with it. Does v8 internally use oilpan? If not, why?

Kentaro Hara

unread,
Aug 28, 2022, 9:27:11 PM8/28/22
to Scott Violet, Yuki Yamada, Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, platform-architecture-dev, Clark Duvall
As mentioned earlier in the thread, I would definitely love to see a more thorough analysis of the cost associated with oilpan. If we are to offer more specific guidance around what should be in oilpan it seems as though we need this analysis.

What cost analysis do you want to see? For non-critical objects, it doesn't matter for performance & memory whether we put them in Oilpan or non-Oilpan -- putting them in Oilpan is the default choice for the reason Daniel mentioned. For critical objects, we need a full analysis to understand the performance cost on a case-by-case basis. In general, we have adopted a zero regression policy to say go (though we have accepted some well-understood regressions). For example, @Yuki Yamada  spent 1 year to analyze the performance cost of moving Layout objects to Oilpan and implement a bunch of optimizations working with layout folks before doing the migration. We reverted the CLs four times due to an observed regression in the wild and kept updating the giant CLs for one year. This was not a project like "let's just move it to Oilpan" :)





--
Kentaro Hara, Tokyo

Michael Lippautz

unread,
Aug 29, 2022, 8:58:40 AM8/29/22
to Kentaro Hara, Scott Violet, Yuki Yamada, Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, platform-architecture-dev, Clark Duvall
On Mon, Aug 29, 2022 at 3:27 AM Kentaro Hara <har...@chromium.org> wrote:
As mentioned earlier in the thread, I would definitely love to see a more thorough analysis of the cost associated with oilpan. If we are to offer more specific guidance around what should be in oilpan it seems as though we need this analysis. 

What cost analysis do you want to see? For non-critical objects, it doesn't matter for performance & memory whether we put them in Oilpan or non-Oilpan -- putting them in Oilpan is the default choice for the reason Daniel mentioned. For critical objects, we need a full analysis to understand the performance cost on a case-by-case basis. In general, we have adopted a zero regression policy to say go (though we have accepted some well-understood regressions). For example, @Yuki Yamada  spent 1 year to analyze the performance cost of moving Layout objects to Oilpan and implement a bunch of optimizations working with layout folks before doing the migration. We reverted the CLs four times due to an observed regression in the wild and kept updating the giant CLs for one year. This was not a project like "let's just move it to Oilpan" :)

At this point we operate on a case-by-case basis for larger areas of code. Owners are in charge of weighing how much they care about their micro benchmarks. Something like e.g. Speedometer2. is not allowed to regress substantially (0.1-.2% is probably arguable with a cost/benefit analysis).

Here's a list of properties that result in trade offs. Their impact differs depending on workload.
- GC scheduling with a mutator utilization of ~97%, meaning we allow spending ~3% time in the mutator. This excludes barriers due to measurement overhead.
- Pointer compression with read and write barriers (*)
- Write barriers for incremental and concurrent marking
- We currently optimize for main thread performance and considers concurrent threads to be mostly free (we do use Jobs API to not overload the system)
- We allocate using LABs which can result in fragmentation for regular objects
- Only basic cross-thread handling

Here's a few things you currently get:
+ No UAFs
+ LAB-based allocation often results in great locality
+ We can use compaction for backing stores (HeapVector and friends). (We can even use explicit free for backings (HeapVector::Clear() has GC support)
+ Destructors run on the originating thread but all the free-list handling (and discarding of pages) runs concurrently
+ Pointer compression results in 2-4% PMF improvements per renderer process.
+ We mark most objects concurrently which means the price for compujting liveness on the main thread is very small.

If you want to create a workload that's slower with Oilpan compared to malloc() I'd suggest a write-heavy workload where locality isn't an issue with malloc() itself. E.g., a std::sort() on HeapVector based on full pointer comparison without actually using the result should be bad. As of today you can also repeatedly allocate short-lived objects and it will be worse than with malloc() because our young generation is still a prototype that's waiting to be productionized.

Is any of this relevant? Likely depends on the workload....

(*) With PGO there's little to no overhead on e.g. Speedometer2 and MotionMark. 

 




On Sat, Aug 27, 2022 at 1:18 AM Scott Violet <s...@chromium.org> wrote:


On Fri, Aug 26, 2022 at 9:13 AM Scott Violet <s...@chromium.org> wrote:


On Thu, Aug 25, 2022 at 10:12 PM Kentaro Hara <har...@chromium.org> wrote:
Thanks all for the thoughts!

> Second to that, I'm interested in understanding the cost associated with moving objects to oilpan. The doc you link to suggests potential problem areas:
> . It is not a good idea to unnecessarily increase the number of objects managed by Oilpan.
> . If the allocation rate of the object is very high, that may put unnecessary strain on the Oilpan's GC infrastructure as a whole.  
> These guidelines are pretty vague.
 
We should clarify that, which was one of the goals of this thread.

I love the framing Daniel proposed: "In general, I think it's better to bias towards Oilpan in Blink unless there's performance data that indicates otherwise, right? Even using something like std::unique_ptr is likely not better if it means you need to use WeakPersistent or Persistents. In general, I feel like there's a fair amount of danger when Oilpan needs to interact with non-Oilpan, so striving to minimize that surface seems safest to me."

As mentioned earlier in the thread, I would definitely love to see a more thorough analysis of the cost associated with oilpan. If we are to offer more specific guidance around what should be in oilpan it seems as though we need this analysis.

I have done very little in the v8 side. But a question for those more familiar with it. Does v8 internally use oilpan? If not, why?


It doesn't at this point. We'd like to but there's still a gap in the library where collections (e.g. HeapVector) are implemented in Blink. Bringing that over to Oilpan's core has been on our roadmap a few times but was always pushed back because of higher priority items.

While using it internally in V8 would be nice, we've been concretely asked about using it on the API boundary where we mix JS and regular C++ lifetimes. It would feel more natural to have that all just on the GCed heap.

Scott Violet

unread,
Aug 29, 2022, 5:06:55 PM8/29/22
to Kentaro Hara, Yuki Yamada, Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, platform-architecture-dev, Clark Duvall
On Sun, Aug 28, 2022 at 6:27 PM Kentaro Hara <har...@chromium.org> wrote:
As mentioned earlier in the thread, I would definitely love to see a more thorough analysis of the cost associated with oilpan. If we are to offer more specific guidance around what should be in oilpan it seems as though we need this analysis.

What cost analysis do you want to see? For non-critical objects, it doesn't matter for performance & memory whether we put them in Oilpan or non-Oilpan -- putting them in Oilpan is the default choice for the reason Daniel mentioned. For critical objects, we need a full analysis to understand the performance cost on a case-by-case basis. In general, we have adopted a zero regression policy to say go (though we have accepted some well-understood regressions). For example, @Yuki Yamada  spent 1 year to analyze the performance cost of moving Layout objects to Oilpan and implement a bunch of optimizations working with layout folks before doing the migration. We reverted the CLs four times due to an observed regression in the wild and kept updating the giant CLs for one year. This was not a project like "let's just move it to Oilpan" :)

I would love to see a cost analysis of some portion of blink that is on the critical path. Layout is likely a good case to study. One of the swarm team takeaways from working to improve FCP/LCP is that pinpoint is not representative of real world loading. It looks like the layout project did not rollout using finch. It may be moving layout to oilpan did not impact FCP/LCP, but without running a finch experiment we don't definitively know. Because of how objects are converted to use oilpan I most definitely understand why it is challenging to conduct a finch experiment for moving parts of blink to oilpan.

  -Scott

Kentaro Hara

unread,
Aug 29, 2022, 7:53:50 PM8/29/22
to Scott Violet, Yuki Yamada, Stefan Zager, Keishi Hattori, Koji Ishii, Michael Lippautz, platform-architecture-dev, Clark Duvall
Because of how objects are converted to use oilpan I most definitely understand why it is challenging to conduct a finch experiment for moving parts of blink to oilpan.

Right, the Finch flag check is too expensive to switch between Oilpan and non-Oilpan.

Another option (which we did in the past to move a large component to Oilpan) was to use WillBe types (e.g., RawPtrWillBeMember<T>), switch types using a GN flag and use a binary experiment to compare performance numbers. We discussed the option but people were not happy with polluting the active codebase with WillBe types for many months. So we decided to just land the change and compare numbers between before and after while understanding this is not fully an apple-to-apple comparison.
--
Kentaro Hara, Tokyo

Scott Violet

unread,
Aug 30, 2022, 9:34:40 AM8/30/22
to Michael Lippautz, Kentaro Hara, Yuki Yamada, Stefan Zager, Keishi Hattori, Koji Ishii, platform-architecture-dev, Clark Duvall
On Mon, Aug 29, 2022 at 5:58 AM Michael Lippautz <mlip...@chromium.org> wrote:
On Mon, Aug 29, 2022 at 3:27 AM Kentaro Hara <har...@chromium.org> wrote:
As mentioned earlier in the thread, I would definitely love to see a more thorough analysis of the cost associated with oilpan. If we are to offer more specific guidance around what should be in oilpan it seems as though we need this analysis. 

What cost analysis do you want to see? For non-critical objects, it doesn't matter for performance & memory whether we put them in Oilpan or non-Oilpan -- putting them in Oilpan is the default choice for the reason Daniel mentioned. For critical objects, we need a full analysis to understand the performance cost on a case-by-case basis. In general, we have adopted a zero regression policy to say go (though we have accepted some well-understood regressions). For example, @Yuki Yamada  spent 1 year to analyze the performance cost of moving Layout objects to Oilpan and implement a bunch of optimizations working with layout folks before doing the migration. We reverted the CLs four times due to an observed regression in the wild and kept updating the giant CLs for one year. This was not a project like "let's just move it to Oilpan" :)

At this point we operate on a case-by-case basis for larger areas of code. Owners are in charge of weighing how much they care about their micro benchmarks. Something like e.g. Speedometer2. is not allowed to regress substantially (0.1-.2% is probably arguable with a cost/benefit analysis).

Here's a list of properties that result in trade offs. Their impact differs depending on workload.
- GC scheduling with a mutator utilization of ~97%, meaning we allow spending ~3% time in the mutator. This excludes barriers due to measurement overhead.
- Pointer compression with read and write barriers (*)
- Write barriers for incremental and concurrent marking
- We currently optimize for main thread performance and considers concurrent threads to be mostly free (we do use Jobs API to not overload the system)
- We allocate using LABs which can result in fragmentation for regular objects
- Only basic cross-thread handling

Here's a few things you currently get:
+ No UAFs
+ LAB-based allocation often results in great locality
+ We can use compaction for backing stores (HeapVector and friends). (We can even use explicit free for backings (HeapVector::Clear() has GC support)
+ Destructors run on the originating thread but all the free-list handling (and discarding of pages) runs concurrently
+ Pointer compression results in 2-4% PMF improvements per renderer process.
+ We mark most objects concurrently which means the price for compujting liveness on the main thread is very small.

If you want to create a workload that's slower with Oilpan compared to malloc() I'd suggest a write-heavy workload where locality isn't an issue with malloc() itself. E.g., a std::sort() on HeapVector based on full pointer comparison without actually using the result should be bad. As of today you can also repeatedly allocate short-lived objects and it will be worse than with malloc() because our young generation is still a prototype that's waiting to be productionized.

Is any of this relevant? Likely depends on the workload....

(*) With PGO there's little to no overhead on e.g. Speedometer2 and MotionMark. 

Thanks for all the details Michael.

To your point about no UAFs. While UAFs in blink code may be more unlikely, there is still the possibility of memory errors, they just end up in a different part of the code base: https://bugs.chromium.org/p/chromium/issues/detail?id=1354535 .

  -Scott

Michael Lippautz

unread,
Aug 30, 2022, 11:37:57 AM8/30/22
to Scott Violet, Daniel Cheng, Michael Lippautz, Kentaro Hara, Yuki Yamada, Stefan Zager, Keishi Hattori, Koji Ishii, platform-architecture-dev, Clark Duvall
Yes, there's definitely room for UAFs in Blink :/

Historically, this was an argument for Oilpan though as within the GCed environment there's no UAFs (% the occasional GC bug that definitely does exist). As @Daniel Cheng mentioned above, it's also the boundary between malloc and GCs that results in problems.

Stefan Zager

unread,
Aug 30, 2022, 12:35:58 PM8/30/22
to Michael Lippautz, Scott Violet, Daniel Cheng, Kentaro Hara, Yuki Yamada, Keishi Hattori, Koji Ishii, platform-architecture-dev, Clark Duvall
On Tue, Aug 30, 2022 at 8:37 AM Michael Lippautz <mlip...@chromium.org> wrote:

Historically, this was an argument for Oilpan though as within the GCed environment there's no UAFs (% the occasional GC bug that definitely does exist). As @Daniel Cheng mentioned above, it's also the boundary between malloc and GCs that results in problems.

I'd be interested to know, with the benefit of your experience, if there are common patterns to how UAF's tend to show up on that boundary. I assume that pointers from malloc'ed objects to gc'ed objects are the biggest problem, but are there other frequent smoking guns?

Daniel Cheng

unread,
Aug 31, 2022, 8:05:10 PM8/31/22
to Stefan Zager, Michael Lippautz, Scott Violet, Kentaro Hara, Yuki Yamada, Keishi Hattori, Koji Ishii, platform-architecture-dev, Clark Duvall
Often, these bugs have to do with the fact that finalization is deferred. So if you forget to use something like blink::HeapMojoReceiver and use mojo::Receiver directly instead, incoming IPCs might be dispatched to an object that is already pending finalization. This is dangerous because even though `this` is not finalized yet, `this` may reference other OIlpan objects which have already been finalized.

Other instances that were buggy in the past were things like using timers (it used heuristics to infer if the Timer was embedded in a GCed object, which broke when something that didn't inherit from GarbageCollected directly but was embedded in *another* class which was garbage collected), et cetera.

There's a non-exhaustive (and very very rough) discussion of some of the issues previously seen in https://docs.google.com/document/d/1EdNzrox5XsTkjAG0MLWeMhpCm_SrjOVCTqvlu9HEv1o/edit: I never had time to go back and make another pass through the document.

Daniel

Daniel Cheng

unread,
Aug 31, 2022, 8:33:36 PM8/31/22
to Stefan Zager, Michael Lippautz, Scott Violet, Kentaro Hara, Yuki Yamada, Keishi Hattori, Koji Ishii, platform-architecture-dev, Clark Duvall
Sorry, please ignore the previous link; use https://docs.google.com/document/d/1EdNzrox5XsTkjAG0MLWeMhpCm_SrjOVCTqvlu9HEv1o/edit?resourcekey=0-oAx_fS7rOnKaMbbR-B2HNg instead. Apparently it's critical to include the resource key now?

(Note: this is a google.com doc)

Daniel
Reply all
Reply to author
Forward
0 new messages