Recent Improvements to Functions like getClientBoundingRect

Clark Gaebel

unread,

Aug 28, 2014, 8:45:19 PM8/28/14

to dev-...@lists.mozilla.org

Hi servo-dev!

Servo exists to validate the idea that parallel browser architectures work. Going parallel isn't always a good thing, and can sometimes be worse if there's too much communication overhead. For example, in the current Servo design, javascript is run in a different task than layout. This is great, but it means that javascript calls that require communication between the tasks can incur a lot of overhead.

Consider this HTML/JS:

<html>
<script>

var ready = function() {
var ident = document.getElementById("some_div");
var left_sum = 0;
var top_sum = 0;
var right_sum = 0;
var bottom_sum = 0;
var t0 = +new Date();
for(var i = 0; i < 10000000; i++) {
var rect = ident.getBoundingClientRect();
left_sum += rect.left;
top_sum += rect.top;
right_sum += rect.right;
bottom_sum += rect.bottom;
}
var t1 = +new Date();

ident.appendChild(document.createTextNode("sums: (" + left_sum + ", " + top_sum + ", " + right_sum + ", " + bottom_sum + ") "));
ident.appendChild(document.createTextNode("dt: " + (t1 - t0) + " ms"));
}

//document.addEventListener("DOMContentLoaded", ready, false)
window.onload = ready;
</script>
<body>
<div id="some_div">Working...</div>
</body>
</html>

Running this on Firefox takes 500 ns/iteration. Chrome takes 700 ns/iteration.

Servo before [1] lands took 8100 ns! That's paying a lot (some would say too much) for a parallel architecture, when simple queries experience a 10x slowdown.

However, thanks to [1], Servo is down to 950 ns/iteration. This is very competitive with Firefox and Chrome, especially when considering the mutex involved. I'm sure with some micro-optimization work we can get closer.

Because of these results, I believe that communication overhead between script and layout can be (and has been) reduced to a competitive amount, while still maintaining the benefits of parallelization.

Regards,
- Clark

[1] https://github.com/servo/servo/pull/3164

Cameron Zwarich

unread,

Aug 28, 2014, 8:56:28 PM8/28/14

to Clark Gaebel, dev-...@lists.mozilla.org

It’s nice that it’s so close to the competition. It would be interesting to see numbers on ARM as well, since the relative cost of the atomic instructions might be higher, even in the uncontended case.

Is it strictly enforced that the script task never sees inconsistent views of layout? This came up in the other thread about threading, but what prevents this incorrect scenario?

1) The script task takes the mutex to access one property of layout.
2) The script task releases the mutex.
3) Layout changes the property that was accessed.
4) The script task takes the mutex again to access the same property, in the same turn of the event loop without modifying layout in any intervening work since the last attempt.
5) The script task reads a different value from before.

Cameron

> _______________________________________________
> dev-servo mailing list
> dev-...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-servo

Patrick Walton

unread,

Aug 28, 2014, 9:00:42 PM8/28/14

to dev-...@lists.mozilla.org

On 8/28/14 5:56 PM, Cameron Zwarich wrote:
> Itï¿½s nice that itï¿½s so close to the competition. It would be interesting to see numbers on ARM as well, since the relative cost of the atomic instructions might be higher, even in the uncontended case.

>
> Is it strictly enforced that the script task never sees inconsistent views of layout? This came up in the other thread about threading, but what prevents this incorrect scenario?
>
> 1) The script task takes the mutex to access one property of layout.
> 2) The script task releases the mutex.
> 3) Layout changes the property that was accessed.
> 4) The script task takes the mutex again to access the same property, in the same turn of the event loop without modifying layout in any intervening work since the last attempt.
> 5) The script task reads a different value from before.

Doh. I wonder if we should just keep the mutex held until the next turn
of the event loop (though don't take it at the outset until the moment
script starts reading back from layout).

This is actually even better for Clark's benchmark, as it reduces the
number of atomic operations in the tight loop to O(1) from O(n).

Patrick

Cameron Zwarich

unread,

Aug 28, 2014, 9:06:11 PM8/28/14

to Patrick Walton, dev-...@lists.mozilla.org

On Aug 28, 2014, at 6:00 PM, Patrick Walton <pcwa...@mozilla.com> wrote:

> On 8/28/14 5:56 PM, Cameron Zwarich wrote:

>> It’s nice that it’s so close to the competition. It would be interesting to see numbers on ARM as well, since the relative cost of the atomic instructions might be higher, even in the uncontended case.

I assumed that was the case, but was going to wait for his response before the obvious follow-up question. We did a similar thing with iOS WebKit: a recursive mutex that was only released on the turn of an event loop. It was universally regarded as being a terrible idea, but nobody had a better solution.

This does mean that we get little-to-no parallelism with things like interactive touch event processing, but that might just be impossible with the web as-is.

Cameron

Patrick Walton

unread,

Aug 28, 2014, 9:09:59 PM8/28/14

to Cameron Zwarich, dev-...@lists.mozilla.org

On 8/28/14 6:06 PM, Cameron Zwarich wrote:
> I assumed that was the case, but was going to wait for his response before the obvious follow-up question. We did a similar thing with iOS WebKit: a recursive mutex that was only released on the turn of an event loop. It was universally regarded as being a terrible idea, but nobody had a better solution.
>
> This does mean that we get little-to-no parallelism with things like interactive touch event processing, but that might just be impossible with the web as-is.

Yeah, there's only so far we can go with the Web APIs as they exist
today. But I think that it may be worth thinking about either
introducing new APIs or ways to encourage Web authors to use existing
ones to get better parallelism. For example, in this case, Web
developers could use `setTimeout(0)`/`postMessage()`/`setImmediate()` to
drop the mutex, and if we can show that the parallelism enables real
performance gains then that's not a bad outcome.

Patrick

Robert O'Callahan

unread,

Aug 28, 2014, 10:05:09 PM8/28/14

to Cameron Zwarich, dev-...@lists.mozilla.org, Clark Gaebel

On Fri, Aug 29, 2014 at 12:56 PM, Cameron Zwarich <zwa...@mozilla.com>
wrote:

> Is it strictly enforced that the script task never sees inconsistent views
> of layout? This came up in the other thread about threading, but what
> prevents this incorrect scenario?
>
> 1) The script task takes the mutex to access one property of layout.
> 2) The script task releases the mutex.
> 3) Layout changes the property that was accessed.
> 4) The script task takes the mutex again to access the same property, in
> the same turn of the event loop without modifying layout in any intervening
> work since the last attempt.
> 5) The script task reads a different value from before.
>

I'm confused. Before or during step 1, the layout must be brought up to
date (flushed, in Gecko parlance). So step 3 shouldn't happen since layout
would already be fully up to date.

Rob
--
oIo otoeololo oyooouo otohoaoto oaonoyooonoeo owohooo oioso oaonogoroyo
owoiotoho oao oboroootohoeoro oooro osoiosotoeoro owoiololo oboeo
osouobojoeocoto otooo ojouodogomoeonoto.o oAogoaoiono,o oaonoyooonoeo
owohooo
osoaoyoso otooo oao oboroootohoeoro oooro osoiosotoeoro,o o‘oRoaocoao,o’o
oioso
oaonosowoeoroaoboloeo otooo otohoeo ocooouoroto.o oAonodo oaonoyooonoeo
owohooo
osoaoyoso,o o‘oYooouo ofooooolo!o’o owoiololo oboeo oiono odoaonogoeoro
ooofo
otohoeo ofoioroeo ooofo ohoeololo.

Patrick Walton

unread,

Aug 28, 2014, 10:10:49 PM8/28/14

to rob...@ocallahan.org, Cameron Zwarich, dev-...@lists.mozilla.org, Clark Gaebel

It might happen if layout is flushed from outside the script task; window resizing/device rotation being what immediately comes to mind, as today in Servo those events go straight from compositor to layout without hitting the script task at all. (As an alternative design, we could route such events through the script task; this would remove the necessity of the mutex but would block layout for such events if script is running, even if the script hasn't touched the DOM.)

Patrick

>_______________________________________________
>dev-servo mailing list
>dev-...@lists.mozilla.org
>https://lists.mozilla.org/listinfo/dev-servo

--
Sent from my Android phone with K-9 Mail. Please excuse my brevity.

Cameron Zwarich

unread,

Aug 28, 2014, 10:14:02 PM8/28/14

to Patrick Walton, dev-...@lists.mozilla.org, rob...@ocallahan.org, Clark Gaebel

Do such events always cause the layout task to require DOM access to create the flow tree? If so, the layout task would still have to wait for the script task to finish, meaning that layout still can’t occur unless forced by script.

Cameron

On Aug 28, 2014, at 7:10 PM, Patrick Walton <pwa...@mozilla.com> wrote:

> It might happen if layout is flushed from outside the script task; window resizing/device rotation being what immediately comes to mind, as today in Servo those events go straight from compositor to layout without hitting the script task at all. (As an alternative design, we could route such events through the script task; this would remove the necessity of the mutex but would block layout for such events if script is running, even if the script hasn't touched the DOM.)
>
> Patrick
>
> On August 28, 2014 7:05:09 PM PDT, Robert O'Callahan <rob...@ocallahan.org> wrote:
> On Fri, Aug 29, 2014 at 12:56 PM, Cameron Zwarich <zwa...@mozilla.com>
> wrote:
>
> Is it strictly enforced that the script task never sees inconsistent views
> of layout? This came up in the other thread about threading, but what
> prevents this incorrect scenario?
>
> 1) The script task takes the mutex to access one property of layout.
> 2) The script task releases the mutex.
> 3) Layout changes the property that was accessed.
> 4) The script task takes the mutex again to access the same property, in
> the same turn of the event loop without modifying layout in any intervening
> work since the last attempt.
> 5) The script task reads a different value from before.
>
>
> I'm confused. Before or during step 1, the layout must be brought up to
> date (flushed, in Gecko
> parlance). So step 3 shouldn't happen since layout
> would already be fully up to date.
>
> Rob
>
> --

Robert O'Callahan

unread,

Aug 28, 2014, 10:18:27 PM8/28/14

to Patrick Walton, Cameron Zwarich, dev-...@lists.mozilla.org, Clark Gaebel

On Fri, Aug 29, 2014 at 2:10 PM, Patrick Walton <pwa...@mozilla.com> wrote:

> It might happen if layout is flushed from outside the script task; window
> resizing/device rotation being what immediately comes to mind, as today in
> Servo those events go straight from compositor to layout without hitting
> the script task at all. (As an alternative design, we could route such
> events through the script task; this would remove the necessity of the
> mutex but would block layout for such events if script is running, even if
> the script hasn't touched the DOM.)
>

Hmm. So given

var v = e.getBoundingClientRect();
// layout change is triggered by window resizing or whatever
var v2 = e.getBoundingClientRect();

what in Servo, prior to Clark's work, ensures v and v2 are the same?

Patrick Walton

unread,

Aug 28, 2014, 10:18:13 PM8/28/14

to Cameron Zwarich, dev-...@lists.mozilla.org, rob...@ocallahan.org, Clark Gaebel

Good point. I believe that the answer is no in general, but there are special cases in which the flow tree must be rebuilt at least in part. Normally the flow tree can be reused on window resize/device rotation/CSS animation, but there are special cases in which it can't (e.g. media queries). But we can test for that up front.

Patrick

On August 28, 2014 7:14:02 PM PDT, Cameron Zwarich <zwa...@mozilla.com> wrote:
>Do such events always cause the layout task to require DOM access to
>create the flow tree? If so, the layout task would still have to wait
>for the script task to finish, meaning that layout still can’t occur
>unless forced by script.
>
>Cameron
>

>On Aug 28, 2014, at 7:10 PM, Patrick Walton <pwa...@mozilla.com>

>wrote:
>
>> It might happen if layout is flushed from outside the script task;
>window resizing/device rotation being what immediately comes to mind,
>as today in Servo those events go straight from compositor to layout
>without hitting the script task at all. (As an alternative design, we
>could route such events through the script task; this would remove the
>necessity of the mutex but would block layout for such events if script
>is running, even if the script hasn't touched the DOM.)
>>

Patrick Walton

unread,

Aug 28, 2014, 10:20:27 PM8/28/14

to rob...@ocallahan.org, Cameron Zwarich, dev-...@lists.mozilla.org, Clark Gaebel

I believe the answer today is "nothing"--i.e. it's a Servo bug. Clark's work doubles as a nice way to fix it :)

Patrick

On August 28, 2014 7:18:27 PM PDT, Robert O'Callahan <rob...@ocallahan.org> wrote:
>On Fri, Aug 29, 2014 at 2:10 PM, Patrick Walton <pwa...@mozilla.com>

>wrote:
>
>> It might happen if layout is flushed from outside the script task;
>window
>> resizing/device rotation being what immediately comes to mind, as
>today in
>> Servo those events go straight from compositor to layout without
>hitting
>> the script task at all. (As an alternative design, we could route
>such
>> events through the script task; this would remove the necessity of
>the
>> mutex but would block layout for such events if script is running,
>even if
>> the script hasn't touched the DOM.)
>>
>

>Hmm. So given
>
>var v = e.getBoundingClientRect();
>// layout change is triggered by window resizing or whatever
>var v2 = e.getBoundingClientRect();
>
>what in Servo, prior to Clark's work, ensures v and v2 are the same?
>
>Rob
>--
>oIo otoeololo oyooouo otohoaoto oaonoyooonoeo owohooo oioso oaonogoroyo
>owoiotoho oao oboroootohoeoro oooro osoiosotoeoro owoiololo oboeo
>osouobojoeocoto otooo ojouodogomoeonoto.o oAogoaoiono,o oaonoyooonoeo
>owohooo
>osoaoyoso otooo oao oboroootohoeoro oooro osoiosotoeoro,o
>o‘oRoaocoao,o’o
>oioso
>oaonosowoeoroaoboloeo otooo otohoeo ocooouoroto.o oAonodo oaonoyooonoeo
>owohooo
>osoaoyoso,o o‘oYooouo ofooooolo!o’o owoiololo oboeo oiono odoaonogoeoro
>ooofo
>otohoeo ofoioroeo ooofo ohoeololo.

Cameron Zwarich

unread,

Aug 28, 2014, 10:26:07 PM8/28/14

to Clark Gaebel, dev-...@lists.mozilla.org

On Aug 28, 2014, at 5:45 PM, Clark Gaebel <cga...@mozilla.com> wrote:

> Running this on Firefox takes 500 ns/iteration. Chrome takes 700 ns/iteration.
>
> Servo before [1] lands took 8100 ns! That's paying a lot (some would say too much) for a parallel architecture, when simple queries experience a 10x slowdown.
>
> However, thanks to [1], Servo is down to 950 ns/iteration. This is very competitive with Firefox and Chrome, especially when considering the mutex involved. I'm sure with some micro-optimization work we can get closer.

As a side point, why is there a 7 us overhead here for message-passing between green threads? Is it really that bad? Are script and layout currently green tasks, or did something land causing this to not be the case?

Cameron

Lars Bergstrom

unread,

Aug 29, 2014, 8:27:13 AM8/29/14

to dev-...@lists.mozilla.org

Yes, the current perf issue is a bug due to a workaround landed to move script onto a native task to avoid some bugs related to Servo's very-old SpiderMonkey but very-new Web Workers support:
https://github.com/servo/servo/pull/2842

That architecture will probably persist until we get a SpiderMonkey upgrade that will allow us to switch script back to also being on a green thread, per the discussion in this thread:
https://github.com/servo/servo/pull/2915#issuecomment-50355651

At that point, I'd expect script to layout calls via message passing to be much cheaper than a mutex acquisition, but would be very interested to see numbers comparing them!

In general, I'd be a little wary of adding global locks into Servo unless it's really necessary (i.e., if it turns out we can't architect away the race condition Cameron brought up in any other way). Since we haven't found a good way to do any concurrent protocol verification yet for Servo, it's really easy to end up writing deadlocky or racy code (mainly w.r.t. underlying native resource allocation/destruction). For example, I spent way too much time getting shutdown "cleaned up" so that we don't intermittently crash due to attempting to render to graphics contexts that had been destroyed too early, and that was just tracking through our spaghetti message passing code, with no locks to reason about.
- Lars

Josh Matthews

unread,

Aug 29, 2014, 9:52:51 AM8/29/14

to mozilla-...@lists.mozilla.org

On 2014-08-29 8:27 AM, Lars Bergstrom wrote:
> On Aug 28, 2014, at 9:26 PM, Cameron Zwarich <zwa...@mozilla.com> wrote:
>> As a side point, why is there a 7 us overhead here for message-passing between green threads? Is it really that bad? Are script and layout currently green tasks, or did something land causing this to not be the case?
>
> Yes, the current perf issue is a bug due to a workaround landed to move script onto a native task to avoid some bugs related to Servo's very-old SpiderMonkey but very-new Web Workers support:
> https://github.com/servo/servo/pull/2842

That only affected workers, not the main script task.

> That architecture will probably persist until we get a SpiderMonkey upgrade that will allow us to switch script back to also being on a green thread, per the discussion in this thread:
> https://github.com/servo/servo/pull/2915#issuecomment-50355651

We should not assume that anybody on the SpiderMonkey is doing that
work. If we're going to bet on this, we're going to need to push on
them/do it ourselves.

> At that point, I'd expect script to layout calls via message passing to be much cheaper than a mutex acquisition, but would be very interested to see numbers comparing them!
>
> In general, I'd be a little wary of adding global locks into Servo unless it's really necessary (i.e., if it turns out we can't architect away the race condition Cameron brought up in any other way). Since we haven't found a good way to do any concurrent protocol verification yet for Servo, it's really easy to end up writing deadlocky or racy code (mainly w.r.t. underlying native resource allocation/destruction). For example, I spent way too much time getting shutdown "cleaned up" so that we don't intermittently crash due to attempting to render to graphics contexts that had been destroyed too early, and that was just tracking through our spaghetti message passing code, with no locks to reason about.
> - Lars
>

Cheers,
Josh

Josh Matthews

unread,

Aug 29, 2014, 10:08:36 AM8/29/14

to mozilla-...@lists.mozilla.org

On 2014-08-29 9:52 AM, Josh Matthews wrote:
> On 2014-08-29 8:27 AM, Lars Bergstrom wrote:
>> At that point, I'd expect script to layout calls via message passing
>> to be much cheaper than a mutex acquisition, but would be very
>> interested to see numbers comparing them!

To be clear, we're spawning a green task for each script task today
(despite the fact that we should not be, according to our JSAPI use).

Patrick Walton

unread,

Aug 29, 2014, 12:30:25 PM8/29/14

to dev-...@lists.mozilla.org

On 8/29/14 5:27 AM, Lars Bergstrom wrote:
> In general, I'd be a little wary of adding global locks into Servo
> unless it's really necessary (i.e., if it turns out we can't
> architect away the race condition Cameron brought up in any other
> way). Since we haven't found a good way to do any concurrent protocol
> verification yet for Servo, it's really easy to end up writing
> deadlocky or racy code (mainly w.r.t. underlying native resource
> allocation/destruction). For example, I spent way too much time
> getting shutdown "cleaned up" so that we don't intermittently crash
> due to attempting to render to graphics contexts that had been
> destroyed too early, and that was just tracking through our spaghetti
> message passing code, with no locks to reason about.

I agree in general. `MutexArc` should be your last resort.

I think that this one is worth making an exception for, though: it's
pretty fundamental to the design, and we can potentially abstract it
away via the "RPC" mechanism. I definitely agree that we shouldn't end
up with a system where you have to sprinkle "lock_dom()" calls
throughout script and layout: the locking should happen in a single
well-defined place.

Patrick

Cameron Zwarich

unread,

Aug 29, 2014, 5:52:18 PM8/29/14

to Patrick Walton, dev-...@lists.mozilla.org, rob...@ocallahan.org, Clark Gaebel

I filed both of the issues discussed on this thread:

https://github.com/servo/servo/issues/3187
https://github.com/servo/servo/issues/3188

Cameron

On Aug 28, 2014, at 7:20 PM, Patrick Walton <pwa...@mozilla.com> wrote:

> I believe the answer today is "nothing"--i.e. it's a Servo bug. Clark's work doubles as a nice way to fix it :)
>
> Patrick
>
> On August 28, 2014 7:18:27 PM PDT, Robert O'Callahan <rob...@ocallahan.org> wrote:
> On Fri, Aug 29, 2014 at 2:10 PM, Patrick Walton <pwa...@mozilla.com> wrote:
> It might happen if layout is flushed from outside the script task; window resizing/device rotation being what immediately comes to mind, as today in Servo those events go straight from compositor to layout without hitting the script task at all. (As an alternative design, we could route such events through the script task; this would remove the necessity of the mutex but would block layout for such events if script is running, even if the script hasn't touched the DOM.)
>
> Hmm. So given
>
> var v = e.getBoundingClientRect();
> // layout change is triggered by window resizing or whatever
> var v2 = e.getBoundingClientRect();
>
> what in Servo, prior to Clark's work, ensures v and v2 are the same?
>
> Rob
>
> --