Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

Unanswered questions from a Q&A about Servo

655 views
Skip to first unread message

Josh Matthews

unread,
Jul 12, 2013, 1:51:28 PM7/12/13
to mozilla-...@lists.mozilla.org
I talked to a room of Gecko platform engineers about Servo today. I drew
a diagram of the most recent architecture plans (constellation; multiple
pipelines of script (with copy-on-write DOM planned), layout, renderer,
and compositer; script tasks with multiple pages per origin;
cross-origin frames linked to separate pipelines; resource and image
cache tasks) and took any questions that people had to the best of my
ability. Here are the questions for which I had no good answers; please
discuss them!

* Are there viability criteria? How much better (performance-wise) than
other engines does Servo have to be considered worth continued investment?

* What are the expectations of parallel wins relative to Gecko? Where
are the wins, and how much?

* What's the timeline for determining viability?

* What is the plan to handle cycles between the DOM and JS?

* Do same-origin pages that can't communicate synchronously need to be
on the same script task? (scenario: two tabs pointing at google.com
properties)

* What's the status of allocating DOM objects on the JS heap and/or
using the JS GC for the DOM task?

* Are we ignoring the major known issue on the web (synchronous layout
operations) in favour of fixing the less important ones? (mentioned
running GC during sync layout; audience not particularly impressed)

* Could we "freeze" background pages (on a language level?) and unload
them in order to conserve memory and avoid extra processing work, with
the expectation of resurrecting them later as needed?

* How can we handle individual task failure, such as layout/rendering?
Could we recover by spawning a new task, or do we need to create a whole
pipeline from scratch?

* Is abortable layout being considered for short-circuiting in-progress
operations when new results are needed?

* If compositing ends up happening no sooner than Gecko (ie. we're still
a sequential pipeline starting from layout), aren't we wasting effort
being parallel (more cores, more work, same amount of time)?

* Do we have data showing how many security bugs we could be avoiding in
Servo in comparison to Gecko? Is the security benefit truly as valuable
if expected performance benefits don't pan out?

* For cases with lots of synchronous layout, could there be a way to run
layout and script sequentially, same-task, such that no async
message-passing is necessary?

* Can we have more traffic on dev-servo for people who want to follow
what's going on? (one suggestion: posting links to meeting notes every week)

Cheers,
Josh

Benjamin Smedberg

unread,
Jul 12, 2013, 2:04:02 PM7/12/13
to Josh Matthews, mozilla-...@lists.mozilla.org
On 7/12/2013 1:51 PM, Josh Matthews wrote:
>
> * Are there viability criteria? How much better (performance-wise)
> than other engines does Servo have to be considered worth continued
> investment?
I suspect that the answer to this depends also on the security question
asked below. If we get "on-par" performance with Gecko, but we're
writing in a memory-safe language, is that worth it?

I'd like to understand what the parallelism targets are... if DOM and
layout are not parallelized with a single DOM tree but we can run
multiple of these tasks at once (so that tabs are isolated from
eachother) is that good enough for a v1 product? Is that a different
target than the servo team is currently looking at as a research project?


>
>
> * What is the plan to handle cycles between the DOM and JS?
This seems like a pretty fundamental question! I was sure that at some
point in the past, we were committed to servo having a single memory
system across DOM nodes and JS. I understood that we were still
discussing whether a cycle collector or a GC or some hybrid of those was
the best option. If the current plan is different, what is it?


>
> * Do same-origin pages that can't communicate synchronously need to be
> on the same script task? (scenario: two tabs pointing at google.com
> properties)
I cannot think of any reason they would have to be.


>
> * Can we have more traffic on dev-servo for people who want to follow
> what's going on? (one suggestion: posting links to meeting notes every
> week)
Hear hear. I'm only lurking on servo stuff at the moment, but I'd like
to keep an understanding of the current state of research and
decision-making on the project.

--BDS

Patrick Walton

unread,
Jul 12, 2013, 2:33:28 PM7/12/13
to dev-...@lists.mozilla.org
Here are my attempts at answers.

On 7/12/13 10:51 AM, Josh Matthews wrote:
> * Are there viability criteria? How much better (performance-wise) than
> other engines does Servo have to be considered worth continued investment?

No idea. There are both security and performance wins, so both must be
considered.

> * What are the expectations of parallel wins relative to Gecko? Where
> are the wins, and how much?

Leo Meyerovich's work (Fast and Parallel Webpage Layout) has some
numbers that he got from parallel speedups (see section 5.4, Performance
Evaluation):

http://www.eecs.berkeley.edu/~lmeyerov/projects/pbrowser/pubfiles/playout.pdf

> * What's the timeline for determining viability?

Hard to estimate at this stage; as much as possible we're trying to get
early numbers, but of course we have to ensure that our numbers mean
something. (A browser that does nothing will be very fast, but that
number doesn't mean much!)

In general we've been finding that our small microbenchmarks such as
time spent to perform synchronous message sends (discussed later) are
encouraging, but until most of the pieces (for example, pure-Rust CSS
selector matching and dirty bits for reflow) are in, the numbers will
not be competitive with Gecko for the simple reason that layout engine
is incomplete.

> * What is the plan to handle cycles between the DOM and JS?

The JavaScript garbage collector handles all DOM objects, so the JS
garbage collector will trace all DOM-JS cycles and clean them up.

> * Do same-origin pages that can't communicate synchronously need to be
> on the same script task? (scenario: two tabs pointing at google.com
> properties)

No. Chromium uses separate processes for these and the plan as far as I
understand it is to do the same with tasks/processes.

> * What's the status of allocating DOM objects on the JS heap and/or
> using the JS GC for the DOM task?

The DOM task doesn't use `@` pointers very much, but where it does,
they're in the Rust heap and can't have strong references to DOM nodes
(other than the root of the document).

> * Are we ignoring the major known issue on the web (synchronous layout
> operations) in favour of fixing the less important ones? (mentioned
> running GC during sync layout; audience not particularly impressed)

I wouldn't say we're ignoring them; we've talked about prototyping, and
proposing for standardization, asynchronous layout operations (e.g.
getBoundingClientRectAsync). Running GC during sync layout is for
helping extant Web content; ideally the Web shouldn't be blocking on
layout operations in the first place, and Servo can help it get there.

> * Could we "freeze" background pages (on a language level?) and unload
> them in order to conserve memory and avoid extra processing work, with
> the expectation of resurrecting them later as needed?

Rust does have good support for serialization of object graphs, so this
is potentially feasible. However, the JS engine is a question mark here:
probably work would need to be done to serialize the JS state of a page.

> * How can we handle individual task failure, such as layout/rendering?
> Could we recover by spawning a new task, or do we need to create a whole
> pipeline from scratch?

We should be able to recover by spawning a new task.

> * Is abortable layout being considered for short-circuiting in-progress
> operations when new results are needed?

We've thought about it, but it isn't currently being done.

> * If compositing ends up happening no sooner than Gecko (ie. we're still
> a sequential pipeline starting from layout), aren't we wasting effort
> being parallel (more cores, more work, same amount of time)?

If we don't use the pipelining, it's true that it's a loss. But the
hypothesis is that pipelining will kick in enough for it to be useful:
we want to allow e.g. script to run while layout is running, or for
layout to be able to return to script while Azure is rendering the new
tiles for display.

> * Do we have data showing how many security bugs we could be avoiding in
> Servo in comparison to Gecko? Is the security benefit truly as valuable
> if expected performance benefits don't pan out?

We've been talking to some members of the security team (Jesse, Brian).
In general the main class of security vulnerabilities that Rust offers a
layer of defense against is memory safety problems in layout, rendering,
and compositing code. Use-after-free is the big one here, but there are
others. I'm not in the sg so I can't run the numbers myself, but I am
told this constitutes a large class of security vulnerabilities.

The pwn2own and Pwnium results demonstrate, at least to me, that memory
safety is still valuable even in the presence of intricate sandboxing.
We need defense in depth, and Rust's type system provides a strong layer
of defense for new code.

Servo is *also* designed to be amenable to OS sandboxing, so that
processes compromised via unsafe code or the JIT can be stopped from
taking over the system. In general, although we don't have fine-grained
sandboxing today, we try to specify the interface so that we can add
process-level sandboxing in the future and keep most of the code intact.
Rust's type system helps a lot here by carefully circumscribing where
memory can be shared. Single-process shared-nothing message passing
designs should be able to be readily ported to multi-process designs.

Of course, Gecko can do the latter with e10s, but the viability of e10s
is not certain on desktop, as I understand things (though please correct
me if I'm wrong).

> * For cases with lots of synchronous layout, could there be a way to run
> layout and script sequentially, same-task, such that no async
> message-passing is necessary?

Well, if you did it as written, you would need to then migrate the
render tree to another thread in the asynchronous case.

However, Brian has been doing a lot of work in the Rust runtime to
achieve the benefits of this. In the new scheduler, synchronously
sending a message to a sleeping task immediately switches to that task
*on the same OS thread*, without a trip through the OS kernel or Rust
scheduler. This significantly improves message passing performance in
the synchronous case. (IIRC the early, unoptimized numbers in the new
scheduler were on the order of a couple of microseconds, which is
dwarfed by the time actually spent to do the layout, and they should be
improvable.)

> * Can we have more traffic on dev-servo for people who want to follow
> what's going on? (one suggestion: posting links to meeting notes every
> week)

Sure, we absolutely should do that.

Patrick

Patrick Walton

unread,
Jul 12, 2013, 2:36:14 PM7/12/13
to dev-...@lists.mozilla.org
On 7/12/13 11:04 AM, Benjamin Smedberg wrote:
> I'd like to understand what the parallelism targets are... if DOM and
> layout are not parallelized with a single DOM tree but we can run
> multiple of these tasks at once (so that tabs are isolated from
> eachother) is that good enough for a v1 product? Is that a different
> target than the servo team is currently looking at as a research project?

It depends on how much the pipelining buys us. Does the ability to
return immediately to script events instead of waiting for Azure to
render tiles (for example) help us a lot? We must measure. (But to have
reasonable measurements, we also need to get some of the basic
optimizations implemented, for example incremental reflow, so that we
stack up reasonably against Gecko in the control case.)

Note that layout is sort of parallelized today, in that image decoding
happens in parallel, but that's only a small part (and one that Gecko
does too).

Patrick

Josh Matthews

unread,
Jul 12, 2013, 3:18:11 PM7/12/13
to mozilla-...@lists.mozilla.org
On 07/12/2013 02:33 PM, Patrick Walton wrote:
>> * What is the plan to handle cycles between the DOM and JS?
>
> The JavaScript garbage collector handles all DOM objects, so the JS
> garbage collector will trace all DOM-JS cycles and clean them up.

What does this mean precisely? Are we going to need to add trace hooks
to every DOM object, ensuring that we trace the JS wrappers of any other
DOM objects that are owned? Off the top of my head that sounds like it
should catch all possible cycles, but someone like Kyle Huey can
probably confirm.

>> * What's the status of allocating DOM objects on the JS heap and/or
>> using the JS GC for the DOM task?
>
> The DOM task doesn't use `@` pointers very much, but where it does,
> they're in the Rust heap and can't have strong references to DOM nodes
> (other than the root of the document).

When you say "very much", I get worried. All codegen'ed DOM code uses @
pointers, and I don't understand what you mean by "can't have strong
references to DOM nodes"; consider events we eventually want to dispatch
that are targeted at DOM nodes and have properties by which this target
can be retrieved.

Kyle Huey

unread,
Jul 12, 2013, 3:22:37 PM7/12/13
to Josh Matthews, mozilla-...@lists.mozilla.org
On Fri, Jul 12, 2013 at 12:18 PM, Josh Matthews <jo...@joshmatthews.net>wrote:

> On 07/12/2013 02:33 PM, Patrick Walton wrote:
>
>> * What is the plan to handle cycles between the DOM and JS?
>>>
>>
>> The JavaScript garbage collector handles all DOM objects, so the JS
>> garbage collector will trace all DOM-JS cycles and clean them up.
>>
>
> What does this mean precisely? Are we going to need to add trace hooks to
> every DOM object, ensuring that we trace the JS wrappers of any other DOM
> objects that are owned? Off the top of my head that sounds like it should
> catch all possible cycles, but someone like Kyle Huey can probably confirm.
>

What is going to handle DOM-DOM references (e.g. in the node tree)?

- Kyle

Josh Matthews

unread,
Jul 12, 2013, 3:29:45 PM7/12/13
to mozilla-...@lists.mozilla.org
Every DOM object comes into existence with a JS wrapper, so the trace
hooks handle that, yes? For example, Elements have a trace hook that
traces the parent, first and last children, and next and previous
siblings
(https://github.com/mozilla/servo/blob/master/src/components/script/dom/bindings/element.rs#L44).

Kyle Huey

unread,
Jul 12, 2013, 3:34:52 PM7/12/13
to Josh Matthews, mozilla-...@lists.mozilla.org
On Fri, Jul 12, 2013 at 12:29 PM, Josh Matthews <jo...@joshmatthews.net>wrote:

> On 07/12/2013 03:22 PM, Kyle Huey wrote:
>
> Every DOM object comes into existence with a JS wrapper, so the trace
> hooks handle that, yes? For example, Elements have a trace hook that traces
> the parent, first and last children, and next and previous siblings (
> https://github.com/mozilla/**servo/blob/master/src/**
> components/script/dom/**bindings/element.rs#L44<https://github.com/mozilla/servo/blob/master/src/components/script/dom/bindings/element.rs#L44>).
>
> ______________________________**_________________
> dev-servo mailing list
> dev-...@lists.mozilla.org
> https://lists.mozilla.org/**listinfo/dev-servo<https://lists.mozilla.org/listinfo/dev-servo>
>

That means you have to eagerly create JS wrappers for every node in the DOM
tree. So you can't do lazy prototype setup, lazy wrapper creation, etc.
Those are big speed and size wins in Gecko ...

- Kyle

Josh Matthews

unread,
Jul 12, 2013, 3:43:55 PM7/12/13
to mozilla-...@lists.mozilla.org
Yes, this is the bed we have made for ourselves in Servo at this point.

Patrick Walton

unread,
Jul 12, 2013, 4:06:26 PM7/12/13
to dev-...@lists.mozilla.org
On 7/12/13 12:18 PM, Josh Matthews wrote:
> When you say "very much", I get worried. All codegen'ed DOM code uses @
> pointers

Does it need to use @ pointers? That seems unfortunate.

> and I don't understand what you mean by "can't have strong
> references to DOM nodes"; consider events we eventually want to dispatch
> that are targeted at DOM nodes and have properties by which this target
> can be retrieved.

OK, we'll need strong references there too. Anyway, that shouldn't be a
deal breaker unless they're cyclic. Anything that can have cycles should
be traceable by JS.

Patrick

Kyle Huey

unread,
Jul 12, 2013, 4:08:28 PM7/12/13
to Josh Matthews, mozilla-...@lists.mozilla.org
On Fri, Jul 12, 2013 at 12:43 PM, Josh Matthews <jo...@joshmatthews.net>wrote:

> On 07/12/2013 03:34 PM, Kyle Huey wrote:
>
>> On Fri, Jul 12, 2013 at 12:29 PM, Josh Matthews <jo...@joshmatthews.net
>> >wrote:
>>
>> On 07/12/2013 03:22 PM, Kyle Huey wrote:
>>>
>>> On Fri, Jul 12, 2013 at 12:18 PM, Josh Matthews <jo...@joshmatthews.net
>>>>
>>>>> wrote:
>>>>>
>>>>
>>>> On 07/12/2013 02:33 PM, Patrick Walton wrote:
>>>>
>>>>>
>>>>> * What is the plan to handle cycles between the DOM and JS?
>>>>>
>>>>>>
>>>>>>
>>>>>>>
>>>>>>> The JavaScript garbage collector handles all DOM objects, so the JS
>>>>>> garbage collector will trace all DOM-JS cycles and clean them up.
>>>>>>
>>>>>>
>>>>>> What does this mean precisely? Are we going to need to add trace
>>>>> hooks to
>>>>> every DOM object, ensuring that we trace the JS wrappers of any other
>>>>> DOM
>>>>> objects that are owned? Off the top of my head that sounds like it
>>>>> should
>>>>> catch all possible cycles, but someone like Kyle Huey can probably
>>>>> confirm.
>>>>>
>>>>>
>>>>> What is going to handle DOM-DOM references (e.g. in the node tree)?
>>>>
>>>> - Kyle
>>>>
>>>>
>>>> Every DOM object comes into existence with a JS wrapper, so the trace
>>> hooks handle that, yes? For example, Elements have a trace hook that
>>> traces
>>> the parent, first and last children, and next and previous siblings (
>>> https://github.com/mozilla/****servo/blob/master/src/**<https://github.com/mozilla/**servo/blob/master/src/**>
>>> components/script/dom/****bindings/element.rs#L44<https:**
>>> //github.com/mozilla/servo/**blob/master/src/components/**
>>> script/dom/bindings/element.**rs#L44<https://github.com/mozilla/servo/blob/master/src/components/script/dom/bindings/element.rs#L44>
>>> >).
>>>
>>> ______________________________****_________________
>>> dev-servo mailing list
>>> dev-...@lists.mozilla.org
>>> https://lists.mozilla.org/****listinfo/dev-servo<https://lists.mozilla.org/**listinfo/dev-servo>
>>> <https://**lists.mozilla.org/listinfo/**dev-servo<https://lists.mozilla.org/listinfo/dev-servo>
>>> >
>>>
>>>
>> That means you have to eagerly create JS wrappers for every node in the
>> DOM
>> tree. So you can't do lazy prototype setup, lazy wrapper creation, etc.
>> Those are big speed and size wins in Gecko ...
>>
>> - Kyle
>>
>>
> Yes, this is the bed we have made for ourselves in Servo at this point.
>
> ______________________________**_________________
> dev-servo mailing list
> dev-...@lists.mozilla.org
> https://lists.mozilla.org/**listinfo/dev-servo<https://lists.mozilla.org/listinfo/dev-servo>
>

Declaring everything to the JS GC will make you not leak, but like I said,
I'm not too fond of the side effects.

- Kyle

Patrick Walton

unread,
Jul 12, 2013, 4:12:33 PM7/12/13
to dev-...@lists.mozilla.org
On 7/12/13 12:43 PM, Josh Matthews wrote:
> Yes, this is the bed we have made for ourselves in Servo at this point.

Can we just have the JS trace hook for a DOM node recursively search
through children for wrappers, even those that don't have wrappers?

To illustrate, suppose we have this DOM:

O -- 1
/ \
X X
/ \ \
X X 2
|
3
|
4

O is the DOM node that has a wrapper which the JS garbage collector has
invoked the trace hook on. 1, 2, 3, and 4 are DOM nodes with wrappers.
Xs are DOM nodes without wrappers, because the wrappers have not been
created yet. When the trace hook is called on O, it searches recursively
through children and siblings and marks them, so it will mark 1, 2, and
3 as live. It stops once it finds a child node with a wrapper. Because 3
was marked, the JS engine then invokes the trace hook on it. In turn, it
marks 4 as live.

(I would prefer not to go the CC-and-GC route if we can avoid it,
although if we have to we can do that too, with good support from Rust.
Note that Chromium is adopting a full-GC model, like Servo's, with the
Oilpan project, although that's mostly driven by Dart.)

Patrick

Boris Zbarsky

unread,
Jul 12, 2013, 4:25:33 PM7/12/13
to mozilla-...@lists.mozilla.org
On 7/12/13 3:34 PM, Kyle Huey wrote:
> That means you have to eagerly create JS wrappers for every node in the DOM
> tree.

True.

> So you can't do lazy prototype setup

You still can as long as you don't have any nodes for a given prototype...

> lazy wrapper creation

Indeed.

> Those are big speed and size wins in Gecko ...

We should do some measurements here. Your typical JSObject is 4 words.
DOM JS objects in Gecko have 3 reserved slots. So that's 4 words + 24
bytes == 40 bytes on 32-bit and 56 bytes on 64-bit. Your typical web
page has several thousand nodes, so we're talking a few hundred KB. The
HTML5 spec has ~400k nodes, so there we're talking 20MB difference...

On the other hand, if we always have a JSObject for the node, then we
can maybe fuse the allocation and save one reserved slot (pointing to
the node) and one word in the node (pointing to the JS object). Maybe.
Then we're talking 28 bytes and 40 bytes overhead respectively. And
in practice web pages often touch their nodes from JS anyway, so this
would not be a cost for _every_ node, and for the ones that get touched
we can actually get savings from the fused allocation. I'm pretty sure
we'll still lose on things like the HTML5 spec, though.

The other big loser here will be things like XHR result documents.

The other source of bloat would be the prototype objects. We should
measure how often it happens that a page has nodes that would want a
given prototype but never has to actually instantiate that prototype object.

As far as performance goes, we should just measure it. E.g. in Gecko we
could try just forcing it to create JS wrappers for every node and
seeing how much of a hit there is on things like Tp and other benchmarks.

The other possible concern here is Olli's concern about GC having to
trace through known-live stuff while CC can actually stop traversing the
graph when it hits known-live things, which makes it easier to optimize
out the effect of large DOMs on the CC than on the GC... I'm not sure
what we can do about that.

-Boris

Boris Zbarsky

unread,
Jul 12, 2013, 4:27:22 PM7/12/13
to mozilla-...@lists.mozilla.org
On 7/12/13 4:12 PM, Patrick Walton wrote:
> On 7/12/13 12:43 PM, Josh Matthews wrote:
>> Yes, this is the bed we have made for ourselves in Servo at this point.
>
> Can we just have the JS trace hook for a DOM node recursively search
> through children for wrappers, even those that don't have wrappers?

You can, but how does memory management then work for the nodes with no
wrappers?

> To illustrate, suppose we have this DOM:
>
> O -- 1
> / \
> X X
> / \ \
> X X 2
> |
> 3
> |
> 4

What keeps the "X" that is a child of "2" alive?

-Boris

Patrick Walton

unread,
Jul 12, 2013, 4:31:09 PM7/12/13
to dev-...@lists.mozilla.org
On 7/12/13 1:27 PM, Boris Zbarsky wrote:
> What keeps the "X" that is a child of "2" alive?

Interesting question. I had assumed that X's were kept alive by their
parents. After all, JS will not try to free things that it doesn't know
about (and presumably JS doesn't know about wrapperless nodes at all).

If there is no wrapper created for a node, then the ways that that node
can be destroyed seem limited to me (JS can't remove it, since it can't
see it without first creating a wrapper). But I could be wrong here.

Patrick

Boris Zbarsky

unread,
Jul 12, 2013, 4:35:57 PM7/12/13
to mozilla-...@lists.mozilla.org
On 7/12/13 4:31 PM, Patrick Walton wrote:
> On 7/12/13 1:27 PM, Boris Zbarsky wrote:
>> What keeps the "X" that is a child of "2" alive?
>
> Interesting question. I had assumed that X's were kept alive by their
> parents.

Consider this DOM fragment, disconnected from the document:

X
/ \
1 2

JS is holding a reference to 1 in a global variable.

What keeps 2 alive? What keeps X alive? They both need to remain
alive... What frees X if the page stops referencing 1?

I can write an example of a script that would produce such a situation,
if desired.

> If there is no wrapper created for a node, then the ways that that node
> can be destroyed seem limited to me

Well, it can be (needs to be!) destroyed when no one is referencing it
anymore on the Rust side... how do we detect that?

-Boris

Jack Moffitt

unread,
Jul 12, 2013, 4:40:04 PM7/12/13
to Boris Zbarsky, mozilla-...@lists.mozilla.org
> I can write an example of a script that would produce such a situation, if
> desired.

Yes, please. I would love to have a concrete example.

jack.

Josh Matthews

unread,
Jul 12, 2013, 4:42:35 PM7/12/13
to mozilla-...@lists.mozilla.org
On 07/12/2013 04:06 PM, Patrick Walton wrote:
> On 7/12/13 12:18 PM, Josh Matthews wrote:
>> When you say "very much", I get worried. All codegen'ed DOM code uses @
>> pointers
>
> Does it need to use @ pointers? That seems unfortunate.

At one point I was trying to write bindings code that could deal with
either ~ or @ pointers, because objects like ClientRectList need to hold
on to a list of ClientRect objects, and it objected when ClientRect
created JS wrappers for itself because the ~self value was moved in
doing so. I hit a wall trying to write code that could deal with both
types of pointers, though I forget the specifics, so I gave up after
several days of banging my head against the compiler.

Josh Matthews

unread,
Jul 12, 2013, 4:48:04 PM7/12/13
to mozilla-...@lists.mozilla.org
So in this model, any time we are performing an operation that will
remove a reference to the DOM node, we now need to check for the
presence of a presence of a JS wrapper and potentially run the
finalizer, I guess? It sounds like we would require AbstractNode to be
reference counted now, or I can't figure out a way to determine if all
references to the node are now released.

Nicholas Nethercote

unread,
Jul 12, 2013, 5:15:32 PM7/12/13
to Patrick Walton, dev-...@lists.mozilla.org
> The pwn2own and Pwnium results demonstrate, at least to me, that memory
> safety is still valuable even in the presence of intricate sandboxing. We
> need defense in depth, and Rust's type system provides a strong layer of
> defense for new code.

Yeah. IMO, memory safety is a BFD.

Nick

Boris Zbarsky

unread,
Jul 12, 2013, 5:22:14 PM7/12/13
to mozilla-...@lists.mozilla.org
<!DOCTYPE html>
<div id="foo">
<span>
<span id="bar"></span>
<span id="baz"></span>
</span>
</div>
<script>
// Create wrapper for "1" and store it in a global var
var global = document.getElementById("bar");
// Create wrapper for "2"
document.getElementById("baz");
// Disconnect the subtree from the DOM, without ever creating
// wrapper for the parent of "1" and "2"
document.getElementById("foo").innerHTML = "";
</script>

-Boris

Ehsan Akhgari

unread,
Jul 12, 2013, 5:41:41 PM7/12/13
to Patrick Walton, dev-...@lists.mozilla.org
On 2013-07-12 2:33 PM, Patrick Walton wrote:
> Servo is *also* designed to be amenable to OS sandboxing, so that
> processes compromised via unsafe code or the JIT can be stopped from
> taking over the system. In general, although we don't have fine-grained
> sandboxing today, we try to specify the interface so that we can add
> process-level sandboxing in the future and keep most of the code intact.
> Rust's type system helps a lot here by carefully circumscribing where
> memory can be shared. Single-process shared-nothing message passing
> designs should be able to be readily ported to multi-process designs.

Can you please talk a bit more about what facilities Rust provides for
OS level sandboxing? This seems very interesting.

> Of course, Gecko can do the latter with e10s, but the viability of e10s
> is not certain on desktop, as I understand things (though please correct
> me if I'm wrong).

We have a plan of action right now, and work is under way based on that.
There are no fundamental reasons why e10s on desktop should not be
viable (we have a few other engines as examples of the feasibility of
this approach), but of course it is a huge engineering task.

Cheers,
Ehsan

Patrick Walton

unread,
Jul 12, 2013, 7:45:39 PM7/12/13
to dev-...@lists.mozilla.org
On 7/12/13 1:40 PM, Jack Moffitt wrote:
>> I can write an example of a script that would produce such a situation, if
>> desired.
>
> Yes, please. I would love to have a concrete example.

So I think there are basically three options:

1. Use the JS GC for everything; eat the cost of eagerly creating all
wrappers. As Boris mentioned, maybe this isn't so bad. I would assume
this is what Oilpan is doing in Blink.

2. Use the JS GC for wrapped objects and reference counting for
non-wrapped objects. This assumes there are no cycles between them,
which I believe to be the case (though could be wrong), because strong
references from Rust to DOM nodes should be fairly minimal and acyclic
(the reference to the root, events, maybe others?)

3. Use the JS GC for wrapped objects and unique ownership coupled with
weak pointers for non-wrapped objects. In other words, there is a single
"strong owner" for each node and all other pointers from Rust to the DOM
are weak. (For example, event targets for queued events would be weak
and if the DOM node dies then the event is dropped.) It is not clear to
me that this is feasible; however, it might be.

Perhaps at this point we should go with (1) and measure what the cost of
the wrappers is, as Boris suggested.

Patrick

Patrick Walton

unread,
Jul 12, 2013, 8:25:31 PM7/12/13
to dev-...@lists.mozilla.org
On 7/12/13 4:45 PM, Patrick Walton wrote:
> 1. Use the JS GC for everything; eat the cost of eagerly creating all
> wrappers. As Boris mentioned, maybe this isn't so bad. I would assume
> this is what Oilpan is doing in Blink.

After talking to Terrence from the JS team, apparently with a month or
two of hacking on SpiderMonkey we might be able to allocate Rust objects
of 18 64-bit words or fewer into the JS heap directly. In other words,
fuse the Rust structure with its wrapper. We could spill to the Rust
heap from the wrapper if and only if the object is more than 18 words.

This would be similar to Blink's Oilpan project, but better because we
can take advantage of the compiler to generate the trace hooks instead
of having to manually write trace hooks and trace all the pointers.

Patrick

Robert O'Callahan

unread,
Jul 12, 2013, 8:38:54 PM7/12/13
to Patrick Walton, dev-...@lists.mozilla.org
What would the developer experience be like with this approach? I mean,
what kind of code would developers have to write to declare classes managed
by JS and to declare references to such objects?

Rob
--
Jtehsauts tshaei dS,o n" Wohfy Mdaon yhoaus eanuttehrotraiitny eovni
le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o Whhei csha iids teoa
stiheer :p atroa lsyazye,d 'mYaonu,r "sGients uapr,e tfaokreg iyvoeunr,
'm aotr atnod sgaoy ,h o'mGee.t" uTph eann dt hwea lmka'n? gBoutt uIp
waanndt wyeonut thoo mken.o w *
*

Patrick Walton

unread,
Jul 12, 2013, 8:43:42 PM7/12/13
to rob...@ocallahan.org, dev-...@lists.mozilla.org
On 7/12/13 5:38 PM, Robert O'Callahan wrote:
> What would the developer experience be like with this approach? I mean,
> what kind of code would developers have to write to declare classes
> managed by JS and to declare references to such objects?

I suspect it would be something like:

#[deriving(JsManagable)]
struct MyObject {
field_a: int,
field_b: float,
field_c: Option<JsManaged<OtherObject>>,
...
}

fn f(x: JsManaged<MyObject>) {
... do something with x ...

// make a loop, why not?
x.mutate().field_c = Some(x.clone());
}

fn g(x: &JsManaged<MyObject>) {
... do something with x ...
}

fn main() {
let object = new(JsManaged) MyObject {
field_a: 10,
field_b: 20.0,
field_c: None,
...
};
f(object.clone()); // make a new reference
g(&object); // or don't and just borrow
}

Patrick

Robert O'Callahan

unread,
Jul 12, 2013, 8:48:54 PM7/12/13
to Patrick Walton, dev-...@lists.mozilla.org
On Fri, Jul 12, 2013 at 5:43 PM, Patrick Walton <pwa...@mozilla.com> wrote:

> I suspect it would be something like:
>
> #[deriving(JsManagable)]
> struct MyObject {
> field_a: int,
> field_b: float,
> field_c: Option<JsManaged<OtherObject>>**,
> ...
> }
>

Great, that looks OK.

Does Option<JSManaged<>> compile down to a single machine word with no
overhead for dereferencing?

Robert O'Callahan

unread,
Jul 12, 2013, 8:55:31 PM7/12/13
to Patrick Walton, dev-...@lists.mozilla.org
On Fri, Jul 12, 2013 at 11:33 AM, Patrick Walton <pwa...@mozilla.com>wrote:

>
> * Do we have data showing how many security bugs we could be avoiding in
>> Servo in comparison to Gecko? Is the security benefit truly as valuable
>> if expected performance benefits don't pan out?
>>
>
> We've been talking to some members of the security team (Jesse, Brian). In
> general the main class of security vulnerabilities that Rust offers a layer
> of defense against is memory safety problems in layout, rendering, and
> compositing code. Use-after-free is the big one here, but there are others.
> I'm not in the sg so I can't run the numbers myself, but I am told this
> constitutes a large class of security vulnerabilities.
>

A quick scan suggests that all 34 sec-critical bugs filed against Web Audio
so far are either buffer overflows (array-access-out-of-bounds, basically)
or use-after-free. In many cases the underlying bug is something quite
different, sometimes integer overflows.

Rust and Servo can potentially be pushed further to get additional
interesting security properties, but that requires more research.

Having said that, if we can't get superior performance, it won't fly no
matter what security we get.

Patrick Walton

unread,
Jul 12, 2013, 9:00:54 PM7/12/13
to rob...@ocallahan.org, dev-...@lists.mozilla.org
On 7/12/13 5:48 PM, Robert O'Callahan wrote:
> Does Option<JSManaged<>> compile down to a single machine word with no
> overhead for dereferencing?

The Rust compiler now implements optimizations to compile Option of a
pointer down to a nullable pointer (although I would have to verify that
it indeed works in this case).

I think in most cases there will be some overhead for dereferencing, due
to the write barrier required by a generational/incremental GC. There is
also a write barrier needed by Rust for ensuring soundness of mutation
(see the "Imagine Never Hearing the Phrase 'Aliasable, Mutable' Again"
blog post for details).

In the current design reads need a barrier too, something that I just
realized -- I should probably talk this over with Niko and pick some
approach to fixing it in the Rust compiler. (There are various
approaches; probably the easiest one is something like a ReadPtr trait
known to the compiler.)

In general the overhead for writes and borrows will probably be only a
couple of instructions over a dereference operation. Reads should be
unbarriered, although I think we need a little more design to make that
happen in the current language.

Patrick

Robert O'Callahan

unread,
Jul 12, 2013, 9:02:07 PM7/12/13
to Patrick Walton, dev-...@lists.mozilla.org
On Fri, Jul 12, 2013 at 6:00 PM, Patrick Walton <pwa...@mozilla.com> wrote:

> In general the overhead for writes and borrows will probably be only a
> couple of instructions over a dereference operation. Reads should be
> unbarriered, although I think we need a little more design to make that
> happen in the current language.
>

That sounds good, thanks.

Robert O'Callahan

unread,
Jul 12, 2013, 9:05:29 PM7/12/13
to Patrick Walton, dev-...@lists.mozilla.org
On Fri, Jul 12, 2013 at 5:55 PM, Robert O'Callahan <rob...@ocallahan.org>wrote:

> A quick scan suggests that all 34 sec-critical bugs filed against Web
> Audio so far are either buffer overflows (array-access-out-of-bounds,
> basically) or use-after-free. In many cases the underlying bug is something
> quite different, sometimes integer overflows.
>

There are 4 sec-high bugs --- DOS with a null-pointer-deref, and a few bugs
reading uninitialized memory. The latter would be prevented by Rust, and
the former would be mitigated to the extent Servo uses the fine-grained
isolation Rust offers.

There are no sec-low bugs.

Web Audio is an example of a feature which has very little security impact
of its own. Its security impact is entirely due to bugs where violation of
language rules can trigger arbitrary behavior. Rust prevents such bugs. A
lot of Web features are in this category.

Boris Zbarsky

unread,
Jul 12, 2013, 10:09:46 PM7/12/13
to mozilla-...@lists.mozilla.org
On 7/12/13 7:45 PM, Patrick Walton wrote:
> 2. Use the JS GC for wrapped objects and reference counting for
> non-wrapped objects. This assumes there are no cycles between them,
> which I believe to be the case (though could be wrong)

If we can have both wrapped and unwrapped DOM nodes, I don't see how we
can have no cycles between the two...

> 3. Use the JS GC for wrapped objects and unique ownership coupled with
> weak pointers for non-wrapped objects. In other words, there is a single
> "strong owner" for each node and all other pointers from Rust to the DOM
> are weak. (For example, event targets for queued events would be weak
> and if the DOM node dies then the event is dropped.)

I rather doubt this is web-compatible... Consider this simple testcase:

(function foo() {
var img = new Image();
img.addEventListener("load", function(e) { window.x = e; });
img.src = whatever;
})();

The event needs to fire. Futhermore, once it's fired window.x.target
needs to be the DOM node... I suppose the DOM event could trace the DOM
node and the image load could be the unique "strong owner", but this
seems fragile...

Option 4 is to go with a GC and CC setup and refcounting on the Rust
side, I guess.

-Boris

Boris Zbarsky

unread,
Jul 12, 2013, 10:11:40 PM7/12/13
to mozilla-...@lists.mozilla.org
On 7/12/13 8:25 PM, Patrick Walton wrote:
> On 7/12/13 4:45 PM, Patrick Walton wrote:
>> 1. Use the JS GC for everything; eat the cost of eagerly creating all
>> wrappers. As Boris mentioned, maybe this isn't so bad. I would assume
>> this is what Oilpan is doing in Blink.
>
> After talking to Terrence from the JS team, apparently with a month or
> two of hacking on SpiderMonkey we might be able to allocate Rust objects
> of 18 64-bit words or fewer into the JS heap directly.

Hmm. 18 64-bit words is enough for a basic element, I'd think, though
with the copy/on-write setup we would need to measure carefully.
Subclasses might need to heap-allocate some of their members, though,
which is a bit of a pain.

-Boris

Boris Zbarsky

unread,
Jul 12, 2013, 10:15:49 PM7/12/13
to mozilla-...@lists.mozilla.org
On 7/12/13 10:09 PM, Boris Zbarsky wrote:
> Option 4 is to go with a GC and CC setup and refcounting on the Rust
> side, I guess.

Though the developer experience of CC is not that great, to be honest.

What would implenentation of a DOM object that ends up not fitting in 18
64-bit words look like? I assume the compiler would prevent people
doing that by accident, right?

Also, I assume that when we say "DOM object" here we mean not "a Node"
but "anything reflected into JS"?

-Boris

Patrick Walton

unread,
Jul 13, 2013, 12:07:28 AM7/13/13
to dev-...@lists.mozilla.org
On 7/12/13 7:11 PM, Boris Zbarsky wrote:
> Hmm. 18 64-bit words is enough for a basic element, I'd think, though
> with the copy/on-write setup we would need to measure carefully.
> Subclasses might need to heap-allocate some of their members, though,
> which is a bit of a pain.

Right; unique pointers should make that fairly painless in Rust though.

Patrick

Patrick Walton

unread,
Jul 13, 2013, 12:10:32 AM7/13/13
to dev-...@lists.mozilla.org
On 7/12/13 7:09 PM, Boris Zbarsky wrote:
> On 7/12/13 7:45 PM, Patrick Walton wrote:
>> 2. Use the JS GC for wrapped objects and reference counting for
>> non-wrapped objects. This assumes there are no cycles between them,
>> which I believe to be the case (though could be wrong)
>
> If we can have both wrapped and unwrapped DOM nodes, I don't see how we
> can have no cycles between the two...

Yeah, you're right. We would need some sort of strongly-connected-cycle
detector, either ad hoc or a cycle collector. Neither sounds appealing,
but especially the former. Probably the best thing to do in this case
would be RC+CC, like Gecko does, but compiler-assisted.

At this point I'm most inclined to try to implement the
allocate-in-the-JS-GC heap strategy, and see how far that gets us. If
that isn't feasible, we can try wrapping all DOM nodes and then measure
the overhead (actually, this is what we're doing now, so this is more
easily testable). If none of those pan out, then we can investigate
RC+CC schemes.

Patrick

David Herman

unread,
Jul 13, 2013, 1:51:36 AM7/13/13
to Patrick Walton, dev-...@lists.mozilla.org
On Jul 12, 2013, at 11:33 AM, Patrick Walton <pwa...@mozilla.com> wrote:

>> * Could we "freeze" background pages (on a language level?) and unload
>> them in order to conserve memory and avoid extra processing work, with
>> the expectation of resurrecting them later as needed?
>
> Rust does have good support for serialization of object graphs, so this is potentially feasible. However, the JS engine is a question mark here: probably work would need to be done to serialize the JS state of a page.

Note that this would be a pretty killer feature in general. It would enable things like "stashing" the state of a running app to disk, extremely fine-grained sync'ing and restoring of running apps across devices, etc.

Dave

David Bruant

unread,
Jul 13, 2013, 2:18:18 PM7/13/13
to
Correct me if I misunderstood what you described, but I believe an equivalent work is undergoing on top of V8 [1] (not an official Google project, though)
A student of Tom Van Cutsem is working on that [2]. I believe Mark Miller could be a point of contact too for that work.

David

[1] https://github.com/supergillis/v8-ken/
[2] http://www.eros-os.org/pipermail/cap-talk/2012-December/015585.html
0 new messages