Table-less string interning

Keegan McAllister

unread,

Apr 23, 2014, 7:38:14 PM4/23/14

to dev-...@lists.mozilla.org

I opened this as a GitHub issue (https://github.com/mozilla/servo/issues/2217) but I thought I'd mention it here as well.

The traditional approach for string interning is:

* Interned strings are pointers
* To intern a string, look in a hash table mapping strings to pointers
* If it's not found, allocate a copy of the string owned by the hash table, and insert it

Unfortunately this hash table needs to be shared between all threads that are creating interned strings which might be compared to each other. If we are parsing CSS in parallel and two threads independently intern the same class name, we will incorrectly interpret the class names as not equal.

There's been much discussion (#1153, #1135) about what kind of concurrent hash table we should use for this.

Here is an alternative that does not require such a table:

* Interned strings are 128 bits
* To intern a string of 16 bytes or fewer, just store it "as is"
* To intern a string of more than 16 bytes, hash it using a 128-bit cryptographic hash function and store that. If the hash function is not broken, we can assume collisions don't happen.

To prevent an attacker-chosen immediate string from colliding with a known hash value, set the top two bits of the first hash byte to `10`, which makes the first byte a UTF-8 continuation byte. Also use hashing for any string containing `NULL`, so that immediate strings don't need to include their length.

So we can intern strings and compare them for equality without any shared state. But there's a third operation we need to support: getting the original characters of an interned string. For this we do need to build a table mapping hash values to strings. But each thread can build this table independently, since they will always get the same result. We can merge them later, in parallel with selector matching etc. Only when we need to read out original strings (for CSSOM?) do we block on construction of the merged table. And we don't need it at all for strings ≤ 16 bytes.

So the things to investigate here are:

* How much does a 128-bit compare cost, vs. 64 bits for a pointer?
* How much does bloating the DOM cost, in terms of memory usage and cache pressure?
* Can we make the cryptographic hash fast enough? On my machine, SHA-256 of a 64-byte string takes about 500 ns (`openssl speed sha256`). Rust already uses SipHash, which is supposed to be fast and crypto-strength. But we would need to run it twice to get 128 bits; 64 bits is not enough to assume collisions don't happen.
* How much does it cost to build and merge the hash → string tables?

We can study some of this using standalone programs before testing this approach in Servo.

I should also note that we can do the small-string optimization even when the non-optimized case is a pointer. But in that case we'll only optimize 4- or 8-byte strings.

keegan

Robert O'Callahan

unread,

Apr 23, 2014, 8:35:29 PM4/23/14

to Keegan McAllister, dev-...@lists.mozilla.org

It seems like it would be a good idea to instrument Gecko to gather a trace
of intern operations from regular Web browsing that you can run against
test programs.

Making the interned atom for <= 8-char or 4-char ASCII strings be the
string itself is a really interesting idea.

Another idea I like is to introduce a separate immutable prepopulated (with
tag names, attribute values, CSS properties and keywords, known-to-be
popular values, etc) table that uses the same hash values as the mutable
concurrent table (so we don't have to compute the hash value twice) and use
tag bits to identify the table an intern value belongs to.

Seems like combining those ideas might get good results without being
complex.

Rob
--
Jtehsauts tshaei dS,o n" Wohfy Mdaon yhoaus eanuttehrotraiitny eovni
le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o Whhei csha iids teoa
stiheer :p atroa lsyazye,d 'mYaonu,r "sGients uapr,e tfaokreg iyvoeunr,
'm aotr atnod sgaoy ,h o'mGee.t" uTph eann dt hwea lmka'n? gBoutt uIp
waanndt wyeonut thoo mken.o w

Keegan McAllister

unread,

Apr 23, 2014, 8:51:10 PM4/23/14

to rob...@ocallahan.org, dev-...@lists.mozilla.org

> Another idea I like is to introduce a separate immutable prepopulated (with tag names, attribute values, CSS properties and keywords, known-to-be popular values, etc) table that uses the same hash values as the mutable concurrent table (so we don't have to compute the hash value twice)

That sounds good. Right now I'm putting together static interning in the HTML parser using rust-phf [1], but converging on the same hash as the dynamic table definitely makes sense.

> and use tag bits to identify the table an intern value belongs to

If an interned string is a pointer to a canonical copy of the string, then most operations won't care whether it's a pointer to static or dynamic memory. And if we do need to distinguish (for refcounting or something), we can look at the memory map.

keegan

[1] https://github.com/sfackler/rust-phf

Brian Anderson

unread,

Apr 23, 2014, 8:53:15 PM4/23/14

to dev-...@lists.mozilla.org

On 04/23/2014 04:38 PM, Keegan McAllister wrote:
> I opened this as a GitHub issue (https://github.com/mozilla/servo/issues/2217) but I thought I'd mention it here as well.
>

> Rust already uses SipHash, which is supposed to be fast and crypto-strength. But we would need to run it twice to get 128 bits; 64 bits is not enough to assume collisions don't happen.

I'm not sure how relevant this is, but Rust does use 64-bit hashes for
versioning and assumes collisions don't happen. We're assuming nobody is
going to be attacking Rust symbols though, so it doesn't need to be
truly 'secure'. On the other hand we also use 64-bit hashes for type
ID's, and that is something somebody might want to forge for sure
(though we don't use dynamic typing for much at all in Rust).

Boris Zbarsky

unread,

Apr 23, 2014, 9:02:45 PM4/23/14

to mozilla-...@lists.mozilla.org

On 4/23/14, 7:38 PM, Keegan McAllister wrote:
> * How much does a 128-bit compare cost, vs. 64 bits for a pointer?

Are we more or less assuming we're on 64-bit systems?

My memory is that the 128-bit IID compares we end up doing in
QueryInterface sorta suck, but we do a bunch of them at a time, which
can't help....

Measuring is probably the right plan here, but need to make sure to do
it on a variety of hardware.

> * How much does bloating the DOM cost, in terms of memory usage and cache pressure?

Note that in Gecko the DOM does not store too much in the way of atoms.
For example, element tag names are stored in nodeinfo structs (shared
across all elements with the same document/localname/namespace/prefix),
not in the element themselves.

We do store an atom-or-nodeinfo in nsAttrName in Gecko to avoid the
extra indirection in the super-common case of an attribute in the null
namespace, but obviously one could always use nodeinfo here.

So I suspect memory usage is not a huge issue in practice.

Cache pressure is a much bigger deal, unfortunately. Selector matching
in particular is not all that cache-friendly in Gecko....

> * Can we make the cryptographic hash fast enough?

We should see how often we end up atomizing things that are over 16 chars.

I guess a fairly large number of IDs and classes might in fact be there...

-Boris

Boris Zbarsky

unread,

Apr 23, 2014, 9:06:19 PM4/23/14

to mozilla-...@lists.mozilla.org

On 4/23/14, 8:53 PM, Brian Anderson wrote:
> We're assuming nobody is going to be attacking Rust symbols though

Oh, right, that's the other worry.

We've had security issues in the past due to things like <input
type="fİle"> being treated as a file input by some parts of the system
but not others. So anything that allows collisions between
user-provided things and built-in atoms for attribute names and values
is bad.

On the bright side, it may be possible to enforce that all built-in
atoms are under the 16 char limit and hence not susceptible to collisions.

-Boris

Patrick Walton

unread,

Apr 23, 2014, 9:26:38 PM4/23/14

to dev-...@lists.mozilla.org

Disclaimer: I am not a cryptographer; please correct me mercilessly.

A collision in a widely used cryptographic hash would be a major,
publishable security advance. SipHash was concerned with attacks that
could somehow exploit the properties of `hash(message) %
hash_table_size`, but we're taking the whole hash here. So I don't think
that would be an issue. (We could mix in a secret, per-browser-session
nonce if we were really paranoid, but since collisions are a death
sentence for cryptographic hashes in the first place it's likely overkill.)

I'm more concerned about the potential impact of a large hash on memory
bandwidth (e.g. if we want to match multiple selectors in parallel using
SIMD), but I haven't had a whole lot of luck with techniques that would
benefit from that anyhow. So I'm totally fine with this move, assuming
its performance characteristics are otherwise favorable; best not to
hold up things that we know *do* provide benefits in favor of nebulous
performance concerns for algorithms we don't know how to make work in
the first place.

As an added note, looks like Skylake is going to have SHA instructions
[1]. Based on my experience with AES-NI I wouldn't be surprised if using
these instructions ends up faster than interning in software anyway.

Patrick

[1]: http://en.wikipedia.org/wiki/Intel_SHA_extensions

Keegan McAllister

unread,

Apr 24, 2014, 5:07:38 PM4/24/14

to Patrick Walton, danie...@gmail.com, dev-...@lists.mozilla.org, edy....@gmail.com

> A collision in a widely used cryptographic hash would be a major,
> publishable security advance.

That's true for SHA-256 itself. But I ran the numbers, and it turns out brute force search for a collision on a 128-bit hash (even with no algorithmic weakness) is feasible. I wrote more on the GitHub ticket.

I think the per-session random secret would fix this issue, but I will think about it more. And it will slow things down at least some.

(cc:ing some people I discussed this with on IRC)

> As an added note, looks like Skylake is going to have SHA instructions

Yeah, that's one reason to pursue this approach. If we can access crypto accelerators on mobile, it might be an even bigger win there. We could even support SHA interning alongside table-based interning, and select one based on the availability of crypto acceleration.

keegan

Keegan McAllister

unread,

Apr 28, 2014, 6:02:45 PM4/28/14

to Patrick Walton, danie...@gmail.com, dev-...@lists.mozilla.org, edy burt

Ultimately I'm sort of pessimistic about table-less interning, because it proposes to speed up parsing at the expense of layout. I think that's a tradeoff we wouldn't want to make unless the cost/benefit ratio is really favorable.

Storing small strings immediate and using a table for the rest seems promising, though.

I also liked the idea of using Gecko to gather information about interned strings.

keegan

----- Original Message -----
From: "Keegan McAllister" <kmcal...@mozilla.com>
To: "Patrick Walton" <pcwa...@mozilla.com>
Cc: danie...@gmail.com, dev-...@lists.mozilla.org, "edy burt" <edy....@gmail.com>
Sent: Thursday, April 24, 2014 2:07:38 PM
Subject: Re: [dev-servo] Table-less string interning

_______________________________________________
dev-servo mailing list
dev-...@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-servo

Boris Zbarsky

unread,

Apr 29, 2014, 4:26:04 AM4/29/14

to mozilla-...@lists.mozilla.org

On 4/28/14, 6:02 PM, Keegan McAllister wrote:
> Ultimately I'm sort of pessimistic about table-less interning, because it proposes to speed up parsing at the expense of layout.

A thought.

If the goal is to speed up parsing in a separate task by avoiding
contention on a table, have we considered a setup where we do part of
the work of interning (e.g. computing the hashcode) in the parsing task
but then leaving things in that state and finishing up the interning
when we ship the data from the parsing task to the task that ends up
using the data?

This is similar to the approach the JS engine is using for its
off-main-thread parsing in Gecko: there's a fixup step when the parsed
representation is shipped back to the thread that requested the parse.

One issue there is that layout and dom might to share the table,
depending on where we do the "finish up" step of CSS parsing. Is the
plan for that to be in the layout task or the DOM task, or sometimes one
and sometimes the other?

> Storing small strings immediate and using a table for the rest seems promising, though.

Agreed.

> I also liked the idea of using Gecko to gather information about interned strings.

I'm happy to write up some patches here if people know what info we want
to gather.

-Boris