| Table-less string interning | Keegan McAllister | 4/23/14 4:38 PM | I opened this as a GitHub issue (https://github.com/mozilla/servo/issues/2217) but I thought I'd mention it here as well.
The traditional approach for string interning is: * Interned strings are pointers * To intern a string, look in a hash table mapping strings to pointers * If it's not found, allocate a copy of the string owned by the hash table, and insert it Unfortunately this hash table needs to be shared between all threads that are creating interned strings which might be compared to each other. If we are parsing CSS in parallel and two threads independently intern the same class name, we will incorrectly interpret the class names as not equal. There's been much discussion (#1153, #1135) about what kind of concurrent hash table we should use for this. Here is an alternative that does not require such a table: * Interned strings are 128 bits * To intern a string of 16 bytes or fewer, just store it "as is" * To intern a string of more than 16 bytes, hash it using a 128-bit cryptographic hash function and store that. If the hash function is not broken, we can assume collisions don't happen. To prevent an attacker-chosen immediate string from colliding with a known hash value, set the top two bits of the first hash byte to `10`, which makes the first byte a UTF-8 continuation byte. Also use hashing for any string containing `NULL`, so that immediate strings don't need to include their length. So we can intern strings and compare them for equality without any shared state. But there's a third operation we need to support: getting the original characters of an interned string. For this we do need to build a table mapping hash values to strings. But each thread can build this table independently, since they will always get the same result. We can merge them later, in parallel with selector matching etc. Only when we need to read out original strings (for CSSOM?) do we block on construction of the merged table. And we don't need it at all for strings ≤ 16 bytes. So the things to investigate here are: * How much does a 128-bit compare cost, vs. 64 bits for a pointer? * How much does bloating the DOM cost, in terms of memory usage and cache pressure? * Can we make the cryptographic hash fast enough? On my machine, SHA-256 of a 64-byte string takes about 500 ns (`openssl speed sha256`). Rust already uses SipHash, which is supposed to be fast and crypto-strength. But we would need to run it twice to get 128 bits; 64 bits is not enough to assume collisions don't happen. * How much does it cost to build and merge the hash → string tables? We can study some of this using standalone programs before testing this approach in Servo. I should also note that we can do the small-string optimization even when the non-optimized case is a pointer. But in that case we'll only optimize 4- or 8-byte strings. keegan |
| Re: Table-less string interning | Robert O'Callahan | 4/23/14 5:35 PM | It seems like it would be a good idea to instrument Gecko to gather a trace
of intern operations from regular Web browsing that you can run against test programs. Making the interned atom for <= 8-char or 4-char ASCII strings be the string itself is a really interesting idea. Another idea I like is to introduce a separate immutable prepopulated (with tag names, attribute values, CSS properties and keywords, known-to-be popular values, etc) table that uses the same hash values as the mutable concurrent table (so we don't have to compute the hash value twice) and use tag bits to identify the table an intern value belongs to. Seems like combining those ideas might get good results without being complex. Rob -- Jtehsauts tshaei dS,o n" Wohfy Mdaon yhoaus eanuttehrotraiitny eovni le atrhtohu gthot sf oirng iyvoeu rs ihnesa.r"t sS?o Whhei csha iids teoa stiheer :p atroa lsyazye,d 'mYaonu,r "sGients uapr,e tfaokreg iyvoeunr, 'm aotr atnod sgaoy ,h o'mGee.t" uTph eann dt hwea lmka'n? gBoutt uIp waanndt wyeonut thoo mken.o w |
| Re: Table-less string interning | Keegan McAllister | 4/23/14 5:51 PM | > Another idea I like is to introduce a separate immutable prepopulated (with tag names, attribute values, CSS properties and keywords, known-to-be popular values, etc) table that uses the same hash values as the mutable concurrent table (so we don't have to compute the hash value twice)That sounds good. Right now I'm putting together static interning in the HTML parser using rust-phf [1], but converging on the same hash as the dynamic table definitely makes sense. If an interned string is a pointer to a canonical copy of the string, then most operations won't care whether it's a pointer to static or dynamic memory. And if we do need to distinguish (for refcounting or something), we can look at the memory map. keegan [1] https://github.com/sfackler/rust-phf |
| Re: Table-less string interning | Brian Anderson | 4/23/14 5:53 PM | On 04/23/2014 04:38 PM, Keegan McAllister wrote: > Rust already uses SipHash, which is supposed to be fast and crypto-strength. But we would need to run it twice to get 128 bits; 64 bits is not enough to assume collisions don't happen.I'm not sure how relevant this is, but Rust does use 64-bit hashes for versioning and assumes collisions don't happen. We're assuming nobody is going to be attacking Rust symbols though, so it doesn't need to be truly 'secure'. On the other hand we also use 64-bit hashes for type ID's, and that is something somebody might want to forge for sure (though we don't use dynamic typing for much at all in Rust). |
| Re: Table-less string interning | Boris Zbarsky | 4/23/14 6:02 PM | On 4/23/14, 7:38 PM, Keegan McAllister wrote:Are we more or less assuming we're on 64-bit systems? My memory is that the 128-bit IID compares we end up doing in QueryInterface sorta suck, but we do a bunch of them at a time, which can't help.... Measuring is probably the right plan here, but need to make sure to do it on a variety of hardware. Note that in Gecko the DOM does not store too much in the way of atoms. For example, element tag names are stored in nodeinfo structs (shared across all elements with the same document/localname/namespace/prefix), not in the element themselves. We do store an atom-or-nodeinfo in nsAttrName in Gecko to avoid the extra indirection in the super-common case of an attribute in the null namespace, but obviously one could always use nodeinfo here. So I suspect memory usage is not a huge issue in practice. Cache pressure is a much bigger deal, unfortunately. Selector matching in particular is not all that cache-friendly in Gecko.... We should see how often we end up atomizing things that are over 16 chars. I guess a fairly large number of IDs and classes might in fact be there... -Boris |
| Re: Table-less string interning | Boris Zbarsky | 4/23/14 6:06 PM | On 4/23/14, 8:53 PM, Brian Anderson wrote:Oh, right, that's the other worry. We've had security issues in the past due to things like <input type="fİle"> being treated as a file input by some parts of the system but not others. So anything that allows collisions between user-provided things and built-in atoms for attribute names and values is bad. On the bright side, it may be possible to enforce that all built-in atoms are under the 16 char limit and hence not susceptible to collisions. -Boris |
| Re: Table-less string interning | Patrick Walton | 4/23/14 6:26 PM | Disclaimer: I am not a cryptographer; please correct me mercilessly.
A collision in a widely used cryptographic hash would be a major, publishable security advance. SipHash was concerned with attacks that could somehow exploit the properties of `hash(message) % hash_table_size`, but we're taking the whole hash here. So I don't think that would be an issue. (We could mix in a secret, per-browser-session nonce if we were really paranoid, but since collisions are a death sentence for cryptographic hashes in the first place it's likely overkill.) I'm more concerned about the potential impact of a large hash on memory bandwidth (e.g. if we want to match multiple selectors in parallel using SIMD), but I haven't had a whole lot of luck with techniques that would benefit from that anyhow. So I'm totally fine with this move, assuming its performance characteristics are otherwise favorable; best not to hold up things that we know *do* provide benefits in favor of nebulous performance concerns for algorithms we don't know how to make work in the first place. As an added note, looks like Skylake is going to have SHA instructions [1]. Based on my experience with AES-NI I wouldn't be surprised if using these instructions ends up faster than interning in software anyway. Patrick [1]: http://en.wikipedia.org/wiki/Intel_SHA_extensions |
| Re: Table-less string interning | Keegan McAllister | 4/24/14 2:07 PM | > A collision in a widely used cryptographic hash would be a major,That's true for SHA-256 itself. But I ran the numbers, and it turns out brute force search for a collision on a 128-bit hash (even with no algorithmic weakness) is feasible. I wrote more on the GitHub ticket. I think the per-session random secret would fix this issue, but I will think about it more. And it will slow things down at least some. (cc:ing some people I discussed this with on IRC) Yeah, that's one reason to pursue this approach. If we can access crypto accelerators on mobile, it might be an even bigger win there. We could even support SHA interning alongside table-based interning, and select one based on the availability of crypto acceleration. keegan |
| Re: Table-less string interning | Keegan McAllister | 4/28/14 3:02 PM | Ultimately I'm sort of pessimistic about table-less interning, because it proposes to speed up parsing at the expense of layout. I think that's a tradeoff we wouldn't want to make unless the cost/benefit ratio is really favorable.
Storing small strings immediate and using a table for the rest seems promising, though. I also liked the idea of using Gecko to gather information about interned strings. keegan _______________________________________________ dev-servo mailing list dev-...@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-servo |
| Re: Table-less string interning | Boris Zbarsky | 4/29/14 1:26 AM | On 4/28/14, 6:02 PM, Keegan McAllister wrote:A thought. If the goal is to speed up parsing in a separate task by avoiding contention on a table, have we considered a setup where we do part of the work of interning (e.g. computing the hashcode) in the parsing task but then leaving things in that state and finishing up the interning when we ship the data from the parsing task to the task that ends up using the data? This is similar to the approach the JS engine is using for its off-main-thread parsing in Gecko: there's a fixup step when the parsed representation is shipped back to the thread that requested the parse. One issue there is that layout and dom might to share the table, depending on where we do the "finish up" step of CSS parsing. Is the plan for that to be in the layout task or the DOM task, or sometimes one and sometimes the other? Agreed. I'm happy to write up some patches here if people know what info we want to gather. -Boris |