External UTF-8 String

48 views
Skip to first unread message

Tekman

unread,
Jan 12, 2021, 8:32:40 PM1/12/21
to v8-users
Hi,

What is the most efficient way to compile a (large) script that is being passed in as an UTF-8 char buffer?

The ExternalOwningOneByteStringResource is really nice for ASCII buffers, but it looks like for UTF-8 we need to copy the entire contents and covert with NewFromUtf8. It also looks like this memory (converted script source) sticks around for the lifetime of the Isolate.

For really large bundles, it's a waste to keep 2 copies of the script source in memory; are there any optimizations / alternatives to consider?

How feasible would it be to create something like ExternalOwningUTF8StringResource?

Thank you!

Alex Kodat

unread,
Jan 12, 2021, 8:57:21 PM1/12/21
to v8-users
I think the string source needs to be Unicode (or Latin-1) to easily convert from character offsets to byte offsets. Obviously, that's not possible with UTF-8. If you're really concerned about the memory, you could get rid of the original UTF-8 string once it's been converted to Unicode or convert the UTF-8 to Unicode yourself (in an external string) and then get rid of the UTF-8. But even saving both UTF-8 and Unicode copies of the script, I suspect the compilation data (byte-code, optimized code, objects, constants, property names, function templates, etc.) will end up taking more memory than your two copies of the source. Plus, memory's ubiquitous and cheap these days. So don't worry, be happy.

That said, if you're really keen on saving memory you could always use a minimizer but IMO that's valuing memory more than people.   

HTH

Ben Noordhuis

unread,
Jan 13, 2021, 2:28:39 AM1/13/21
to v8-users
There is no way around the conversion. One way or another, you're
going to have two copies, because V8 doesn't use UTF-8 internally. JS
strings are either one-byte or two-byte. V8 does the UTF-8 conversion
at the edges.

The way I solved it in Node.js is by storing the built-in scripts as a
ExternalOneByteStringResource when they're Latin-1, and
ExternalStringResource (two-byte) otherwise.

It's not great for big scripts because approximately half of that
two-byte memory is zero, but on the flip side, it's in read-only
memory and doesn't count towards the JS heap limit.

If you're bundling scripts into your executable and you're worried
about executable size, one approach is to store the sources compressed
and decompress-and-stream them using ScriptCompiler::StartStreaming().
The uncompressed script source stays around for the lifetime of the
script, however.
Reply all
Reply to author
Forward
0 new messages