Comment #23 on issue 2869 by a...@
adamhooper.com: Substring of huge string retains huge string in memory
https://bugs.chromium.org/p/v8/issues/detail?id=2869#c23I keep hearing of people running into this issue, so I'll brainstorm some ideas:
1. Modify the language specification so this bug becomes a feature
In Go, the leak is built in to the language specification: Slices explicitly reference their underlying Arrays. Go developers can easily understand the problem, and the solution is intuitive. The problem and solution are documented in a couple of paragraphs at
https://blog.golang.org/go-slices-usage-and-internals.
Here's that idea applied to JavaScript. The JavaScript spec would document that a String consists of a "backing store" and "slice". Each String method specifies whether it reuses an argument's "backing store." And finally, there would be a force-copy function so users had a clear workaround -- for instance, "`new String()` always creates a new backing store."
2. Stop slicing
In OpenJDK, Java Strings used to share a backing store. In 2012, they simply ... stopped. If I'm reading correctly, the OpenJDK community determined that the harm of shared backing stores outweighed the good.
http://mail.openjdk.java.net/pipermail/core-libs-dev/2012-May/010257.htmlShould V8 do the same? Are shared backing stores doing more harm than good, as they did in OpenJDK? (Perhaps shared backing stores are better on 32-bit architectures than on 64-bit architectures, and the whole idea is now an expensive relic?)
3. Use two kinds of string.
Rust is explicit about `str` being a slice ("borrowing" from a backing store) and `String` being not-a-slice:
https://doc.rust-lang.org/std/primitive.str.html https://doc.rust-lang.org/std/string/struct.String.html This makes it easy to reason about what objects consume memory.
C++17 has a similar dichotomy: `std::string` is akin to a Rust `String` ("backing store + length"), and the new `std::string_view` is akin to a Rust `str` ("[borrowed] pointer + length").
(Yes, Go Slices are kinda the same thing. But Rust and C++ aren't garbage-collected, so Rust/C++ developers think in terms of "ownership" and "borrowing".)
JavaScript could do something similar, and there's already an opening: JavaScript already _has_ two kinds of string! Maybe "primitive string" could be a "slice" and "String" could be a "not-a-slice", or maybe vice-versa. (This probably can't be achieved without big changes to the language spec ... but cut me some slack -- I'm brainstorming here :).)
...
Personally, I'm really curious about my second idea. OpenJDK made the switch; why shouldn't V8?