Sure thing, I've opened
12668 - isolate script cache stores streamed scripts incorrectly - v8 (chromium.org). However, I think I was incorrect earlier: I didn't figure out
the problem, just
a problem. The isolate script cache does a very nice job of deduplicating SharedFunctionInfos when a bunch of tabs are opened quickly, but that's not a very realistic customer behavior. Customers are more likely to interact with a tab for a while, causing plenty of allocations and GC cycles, before opening another tab, so ageing is a problem in this scenario. I completely understand that we should clear out the root SFI for a script after a while because it's unlikely to be used again. However, other SFIs for functions defined within the script may stay alive much longer. I think our main problem arises from the fact that when generating a new root SFI, V8 attaches it to a new Script instance, meaning SFIs for nested functions end up getting duplicated. I have a proposal that I think would help; what do you think of the following?
When an entry in the isolate script cache becomes old, instead of deleting it, we could replace the value with a weak pointer to the Script and update the source pointer in the key to be weak too. That way, the entry would not retain any large objects. When the cache needs resizing, we could iterate the table and delete any entries that point to cleared weak pointers.
If the cache lookup returns a Script, that means the calling code must create a new root SFI but attach it to the existing Script. I imagine this would require some additional plumbing through the various code paths (synchronous parse, finalizing a streaming parse, and finalizing a code cache deserialization), but I think that the additional complexity would be worthwhile for the result of fully reliable deduplication. In places where the parsing or deserialization code currently creates a new SFI other than the root, that code would first need to check whether the Script already has one. If the SFI already exists but has been flushed, the parsing or deserialization code should attach new bytecode to it.
This proposal is only concerning the script cache, not the eval or regexp caches.
Looking forward to hearing your thoughts. Thanks!