Hi Conrad,
I'll make an attempt at answering, though I'm not an expert on OSR, so others like Jakob or Mythri may have more precise answers.
1. Why would this property access be polymorphic?
If you're talking about polymorphic access, you're probably familiar with hidden classes (also called "object shapes" or "Maps but not
those Maps"). Regardless, I'll include a link: the most complete and accurate description I've found of how hidden classes work in V8 is
Fast properties in V8.
In this example, the object {a: 0} has a hidden class which says it has one property named "a". The object {a: 0, b:1} has a different hidden class which says it has two properties, named "a" and "b". After the function getA has performed a lookup to get property "a" from {a: 0}, it keeps a pointer to that object's hidden class and another value indicating where the property can be found (for this case, the first in-object property slot). When running getA({a:0, b:1}), V8 checks whether the object {a:0, b:1} has the same hidden class as {a: 0}, which it does not. So V8 remembers a second pair of values: the hidden class for {a:0, b:1} and where its "a" property can be found (also the first in-object property slot). Now the feedback state for that load operation is polymorphic. The "mono" in monomorphic refers to a single hidden class, not to a single result of where the property can be found.
This behavior tends to be particularly problematic in codebases with a lot of class inheritance, because loading a field defined by a base class is often a megamorphic operation, even if that base class's constructor always sets the same properties in the same order.
2. Why would polymorphic code optimized by turbofan be a full 3x slower than unoptimized bytecode?
It seems that you may be misunderstanding the somewhat cryptic output from --trace-opt. In particular, OSR means
on-stack replacement. Copying some text from that link:
When V8 identifies that certain functions are hot it marks them for optimization on the next call. When the function executes again, V8 compiles the function using the optimizing compiler and starts using the optimized code from the subsequent call. However, for functions with long running loops this is not sufficient. V8 uses a technique called on-stack replacement (OSR) to install optimized code for the currently executing function. This allows us to start using the optimized code during the first execution of the function, while it is stuck in a hot loop.
Iterating through an array of 100 million items certainly counts as a "hot loop", so the vast majority of the time in all of your measurements is spent in optimized code produced by Turbofan, not in the interpreter. You can try running unoptimized code by passing the command-line flag --no-opt, which I expect will go much more slowly than what you've measured thus far. I've added some possibly more human-readable annotations to the output you provided:
# This call started in the interpreter, but was replaced by optimized code while running (this process is referred to as OSR).
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) using TurboFan OSR]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) - took 0.000, 0.541, 0.000 ms]
array1: 115.701ms
# This call reused the OSR code from the first call.
[found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35]
array1: 113.721ms
# At this point, the function got compiled normally (not using OSR), so future calls will use this optimized code.
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) using TurboFan]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) - took 0.000, 0.500, 0.041 ms]
# These three calls used fully optimized code.
array1: 80.069ms
array1: 79.72ms
array1: 79.245ms
# This call mostly used optimized code, until it bailed out to the interpreter for the last four items.
array2: 78.906ms
# This call reused the OSR code from the very first call. This is surprising to me; I didn't realize that the OSR code was still available at this point, after the non-OSR version of the function has bailed out. However, it seems to work nicely in this case. Once again, it bailed out to the interpreter for the last four items.
[found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35]
array2: 112.758ms
# At this point, the function got compiled normally again, so future calls will use this optimized code.
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) using TurboFan]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) - took 0.000, 0.500, 0.042 ms]
# These three calls used that newly compiled version of the code, which uses a megamorphic load.
array2: 350.273ms
array2: 351.822ms
array2: 357.311ms
In closing, I'll just echo Ryan: "JS perf is extremely hard to reason about".
Best,
Seth