Polymorphism slower than bytecode?

65 views
Skip to first unread message

Conrad Buck

unread,
May 20, 2022, 1:11:37 PM5/20/22
to v8-dev
I'm looking at a perf example shared by Ryan Cavanaugh of Typescript, and I'm very much failing to understand what is happening and why. The particular contradictions upset my entire mental model of how to write performant javascript. What's going on internally?

Here is the example: https://gist.github.com/conartist6/642dcfbd6fa444da92f211bcb405692b

The two specific things I don't understand are:

1. If I have this code:

```js
function getA(o) {
  // Why would this property access be polymorphic?
  // Isn't the offset for the `a` property always the same?
  return o.a;
}
getA({ a: 0 })
getA({ a: 0, b: 1 })
```

2. Why would polymorphic code optimized by turbofan be a full 3x slower than unoptimized bytecode?

seth.b...@microsoft.com

unread,
May 20, 2022, 4:07:42 PM5/20/22
to v8-dev
Hi Conrad,

I'll make an attempt at answering, though I'm not an expert on OSR, so others like Jakob or Mythri may have more precise answers.

1. Why would this property access be polymorphic?

If you're talking about polymorphic access, you're probably familiar with hidden classes (also called "object shapes" or "Maps but not those Maps"). Regardless, I'll include a link: the most complete and accurate description I've found of how hidden classes work in V8 is Fast properties in V8.

In this example, the object {a: 0} has a hidden class which says it has one property named "a". The object {a: 0, b:1} has a different hidden class which says it has two properties, named "a" and "b". After the function getA has performed a lookup to get property "a" from {a: 0}, it keeps a pointer to that object's hidden class and another value indicating where the property can be found (for this case, the first in-object property slot). When running getA({a:0, b:1}), V8 checks whether the object {a:0, b:1} has the same hidden class as {a: 0}, which it does not. So V8 remembers a second pair of values: the hidden class for {a:0, b:1} and where its "a" property can be found (also the first in-object property slot). Now the feedback state for that load operation is polymorphic. The "mono" in monomorphic refers to a single hidden class, not to a single result of where the property can be found.

This behavior tends to be particularly problematic in codebases with a lot of class inheritance, because loading a field defined by a base class is often a megamorphic operation, even if that base class's constructor always sets the same properties in the same order.

2. Why would polymorphic code optimized by turbofan be a full 3x slower than unoptimized bytecode?

It seems that you may be misunderstanding the somewhat cryptic output from --trace-opt. In particular, OSR means on-stack replacement. Copying some text from that link:

When V8 identifies that certain functions are hot it marks them for optimization on the next call. When the function executes again, V8 compiles the function using the optimizing compiler and starts using the optimized code from the subsequent call. However, for functions with long running loops this is not sufficient. V8 uses a technique called on-stack replacement (OSR) to install optimized code for the currently executing function. This allows us to start using the optimized code during the first execution of the function, while it is stuck in a hot loop.

Iterating through an array of 100 million items certainly counts as a "hot loop", so the vast majority of the time in all of your measurements is spent in optimized code produced by Turbofan, not in the interpreter. You can try running unoptimized code by passing the command-line flag --no-opt, which I expect will go much more slowly than what you've measured thus far. I've added some possibly more human-readable annotations to the output you provided:

# This call started in the interpreter, but was replaced by optimized code while running (this process is referred to as OSR).
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) using TurboFan OSR]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) - took 0.000, 0.541, 0.000 ms]
array1: 115.701ms

# This call reused the OSR code from the first call.
[found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35]
array1: 113.721ms

# At this point, the function got compiled normally (not using OSR), so future calls will use this optimized code.
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) using TurboFan]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) - took 0.000, 0.500, 0.041 ms]

# These three calls used fully optimized code.
array1: 80.069ms
array1: 79.72ms
array1: 79.245ms

# This call mostly used optimized code, until it bailed out to the interpreter for the last four items.
array2: 78.906ms

# This call reused the OSR code from the very first call. This is surprising to me; I didn't realize that the OSR code was still available at this point, after the non-OSR version of the function has bailed out. However, it seems to work nicely in this case. Once again, it bailed out to the interpreter for the last four items.
[found optimized code for 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) at OSR bytecode offset 35]
array2: 112.758ms

# At this point, the function got compiled normally again, so future calls will use this optimized code.
[compiling method 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) using TurboFan]
[optimizing 0x3b21df5b9ad9 <JSFunction sum (sfi = 0x10c4d4712831)> (target TURBOFAN) - took 0.000, 0.500, 0.042 ms]

# These three calls used that newly compiled version of the code, which uses a megamorphic load.
array2: 350.273ms
array2: 351.822ms
array2: 357.311ms


In closing, I'll just echo Ryan: "JS perf is extremely hard to reason about".

Best,
Seth

Jakob Kummerow

unread,
May 20, 2022, 5:05:16 PM5/20/22
to v8-dev
That's an excellent answer, Seth.

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/6857a76a-8c97-445d-9f92-413cc9f99c67n%40googlegroups.com.

Conrad Buck

unread,
May 22, 2022, 8:36:59 AM5/22/22
to v8-dev
Thanks for taking the time to write out such a detailed and thoughtful answer. I knew some parts of that like maps/shapes and transitions, but not other parts like OSR.

I'm still left with my fundamental question. Given that there is a transition tree for maps, why does the engine not realize that there is one map (for objects with a single property a) that is valid for both objects? In other words, why does the engine cache the most specific applicable map instead of the most generally applicable map in this case?

Conrad Buck

unread,
May 22, 2022, 10:12:29 AM5/22/22
to v8-dev
Maybe it would be wise to keep my question less specific. What I really want to know is: is there any possible way to define the conditions for the optimization so that they do not break down when properties are added? The alternative is rather gruesome to think about -- that it is not safe to extend classes or otherwise use objects that do not have identical transition trees in perf-sensitive code. I'm currently writing a parser, and to work around that limitation I'd need to write this code!

{
  type: 'NodeType',
  prop1: ...,
  prop2: ...,
  prop3: ...,
  prop4: ...,
  prop5: ...,
  // hope more than 6 properties are never needed
}

buildIfStatement(test, consequent, alternate) {
  return {
    type 'IfStatement',
    prop1: test,
    prop2: consequent,
    prop3: alternate,
    prop4: null,
    prop5: null,
  }
}

Conrad Buck

unread,
May 22, 2022, 10:28:51 AM5/22/22
to v8-dev
Sorry that example code was gibberish. I'm so used to being able to edit posts I make (and use syntax highlighting) that I'm very sloppy. The correct example code was:

function buildIfStatement(test, consequent, alternate) {
  return {
    type: 'IfStatement',
    prop1: test,
    prop2: consequent,
    prop3: alternate,
    prop4: null,
    prop5: null,
  }
}

But actually I think it was utterly pointless code as it misunderstands the real impact of the problem. Ast-handling code tends to look like:

function print() {
  switch(node.type) { // <-- only this access is needlessly megamorphic
    case 'IfStatement': return 'if (' + node.test + ')' + ... // <-- this access is monomorphic
  }
}

Thus while there is megamorphism it is not likely to be dominant in the total costs of real execution.

seth.b...@microsoft.com

unread,
May 23, 2022, 12:45:57 PM5/23/22
to v8-dev
>  Given that there is a transition tree for maps, why does the engine not realize that there is one map (for objects with a single property a) that is valid for both objects? In other words, why does the engine cache the most specific applicable map instead of the most generally applicable map in this case?

Checking whether the object's map exactly matches some pointer is very fast. Traversing the transition tree to find out whether the object's map derives from some other map could be very slow, depending on how many layers of tree must be loaded along the way. Unfortunately, there's currently no simple and fast way to detect that a map is a member of a large group of maps that have some matching properties.

This behavior is indeed troubling, and occasionally I write design documents with wild ideas for improving the situation. Like this one from a couple of months ago (which I never shared til now), or this other one from 2019. I know that other people have also given serious consideration to scenarios like the one you described, but the current behavior remains, mostly because the megamorphic stub cache is pretty good, so meaningful improvement is hard.

> I'm currently writing a parser, and to work around that limitation I'd need to write this code!

The code you wrote is one option; another option would be to put all type-specific data inside a nested object, like so:

{
  nodeType: 'IfStatement',
  data: {
    test: ...,
    consequent: ...,
    alternate: ...
  }
}

That way, all Node objects match precisely and have just the two properties 'nodeType' and 'data'.

> that example code was gibberish

Looked fine to me 🙂.

> Thus while there is megamorphism it is not likely to be dominant in the total costs of real execution.

That seems logical, but I'll plagiarize from Ryan again: "moreso than in other systems, you need to measure measure measure measure, and make sure your measurements are as near as possible to the real thing you're trying to build."

Best,
Seth
Reply all
Reply to author
Forward
0 new messages