De-optimization of hot function when constructor adds methods directly to object

255 views
Skip to first unread message

jMerliN

unread,
Jul 18, 2012, 9:04:35 PM7/18/12
to v8-users
So I can't get my head around why this happens (I haven't dug through
v8's code to try to figure it out either), but this is really
inconsistent to me with how v8 constructs hidden classes in general.
The following is running in Node.js v0.8.2 (V8 v3.11.10.12).

Here's the code:
http://pastebin.com/2gKWrfHp

Here's the output, and the deopt trace:
http://pastebin.com/WerQuGLZ

Calling Foo.prototype.runTest with any Foo object results in similar
performance (unless you change the hidden class, as expected). Bar
expectedly deoptimizes because abc is stored on the proto and isn't
actually on the constructed object until the first call, causing the
optimized function (once it gets hot, which is after the object has
changed hidden class) to bailout on the next attempt with a new Bar
object.

It gets weird with Foobar. test is added directly to the object, the
only difference is that this is a function, not a primitive, but it
seems like the hidden classes of objects from Foobar's constructor
should be the same. The first run is performant, equivalent to Foo
(expected). Though running the test again with a new Foobar
deoptimizes it. I can't at all understand why.

Thanks,
Justin

jMerliN

unread,
Jul 18, 2012, 9:26:56 PM7/18/12
to v8-users
Additionally, I understand that (new Foobar()).test !== (new
Foobar()).test, so that these functions are actually created unique to
each instance, but if you add a parameter to initialize 'abc' to a
number, a differently valued SMI on the object doesn't change the
hidden class, so why does a different function? They're both Function
objects, so that seems weird.

A potentially related question is why functions are treated so weirdly
in objects depending on how they're added. For instance:

var z = {test: function () {}};
z.test2 = function () {};

var i;
console.time('test speed');
for (i = 0; i < 10000000; i++) {
z.test();
}
console.timeEnd('test speed');
console.time('test2 speed');
for (i = 0; i < 10000000; i++) {
z.test2();
}
console.timeEnd('test2 speed');

Result:

test speed: 99ms
test2 speed: 21ms

- Justin

Vyacheslav Egorov

unread,
Jul 19, 2012, 5:27:11 AM7/19/12
to v8-u...@googlegroups.com
Hi Justin,

V8's hidden classes are not limited to tracking fields you assign to
an object, V8 also tries to capture methods you assign (just like in
any object-oriented language classes capture both data and behavior).

That is why first and second objects produced by Foobar will have
different hidden classes --- they have different methods.

As to your second question: they are not treated differently. If you
rewrite your test like this:

var z = {test: function () {}};
z.test2 = function () {};

function foo(z) {
var i;
console.time('test speed');
for (i = 0; i < 10000000; i++) z.test();
console.timeEnd('test speed');
console.time('test2 speed');
for (i = 0; i < 10000000; i++) z.test2();
console.timeEnd('test2 speed');
}

foo(z);
foo(z);

You will see something like:

test speed: 38ms
test2 speed: 12ms
test speed: 11ms
test2 speed: 11ms

Truth is V8 optimizes the code while the first loop is still _running_
(this is called On Stack Replacement aka OSR). So first "test speed"
measurement contains a sum of time spent in unoptimized code, compiler
and optimized code and first "test2 speed" measurement is purely time
spent in optimized code. If you call the same code second time you see
purely timing results for optimized code. This is why benchmarks
should always contain warm up phase to let optimizing JIT kick in.

Hope this explains it.

--
Vyacheslav Egorov
> --
> v8-users mailing list
> v8-u...@googlegroups.com
> http://groups.google.com/group/v8-users

Stephan Beal

unread,
Jul 19, 2012, 7:26:34 AM7/19/12
to v8-u...@googlegroups.com
On Thu, Jul 19, 2012 at 3:04 AM, jMerliN <jme...@jmerlin.net> wrote:
So I can't get my head around why this happens (I haven't dug through
v8's code to try to figure it out either), but this is really
inconsistent to me with how v8 constructs hidden classes in general.
The following is running in Node.js v0.8.2 (V8 v3.11.10.12).

FWIW: v8's internal optimizations are internal implementation details, and any client code which optimizes specifically for them will, long term, require more maintenance and possibly have more bugs/regressions. It is, in general, poor practice for client code to assume that it knows ANYTHING with certainty about a 3rd-party library beyond what is documented in the library's API docs... errr... well, okay, that might be the problem right there.

-- 
----- stephan beal
http://wanderinghorse.net/home/stephan/

jMerliN

unread,
Jul 19, 2012, 2:03:11 PM7/19/12
to v8-users
Vyacheslav,

When I run the code you posted, I see a much bigger discrepancy
between test/test2 in the first pass and a slight reduction in test's
time but still a large discrepancy the second pass (indicating OSR
happened during the first loop the first time around), similar to what
I was seeing yesterday. But that's running on Node.js, and I haven't
re-built Node.js against the latest stable v8 code, but that issue is
completely gone in the current nightly Canary build.

I think I better understand the method issue now. V8 actually treats
methods set on this. differently than other properties, the assembly
generated looks aggressively inlined. If you cheat and set this.test
to a number then to the method, it effectively disables those
optimizations in V8 and you end up treating the object as a normal
object, and even though it doesn't cause deoptimizations (all objects
have the same hidden class), it's significantly slower than the
inlined method call. The real issue in my example is that test is per-
object and runTest is static, if runTest was assigned via this., it
should only ever see one hidden class, unless you do something evil
like .apply.

Though this test seems to indicate that this only occurs when building
the hidden class: http://pastebin.com/JbuLaEUt

Even though it never deoptimizes, I'd expect each of those to have
similar performance, but only the first Foobar created is performant.


On a related note, has there been any consideration for making v8 not
de-optimize when a hidden class is ancestral to another (and therefore
compatible)? I mean if you have {a: 7, b: 7} and you have a really
hot loop that only touches a and b, then you add a c property, because
it was transitioned from the proper hidden class for that hot loop to
a superclass of it (with the same indices in its property access
table), that hot function can assume it's the {a, b} hidden class.
This is similar to how classical inheritance works (Foo extends Bar,
functions that operate on Bar can also operate on Foo), but in this
case a hidden class transition is a strict superset, which lets you
make really nice assumptions.

Vyacheslav Egorov

unread,
Jul 19, 2012, 4:00:25 PM7/19/12
to v8-users
Knowing that you are running it in node.js I can confirm that there is
indeed a difference between test/test2 properties. The reason is we
don't convert test to a CONSTANT_FUNCTION if object literal is not in
global scope. This is a heuristic that was based on the assumption
that top level code is executed once and non-top-level many times
(thus every time object literal will have a different map):
https://github.com/v8/v8/blob/master/src/parser.cc#L4272-4279 . In the
past we would not make test2 a CONSTANT_FUNCTION either because we
required function to be in old space. I think we might want to change
this to make it consistent and I've filed a bug (https://
code.google.com/p/v8/issues/detail?id=2246). node.js wraps module
bodies in anonymous function --- that is why slow down is not
reproable in Chrome or d8 shell:

(function () {
var z = {test: function () {}};
z.test2 = function () {};
function foo(z) {
var i;
console.time('test speed');
for (i = 0; i < 10000000; i++) z.test();
console.timeEnd('test speed');
console.time('test2 speed');
for (i = 0; i < 10000000; i++) z.test2();
console.timeEnd('test2 speed');
}

foo(z);
foo(z);
})();

> The real issue in my example is that test is per-
> object and runTest is static, if runTest was assigned via this., it
> should only ever see one hidden class, unless you do something evil
> like .apply.

This will not help because type-feedback is currently shared between
all instances of the same function literal: V8 mostly gets type-
feedback from IC-stubs that are referenced by inline-caches in
unoptimized code and unoptimized code object is the same for any
closure created from the same function literal.

> On a related note, has there been any consideration for making v8 not
> de-optimize when a hidden class is ancestral to another (and therefore
> compatible)?

This will be great but there is no easy way to check that two hidden
classes are compatible. Hidden classes are currently compared by
pointer equivalence, which boils down to two instructions (compare and
jump). Checking for inheritance would lead to a pretty complicated
code. The most effecient way, it seems, to implement such a check is
to record transition path in every map and then check if a fixed
position in transition path is equal to a fixed map. This is much more
complex and I am not sure it benefits any real world code.

--
Vyacheslav Egorov

jMerliN

unread,
Jul 19, 2012, 6:41:55 PM7/19/12
to v8-users
> This will be great but there is no easy way to check that two hidden
> classes are compatible. Hidden classes are currently compared by
> pointer equivalence, which boils down to two instructions (compare and
> jump). Checking for inheritance would lead to a pretty complicated
> code. The most effecient way, it seems, to implement such a check is
> to record transition path in every map and then check if a fixed
> position in transition path is equal to a fixed map. This is much more
> complex and I am not sure it benefits any real world code.

I'll try to find a good real-world example of where this causes
violent deops from common practices. I've seen it done quite a few
times.

If there are only 25-35 allowable properties in a klass, you can
potentially make a really fast check for this. If you store pointers
to the klasses in a contiguous array such that higher indices are
always superklass pointers of lower indices (regardless of
transition), you can determine compatibility with 2 cmps (one compat,
one bounds checking). You could still do the normal cmp/jmp into
optimized code, but if the cmp fails (not equal), you can do 2 more
cmps (if > optimized-for-klass and < end of block) to determine if
this is a parent klass, and if so you can jmp to the optimized code
and only if those cmps fail do you deoptimize.

The downside is that the generated optimized code would need to
dereference once just to get the klass pointer, adding an extra few
cycles to each optimized IC. Though I suppose when you could move
that code out and do actual klass pointer equiv cmp, if that fails
then go back to this block and do a bounds check, and if it's a parent
then jmp into the optimized code keeping the klass pointer, which
pushes the extra work into the case that the klass pointers aren't
equivalent but are compatible (which should be rare). Storing those
compat blocks would add a memory overhead and the non-monomorphic
check can potentially prevent a deoptimization with a few more
instructions. It shouldn't reduce performance, though.

You could also potentially partition such a compat block structure as
to minimize the number of pointers needed to do a reasonable job at
guarding against deoptimization from extended objects.

On Jul 19, 1:00 pm, Vyacheslav Egorov <vego...@chromium.org> wrote:
> Knowing that you are running it in node.js I can confirm that there is
> indeed a difference between test/test2 properties. The reason is we
> don't convert test to a CONSTANT_FUNCTION if object literal is not in
> global scope. This is a heuristic that was based on the assumption
> that top level code is executed once and non-top-level many times
> (thus every time object literal will have a different map):https://github.com/v8/v8/blob/master/src/parser.cc#L4272-4279. In the

Vyacheslav Egorov

unread,
Jul 20, 2012, 7:05:21 AM7/20/12
to v8-u...@googlegroups.com
> If there are only 25-35 allowable properties in a klass, you can
> potentially make a really fast check for this.

Yep, I know. You are describing exactly what I described above, just
in different words :-) It's an old and well known way to implement
inheritance checks in a single inheritance languages (at least Oberon
compilers used in back in 80s).

--
Vyacheslav Egorov
Reply all
Reply to author
Forward
0 new messages