mixed calling convention is a little more complex, but debuggability is
improved a lot. not sure about impact on performance.
how about
s.up__.up__.a
mixed calling convention is a little more complex, but debuggability is
improved a lot. not sure about impact on performance.
is there a reason to use numeric slots instead of symbolic slots?
ok, for me, speed beats debuggability, especially since cajita programs
are runnable without translation.
maybe add a few sentences about this to the proposal?
maybe add a few sentences about this to the proposal?
no wait, I take that back. the typical use of cajita will be within a
complex container, and it might be hard to run the cajita code usefully
outside the container. debuggability beats performance. hm.
I guess it isn't too hard to keep the names of things as comments near
the translated code, that'll help some. but it will still be hard to
read object tree browserss, which can be solved by emitting debug
symbols and teaching firebug about it.
I'm just wary of that complexity. symbolic slots is potentially more
usable more quickly.
I'm just wary of that complexity. symbolic slots is potentially more
usable more quickly.
debuggability beats performance.
in IE7, the speed of numeric vs symbolic slots is indistinguishable.
in IE8b1, numeric wins slightly but it's still mostly indistinguishable.
in opera 9.51, symbolic slots are faster.
in nightly firefox and nightly webkit, numeric slots are faster.
IE is the most important browser, and it evolves slowly.
in the other browsers, performance is a moving target and is changing
rapidly. optimizing for their current performance characteristics is a
mistake. the other browsers are likely improve their runtime to match
opera, since making these basic operations faster benefits everyone, not
just cajita.
therefore, performance of these micro-operations is a red herring.
symbolic slots beat numeric slots, due to ease of implementation and
ease of debugging.
does that seem sensible?
here's the performance numbers I got:
ie7 (in a vm):
readGrandparentNumerically - 1250 μs
writeGrandparentNumerically - 1095 μs
readGrandparentSymbolically - 1170 μs
writeGrandparentSymbolically - 1095 μs
writeNewArraySlot - 1640 μs
writeNewObjectSlot - 1875 μs
new3ElementArray - 18905 μs
new3MemberObject - 19220 μs
ie8 (in a vm):
readGrandparentNumerically - 1095 μs
writeGrandparentNumerically - 1015 μs
readGrandparentSymbolically - 1175 μs
writeGrandparentSymbolically - 1015 μs
writeNewArraySlot - 1325 μs
writeNewObjectSlot - 1800 μs
new3ElementArray - 17110 μs
new3MemberObject - 17655 μs
opera 9.51:
eadGrandparentNumerically - 227.5 μs
writeGrandparentNumerically - 302.5 μs
readGrandparentSymbolically - 275 μs
writeGrandparentSymbolically - 205 μs
writeNewArraySlot - 315 μs
writeNewObjectSlot - 295 μs
new3ElementArray - 1362.5 μs
new3MemberObject - 1095 μs
nightly firefox:
readGrandparentNumerically - 647.5 μs
writeGrandparentNumerically - 577.5 μs
readGrandparentSymbolically - 705 μs
writeGrandparentSymbolically - 620 μs
writeNewArraySlot - 675 μs
writeNewObjectSlot - 1392.5 μs
new3ElementArray - 1762.5 μs
new3MemberObject - 3912.5 μs
nightly webkit:
readGrandparentNumerically - 135 μs
writeGrandparentNumerically - 110 μs
readGrandparentSymbolically - 162.5 μs
writeGrandparentSymbolically - 157.5 μs
writeNewArraySlot - 77.5 μs
writeNewObjectSlot - 1125 μs
new3ElementArray - 1317.5 μs
new3MemberObject - 1562.5 μs
my earlier benchmarking gets different numbers than yours, because yours
looks like
for (k = 0; k < n; ++k) { t = a[0][0][1]; }
and mine looks like:
for (k = 0; k < n; ++k) { a[0]; a[1]; a[3]; a[4]; }
I'm confused as to why you are expecting this to reduce overall allocation.
Doesn't it end up allocating a bunch of array objects that would not
otherwise have been needed? Note that these can't be deallocated immediately
when a stack frame exits, since they might be captured in closures created
by the function. I.e. you're essentially using a spaghetti heap. That can
be a perfectly reasonable approach *if* the whole language implementation
is designed around it, but existing JS implementations are not. You're also
replacing local parameter accesses with array accesses, which should be
less efficient.
--
David-Sarah Hopwood
I'm confused as to why you are expecting this to reduce overall allocation.
Doesn't it end up allocating a bunch of array objects that would not
otherwise have been needed?
Note that these can't be deallocated immediately
when a stack frame exits, since they might be captured in closures created
by the function. I.e. you're essentially using a spaghetti heap.
That can be a perfectly reasonable approach *if* the whole language implementation
is designed around it, but existing JS implementations are not.
You're also replacing local parameter accesses with array accesses, which should be
less efficient.
I think you need to concentrate on IE. for the other browsers, it seems
more useful to put the effort into improving the underlying js engine,
especially if cajita.eval() becomes a native function.
Yes they do. Existing JS interpreters will not be able to see that the
use of these array objects follows stack discipline. They will just leave
it to the GC, and so the GC will run more often.
(It's not a difficult optimization -- just straightforward escape analysis
-- but JS interpreters are *painfully* unoptimized.)
> However, we are making a deal with the devil in some sense. We reduce the
> number of per-object allocations for objects that have a large number of
> closures. In return, we incur one heap allocation every time we call a
> function. Is this the tradeoff you are calling into question?
>
>> Note that these can't be deallocated immediately
>> when a stack frame exits, since they might be captured in closures created
>> by the function. I.e. you're essentially using a spaghetti heap.
>
> I assume "spaghetti heap" == allocating stack frames on the heap, aka a
> stackless interpreter?
Yes. Actually I meant to say "spaghetti stack":
<http://en.wikipedia.org/wiki/Spaghetti_stack>.
> Or am I missing something?
>
> In any case, yes, closures will pin the stack frames, which is precisely
> what we want. Is your point that the whole stack frame will be pinned,
> rather than simply a structure containing only those slots which the closure
> refers to?
That's another problem, yes: you will get potentially huge space leaks.
>> That can be a perfectly reasonable approach *if* the whole language
>> implementation is designed around it,
... for example, the escape analysis for stack frames mentioned above is
essential ...
>> but existing JS implementations are not.
>
> Yes, I think we are gambling that we can implement 10% of a VM inside the
> other 90% and get better performance. This is subject to benchmarking.
>
>> You're also replacing local parameter accesses with array accesses, which
>> should be less efficient.
>
> Well so I think we are trading time for space and hoping that we don't make
> things too bad.
I'm still extremely skeptical; I suspect that you will lose in both time and
space.
--
David-Sarah Hopwood