Generator bindings approaches

Jason Orendorff

unread,

May 13, 2020, 1:54:09 PM5/13/20

to JS Internals list

Ted and I had a fun conversation just now about bindings in generators.
Jan, I'd like to know what you think about this.

Currently, in generators, all bindings—arguments, locals, etc.—are
marked as "aliased". It's slow. Two possible fixes:

1. LOCALS ON GENERATOR

Add bytecode instructions for getting and setting locals in a
generator. Just as non-aliased locals in normal functions are
optimized into stack slots, in generators optimize them into slots
on the GeneratorObject, where these special instructions can reach
them.

The expression stack remains as-is: on yield, copy it from the stack
to the generator; on resume, copy it back.

2. FRAME ON GENERATOR

When calling a generator, create a normal interpreter/baseline stack
frame, but on the heap instead of the stack, a part of the
GeneratorObject. Copy the arguments there. That is actually going to
be our frame. It is not physically located on the stack.

In Interpreter.cpp, make `REGS.fp` point to the current stack frame
even when it's allocated in a generator, and likewise `REGS.sp`.
Same for the corresponding registers in Baseline/ blinterp. All uses
of `BaselineFrameReg` we looked at would still work if it pointed to
a heap-allocated frame.

The CPU's `rsp` would still point to the C stack as usual.

Now just remove the special case in the frontend that marks all
bindings in generators as aliased. Emit normal GetLocal, SetLocal,
GetArg, etc. instructions. "Everything" "just" works.

Approach 1 is more flexible and calls might be faster. In approach 2,
`yield` and resume are faster, and we avoid having new opcodes. However,
the complexity settles at boundaries between execution modes, which
would now have to cope with frames possibly being noncontiguous and
stored on the heap.

What do you think?

-j

Iain Ireland

unread,

May 13, 2020, 11:16:34 PM5/13/20

to

My initial instinct is that #2 adds more weird corner cases than #1, which makes me lean towards #1 unless there's a significant performance gap.

How many new opcodes are we talking for #1? (I assume that we've considered and rejected the option of having the normal opcodes do something different when we're inside a generator because we don't want to add extra branches to the hot path.)

Do we have any intuition for how big the expression stack tends to be when yielding? If it's small/empty, does that minimize #2's advantage of not having to copy the expression stack?

How do calls work in #2? Would we have to add new call opcodes?

- Iain

Jan de Mooij

unread,

May 14, 2020, 2:05:08 AM5/14/20

to Jason Orendorff, JS Internals list

On Wed, May 13, 2020 at 7:54 PM Jason Orendorff <joren...@mozilla.com>
wrote:

> Jan, I'd like to know what you think about this.
>

Approach 2 is what we used to do before the big generator rewrite that
resulted in Baseline JIT support. Problems I see with bringing it back:

- It ties the generator's heap representation to JIT internals resulting
in weird edge cases. For example: what happens if we execute in the
Baseline JIT, yield from the generator, discard the BaselineScript on GC,
then resume? We'd need to fix up the heap-allocated frame somehow or
Baseline-compile immediately. Debugger and profiler make these things worse.
- There are places
<https://searchfox.org/mozilla-central/rev/9f074fab9bf905fad62e7cc32faf121195f4ba46/js/src/jit/BaselineCodeGen.cpp#611-615>
where we calculate the BaselineFrame's size based on FP/SP distance. This
especially matters for blinterp because it doesn't know anything about the
script statically.
- blinterp /must/ access expression stack slots via SP because the
offset from FP isn't statically known (depends on number of locals and
expression stack slots).
- It makes Ion/Warp support hard because its frame layout is very
different (regalloc spills to 'arbitrary' stack slots).

For approach 1 we could potentially share get/set opcodes for
locals/formals because it's a single array (or maybe not due to
ArgumentsObject weirdness).

Here's a hybrid approach 3:

- GeneratorObject owns a list-of-Values (for formals and locals) as in
approach 1, but InterpreterFrame and BaselineFrame store a raw pointer to
that array.
- We'd need different get/set opcodes for generator locals, but they'd
only be a single load (fp->generatorArray) slower than normal locals [0].
- Maybe we could implement this by ensuring the "list of Values" == "the
GeneratorObject's dynamic slots". That way we get things like nursery
allocation of that array for free.
- For expression stack slots: the list-of-Values could reserve space for
them (based on script->nslots) and then we emit store ops before the yield
and load + push ops after. This way yield/resume with expression stack
slots would be faster than what we do now (ArrayObject allocation..) and if
there are no expression stack slots we don't have any runtime overhead.
This could potentially be optimized in the frontend so that if you have a
yield right after another yield, you only have to emit stores for the part
of the expression stack that actually changed.

[0] For each approach: we probably want to avoid pre-barriers and
post-barriers when storing generator locals while executing. The old code
had some complexity around pre-barriers, it's probably worth trying to
understand what it did.

Jan

> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals
>