Instruction scheduling and selection in IonMonkey

Ting-Yuan Huang

unread,

Mar 15, 2013, 6:20:01 AM3/15/13

to dev-tech-js-en...@lists.mozilla.org

Hi,

It seems that there's no instruction scheduler in IonMonkey. If so, may I know why? Modern processors should be benefited a lot by an instruction scheduler. I'd like to know if it is worth doing so before diving in :-)

Also I didn't see a "formal" (that appears in textbooks) instruction selector, such as tiling a tree/DAG by dynamic programming, or a peephole optimizer. I'm not sure but it seems that the quality of instruction selection relies on the lowering process from MIR to LIR, so that a direct mapping from LIR to assembly codes is efficient enough, right?

Thanks!

Andreas Gal

unread,

Mar 15, 2013, 12:17:42 PM3/15/13

to Ting-Yuan Huang, dev-tech-js-en...@lists.mozilla.org

We did some research work on this for JIT compilers way back at UCI as part of my thesis. This was 5 years ago and the architecture world was different, and this was focused on x86, but the rough result was that on x86 all that matters is scheduling division and memory access. The rest was irrelevant. The hardware can see much further ahead in the dynamic instruction stream than you can easily do in software, especially when compiling under time pressure (its a JIT!). ARM is a different beast and might benefit more, especially at the Ion level.

Andreas

> _______________________________________________
> dev-tech-js-engine-internals mailing list
> dev-tech-js-en...@lists.mozilla.org
> https://lists.mozilla.org/listinfo/dev-tech-js-engine-internals

Nicolas B. Pierron

unread,

Mar 15, 2013, 6:56:43 PM3/15/13

to

Hi,

On 03/15/2013 03:20 AM, Ting-Yuan Huang wrote:
> It seems that there's no instruction scheduler in IonMonkey. If so, may I know why? Modern processors should be benefited a lot by an instruction scheduler. I'd like to know if it is worth doing so before diving in :-)
>
> Also I didn't see a "formal" (that appears in textbooks) instruction selector, such as tiling a tree/DAG by dynamic programming, or a peephole optimizer. I'm not sure but it seems that the quality of instruction selection relies on the lowering process from MIR to LIR, so that a direct mapping from LIR to assembly codes is efficient enough, right?

Indeed, our macro assembler is directly writing into the buffer. At the
same time the code that we are producing contains many checks which might
make it hard for assembly optimization to trigger as we need to handle
corner cases such as bailouts.

In IonMonkey case, I think this might be interesting in terms of code-size
and avoiding redundant operations. Like avoiding test operations after ALU
if we are checking if the last computed register is zero, and also to get
rid of scratch register initialization on x64. But I guess this would
mostly be a code-size issue.

In asm.js case (codename OdinMonkey), I think this might be interesting to
test as we are trying to recover the assembly out-of infallible JavaScript
(except on ARM bounds check). I don't know if the quality of the assembly
that we are producing is good enough or not, Luke and Marty might know more
about it.

In case of ARM, I don't know what is the impact of such optimizations.

--
Nicolas B. Pierron

Jeff Walden

unread,

Mar 17, 2013, 7:55:08 PM3/17/13

to

On 03/15/2013 09:17 AM, Andreas Gal wrote:
> We did some research work on this for JIT compilers way back at UCI as part of my thesis. This was 5 years ago and the architecture world was different, and this was focused on x86, but the rough result was that on x86 all that matters is scheduling division and memory access. The rest was irrelevant. The hardware can see much further ahead in the dynamic instruction stream than you can easily do in software, especially when compiling under time pressure (its a JIT!).

This basically matches my knowledge from a compiler class I took around the same time. Someone took the test programs we were to compile and did a ton of hand-scheduling of things (to decide which optimizations we should implement in the limited time we had) and found it didn't make any difference at all to ultimate speed.

I don't know anything about ARM capabilities now, but to the extent ARM chips aren't as smart, I'd want to know what the chances are that newer ARM chips will be smarter in this regard, before spending a lot of time on instruction scheduling. There seems to be a lot of other low-hanging fruit we should pick first, even if it were valuable on ARM.

Jeff

Ting-Yuan Huang

unread,

Mar 18, 2013, 4:26:23 AM3/18/13

to Jeff Walden, dev-tech-js-en...@lists.mozilla.org

Thanks for the explanation! Sounds like that those dynamic techniques (out of order execution, register renaming, etc.) dominated the static scheduler in compiler, especially on beasts like x86.

AFAIK, ARM introduced ooo and register renaming to cortex-a9, but I've no idea how effective it is. I can do some experiments on various ARM CPUs if needed.

BTW, could you please suggest some low-hanging fruit? :-) Before this thread I used to think of an instruction scheduler a big fruit.