Switch statement range checking

149 views
Skip to first unread message

Mihir Shah

unread,
May 15, 2021, 1:03:47 AM5/15/21
to v8-dev
Hi,

I was working on a jump table implementation for switch statements with all constant Smi case labels (in the parse step, instead of generating the traditional if-else-if kind of logic), and needed to do range checking at the top. 

I was wondering, then, was there a better way to do range checking, i.e. does value in accumulator register x lie in range (known at compile time) [a,b]? I think the standard trick of reducing a<=x<=b to 0<=x-a<=b-a and then doing unsigned comparison here would not work because Smi is signed. 

Because right now my idea of the bytecode is something like this (which feels very inefficient):

Load x into accumulator and register r1 ...
TestLessThan [b]
Star [r2]
Ldar [r1]
TestGreaterThan [a]
TestEqual [r2]
JumpIfFalse <after the switch>
Ldar [r1]

... proceed with SwitchOnSmi...

Thank you!


Leszek Swirski

unread,
May 15, 2021, 2:45:42 AM5/15/21
to v8-dev
You could save a register and a TestEqual if you did two jumps (one for x<a, one for x>=b) but otherwise there's not much you can do for a range check. Keep in mind though that SwitchOnSmi does a range check itself (and falls through on failure), so unless you want to exclude a range that's inside your Switch range, you don't need a range check.

Another thing to keep in mind is that SwitchOnSmi expects a Smi value in the accumulator, so you'll need to add a Smi check to the bytecode handler. Thankfully 'switch' comparison semantics are that of strict equality, so afaict you don't need to add a ToPrimitive or anything like that, but I think you might have to add an explicit -0 check before the Smi check.

Hope that helps, happy to review when you get a patch together - this has been on my backlog for literal years, so I'm glad to see someone else doing it!

- Leszek

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/6df36377-1d02-4de9-aec4-5890003af416n%40googlegroups.com.

Mihir Shah

unread,
May 15, 2021, 12:12:29 PM5/15/21
to v8-dev
Oh ok I will add the smi check then, thank you!

Mihir Shah

unread,
May 16, 2021, 8:55:21 PM5/16/21
to v8-dev
Hi, so I implemented an Smi check by checking (x|0) === x,  (since I was not able to get subtracting 0 to work with some of the special case in test/mjstest/switch.js like feeding a NaN or a number very close to an integer case). If the equality check fails, then I jump to the default case or after the switch, depending on whether there is a default case.

This passes all the test cases in release mode, but in debug mode, the assertion in the SwitchOnSmi in src/interpreter/interpreter_generator.cc that checks whether the tag of the switch is an Smi fails for certain malformed inputs to the switch (when running the switch.js test). 

I was not sure how I could get around this, since I did not see any convert to Smi bytecode.

Thank you!

Leszek Swirski

unread,
May 17, 2021, 7:35:23 AM5/17/21
to v8-dev
I actually meant implementing a Smi check in the bytecode handler itself, in interpreter-generator.cc, not as additional bytecodes.

Mihir Shah

unread,
Jun 23, 2021, 11:45:35 AM6/23/21
to v8-dev
Hi, thank you for all the feedback on the PR, I learnt a lot from the process!

Also I was interested in doing another project, I remember you had mentioned that opportunistic inlining of Smi variables/eliding it entirely (Commit message · Gerrit Code Review (googlesource.com)) could be a good future project.

I was wondering whether you think this would be a good next project for me to work on, and if so, I was wondering what would be the acceptance criteria for such a PR (since I can't find an issue on the v8 issue tracker with a similar purpose)?

Thank you!

Leszek Swirski

unread,
Jun 24, 2021, 6:35:53 AM6/24/21
to v8-dev
Hi Mihir,

Thanks for your contribution! I think you have two good options for followups:

  a) Port your switch statement prologue from being in bytecodes to being in Sparkplug and TurboFan (effectively, the first version we almost submitted, except working well with the other compilers).

  b) As you say, this Smi variable thing. There's no bug for it, I think it's actually quite an original idea from your side. This sort of thing would do well with a design doc first, specifying more precisely what sort of variables you'd be able to inline (my guess would be never-assigned vars with Literal initializers), and what the consequences would be (e.g. the additional memory cost of repeated LdaSmi, smaller stack frames because there's fewer registers, potential for constant folding(?)). I actually have no idea if this would be a net benefit or not, so it'd be good to have an analysis. You could follow the template in our other design docs, e.g. https://docs.google.com/document/d/1qR_C8qYdKsDQFbFliAZLw2zmZ0nhA0xUCld1ba3N2Zk/edit#heading=h.n63k76b3zfwa

- Leszek

Mihir Shah

unread,
Jun 24, 2021, 7:25:31 PM6/24/21
to v8-dev
Hi,

So just to update, I got this email with the following regression test result,  Pinpoint (pinpoint-dot-chromeperf.appspot.com), which seems to show that my patch caused a significant performance decrease. I as well think it automatically opened this issue  1223133 - [V8 Perf Sheriff]: 3 regressions in v8.browsing_desktop - chromium. I wasn't sure if this was due to my patch, and if it was, if I needed to address it or if it was a noisy test?

Thank you!

Mihir Shah

unread,
Jun 24, 2021, 8:08:59 PM6/24/21
to v8-dev
Sorry I forgot to mention this in the earlier message, but I was googling for this error, and I found in a lot of places it was marked with status WontFix (which is why I thought it may be noisy)

Santiago Aboy Solanes

unread,
Jun 25, 2021, 4:46:56 AM6/25/21
to v8-...@googlegroups.com
Hi Mihir,

The name of the error is a generic one since it is automatically created when it sees a regression so you'd see "N regressions in test_suite_name" quite often. Some regressions are flakes and are closed as won't fix, but this one doesn't seem to be.
In some cases, we are trading off saving memory for performance regression and we consider it acceptable.

In any case, you currently have three bugs associated with your CL. I will merge them into only one so it is easier to track them but be aware that it has more regressions than those three.
I will also CC the current performance sheriff who may be able to assist on how to proceed. The performance sheriff is a rotation that we have of people who make sure no big performance regressions go into V8.

Cheers,
Santiago

Mihir Shah

unread,
Jun 30, 2021, 11:30:56 AM6/30/21
to v8-dev
Hi, sorry to bug you but just to follow up, I was not understanding why creating a chain of branches would improve upon a large switch node ( 1223133 - [V8 Perf Sheriff]: 3 regressions in v8.browsing_desktop - chromium), in improving the optimize duration performance?
Thank you!
On Thursday, June 24, 2021 at 5:35:53 AM UTC-5 les...@chromium.org wrote:

Leszek Swirski

unread,
Jul 5, 2021, 7:01:55 AM7/5/21
to
(v8-dev to bcc)

I replied on the bug, basically I imagine that Switch nodes aren't well supported by the optimizer -- either because of node traversal costs, or because of optimization passes being designed only for Branch.

Mihir Shah

unread,
Jul 6, 2021, 11:45:01 PM7/6/21
to v8-dev
Hi,

Sorry to bug you again, but I was thinking maybe if we eliminated the extra bytecodes that could potentially reduce optimize::duration, since the optimizer has to handle/optimize on more than a dozen bytecodes extra. I created a design doc for preliminaries on porting (which we were planning on doing later I think anyway), https://docs.google.com/document/d/1Nu5dBnmoL7IcFI-ZPvtjhk97rqu8B6m9IgQgQXLgPl4/edit?resourcekey=0-L-yAxQwVqRlKtIXmomJ1-g, can you please let me know if this sounds good?

Thank you!

Mihir Shah

unread,
Jul 7, 2021, 10:43:42 AM7/7/21
to v8-dev
Hi,

Thank you for this feedback, I will go back to the old method then.
Also just to follow up, in one of your comment you mentioned, "We could even consider making the remaining conversion code "deferred" so that it doesn't fall out of line." I wasn't understanding what would this look like?

Thank you!

Leszek Swirski

unread,
Jul 7, 2021, 11:54:53 AM7/7/21
to mihirsh...@gmail.com
v8-dev to bcc

It's easier to discuss this further in the doc, but basically in CSA code you can mark a rarely-taken label as "deferred" to move that chunk of code to the end of the generated code. This makes the fast-path of not taking the label faster.

Mihir Shah

unread,
Jul 8, 2021, 3:20:05 PM7/8/21
to v8-dev
Hi,

Also I had another question, I noticed in the TryTaggedToInt32AsIntPtr() method, in the case it is not an Smi but is a Heap number, we load the float64, truncate to an int32, and then do the strict equality check between the original and truncated version to make sure the new form is equivalent. 

I was wondering, in what case would the float64 ever be equivalent to an int32, except for -0.0? (Because I remember like we were saying that any float representable as an Smi will be stored as an Smi, back when we were doing the switch case analysis) In this case, I was wondering if we could streamline the logic just to check for a -0.0?

Thank you!

Leszek Swirski

unread,
Jul 9, 2021, 4:10:16 AM7/9/21
to v8-...@googlegroups.com
There's almost certainly cases in the runtime where we don't normalize a HeapNumber to a Smi -- my comment on floats representable as Smis being normalized to Smis was only for the numbers generated by the parser.

Mihir Shah

unread,
Jul 13, 2021, 1:21:00 AM7/13/21
to v8-dev
Hi,

So I was implementing the baseline compiler routine to do the Smi conversion, and I just wanted to make sure I was on the right track:
Here is a short document outlining what I am doing, whenever it is possible can you please look through this?

Thank you!

Mihir Shah

unread,
Jul 19, 2021, 12:40:38 AM7/19/21
to v8-dev
Hi,

Just to update, so I implemented the sparkplug code for the x86/arm machines for 32-bit and 64-bit; I wasn't sure did I need to support MIPS or any of the other architectures, since the  Handling of ports · V8 link says that MIPS is not officially supported?

Thank you!

Leszek Swirski

unread,
Jul 19, 2021, 3:50:29 AM7/19/21
to v8-...@googlegroups.com
Correct, x86, x64, arm and arm64 are all you need to do.

Mihir Shah

unread,
Jul 20, 2021, 12:28:35 AM7/20/21
to v8-dev
Hello, I was wondering, I think maybe Turbofan is already doing Smi checking/necessary conversions?

Here is a javascript file I wrote, switch2.js, and the output.txt (https://drive.google.com/file/d/12Zz4kznIesa1RzxCnCKPLXPmsuSrOGSZ/view?usp=sharing)
  • Run with --print-code and --print-bytecode turned on.
  • I also verified that sparkplug is not running in this case. 
  • The file has the bytecode generated at the front, and the source of switch2.js on line 85 or so
  • switch2.js is verified to call the BytecodeGraphBuilder method (I put a print statement to verify)
  • I noticed as well it is optimized and presumably deoptimized several times when run with more iterations than 6000 currently in source, not sure if this is relevant information (maybe because i is switching between an integer-representable type and back again)
The output looks like Smi conversion is happening (line 200 in output.txt, the ucomisd and vcvttsd2si instructions are happening, before the indirect jump with the jump table). I think this is not from any sparkplug code I already wrote, since that was verified as not running. I am guessing it is collecting type feedback maybe?

I was wondering whether there was another use case I was missing that I should be testing on, since I tried a couple cases like this, and all seemed to work with no further modifications, so I wasn't sure.

Thank you!

Leszek Swirski

unread,
Jul 20, 2021, 4:23:28 AM7/20/21
to v8-...@googlegroups.com
Turbofan guards against the input value not being a Smi (with CheckSmi). The float conversions that you're seeing are to check if the result of i%7 is representable as Smi, and if not, it deopts (the HeapNumber that would normally be created there is elided, since it doesn't escape the function, that's why you don't see HeapNumber checks and instead checks on the Float64 value of i%7).

So, as you've observed, this will deopt instead of crashing whenever a non-Smi value is received. This is not behaviour that we want, deopts are expensive and this would be what we'd consider a "deopt loop" which we don't want to have.

Mihir Shah

unread,
Jul 22, 2021, 1:22:43 AM7/22/21
to v8-dev
Hi,

Sorry to bug you again about this, but I was having trouble understanding how to implement the Turbofan port,

In the doc a couple comments ago you had mentioned "add[ing] an new op, generated in BytecodeGraphBuilder, which checks if an incoming Object is either a Smi or a HeapNumber whose value can compare equal to a Smi."

1. Just to clarify, op is something in opcodes.h?
2. I was thinking, would it have to be a new op, or could it be implemented in terms of existing opcodes's?
3. Hybrid - could I use some of the existing opcodes, and then create an opcode wrapping a call to my TurboAssembler::JumpIfHeapNumberNotSmi method in codegen (that I wrote for the use of the baseline-compiler)?

Also, I was wondering, would you recommend reading any particular docs to understand Turbofan's design (I found the docs on v8.dev/docs were more about its benefits, etc rather than the architecture), since I was having trouble figuring out how the pieces worked together (and thereby where I would put methods).

Thank you so much!

Leszek Swirski

unread,
Jul 22, 2021, 3:20:20 AM7/22/21
to mihirsh...@gmail.com, Michael Stanton, Georg Neis
bcc v8-dev, and cc mvstanton@ and neis@

Mike, Georg, can you help Mihir with this? The TL;DR is that he wants to make the SwitchOnSmi bytecode take arbitrary values, not just Smis, where the switch wants to have === behaviour for cases, i.e. fallthrough on non-numbers, and convert HeapNumbers to int32 where possible (i.e. convert float64 to int32, convert int32 back to float64, compare the two float64s, return the int32 if they compare equal). He has this behaviour implemented in the bytecode handler, and wants to port it to the bytecode-graph-builder; I think we might need a new OpCode.

- Leszek

Mihir Shah

unread,
Aug 5, 2021, 1:49:52 AM8/5/21
to v8-dev
Hello,

I followed up with the cc'ed recipients from last message, but I think it may be going to their spam, since I'm not sending from a Google/Chromium address.

I was thinking, instead of using a new opcode (I'm not sure, but I think this could require making a lot of changes all across the codebase, since so many optimizations switch on the opcode type, etc.), could we implement everything in bytecode-graph-builder.cc?

Attached is the sea-of-nodes graph I was visualizing for the method:

So far, I wasn't sure mainly,
1. How do merges and branches work in the code? Because in a lot of the code I see branches as

NewBranch(...)
{
   SubEnvironment...
   BuildJumpIfFalse...
   ...
   MergeIntoSuccessorOffset(int offset) or MergeLeaveFunction
}
BuildJumpIfTrue...

But since the merging I want is between two nodes, neither of these kind of merges I think would apply. I was thinking of using MergeControl, but this doesn't seem to jive well with the scoping here. I was also wondering, would I have to do anything with the environment() as well?

Finally, I was wondering if there is more documentation on the bytecode-graph-builder.cc code, at this more granular level of these methods, outside of v8.dev/docs/turbofan (so I can understand it better?) 

Thank you so much!

Leszek Swirski

unread,
Aug 5, 2021, 5:02:52 AM8/5/21
to v8-...@googlegroups.com
This diagram definitely looks like a plausible solution, if those lower-level F2I/I2F nodes are available to you at that point (I think they might be?) -- TF has some amount of restrictions on what "level" of node is available at what stage of the pipeline.

As for documentation, we're not in a fantastic state. You might get something out of the turbofan handbook, and I have an old presentation on how graph building works in general, but sadly can't find a recording of it. The general idea is that it's roughly a symbolic execution of the bytecode, with the "environment" representing the current state of the execution (including current control nodes, current register values, etc.). If you want to branch the execution, you have to copy the current environment, if you want to switch to another execution branch you have to set the current environment, and if you want to merge two branches you have to merge their environments. "SubEnvironment" is a helper for copying and setting the current environment, but it's not strictly necessary to use it.

For your case, with two branches, you could have something like

Environment* after;
{
  SubEnvironment sub(this);
  NewIfTrue();
  ...
  
after = environment()
  set_environment(nullptr);
}

{
  SubEnvironment sub(this);
  NewIfFalse();
  ...
  
after = after->Merge(environment(), /* probably need the current bytecode's liveness here, see other uses of Merge */)
  set_environment(nullptr);
}
set_environment(after);


--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.

Mihir Shah

unread,
Aug 10, 2021, 1:28:36 AM8/10/21
to v8-dev
Hello,

Thank you for all these links, they were very useful in understanding Turbofan more.
Upon trying stuff more, I think that maybe putting everything (even low-level logic) in bytecode-graph-builder.cc like I was thinking earlier would be starkly different than how the rest of the Visit..Bytecode methods are, and additionally be painful to implement.

Alternatively, I was thinking of a new simplified/VM-level opcode, TryTaggedToWord32, which returns sort of an Optional<Word32>, so in Turbofan it would be a tuple (bool,word32) returned by the opcode, which I could then extract out using projections for the core code in bytecode-graph-builder.cc, I was wondering do you think this would be an ok design?

The main part I was having trouble with?
1. How to construct an opcode that returns such a tuple? (I was looking through the codebase, and the only such examples were in js-operator (too high level))? 
2. Or maybe should my opcode should really be a machine-op level opcode, similar to TryTruncateFloat64...?

Also, I was wondering should I create the PR for this, just so you can view my code so far (or should I hold off until I have a working version?)

Thank you so much!

Leszek Swirski

unread,
Aug 10, 2021, 5:07:38 AM8/10/21
to mihirsh...@gmail.com
v8-dev to bcc again, please feel free to just email me directly about this rather than messaging the entire mailing list.

A new simplified opcode sounds reasonable I think, I'm reasonably sure that creating a node with two outputs is not much more complex than specifying in the opcode constructor that it has two value outputs (the fourth of that list of six numbers in opcode constructors). The WithOverflow opcodes (e.g. Int32AddWithOverflow) are the canonical examples here. You'll then need to add a lowering from that simplified opcode to machine nodes.

Definitely upload a PR if you have one, it's always easier to talk about existing code than hypothetical code.

Reply all
Reply to author
Forward
0 new messages