V8 Turbofan Optimization: Modulo 2 vs. Bitwise AND

65 views
Skip to first unread message

Sỹ Trần Dũng

unread,
Mar 11, 2025, 5:20:13 AM3/11/25
to v8-dev

I have a question regarding V8's compiler optimization, specifically concerning the modulo 2 operation. In compilers like GCC and Clang, it's common to see the operation n % 2 optimized to a bitwise AND (n & 1) or a bit check instruction, as these are generally more efficient.

I've been examining the bytecode generated by V8, and I've observed that a modulo instruction is used for n % 2. 

[generated bytecode for function: my_mod (0x3de244c5b401 <SharedFunctionInfo my_mod>)]
Bytecode length: 17
Parameter count 2
Register count 1
Frame size 8
   23 S> 0x32c69b60dc80 @    0 : 0b 03             Ldar a0
   29 E> 0x32c69b60dc82 @    2 : 4b 02 00          ModSmi [2], [0]
         0x32c69b60dc85 @    5 : c9                Star0
         0x32c69b60dc86 @    6 : 0d 01             LdaSmi [1]
   33 E> 0x32c69b60dc88 @    8 : 6f f9 01          TestEqual r0, [1]
         0x32c69b60dc8b @   11 : 9e 04             JumpIfFalse [4] (0x32c69b60dc8f @ 15)
   45 S> 0x32c69b60dc8d @   13 : 11                LdaTrue
   57 S> 0x32c69b60dc8e @   14 : ae                Return
   64 S> 0x32c69b60dc8f @   15 : 12                LdaFalse
   77 S> 0x32c69b60dc90 @   16 : ae                Return

I'm curious if this behavior changes when the code is "heated" and optimized by Turbofan.

Could someone please tell whether Turbofan performs this particular optimization?

Thank you for your time and expertise.

Jakob Kummerow

unread,
Mar 11, 2025, 6:35:11 AM3/11/25
to v8-...@googlegroups.com
Why don't you test it and find out yourself?


--

Sỹ Trần Dũng

unread,
Mar 11, 2025, 9:35:29 AM3/11/25
to v8-dev
I tried d8 with --allow-natives-syntax --turbofan --print-opt-code flags and following code but don't get any output.

function my_mod(n) {
  if (n  % 2 == 1)
    return true;
  return false;
}

my_mod(2);
my_mod(1);
my_mod(3);

%OptimizeFunctionOnNextCall(my_mod);

my_mod(100)


I'm not sure which flags to use here. 

Jakob Kummerow

unread,
Mar 11, 2025, 9:42:08 AM3/11/25
to v8-...@googlegroups.com
You need either %PrepareFunctionForOptimization(my_mod); before you start collecting unoptimized feedback (i.e. before the my_mod(2) call), or more unoptimized calls until feedback collection kicks in on its own. And of course you need a build that has disassembler support enabled.

--- Raw source ---
(n) {
 if (n  % 2 == 1)
   return true;
 return false;
}


--- Optimized code ---
optimization_id = 0
source_position = 340
kind = TURBOFAN_JS
name = my_mod
compiler = turbofan
address = 0x176e001401a1

Instructions (size = 176)
0x55a9e0000040     0  55                   push rbp
0x55a9e0000041     1  4889e5               REX.W movq rbp,rsp
0x55a9e0000044     4  56                   push rsi
0x55a9e0000045     5  57                   push rdi
0x55a9e0000046     6  50                   push rax
0x55a9e0000047     7  4883ec08             REX.W subq rsp,0x8
0x55a9e000004b     b  488975e0             REX.W movq [rbp-0x20],rsi
0x55a9e000004f     f  493b65a0             REX.W cmpq rsp,[r13-0x60] (external value (StackGuard::address_of_jslimit()))
0x55a9e0000053    13  0f865e000000         jna 0x55a9e00000b7  <+0x77>
0x55a9e0000059    19  488b5518             REX.W movq rdx,[rbp+0x18]
0x55a9e000005d    1d  f6c201               testb rdx,0x1
0x55a9e0000060    20  0f857b000000         jnz 0x55a9e00000e1  <+0xa1>
0x55a9e0000066    26  488bca               REX.W movq rcx,rdx
0x55a9e0000069    29  d1f9                 sarl rcx, 1
0x55a9e000006b    2b  85d2                 testl rdx,rdx
0x55a9e000006d    2d  0f8c08000000         jl 0x55a9e000007b  <+0x3b>
0x55a9e0000073    33  83e101               andl rcx,0x1
0x55a9e0000076    36  e90f000000           jmp 0x55a9e000008a  <+0x4a>
0x55a9e000007b    3b  f7d9                 negl rcx
0x55a9e000007d    3d  83e101               andl rcx,0x1
0x55a9e0000080    40  85c9                 testl rcx,rcx
0x55a9e0000082    42  0f845d000000         jz 0x55a9e00000e5  <+0xa5>
0x55a9e0000088    48  f7d9                 negl rcx
0x55a9e000008a    4a  83f901               cmpl rcx,0x1
0x55a9e000008d    4d  0f841e000000         jz 0x55a9e00000b1  <+0x71>
0x55a9e0000093    53  498d4655             REX.W leaq rax,[r14+0x55]
0x55a9e0000097    57  488b4de8             REX.W movq rcx,[rbp-0x18]
0x55a9e000009b    5b  488be5               REX.W movq rsp,rbp
0x55a9e000009e    5e  5d                   pop rbp
0x55a9e000009f    5f  4883f902             REX.W cmpq rcx,0x2
0x55a9e00000a3    63  7f03                 jg 0x55a9e00000a8  <+0x68>
0x55a9e00000a5    65  c21000               ret 0x10
0x55a9e00000a8    68  415a                 pop r10
0x55a9e00000aa    6a  488d24cc             REX.W leaq rsp,[rsp+rcx*8]
0x55a9e00000ae    6e  4152                 push r10
0x55a9e00000b0    70  c3                   retl
0x55a9e00000b1    71  498d4671             REX.W leaq rax,[r14+0x71]
0x55a9e00000b5    75  ebe0                 jmp 0x55a9e0000097  <+0x57>
0x55a9e00000b7    77  ba40000000           movl rdx,0x40
0x55a9e00000bc    7c  52                   push rdx
0x55a9e00000bd    7d  48bb00405fc7a9550000 REX.W movq rbx,0x55a9c75f4000
0x55a9e00000c7    87  b801000000           movl rax,0x1
0x55a9e00000cc    8c  48bee51a1800f57d0000 REX.W movq rsi,0x7df500181ae5    ;; object: 0x7df500181ae5 <NativeContext[302]>
0x55a9e00000d6    96  e825a246e8           call 0x55a9c846a300  (CEntry_Return1_ArgvOnStack_NoBuiltinExit)    ;; near builtin entry
0x55a9e00000db    9b  e979ffffff           jmp 0x55a9e0000059  <+0x19>
0x55a9e00000e0    a0  90                   nop
0x55a9e00000e1    a1  41ff55d8             call [r13-0x28]
0x55a9e00000e5    a5  41ff55d8             call [r13-0x28]
0x55a9e00000e9    a9  41ff55e0             call [r13-0x20]
0x55a9e00000ed    ad  0f1f00               nop

Inlined functions (count = 0)

Deoptimization Input Data (deopt points = 3)
index  bytecode-offset    pc
    0                2    NA
    1                2    NA
    2               -1    9b

Safepoints (stack slots = 6, entries = 1, byte size = 16)
0x55a9e00000db     9b  slots (sp->fp): 100000  deopt      2 trampoline:     a9

RelocInfo (size = 5)
0x55a9e00000ce  full embedded object  (0x7df500181ae5 <NativeContext[12e]>)
0x55a9e00000d7  near builtin entry

--- End code ---

Reply all
Reply to author
Forward
0 new messages