AST -> bytecode -> opcode question

95 views
Skip to first unread message

yoshi chen

unread,
Feb 12, 2020, 12:56:13 AM2/12/20
to v8-dev
Hi,

I am trying to understand how AST is translated into TurboFan bytecode graph, then update the graph for JIT optimization.

So far from googling, online blogs, and stepping through the v8 code, I have gathered that:
Compiler -> AST -> GnerateUnoptimizedCode -> unoptimied BytecodeGraph -> turbofan optimization stages when code is hot

Most of the info are from this doc:

But I can't find any documentation at a high level, how an AST node maps to a JSOpcode.
The test cases gives some insights, but an overall documentation would be great.
I have also been playing with d8 with --trace-turbo flag.

For example, just doing
console.log('stuff')

will create the following opcodes:
JSLoadGlobal (for 'console')
JSLoadNamed (for 'log')
  -> JSLoadNamed creates FrameState, and PeeHole constant  in case the 'log' property doesnt exist

HeapConstant (for 'stuff')

The process of stepping through code, looking at tests, playing with d8 has been illuminating but very slow and tedious.
Does anybody know if there are any docs around this area of v8?
I am assuming anybody that want to become a new v8 contributor would need to go through this process and might have built some docs already.

Thanks in advance,
Yoshi

Santiago Aboy Solanes

unread,
Feb 12, 2020, 5:22:06 AM2/12/20
to v8-...@googlegroups.com
Hi Yoshi,

With Dominik we presented "Life of a <script>" which is a high level overview of V8's pipeline. It should shed some light on how V8 works.

As a super quick drawing, the V8's phases are:
Loading -> Parsing (generate the AST) -> Interpreting (Ignition) -> Execution -> Optimization (TurboFan, if necessary)
                                                                              ^                                                             |
                                                                               \______________________________/
                                                                                          deoptimization (if necessary)

Also, we have a flag you can use if you want to take a look at the generated bytecode, --print-bytecode.

Both --print-bytecode and --trace-turbo can be used with an extra flag that acts as a filter (i.e --print-bytecode--filter=function_name , --trace-turbo-filter=function_name).
It is useful to use the filters so you just get the information for the functions that you want to analyse.

Cheers,
Santiago

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/91385b8f-a198-4dd1-8c51-585309c26e3f%40googlegroups.com.

yoshi chen

unread,
Feb 13, 2020, 12:24:59 PM2/13/20
to v8-dev
Thanks for the vid link Santiago. It definitely clear things up a bit.

I am still trying to figure out if there is 1 to 1 mapping between AST node -> bytecode -> JSOpcode

For example, there are a lot of bytecode node such as StackCheck, which I think is for stack overflow check.
Then in TurboFan bytecode graph I see FrameState/StateValue, and those are probably for accessing properties in closure?

I dont think a StackCheck bytecode maps to an AST node, and is inserted when BytecodeGenerator walks a FunctionStatement AST node, because v8 needs to check for stackoverflow for every function execution.

Is there a design doc that says FunctionStatement AST node -> StackCheck bytecode, {other bytecodes in the function body statements...}?

Thanks in advance,
Yoshi
To unsubscribe from this group and stop receiving emails from it, send an email to v8-...@googlegroups.com.

Santiago Aboy Solanes

unread,
Feb 14, 2020, 5:32:13 AM2/14/20
to v8-...@googlegroups.com
There is not a 1 to 1 to 1 mapping of AST node -> bytecode -> JSOpcode.

StackChecks are actually not stack overflow checks. They are a way we have to interrupt running code. They are added in two places: on function entry, and in loop iterations.
Coincidentally, I am working on making them implicit (as in, not having an explicit Bytecode for them). You can learn more about StackChecks and how we are planning on making them implicit on this design doc.

You can learn more about bytecode generation in the Ignition design doc. It is from 2016 but Ignition it's still what we use today.

If you want to take a look at what TurboFan nodes we create from Bytecodes, you can check out BytecodeGraphBuilder.
Also, you can use Turbolizer to visualize optimized code along the various phases of Turbofan's optimization pipeline.

Hope this helps!

To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/fe63e255-ee49-4b6c-8811-28b4eb774c55%40googlegroups.com.

Ross McIlroy

unread,
Feb 14, 2020, 6:33:54 AM2/14/20
to v8-dev
> I am still trying to figure out if there is 1 to 1 mapping between AST node -> bytecode -> JSOpcode

To answer this question directly, there isn't a 1:1 mapping between AST nodes and bytecode - we generate multiple bytecodes depending on what the AST node represents, e.g., BytecodeGenerator::VisitPropertyLoad shows the possible bytecodes we might generate for the Property AST node. The bytecode-generator.cc file defines this mapping of AST to bytecode.

For Bytecode -> JSOpcode, this is intentionally more of a 1:1 mapping. It's still not completely 1:1, but in most cases is close - e.g: BytecodeGraphBuilder::VisitLdaNamedProperty takes a LdaNamedProperty bytecode and turns it into a JSLoadNamed JSOpcode (it also tries to lower directly to simplified IR, but that is an optimization). These mappings are defined in bytecode-graph-builder.cc.

Cheers,
Ross


yoshi chen

unread,
Feb 16, 2020, 11:46:50 PM2/16/20
to v8-dev
Thank you Santiago and Ross for all the guidance and patience.

I have some confusion around LdaHole, and its usage.
I enable --print-ast as well in D8 trying to understand the mapping between AstNode and output bytecodes.

ORIGINAL JS CODE
d8> const x = 10; function f1() {return x;}


AST, V8 generates variable proxy for hoisting, then scope analysis will bind proxy to variable declaration
--- AST ---
FUNC at 0
. KIND 0
. SUSPEND COUNT 0
. NAME ""
. INFERRED NAME ""
. DECLS
. . VARIABLE (0x7f94e001ce28) (mode = CONST) "x"
. . FUNCTION "f1" = function f1
. BLOCK NOCOMPLETIONS at -1
. . EXPRESSION STATEMENT at 10
. . . INIT at 10
. . . . VAR PROXY context[4] (0x7f94e001ce28) (mode = CONST) "x"
. . . . LITERAL 10



This is what i see in the code for LdaHole, which telss me LdaTheHole is for variable declaration, and maybe for variable proxy
case VariableLocation::LOCAL:
     if (variable->binding_needs_init()) {
       Register destination(builder()->Local(variable->index()));
       builder()->LoadTheHole().StoreAccumulatorInRegister(destination);
     }
     break;



Here is the generated bytecode from D8, please see my question in comments below:

/** FUNCTION PROLOGUE START **/
         0xccc69fa90ca @    0 : 12 00             LdaConstant [0]
         0xccc69fa90cc @    2 : 26 f9             Star r2
         0xccc69fa90ce @    4 : 27 fe fa          Mov <closure>, r1
         0xccc69fa90d1 @    7 : 5e 7f 01 fa 02    CallRuntime [NewScriptContext], r1-r2
         0xccc69fa90d6 @   12 : 16 fa             PushContext r1
/** FUNCTION PROLUGE END **/


/** VARIABLE PROXY in ContextSlot 4? **/
         0xccc69fa90d8 @   14 : 0f                LdaTheHole
         0xccc69fa90d9 @   15 : 1d 04             StaCurrentContextSlot [4]

/** NO CreateClosure bytecode? but Mov <closure>? **/
         0xccc69fa90db @   17 : 12 01             LdaConstant [1] // accumulator = SCRIPT_SCOPE
         0xccc69fa90dd @   19 : 26 f9             Star r2 // r2 = SCRIPT_SCOPE
         0xccc69fa90df @   21 : 0b                LdaZero
         0xccc69fa90e0 @   22 : 26 f8             Star r3 // r3 = 0
         0xccc69fa90e2 @   24 : 27 fe f7          Mov <closure>, r4 // r4 = <closure>
         0xccc69fa90e5 @   27 : 5e 76 01 f9 03    CallRuntime [DeclareGlobals], r2-r4


/** entering function, because there's StackCheck bytecode **/
    0 E> 0xccc69fa90ea @   32 : a0                StackCheck

/** store small integer 10 into variable proxy in ContextSlot 4? **/
   10 S> 0xccc69fa90eb @   33 : 0c 0a             LdaSmi [10]
   10 E> 0xccc69fa90ed @   35 : 1d 04             StaCurrentContextSlot [4]

         0xccc69fa90ef @   37 : 0d                LdaUndefined
   38 S> 0xccc69fa90f0 @   38 : a4                Return
Constant pool (size = 2)
0xccc69fa9059: [FixedArray] in OldSpace
 - map: 0xccc20b82361 <Map>
 - length: 2
           0: 0xccc69fa8f39 <ScopeInfo SCRIPT_SCOPE [12]>
           1: 0xccc69fa8fa9 <FixedArray[4]>


--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-...@googlegroups.com.

Leszek Swirski

unread,
Feb 17, 2020, 5:01:19 AM2/17/20
to v8-dev
On Mon, Feb 17, 2020 at 5:46 AM yoshi chen <yoshi...@gmail.com> wrote:
I have some confusion around LdaHole, and its usage.

Holes are used, in this example, to initialise a lexical variable (i.e. let or const) during its "temporal dead zone", a.k.a "TDZ", which is the time between function entry and the actual variable being initialised. It's there so that we can throw errors for code such as "print(x); let x = 0;" or even "let x = x + 1;", where a lexical variable is accessed before being initialised. Note that this doesn't happen to "var" variables, because they are "hoisted" to the top of the function and initialised with undefined.

Here is the generated bytecode from D8, please see my question in comments below:

/** FUNCTION PROLOGUE START **/
         0xccc69fa90ca @    0 : 12 00             LdaConstant [0]
         0xccc69fa90cc @    2 : 26 f9             Star r2
         0xccc69fa90ce @    4 : 27 fe fa          Mov <closure>, r1
         0xccc69fa90d1 @    7 : 5e 7f 01 fa 02    CallRuntime [NewScriptContext], r1-r2
         0xccc69fa90d6 @   12 : 16 fa             PushContext r1
/** FUNCTION PROLUGE END **/


/** VARIABLE PROXY in ContextSlot 4? **/
         0xccc69fa90d8 @   14 : 0f                LdaTheHole
         0xccc69fa90d9 @   15 : 1d 04             StaCurrentContextSlot [4]

This is initializing the const. Top-level lexical variables are allocated in the script context (so that they are visible across scripts, via the "script context table"), which is why this is a context slot access.
 

/** NO CreateClosure bytecode? but Mov <closure>? **/
         0xccc69fa90db @   17 : 12 01             LdaConstant [1] // accumulator = SCRIPT_SCOPE
         0xccc69fa90dd @   19 : 26 f9             Star r2 // r2 = SCRIPT_SCOPE
         0xccc69fa90df @   21 : 0b                LdaZero
         0xccc69fa90e0 @   22 : 26 f8             Star r3 // r3 = 0
         0xccc69fa90e2 @   24 : 27 fe f7          Mov <closure>, r4 // r4 = <closure>
         0xccc69fa90e5 @   27 : 5e 76 01 f9 03    CallRuntime [DeclareGlobals], r2-r4


This function is the top-level script function, which means functions you define in it are globals. This part of the code is setting up those globals, by passing through the FixedArray specifying top-level declarations (element 1 in the constant pool, notably *not* the script scope as you thought here) and the *current* closure (i.e. the top level script) into the DeclareGlobals runtime function. Inside of this runtime function, we'll create the closure for f1.
 

/** entering function, because there's StackCheck bytecode **/
    0 E> 0xccc69fa90ea @   32 : a0                StackCheck

Entering the top-level script function, specifically, not f1 (note -- these StackChecks have gone away now IIRC).
 

/** store small integer 10 into variable proxy in ContextSlot 4? **/
   10 S> 0xccc69fa90eb @   33 : 0c 0a             LdaSmi [10]
   10 E> 0xccc69fa90ed @   35 : 1d 04             StaCurrentContextSlot [4]

Yup, this is the "x = 10" part of "const x = 10".
Reply all
Reply to author
Forward
0 new messages