Issue 3016 in v8: Rewrite the V8 parser

codesite...@google.com

unread,

Nov 18, 2013, 9:45:16 AM11/18/13

to v8-...@googlegroups.com

Status: Accepted
Owner: ----
CC: joc...@chromium.org, dcar...@chromium.org, ul...@chromium.org,
mstar...@chromium.org, da...@chromium.org, ad...@chromium.org,
ma...@chromium.org, to...@chromium.org
Labels: Type-Bug Priority-Medium

New issue 3016 by ma...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

What we want from the new parser:
- needs to be able to parse utf8
- needs to be able to start parsing data before everything has arrived from
the net

The work is ongoing in this branch:
https://code.google.com/p/v8/source/browse#svn%2Fbranches%2Fexperimental%2Fparser

in src/lexer and tools/lexer_generator.

Happened so far:
- We've experimented with generating a lexer with re2c and now we're
experimenting with writing our own regex -> lexer engine. (dcarney, ulan,
marja)
- API changes are under discussion (mstarzinger et al)
- moving parts which need to access the Isolate as late as possible in the
parsing (so that the parsing can be done in a non-main thread) (mstarzinger)

--
You received this message because this project is configured to send all
issue notifications to this address.
You may adjust your notification preferences at:
https://code.google.com/hosting/settings

codesite...@google.com

unread,

Nov 18, 2013, 10:11:14 AM11/18/13

to v8-...@googlegroups.com

Updates:
Owner: mstar...@chromium.org

Comment #1 on issue 3016 by ma...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

(No comment was entered for this change.)

codesite...@google.com

unread,

Nov 18, 2013, 10:51:28 AM11/18/13

to v8-...@googlegroups.com

Comment #2 on issue 3016 by to...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

Streaming parsing will be awesome! Really excited to see this happening.

Sorry to potentially derail the bug, but I have a few naive questions while
we are considering a major re-write:

- There is a lot of dead JS on the web. For example, a page imports all of
jquery, but only uses one method, $(...). Are lazy parsing designs
feasible? Are they being considered? If so, we may want to think about
teaching the parser whether it is on the critical path currently. If we are
blocked on the network, be eager. If we have the whole script and are
blocking compilation, be lazy.

- Are there more valuable artifacts that we could cache post-parsing than
the current preparsing data? That didn't prove to save a very significant
amount of parsing effort. But we still have all the memory/disk cache
plumbing in Chrome/Blink if V8 can give us something more valuable to cache.

- Rather, could the full parse output be made cacheable? Then we could
consider pre-warming same-domain renderers with cached parse output. This
would significantly help with common cases like "Google Search Results
--(click)-> Search Result on a different domain --(back)--> Google Search
Results again in a cold renderer that needs to reparse/recompile
everything."

- Can parsing be made thread safe so that we can consider offloading it
from the main thread? I believe IE does this.

ps. The above questions would also apply to compilation, but I expect the
answers are less rosy than what I'm hoping for for parsing.

codesite...@google.com

unread,

Nov 18, 2013, 10:59:41 AM11/18/13

to v8-...@googlegroups.com

Comment #3 on issue 3016 by yan...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

As for the compiler, we do lazy compile where-ever we can. The parser only
creates a shared function info with the source offsets so that when we
actually run the function, it's compiled from the function definition at
the source offset.

The unoptimized code is not produced in a separate thread, and the gains
would be rather small since it works quite fast. Since it's done lazily, it
will block execution either way. The optimized code is already being
compiled in a second thread in the most cases (excluding OSR), only code
generation happens on the main thread.

We do have a script to code cache. I'm not 100% sure about this, but at
least top-level javascript code is cached, so as long as the embedder
(chrome) tells us to create code from the same (external) string, we should
be just fetching it from the cache. I don't remember the details about that
anymore though, but it probably covers part of your Google Search example.

We could cache the abstract syntax tree, but given that we might also not
use the AST format in the optimzing compiler altogether in the long-term
future, I'm not sure work in this area is a good investment.

codesite...@google.com

unread,

Nov 18, 2013, 11:08:15 AM11/18/13

to v8-...@googlegroups.com

Comment #4 on issue 3016 by to...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

> but it probably covers part of your Google Search example.

It does not. The problem is that it involves cross origin navigations which
entail new renderers with completely cold caches (including all V8 caches).
I think we should come up with a scheme to allow some of this stuff to
persist across cross-origin navigations so renderers are cold less often.
Something like:
https://groups.google.com/a/chromium.org/d/msg/blink-dev/QP9bEjbXqgs/r9NBPSfjFJMJ

I'm not sure yet how that proposal will pan out, but I'd just like to
consider in the API that Chrome or Blink may be in the best position to let
caches survive longer as well as to appropriately manage memory.

> but given that we might also not use the AST format in the optimzing
> compiler altogether in the long-term future

Could we just version whatever opaque data we tell the embedder to cache?
Then if V8 changes, it knows not to use it?

codesite...@google.com

unread,

Nov 18, 2013, 11:17:59 AM11/18/13

to v8-...@googlegroups.com

Comment #5 on issue 3016 by sven...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

I don't know what is meant by "lazy parsing designs" (jquery example
above). The problem in a sloppy language like JavaScript is that you have
to see all stuff before something used because one can assign to e.g. a
name initially bound to a function:

function foo() { return 42; }
... [200kB JavaScript code] ...
foo = 3.1415;

Perhaps I misunderstood your point, though.

Furthermore, I am not really convinced that deserializing some form of an
AST from disk/cache is faster than just parsing the initial string again.
To be more exact: I highly doubt that. :-)

codesite...@google.com

unread,

Nov 18, 2013, 1:39:56 PM11/18/13

to v8-...@googlegroups.com

Comment #6 on issue 3016 by to...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

#c5: Please forgive my ignorance as I'm not sure exactly where parsing
stops and compilation begins. But my thought was that in your example, only
the "return 42" could be lazy.

Regarding loading serialized version vs actual parsing: you may be right.
It would be interesting to verify with data though.

codesite...@google.com

unread,

Nov 19, 2013, 4:04:44 AM11/19/13

to v8-...@googlegroups.com

Updates:
Cc: rossb...@chromium.org

Comment #7 on issue 3016 by sven...@chromium.org: Rewrite the V8 parser
http://code.google.com/p/v8/issues/detail?id=3016

IIRC there are things in ES6 (and some even in ES5) which make all this
discussion around lazy parsing moot, adding Andreas, he knows more (I
hope :-)...

Reply all

Reply to author

Forward