grab source text in parser

42 views
Skip to first unread message

Gus Caplan

unread,
Oct 4, 2020, 12:53:37 PM10/4/20
to v8-dev

Hey folks, I'm trying to figure out how to capture a range of source text in the parser. Ideally the container of the source text is something that can easily be turned into a heap string.

int start = peek_position();
ParseAssignmentExpression();
int end = end_position();
auto source = somehow_get_source_text(start, end);

Any advice is appreciated.

Leszek Swirski

unread,
Oct 5, 2020, 3:38:04 AM10/5/20
to v8-dev
We don't have an API like that in the scanner/parser, for two reasons:

a) Chunking -- data comes to the scanner in chunks (e.g. for streaming), and we do very little rewinding when parsing, so we want to be able to discard chunks once they're no longer needed.
b) Escaping -- you can have unicode escapes in strings and even in identifiers, and those should be treated as the equivalent unicode character.

So, for things like strings and identifiers, we actually copy the source data into a local buffer while scanning, to be able to turn them into heap strings after. It looks like you want to have a substring representation of the source instead? Would it make sense for you to store the start and end positions, and use those to get a substring of the script's source string?

--
--
v8-dev mailing list
v8-...@googlegroups.com
http://groups.google.com/group/v8-dev
---
You received this message because you are subscribed to the Google Groups "v8-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to v8-dev+un...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/v8-dev/f6e9e16a-0335-4005-912c-069009492a0an%40googlegroups.com.

Gus Caplan

unread,
Oct 5, 2020, 11:26:55 AM10/5/20
to v8-dev
My ultimate goal is to store the substring in the constant pool. If I just store the start/end positions, am I able to get the "current" source text when the constant pool is being filled in the bytecode gen?

Leszek Swirski

unread,
Oct 5, 2020, 11:34:08 AM10/5/20
to v8-dev
Our current streaming API + off-thread finalization makes that a bit tricky, we don't necessarily have the source text when the constant pool is being filled in. Could you store the positions as raw values in the constant pool (or as a tuple or whatever) and generate the substring at runtime? Do you have more details on the underlying problem you're solving?

Gus Caplan

unread,
Oct 5, 2020, 11:44:22 AM10/5/20
to v8-dev
Sure, I'm experimenting with https://github.com/tc39/proposal-standardized-debug/ (just for my own amusement, not planning to try to merge this into v8 anytime soon). I'd like to pass the source text of the argument to debugger.log/break without a lot of overhead (e.g. taking the substring of the stack top's script's source text each time), so only generating it once and using the same string from then on seemed like a good path.
Reply all
Reply to author
Forward
0 new messages