Working with Tokens emitted by the official Dart Scanner.

28 views
Skip to first unread message

valauska...@gmail.com

unread,
Dec 21, 2022, 6:41:26 PM12/21/22
to Dart Analyzer Discussion
I'm working on an autogenerated lexer for Dart. I'd like to confirm that its output matches the one in _fe_analyzer_shared.

I'm trying to map the tokens that the scanner in _fe_analyzer_shared emits to my tokens. However, there are some things that I did not expect and don't know how to deal with. I wanted to ask for help with how to handle the following:

* Is there a way to extract the nested structure of a Token with .type == TokenType.MULTI_LINE_COMMENT? Or does the scanner not expose this information? (By nested structure I'm referring to the nestedness of multi line comments i.e. the fact that '/* /* */ */' are two multiline comments where one can be found inside of the other.)

* All Strings, complete ones such as '"foo"' and incomplete ones that are part of an interpolated string such as '"""foo' are emitted under .type == TokenType.STRING. Is there a way to extract their type (i.e. whether they represent a single line single quote string, a single line double quote string and so on) or would I have to examine the lexeme to discover this information?

* What is the purpose of TokenType.BACKPING and TokenType.BACKSLASH? It looks to me like they are used for error recovery and will not appear in code that represents a valid Dart token stream. Is that correct?

* Keyword.PATCH and Keyword.SOURCE don't seem to be part of the official lexical grammar. Can I safely ignore them or is there some Dart code that I should expect to contain these tokens?

* It looks to me like whitespace information (e.g. spaces, tabs, newlines) is dropped and there's no way to extract some sort of "WhitespaceToken". Is that correct?

Brian Wilkerson

unread,
Dec 21, 2022, 8:13:30 PM12/21/22
to analyzer...@dartlang.org
* Is there a way to extract the nested structure of a Token with .type == TokenType.MULTI_LINE_COMMENT? Or does the scanner not expose this information?

The scanner doesn't expose that information. We don't currently have any tooling that I'm aware of that would make use of that information. Dartdoc (and the analysis server) do look at the content of comments for various reasons, but that's all done outside the scanner (and they both ignore the possibility of nested comments inside doc comments).


* All Strings, complete ones such as '"foo"' and incomplete ones that are part of an interpolated string such as '"""foo' are emitted under .type == TokenType.STRING. Is there a way to extract their type (i.e. whether they represent a single line single quote string, a single line double quote string and so on) or would I have to examine the lexeme to discover this information?

You'd have to look at the lexeme to discover that information. See `StringLexemeHelper`.


* What is the purpose of TokenType.BACKPING and TokenType.BACKSLASH? It looks to me like they are used for error recovery and will not appear in code that represents a valid Dart token stream. Is that correct?

To the best of my knowledge, that's correct. It's quite possible that they are left over from some abandoned experiment.

* Keyword.PATCH and Keyword.SOURCE don't seem to be part of the official lexical grammar. Can I safely ignore them or is there some Dart code that I should expect to contain these tokens?

To the best of my knowledge you can ignore them.

* It looks to me like whitespace information (e.g. spaces, tabs, newlines) is dropped and there's no way to extract some sort of "WhitespaceToken". Is that correct?

That's correct. The tokens do know their offset so it's easy enough to figure out the character range of whitespace should that ever be interesting, but the only use case I can think of in the current code base is in server when we need to replicate indentation, and that's rare enough that the memory pressure of all those extra tokens would outweigh the cost of computing the data on demand.

--
You received this message because you are subscribed to the Google Groups "Dart Analyzer Discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to analyzer-discu...@dartlang.org.
To view this discussion on the web visit https://groups.google.com/a/dartlang.org/d/msgid/analyzer-discuss/d6e8c27a-634f-4f6a-9c5b-ea4d00d7b948n%40dartlang.org.

valauska...@gmail.com

unread,
Dec 21, 2022, 8:48:27 PM12/21/22
to Dart Analyzer Discussion, brianwilkerson
Thank you Brian.
Reply all
Reply to author
Forward
0 new messages