| The most probable cause of these kinds of problems are mistakes in the grammar itself where the parsing logic assigns a location (position and length) to produced ast nodes using a call to loc. For example like this lines 229 to 230
bracketed_expression |
: expression LBRACK access_args endcomma RBRACK =LBRACK { result = val[0].access(val[2]); loc result, val[0], val[4] }
|
The rules in the grammar should allow location information to bubble upwards, and when doing so, it must return a result with a location based on the correct reduced tokens. The grammar source I references above for expr[expr] the correct tokens (val[0] and val[4]) are used. For the selector expression however found [here|http://example.com] it seems to be wrong as it accesses val[0] and val[1], which means that the ast node for a selector entry will stop at the =>. Here is the source:
selector_entry |
: expression FARROW expression { result = Factory.MAP(val[0], val[2]) ; loc result, val[1] }
|
Unfortunately location information is somewhat different throughout the grammar, for some an astnode will have the position of the operator and the operators length, and others include the location of length of the entire expression. That is naturally where the method extract_tree_text comes in as it should find the earliest source position and longest length starting from the ast node it operates on. Since the off by one error is not visible for every node, it most likely not the calculation itself that is wrong, but the information recorded by the parser. The problem could also be in the factory methods that produce the ast nodes. Fo anyone tackling this, there is an overview of how this works in Puppet Internals - Puppet LanguageDevelopment PS, it would have been handy if the puppet parser dump command had an option to output the location information as it would then be trivial to manually verify the location and lengths produced by the parser, now coding is unfortunately required. |