Hi, I have question about identifying words in a language and whether they are valid identifiers. In this language an identifier cannot be a reserved word.
I attach a test script that includes the grammar definition.
The interesting parts, as far as this topic are concerned are:
class Word (Grammar):
grammar = (WORD("A-Za-z", "A-Za-z0-9_"))
class ReservedWord (Grammar):
grammar = (L("Form") | L("Data") | L("End"))
class Identifier (Grammar):
grammar = (EXCEPT(Word, ReservedWord))
class FormHeader (Grammar):
grammar = (L("Form"), Identifier)
class Form (Grammar):
grammar = (FormHeader, FormData, FormEnd)
Here's the test input:
Form End
Form Data
End Data
End Form
This input is incorrect and should not parse correctly. The reason is that after the word "Form" on the 1st line we should get an identifier. However, "End" is not a valid identifier because it is a reserved word.
When the test script is run with this input the parse error I see is: ParseError: [line 2, column 8] Expected 'Form': Found 'd\n Form Data\n'
I think this means that modgrammar matches the 'Form' (line 1) and matches the "En" of "End" as the identifier. It cannot match "End" as an identifier because it knows that "End" is a reserved word. However, it then backtracks and matches "En" instead, even though that is not a complete word.
Is that thinking correct?
My main question is: How can I resolve the problem? The Word grammar element should commit to the longest possible word it can see and should not ever consider leading sub-words. Is that possible?
Many thanks for your attention.
Leigh.