Understanding longest token matching in SLIF

16 views
Skip to first unread message

amon

unread,
Jul 21, 2013, 12:58:39 PM7/21/13
to marpa-...@googlegroups.com
Crosspost http://stackoverflow.com/q/17773976/1521179. I am trying to parse a file with Marpa with the difficulty that longest token matching of the SLIF seems to get in my way. My grammar contains:

record ::= ('!') key (':') value
key     ~  [\w]+
value   ~  [^\s]+

Now when I let it parse "! key : value", then everything works fine. But when the input is  "! key:value", the longest token after the "!" is the "key:value" of type "value", although it cannot occur at this position. I would be interested in learning
  1. … why the SLIF considers terminals that cannot be accepted at this position.
  2. … how I must change my program so that both the input with and without spaces parse the same. If possible, without writing my own scanner (this would defeat the whole point of the SLIF).
It would probably be best if the answer to the second question gets posted to stackoverflow, where more people can benefit. I suspect the answer will utilize events, but I cannot figure out how to do this myself.

Jeffrey Kegler

unread,
Jul 21, 2013, 1:05:29 PM7/21/13
to marpa-...@googlegroups.com
Answering just the first question: "Why the SLIF considers terminals that cannot be accepted at this position?"� Two reasons:

1.)� It's easier to implement.

2.)� It's is how lexers have worked in the past -- they hadn't a clue what the parser would do.� This meant the SLIF's working would be more obvious to programmers used to the state of the art in lexers.

I've considered making the SLIF smarter about this, but my guess that the majority of programmers seem to be used to lexers being dumb about what tokens are expected or not seems to have been correct.

-- jeffrey

amon wrote:
Crosspost�http://stackoverflow.com/q/17773976/1521179. I am trying to parse a file with Marpa with the difficulty that longest token matching of the SLIF seems to get in my way. My grammar contains:

record ::= ('!') key (':') value
key � � ~ �[\w]+
value � ~ �[^\s]+

Now when I let it parse "! key : value", then everything works fine. But when the input is �"! key:value", the longest token after the "!"�is the "key:value"�of type "value", although it cannot�occur at this position. I would be interested in learning
  1. � why the SLIF considers terminals that cannot be accepted at this position.
  2. � how I must change my program so that both the input with and without spaces parse the same. If possible, without writing my own scanner (this would defeat the whole point of the SLIF).
It would probably be best if the answer to the second question gets posted to stackoverflow, where more people can benefit. I suspect the answer will utilize events, but I cannot figure out how to do this myself.
--
You received this message because you are subscribed to the Google Groups "marpa parser" group.
To unsubscribe from this group and stop receiving emails from it, send an email to marpa-parser...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
�
�

Jeffrey Kegler

unread,
Jul 21, 2013, 1:13:26 PM7/21/13
to marpa-...@googlegroups.com
Oh, yes, I remembered the 3rd reason, an important one.� If, on a failed longest token, I try shorter tokens, the parse can go quadratic.� It's easy to write the lexer this way by accident.� As the SLIF is designed, it is hard to, by accident, write a grammar that does not parse in linear time.� This is a perhaps under-appreciated feature.� -- jeffrey
Reply all
Reply to author
Forward
0 new messages