Re: [javacc/javacc] minimumSize=1 generated for choice that contains child with zero size (#86)

19 views
Skip to first unread message

JavaCC Support

unread,
May 31, 2019, 12:05:59 AM5/31/19
to reply+AE2B3LPC3QL42KUJVR...@reply.github.com, javacc...@googlegroups.com

Adding javacc users  so more people can chime in.

> Like I said, it's not incorrect. There are some really subtle rules.
> Let me read carefully the lookahead mini tutorial and get back to you.
> Remember the 2 rules - 1. LOOKEHAD is the max we want it to use so
> it's free to use anything upto that and 2.  javacc does not do
> lookahead during lookahead. As for your examples, the number of choice
> points are different so you can't  expect the same behaviour. Only
> thing I would say if a rule works without lookahead, then should also
> work with lookahead.
>
> So I'm still not convinced there is a bug. You can always reorder your
> rules to put <BAR><BAR> first - that's standard practice in javacc. It
> simply takes the first choice at a choice point that's feasible due to
> implicit/explicit lookahead. It does not try all possibilites.
>
> In fact, I say keep the bug open as low priority and use the
> workaround of reordering the rules.
>
>>     An empty action is alway taken because the lookahead is trivially
>>     true.
>>
>> @new-javacc <https://github.com/new-javacc> , also, the lookahead for
>> empty action is true, but the action does't consume lookahead counts.
>> Say:
>>
>>   * Grammar |({} <FOO> | <BAR> )| is good for parsing input |BAR|;
>>   * Grammar |(( <DUMMY> | {} ) <FOO> | <BAR> )| is good for parsing
>>     input |BAR|;
>>   * HOWEVER grammer |( LOOKAHEAD(2) ( <DUMMY> | {} ) <BAR> <DUMMY> |
>>     <BAR> <BAR> )| is not able to parse input |BAR BAR|.
>>
>> This is simply INCORRECT.
>>
>> —
>> You are receiving this because you were mentioned.
>> Reply to this email directly, view it on GitHub
>> <https://github.com/javacc/javacc/issues/86?email_source=notifications&email_token=AE2B3LKZ6MFXYLPLNHBW5PLPYCMXBA5CNFSM4HCOZM22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWUDQVY#issuecomment-497563735>,
>> or mute the thread
>> <https://github.com/notifications/unsubscribe-auth/AE2B3LOQCT2XBLNWILBDNELPYCMXBANCNFSM4HCOZM2Q>.
>>
>

JavaCC Support

unread,
May 31, 2019, 12:57:33 PM5/31/19
to javacc/javacc, javacc...@googlegroups.com, javacc/javacc, Mention
Oops hit send too soon.

Basically your example perfectly illustrates the need for lookahead. So in your second example below sicne the default lookahead is 1, it will see that bar matches in the first choice and so it commits to that. So the two examples are *not* equivalent.

I strongly suggest you run the grammar with debug_lookahead option and carefully see the output to understand what it's doing.


On May 31, 2019 9:23 AM, JavaCC Support <sup...@javacc.org> wrote:
Yeas that's because the default lookahead is 1

On May 31, 2019 2:51 AM, Hongze Zhang <notifi...@github.com> wrote:

Thanks @new-javacc .

Just to reiterate LOOKAHEAD(2) says upto two tokens but if the choices are less than 2 tokens long, it will not use 2.

Sorry I don't think so (If I understand your point correctly). For example, I can write following grammars:

  1. ( LOOKAHEAD(2) ( <FOO> | <BAR> ) <DUMMY> | <BAR> <BAR> )
  2. ( ( <FOO> | <BAR> ) <DUMMY> | <BAR> <BAR> )

By your explanation, the 2 grammars should be effectively equivalent, because the leading expansion ( <FOO> | <BAR> ) is a choice with no longer than 2 tokens. However, grammar 1 can parse BAR BAR correctly, where grammar 2 cannot. I think the behavior is right.

Per lookahead tutorial and JavaCC generated code, I believe a lookahead hint in choice should be the lookahead for the whole choice branch, not only the first expansion. Say, the lookahead in LOOKAHEAD(2) ( <FOO> | <BAR> ) <DUMMY> | ... should consider ( <FOO> | <BAR> ) <DUMMY>, not only the sub choice ( <FOO> | <BAR> ), so I don't think length of the sub choice matters.


You are receiving this because you were mentioned.

Reply to this email directly, view it on GitHub, or mute the thread.



JavaCC Support

unread,
Jun 1, 2019, 3:07:59 PM6/1/19
to reply+AE2B3LN56OQO2QQYAB...@reply.github.com, javacc...@googlegroups.com
Same issue - it's not a test case for the "bug". You need to factor out
left side because this a top-down parser and write a javacc grammar
properly taking into account how javacc works. It will simply take the
first choice it can with the minimal lookahead - so *, [] and empty
choice {} are always taken. So you need to rework your grammar to make
sure you don't have crazy combinations in the prefix especially
optional/empty. And anyway these kinds of grammars hard to understand
and maintain. So if you have real grammar with this issue, send it over
and we can see what can be done.

> Please consider:
>
> |void Branch1() : {} { ( <FOO1> | [ <FOO3> ] ) <BAR> <DUMMY> } void
> Branch2() : {} { ( <FOO2> | ( <FOO4> )* ) <BAR> <BAR> } |
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/javacc/javacc/issues/86?email_source=notifications&email_token=AE2B3LKU7AGS3BNJGJ64LYDPYLBAXA5CNFSM4HCOZM22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXGSII#issuecomment-497969441>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AE2B3LNHWHAACJKSVKZTBGDPYLBAXANCNFSM4HCOZM2Q>.
>

JavaCC Support

unread,
Jun 1, 2019, 4:08:43 PM6/1/19
to javacc...@googlegroups.com
Actually, your example should work with lookahead(3)

JavaCC Support

unread,
Jun 2, 2019, 2:16:20 AM6/2/19
to reply+AE2B3LK2VSWVUNGVR3...@reply.github.com, javacc...@googlegroups.com
So the issue is you don't know that. JavaCC doesn't let you specify
"just empty" as a production so at least one of the choices has to have
> 0 tokens. That's why to avoid infinite looping, we do this. Your
example is too simplistic. If have recursion, this will loop because
lookahead will never go down and cause major problems. It's been a long
time (> 20yrs :)) but now I remember this.
>
> *, [] and empty choice {} are always taken.
>
> And I think I understood your concern above. |( <FOO> )*| / |[ <FOO>
> ]| / |{}| are all min=0 expansions, they can always be taken, that's
> no problem. But we can't consider that they consume one lookahead
> count, they should all consume zero. We can take |[ <FOO> ]|, but
> since we don't have any |<FOO>|s, we should keep the rest lookahead
> count numbers and consume them afterwards. Decreasing the number by
> *1* is incorrect.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> <https://github.com/javacc/javacc/issues/86?email_source=notifications&email_token=AE2B3LOOUM6BSHIAF73FSUDPYNNEDA5CNFSM4HCOZM22YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODWXOMPY#issuecomment-498001471>,
> or mute the thread
> <https://github.com/notifications/unsubscribe-auth/AE2B3LKOYFGQE4T2RJMBNWLPYNNEDANCNFSM4HCOZM2Q>.
>

T.S.Norvell

unread,
Jun 3, 2019, 12:47:19 PM6/3/19
to javacc...@googlegroups.com

Just in case anyone else is wondering what this is about.  It's issue 86 on the GitHub.

        https://github.com/javacc/javacc/issues/86

I mention this, because it took me a long time to figure out what this discussion is about.  I'd suggest that further discussion should be on GitHub because that's where it started, because that's where the context is, and because it's best to have discussions like this in one place.

But thanks for bringing this to our attention.

Theo

Reply all
Reply to author
Forward
0 new messages