EOF token start greater than end on syntax error

103 views
Skip to first unread message

mark.fi...@thoughtwire.ca

unread,
Aug 22, 2013, 11:35:36 AM8/22/13
to antlr-di...@googlegroups.com, Mark Fitzgerald
Good day, all. I've run into a possible minor issue with ANTLR 4.1 for Java.

On parse of a text in which an expected terminating token is missing, my ANTLRErrorListener is notified of a syntax error via the syntaxError() method with message "no viable alternative at input '<EOF>'". So far so good. The possible issue I noticed is that the Token.getStart() of the received Token points just past the last character of my input, while the Token.getEnd() is at the end of my input. Having start > end seems rather confusing, and is not hinted at by the Javadocs.

Thoughts?

Mark A. Fitzgerald
ThoughtWire Corporation

Terence Parr

unread,
Aug 22, 2013, 3:23:46 PM8/22/13
to antlr-di...@googlegroups.com
That sounds familiar. I thought we set it up to some unusual state to indicate end of file rather than pointing at a nonexistent character after the end of the real characters.

Ter
> --
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Sam Harwell

unread,
Aug 22, 2013, 6:27:03 PM8/22/13
to antlr-di...@googlegroups.com
Hi Mark,

The end position of a token is inclusive (which should certainly be documented), so the formula is (start + length - 1), where length is the number of input symbols matched for the token. The start position of the EOF token is input.length, and the length of the EOF token is 0 (it doesn't match any real input symbols), leaving you with end=start-1.

Sam

mark.fi...@thoughtwire.ca

unread,
Aug 29, 2013, 11:06:24 AM8/29/13
to antlr-di...@googlegroups.com
On Thursday, August 22, 2013 6:27:03 PM UTC-4, Sam Harwell wrote:
> Hi Mark,
>
>
>
> The end position of a token is inclusive (which should certainly be documented), so the formula is (start + length - 1), where length is the number of input symbols matched for the token. The start position of the EOF token is input.length, and the length of the EOF token is 0 (it doesn't match any real input symbols), leaving you with end=start-1.
>
>
>
> Sam
>
>
>
> -----Original Message-----
>
> From: antlr-di...@googlegroups.com [mailto:antlr-di...@googlegroups.com] On Behalf Of mark...@th....ca
>
> Sent: Thursday, August 22, 2013 10:36 AM
>
> To: antlr-di...@googlegroups.com
>
> Cc: Mark Fitzgerald
>
> Subject: [antlr-discussion] EOF token start greater than end on syntax error
>
>
>
> Good day, all. I've run into a possible minor issue with ANTLR 4.1 for Java.
>
>
>
> On parse of a text in which an expected terminating token is missing, my ANTLRErrorListener is notified of a syntax error via the syntaxError() method with message "no viable alternative at input '<EOF>'". So far so good. The possible issue I noticed is that the Token.getStart() of the received Token points just past the last character of my input, while the Token.getEnd() is at the end of my input. Having start > end seems rather confusing, and is not hinted at by the Javadocs.
>
>
>
> Thoughts?
>
>
>
> Mark A. Fitzgerald
>
> ThoughtWire Corporation
>
>
>
> --
>
> You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
>
> To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
>
> For more options, visit https://groups.google.com/groups/opt_out.

Thanks for confirming this. (Sorry for the late reply.) Chances are it is documented in either the book or one of the documentation pages I've not yet read. :)

I mostly work from the Javadocs; the Javadoc for Token#getStopIndex() does not indicate whether the end index is inclusive or exclusive: http://www.antlr.org/api/Java/org/antlr/v4/runtime/Token.html#getStopIndex(). Perhaps that could be clarified in the Javadoc?

Have a good morning,

Mark

Mike Lischke

unread,
Aug 29, 2013, 12:19:48 PM8/29/13
to antlr-di...@googlegroups.com
>> The end position of a token is inclusive (which should certainly be documented), so the formula is (start + length - 1), where length is the number of input symbols matched for the token. The start position of the EOF token is input.length, and the length of the EOF token is 0 (it doesn't match any real input symbols), leaving you with end=start-1.


IMO, it should not be startIndex and stopIndex but startIndex and length which then wouldn't need any documentation as it is crystal clear what is meant. Would have avoided quite some grief on my side too.

Mike
--
www.soft-gems.net

Terence Parr

unread,
Aug 29, 2013, 1:10:45 PM8/29/13
to antlr-di...@googlegroups.com
Sam has argued for start+length but I think I killed idea since we had too much already built the other way. something like that.
Ter


--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.



--
Dictation in use. Please excuse homophones, malapropisms, and nonsense. 
Reply all
Reply to author
Forward
0 new messages