antlrworks2 can not open grammar files with utf-8 encoding?

186 views
Skip to first unread message

mdakin

unread,
Feb 12, 2013, 4:13:19 AM2/12/13
to antlr-di...@googlegroups.com
I am trying to open a antlr4 lexer grammar file with some Turkish characters in it, something like this:

// Letters
fragment TurkishLetters
    : [a-zâîûçğıöşü];

fragment TurkishLettersCapital
    : [A-ZÂÎÛÇĞİÖŞÜ];

When I try to open it with antlrworks2 it gives me this error message:

The file <snip>/TurkishLexer.g4 cannot be safely opened with encoding US-ASCII. Do you want to continue opening it? Yes/cancel

If I open it, it garbles the Turkish characters and I could not find an option to select a default encoding, do I have to use ascii only for defining grammars with \u escaping for non ascii characters?

Thanks.


Sam Harwell

unread,
Feb 12, 2013, 9:38:42 AM2/12/13
to antlr-di...@googlegroups.com

Have you tried saving your grammar in UTF-8 with the byte order mark? The byte order mark is the only reliable way of auto-detecting the encoding of a file, and since NetBeans doesn’t expose an “Open with Encoding” option I’m not sure how the encoding is set otherwise.

 

Thank you,

--

Sam Harwell

Owner, Lead Developer

http://tunnelvisionlabs.com

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

mdakin

unread,
Feb 12, 2013, 10:38:32 AM2/12/13
to antlr-di...@googlegroups.com
Added a BOM header as well but did not work. Maybe it is failing because my locale is tr_TR.UTF-8 anyway we are back to \u escape codes. (Btw, IMHO, the default encoding should always be UTF-8 for the antlrworks)
Reply all
Reply to author
Forward
0 new messages