'??' and the trigraph warning

207 views
Skip to first unread message

will...@williamzeitler.com

unread,
Aug 27, 2018, 1:21:34 PM8/27/18
to antlr-discussion

In my grammar I have to recognize the token '??' (required by a 3rd party, not my choice).

When I compile I'm getting  "warning: trigraph ??' ignored".

I need to recognize this token.

Do I disable trigraphs with "-Wno-trigraphs"? (Unsure of the implications.)

Or is there a way to escape '??' that avoids this issue?

Or it there yet some other way to handle this?

(My parser is Linux/GCC/C++, and I'm using the latest ANTLR 4.7.1)

MANY THANKS!

William Zeitler


Mike Lischke

unread,
Aug 28, 2018, 3:18:04 AM8/28/18
to antlr-di...@googlegroups.com

In my grammar I have to recognize the token '??' (required by a 3rd party, not my choice).

When I compile I'm getting  "warning: trigraph ??' ignored".

I need to recognize this token.

Do I disable trigraphs with "-Wno-trigraphs"? (Unsure of the implications.)

Trigraphs are something from the dawn of the programming age. They allow to use certain ASCII chars to represent others (e.g. ??( represents a [ and ??> means } etc.). Very likely disabling them will not harm anything.


Or is there a way to escape '??' that avoids this issue?

You could escape them as usual (a leading backslash before the second question mark).



William Zeitler

unread,
Aug 28, 2018, 11:24:03 AM8/28/18
to antlr-di...@googlegroups.com

'?\?' is an 'invalid escape sequence' when I use the ANTLR tool to generate my C++. (Likewise '\??', and '\?\?')

I'll try  -Wno-trigraphs

--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Mikoláš Janota

unread,
Aug 28, 2018, 11:37:11 AM8/28/18
to antlr-discussion
 -Wno-trigraphs will only disable the warning. So effectively not doing anything. I believe this should be fixed within the generator. The generator should escape trigraphs in the generated code.
-M

Mike Lischke

unread,
Aug 28, 2018, 11:40:06 AM8/28/18
to antlr-discussion

'?\?' is an 'invalid escape sequence' when I use the ANTLR tool to generate my C++. (Likewise '\??', and '\?\?')



Surprising. When I wrote my .rc file parser that was what I implemented for the trigraph handling.

I'll try  -Wno-trigraphs



That's the only option then (or changing the language to disallow ?? ;-) ).

Mike Lischke

unread,
Aug 29, 2018, 2:50:57 AM8/29/18
to antlr-discussion

 -Wno-trigraphs will only disable the warning. So effectively not doing anything.

Any source that confirms this?

I believe this should be fixed within the generator. The generator should escape trigraphs in the generated code.
-M


How can the generator fix that when escaping trigraphs doesn't work? And keep in mind the parser class is something that is put dynamically together using just text replacements in templates.





Loring Craymer

unread,
Aug 29, 2018, 3:32:14 AM8/29/18
to antlr-discussion
If you are using '??' in your parser grammar as a literal, that is not a good idea--avoiding keyword literals is good practice.  What you really want is a lever production of the form
QQ : '?' '?' ;
that precedes the usual
QUESTION: '?' ;
production.
That should avoid any conflicts with trigraph recognition.

--Loring

Mikoláš Janota

unread,
Aug 30, 2018, 11:12:46 AM8/30/18
to antlr-discussion
> Any source that confirms this?

My man page on gcc is fairly clear on this.

> How can the generator fix that when escaping trigraphs doesn't work?

For me it works. If I compile the following with "g++ -ansi", I get two different prints, ^ and ??', without -ansi gcc ignores trigraphs.

#include<iostream>
int main() {
    std::cout<<"??'"<<std::endl;
    std::cout<<"\?\?'"<<std::endl;
    return 0;
}

Mikoláš Janota

unread,
Aug 30, 2018, 7:59:07 PM8/30/18
to antlr-discussion
Could you elaborotare why it's not a good practice?


> that precedes the usual QUESTION: '?' ;

Does the order matter? Wouldn't the longest always get matched?

Loring Craymer

unread,
Aug 30, 2018, 10:54:33 PM8/30/18
to antlr-discussion
?? can be interpreted by the lexer either as two '?' tokens or one '??' token.  Syntactic ambiguities like this are resolved to create whichever token is defined first (PEG-style ambiguity resolution, which Ter introduced in ANTLR 3).  I would describe your use of a '??' literal as one example of why--in this case a "??" string appears in the generated code (not true when '?' '?' is used to match two characters)--avoiding literals is good practice, but the confusion over ambiguity resolution is the more common issue.  Having a '111' literal appear after a '11' literal, for example, will give unexpected results--encountering 111 in the input stream will then cause the lexer to spit out 11 and 1 tokens (assuming the 1 is recognized as a NUMBER in the lexer grammar).  String literals for keywords can look nice in grammars, but experience leads to the conclusion that it is better to avoid them in favor of specifying an ordering of token definitions in the lexer grammar that guarantees the desired result.  Even when you automatically think of how syntactic ambiguities will be resolved and always (well, to the extent that anyone "always" does the right thing) pick the desired ordering, the newbie stuck with maintaining the legacy grammar after you have left (the project or company for which you did the work) probably will not, and it is best to avoid giving him a chance to make a mistake.

--Loring
Reply all
Reply to author
Forward
0 new messages