parser grammar Test;options {language=Cpp;tokenVocab=TestL;}eval: additionExp ;additionExp: multiplyExp ( PLUS multiplyExp | MINUS multiplyExp )* ;multiplyExp: atomExp( STAR atomExp | DIV atomExp )* ;atomExp: NUMBER | LPAR additionExp RPAR;
lexer grammar TestL;options {language=Cpp;}NUMBER : ('0'..'9')+ ('.' ('0'..'9')+) ;Comment: '//' .*? ('\r')? '\n' -> channel(HIDDEN) ;WS : (' ' | '\t' | '\r'| '\n') -> channel(HIDDEN) ;PLUS : '+' ;MINUS : '-';STAR : '*';DIV : '/';LPAR : '(';RPAR : ')';
// this is a comment5 + 3 * 5
Comment : '//' -> channel(HIDDEN) ;
I had an earlier version of the C++ runtime and antlr4 jar file that worked fairly well. I've tried to upgrade to the newest version of the C++ runtime, but now it seems to fail when trying to add a standard comment rule in the lexer.
The lexer file is:lexer grammar TestL;options {language=Cpp;}NUMBER : ('0'..'9')+ ('.' ('0'..'9')+) ;Comment: '//' .*? ('\r')? '\n' -> channel(HIDDEN) ;WS : (' ' | '\t' | '\r'| '\n') -> channel(HIDDEN) ;PLUS : '+' ;MINUS : '-';STAR : '*';DIV : '/';LPAR : '(';RPAR : ')';The problem is that while the white space rule works fine, the comment rule does not (I've tried every form of comment rules that I could find online).
Thank you. I downloaded both the source and binary distributions of the new runtime, as well as the jar file (from http://www.soft-gems.net/index.php/all-downloads). The results are exactly the same - i.e., it's not working like it is supposed to.
Yes, that sounds more reasonable. Here is the printout when printing the type and text of the tokens. Seems like the beginning of the comment is not recognized, but taken as two '/' divide tokens.Getting tokens 24Token 0: type: 7 text: '/'Token 1: type: 7 text: '/'Token 2: type: 3 text: ' 'Token 3: type: 3 text: ' '
Mike,
I think I have found the problem. In LexerATNSimulator.cpp:318, the method match is called as follows:
I think I have found the problem. In LexerATNSimulator.cpp:318, the method match is called as follows:if (trans->matches((int)t, std::numeric_limits<char32_t>::min(), std::numeric_limits<char32_t>::max())) {
Notice the last parameter, std::numeric_limits<char32_t>::max() - this will yield the max of an unsigned 32 bit variable. However, the signature of the code it calls in WildCardTransition.cpp:45 as shown below accepts parameters of type ssize_t.bool WildcardTransition::matches(ssize_t symbol, ssize_t minVocabSymbol, ssize_t maxVocabSymbol)In an x86 build this gets defined as int, which means that the max value passed in gets converted into a -1.
Thus the statement on line 46:return symbol >= minVocabSymbol && symbol <= maxVocabSymbol;Will always fail. That's why wildcards never work in the new distro if you build for x86 (which I did for various legacy reasons).
>
> I'm a bit busy, or I would offer to help. I have done a lot of work on ANTLR internals, although I have not looked at ANTLR4--I have a variant of ANTLR derived from ANTLR3 for which I am currently debugging redundancy removal code for my linear GLL recognition algorithm. When I get around to a C++ port, I will probably start with your code.
Would be great, Loring. However, be prepared, the differences between ANTLR3 and 4 are non-trivial ;-)
>
> For this problem, I suggest making token types unsigned
That's one of the bigger problems. There is one negative token type: EOF. Only because of that the entire token type machinery must use signed types. It's a PITA. I hesitate to assign a different (positive) value to that, as it would break compatibility to all other targets. In other cases -1 is used to indicate non-initialized values (e.g. token positions) in otherwise always unsigned properties. I wonder if C++ users would accept a rule that says: whenever you see a -1 in Java ANTLR code use e.g. npos instead in C++?
> This should only take a day or two--once you get the code to compile again, it should pass testing after only a few minor tweaks. At least that has been my experience in doing similar things with the ANTLR source.
Unfortunately, the runtime test coverage is by far not 100%, so we might miss special cases like the one which led to this discussion, which you find only after weeks or even months. But the pure replacement is indeed not such a big deal. I'm more worried about breaking something.
Mike
--
www.soft-gems.net
--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussion+unsubscribe@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
I think it is totally fine to use different values for EOF in C++. There are no unsigned ints in Java, hence this type of thing happens. Personally, I think it would probably be OK to define EOF as token 0 and start real tokens after that. However, perhaps there are idioms in the source code as it stands that expect EOF to be negative? Shouldn't be really as it should be always be used symbolically without assuming anything about the value, but such things creep in. Maybe just try it and test it?
I feel that you should take the C++ runtime towards "perfect C++", whatever that means, over time. Sometimes this is bound to mean that the internals of C++ are different to Java, but I think that is just implementation specific and we should not worry about it.
Hey Jim,I think it is totally fine to use different values for EOF in C++. There are no unsigned ints in Java, hence this type of thing happens. Personally, I think it would probably be OK to define EOF as token 0 and start real tokens after that. However, perhaps there are idioms in the source code as it stands that expect EOF to be negative? Shouldn't be really as it should be always be used symbolically without assuming anything about the value, but such things creep in. Maybe just try it and test it?Yes, you are probably right. I'm just leaning towards using a high value (max unsigned probably) instead of 0 for it, so we don't have to shift token values. We still have to compare between java and C++ sometimes.
I feel that you should take the C++ runtime towards "perfect C++", whatever that means, over time. Sometimes this is bound to mean that the internals of C++ are different to Java, but I think that is just implementation specific and we should not worry about it.Right, I see that the same way.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.