Indentation instead of brackets for Java Simple Grammar in Antlr 4

948 views
Skip to first unread message

George

unread,
Jul 25, 2013, 12:43:00 PM7/25/13
to antlr-di...@googlegroups.com
I'm currently attending the course "Compilers Construction" and I'm developing a simplified Java grammar in Antlr 4. The part involving Java grammar works just fine, except that it is slightly different from the original Java. My teacher told me I should no use brackets and instead use indentation, the same of Python. I have tried a few examples from the internet, but I found only examples for Antlr 3 which won't work in Antlr 4.

Can anyone help me with this indentation issue, or provide me the Python grammar for Antlr 4? 

Thanks for listening.



Att,
George

George S Cowan

unread,
Jul 29, 2013, 2:17:11 PM7/29/13
to antlr-di...@googlegroups.com
Hi, George,

I don't really know much about Python, but as a dues-paying member of the George Solidarity Club, I thought I would give you pointers that I could find in a hurry.

The Antlr4 Reference (http://pragprog.com/book/tpantlr2/the-definitive-antlr-4-reference) has a section on handling the indentation/out-dentation in Python (see p. 214, "Fun with Python Newlines"). There is also lexmagic/SimplePython.g4 in the book's accompanying source code.

Good luck translating this into a Java-like language.

George S. Cowan

mycircuit

unread,
Jul 31, 2013, 4:19:55 PM7/31/13
to antlr-di...@googlegroups.com
Please be aware that the examples you mention DO NOT handle indentation , they focus solely on handling comments and whitespace across multiple lines ( because in arg lists in python you can have comments after each arg on separate lines )

I have tried to do identation stuff, but it gets pretty ugly very quickly. I am planning to try some two-pass handling  but have not yet started.

mycircuit

George S Cowan

unread,
Jul 31, 2013, 5:05:41 PM7/31/13
to antlr-di...@googlegroups.com
Oh, a quick rescan of the topic in the Antlr4 Ref shows that I jumped the gun. Thanks for setting the record straight.

I did find PythonTokenSource.py that might be an approach you can use. It's Antlr3 code and I don't know whether it will require a rewrite. Also, it doesn't help our original poster.

George S. Cowan

mycircuit

unread,
Jul 31, 2013, 5:15:02 PM7/31/13
to antlr-di...@googlegroups.com
Great, thanks for link, hopefully there will we an ANTLR4 python grammar one day (:- . There is also the python grammar from ANTLR3 which might be usefull to you ( including the indentation magic ), but it won't work straight out of the box as an ANTLR4 grammar, you have to tweak it.

mycircuit

Jim Idle

unread,
Jul 31, 2013, 10:49:48 PM7/31/13
to antlr-di...@googlegroups.com
The problems 
    caused by 
making indentation/whitespace
 significant in
a language are seemingly innumerable. I suggest that no 
  one 
    repeats 
          this mistake with a new language/DSL
if you 
    must 
         parse Python,
  then
     hopefully you are converting it
 to something
else.

E.J. Thrib. Age 13 3/4 Anacondas





--
You received this message because you are subscribed to the Google Groups "antlr-discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to antlr-discussi...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

mycircuit

unread,
Aug 1, 2013, 5:36:00 AM8/1/13
to antlr-di...@googlegroups.com
Dear Jim

I beg to disagree. There are several languages that use the indentation thing besides Pythin ( most notably Haskell ) and it clearly makes sense to many people deeply involved in language design.
Not to do it a the language level just because it is difficult to implement doesn't sound like a convincing reason to me.

Besides, suppose I want a grammar that allows me to define path and directory names the "intuitive" way, like so:

rootdir/
   dir1
and so on. Or some other form of tree structured data. I would need to manage indentation based structuring as well.

I believe language designers should have tools to do indentation based grammars without adhoceries.

mycircuit

Jim Idle

unread,
Aug 1, 2013, 6:23:39 AM8/1/13
to antlr-di...@googlegroups.com, antlr-di...@googlegroups.com
I have also designed many languages. The reasons not to do this are purely at the usage level. Nothing to do with difficulty of parsing. 

The most obvious problem is that an accidental indent or format results in syntactically valid input with different semantics. That's just asking for trouble. 

Interestingly, people seem to argue the "good to write" argument without considering what you are denying the parser and compiler the ability to check for you. Thus is very important. 

I think Haskell is a great concept by the way, syntactical decisions aside. 

But his long before we are back to COBOL punched cards with defined columns?

Jim

mycircuit

unread,
Aug 1, 2013, 9:14:00 AM8/1/13
to antlr-di...@googlegroups.com


On Thursday, August 1, 2013 12:23:39 PM UTC+2, Jim Idle wrote:
I have also designed many languages.

=> Sorry, I didn't mean to be personal, I meant to say, that different people value syntactic aspects ( like braces and sems ) differently, I tend to value syntactical-easy-on-the-eyes very highly but
your point about explicit correctness is certainly worth consideration.
 
The reasons not to do this are purely at the usage level. Nothing to do with difficulty of parsing. 

The most obvious problem is that an accidental indent or format results in syntactically valid input with different semantics. That's just asking for trouble. 

=> I agree, still I can't think of a python script I have written so far, where an incidential indent didn't throw a runtime error.
In the same vein, I can't think of a Haskell programm that I would have even got past the compiler with an incidential indent.

But there are certainly examples that prove you correct.
 

Interestingly, people seem to argue the "good to write" argument without considering what you are denying the parser and compiler the ability to check for you. Thus is very important. 

I think Haskell is a great concept by the way, syntactical decisions aside. 

But his long before we are back to COBOL punched cards with defined columns?

I knew people who made a fortune from getting COBOL punched cards right (;-

George S Cowan

unread,
Aug 1, 2013, 2:17:01 PM8/1/13
to antlr-di...@googlegroups.com
I agree with you mycircuit. Output of tree structured data often uses indentation, and indentation alone, as a readable way to indicate the level in the tree. For instance, Ter has done this on occasion. It seems obvious to me that we should be able to parse what we write, though I am willing to listen to arguments that even computer output should have a syntactic structure that duplicates the indentation.

For me, the problem with COBOL was not the indentation requirements, it was the stupid period at the end of some arbitrary line, which syntactically ended the statement no matter what the indentation was. There can be a related problem with indentation under a Java if-statement that doesn't use blocks, e.g.,

if (a == b)
  a
= c;
  b
= d;

Although I have to admit I haven't seen it but once and found it quickly with the debugger. But IMHO that's enough evidence to change the language. I haven't used indentation based languages enough to encounter the problem that Jim mentioned: accidentally changing the program semantics with the wrong indentation level.

Well, where I've arrived at in meandering around this topic is that language syntax definitions should require indentations that match the syntax nesting. This should be checked in compilers, not just IDEs, because error-prone is error-prone, wherever it is. Besides, IDEs have to use parsers, too.

Let's face it, computer languages are two-dimensional, even if parser theory is not.

And this brings the whole discussion back to Antlr. I propose that the Antlr syntax should provide the ability for checking the level of indentation. (Even if we have to wait for Antlr5.)

George S. Cowan
(wait, where is that flame-proof jacket?)

Terence Parr

unread,
Aug 1, 2013, 5:53:47 PM8/1/13
to antlr-di...@googlegroups.com

On Aug 1, 2013, at 11:17 AM, George S Cowan wrote:
> And this brings the whole discussion back to Antlr. I propose that the Antlr syntax should provide the ability for checking the level of indentation. (Even if we have to wait for Antlr5.)

If it were just indentation, then it's an easy character stream object that sits on top of the usual ANTLR stuff. Unfortunately, indentation is intertwined with \n processing. R and Python differ in how they handle which \n to ignore.

3+
4

is fine in R but not in Python. Python has to have nested ( or [ or whatever to ignore \n whereas R will ignore \n inside of a valid expression. I needed to use a nested grammar to get this right in my extras/RFilter.g4 example from the book. Python it's much easier because the lexer alone can handle tracking nested grouping symbols.

Ter

Jim Idle

unread,
Aug 1, 2013, 10:08:59 PM8/1/13
to antlr-di...@googlegroups.com
George - you are of course talking about output, not input, which means we can test the programs so that they always generate correct output. and reading with the human eye not a parser allows the power of the brain and context to decipher issues. But consider that in written languages, we have syntactic sugar, but in general we can "work it out'" without that stuff, even if a sentence becomes ambiguous ( 'and' and ',' are the easiest to contemplate). However, if I add more whitespace and new lines, (in English at least) I don't change the context. Why shouldthissuddenlybe 
    a goodidea
  for a computer language?

We use whitespace and space in general to aid in design, clarity, form, aesthetics. Nobody is going to convince me that a computer language where whitespace is significant is a good idea. It doesn't mean that I am emphatically correct and everyone else who disagrees must be wrong.

Requiring indent gets us back towards COBOL and having to 'execute' the code with your eyes to try and make sure you typed it in correctly, for zero benefit. A well designed language (syntactically), means that an automated formatter is easy to write and the whitespace goes back to its real usefulness - making it easy on the eye. 

But, horses for courses. I would not use languages like python or ruby, but clearly they have many followers.

I don't think that ANTLR should do anything special for this, however mainly because I don't think it should be encouraged and if is made easier by ANTLR, then everyone will unleash such beasts upon the world! ;)

Good luck,

Jim



mycircuit

unread,
Aug 2, 2013, 4:03:34 AM8/2/13
to antlr-di...@googlegroups.com
Hi Jim

I believe that this is really an interesting discussion and it certainly makes me think about what I thought to be a clearly positive thing ( indentation-based structure ).
It also happens that indentation is as far as I can tell almost always used by more recent languages that include garbage collection and other nice things like lists, hashs etc out of box as part of the language.
I always feel that reading a python or ruby or haskell is so much easier than C or Java but that might just be lack of experience in the latter languages.

I should add that there are cases in human language where "indentation" matters, namely in poetry and this might be for good reason because poetry is one of the most condensed form of writing.
Also, "indentation" in a wider sense is crucial in musical scores because it says what happens in parallel and what is a sequence.

Thanks to the great ANTLR4  - which I find such a big improvement over the previous version ( most notably because I could never get excited about inventing grammars with that Java code intermingled ) -
it is now really fun to experiment with different languages.

mycircuit
Reply all
Reply to author
Forward
0 new messages