Indentation sensitivity issues

182 views
Skip to first unread message

Tony Arcieri

unread,
Feb 21, 2009, 3:32:16 PM2/21/09
to re...@googlegroups.com
Like Erlang and Ruby (and unlike Python) Reia has a purely expression-based grammar where everything returns a value.  In Python, all indentation-based constructs are all statements, and by nature of being statements cannot return values.

Reia attempts to introduce Ruby-style "blocks" into a Python-like grammar.  Blocks are attached to a grammatical element which must be an expression: a function call.  So, in order to have multi-line blocks Reia must have expressions with indentation blocks.  I would also like to retain the pure-expression based nature of Erlang.

Guido found it unacceptable to embed an indentation-based block in the middle of an expression.  I'm starting to see some of the cases where doing this allows for ugly and strange syntax, and can see why Guido wanted to disallow it for sake of syntactic purity.

Multi-line expressions allow some strange (and arguably ugly) syntax:

fun(a, b, c) do
  a + b + c
.to_s()

Furthermore, every multiline expression will need to be "terminated" with a newline.  So this:

v = if false
  'works'
else
  'doesnt'
puts(v)

...will be invalid.  Every if statement, case statement, function call that accepts a multiline block, or multiline lambda declaration will need a blank line after it to terminate the expression:

v = if true
  'works'
else
  'doesnt'

puts(v)

or you could use the weird syntax to invoke the puts method on the return value:

if true
  'works'
else
  'doesnt'
.puts()

This sort of odd syntax is also permissable as an effect of the way the grammar works:

v = if true
  'works'
else
  'doesnt'
; puts(v)

Here semicolon is acting as the statement terminator, rather than a newline.

I'm going to add support for this in the lexer this weekend.  However, I'm not really sure I'm happy about this syntax.

What do you think?  Weird?  Annoying? Or does it actually make sense?

This is the best I'm able to do in terms of solving Guido's "unsolvable puzzle" of multi-line expressions in a Python-like grammar.  I'm certain Guido would not be happy with this solution, and I think he'd equally dislike that this has come as the result of something of a puzzle-solving exercise, rather than looking for the most elegant approach.  I'll certainly admit it is somewhat inelegant.

Someone on #python pointed me at a language called Logix:

http://www.livelogix.net/logix/tutorial/3-Introduction-For-Python-Folks.html#3.1

which is another Python-like language with an expression-based grammar.  It makes use of the off-side rule in a way more akin to Haskell (whereas Reia uses it more like Python).  It's another potential direction I could go.  The other would be abandoning indentation sensitivity altogether and going with a more Ruby-like grammar.

Logix appears to have been the project of Hobo author Tom Locke.  If anyone has his e-mail address I'd be curious to ask him what he thinks about this.

--
Tony Arcieri
medioh.com

Phil Pirozhkov

unread,
Feb 22, 2009, 6:59:11 AM2/22/09
to re...@googlegroups.com
I'd rather go for "coding guidelines", than implementing coding restrictions in the language.
Yes, all the examples you've given look really ugly.
I remember Linus's phrase "If you need more than 3 levels of indentation, you're screwed anyway, and should fix
your program" (http://en.wikiquote.org/wiki/Linus_Torvalds), and no matter what language i'm using, i'm trying to
follow this.
No one can prevent people from using the good language to express ugly things.

Cheers, Phil

Michael Sparks

unread,
Feb 22, 2009, 8:09:49 AM2/22/09
to re...@googlegroups.com, Tony Arcieri
On Saturday 21 February 2009 20:32:16 Tony Arcieri wrote:
> or you could use the weird syntax to invoke the puts method on the return
> value:
>
> if true
> 'works'
> else
> 'doesnt'
> .puts()
>
> This sort of odd syntax is also permissable as an effect of the way the
> grammar works:
>
> v = if true
> 'works'
> else
> 'doesnt'
> ; puts(v)
>
> Here semicolon is acting as the statement terminator, rather than a
> newline.

I don't know if it's helpful, but a few years back I played with writing a
grammar with the following attributes:
* Was written test first (to see if TDD grammar design was doable)
* No keywords at all
* Used indentation in a similar way to python
* Used a ":" to indicate the start of an indentation block

Due to using no keywords at all, everything was treated as a function call.
(The aim in a way was to see if it was possible to put a pythonlike syntax on
the front of a lisp-like language - using indentation & infix rather than
prefix / brackets)

The key result I found was that it was possible, but that indented blocks
needed some form of end keyword. That keyword didn't need to be defined
(though its nice to have conventions :), but it did need to be there.

I never attached a backend to the parser/lexer, but the code works and is
still accessible. I even gave a (slightly tongue in cheek :) lightning talk
on it at Europython in 2005. As a result:

Slides:
http://www.slideshare.net/kamaelian/swp-a-generic-language-parser

Code:
http:///www.cerenity.org/SWP-0.0.0.tar.gz

Other relevant stuff:
http:///www.cerenity.org/SWP/

Only posting this in case it's useful :)

That said, posting since hopefully of interest because fundamentally it treats
everything as a function call.

Regards,


Michael
--
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home
--
http://yeoldeclue.com/blog
http://twitter.com/kamaelian
http://www.kamaelia.org/Home

Tony Arcieri

unread,
Feb 22, 2009, 3:08:42 PM2/22/09
to re...@googlegroups.com
On Sun, Feb 22, 2009 at 4:59 AM, Phil Pirozhkov <pi...@mail.ru> wrote:

I'd rather go for "coding guidelines", than implementing coding restrictions in the language.

The goal isn't so much being restrictive as using indentation to allow for a cleaner look.

Of course, I'm not really sure if it's being accomplished in this case.

The role of the blank line is the same as the "end" keyword in an expression-based grammar like Ruby's.  Having it there implicitly seems somewhat weird to me though.

Requiring a blank line after a function which takes a block or a case/if statement seems like the one thing someone would have to do day-to-day which I don't necessarily like.  The other things are just syntactic oddities that result from the way the grammar works at present.

That said, I'm going to go with it for now, then investigate a more Haskell/Logix-like grammar in another branch.

--
Tony Arcieri
medioh.com

Tony Arcieri

unread,
Feb 22, 2009, 3:12:23 PM2/22/09
to re...@googlegroups.com
On Sun, Feb 22, 2009 at 6:09 AM, Michael Sparks <spar...@gmail.com> wrote:
The key result I found was that it was possible, but that indented blocks
needed some form of end keyword. That keyword didn't need to be defined
(though its nice to have conventions :), but it did need to be there.

I'm running into something similar, except expressions with indentation blocks need a statement separator to terminate them.

I'd like to avoid an explicit ending token (as I feel that goes against what an indentation sensitive grammar is trying to accomplish in the first place)

Using a blank line as an expression separator "token" seems like a decent compromise.

--
Tony Arcieri
medioh.com

Phil Pirozhkov

unread,
Feb 22, 2009, 5:07:31 PM2/22/09
to re...@googlegroups.com
> From: Tony Arcieri <to...@medioh.com>
> On Sun, Feb 22, 2009 at 6:09 AM, Michael Sparks <spar...@gmail.com> wrote:
>
> > The key result I found was that it was possible, but that indented blocks
> > needed some form of end keyword. That keyword didn't need to be defined
> > (though its nice to have conventions :), but it did need to be there.
> >
>
> I'm running into something similar, except expressions with indentation
> blocks need a statement separator to terminate them.

Is there a special reason for that? It looks at a glance that

if x
if y
puts("y")
else
puts("not y")
puts("i'm only seen when x is true")
puts("outside all ifs")

can be parsed, and i don't see any possible conflicts here
empty lines with any number of tabs/spaces, possibly ending with a comment
should be skipped ('$empty') during parsing

Currently,

module A
def a
if true
puts("y")
def b
..

doesn't work, and this means there's something wrong with indentation

> I'd like to avoid an explicit ending token (as I feel that goes against what
> an indentation sensitive grammar is trying to accomplish in the first place)
sure, ending token breaks the beauty, and just eats rows

> Using a blank line as an expression separator "token" seems like a decent
> compromise.
I'm sure this can be avoided, too

> Tony Arcieri
> medioh.com
>
> >
>
>

Cheers, Phil

Eero Saynatkari

unread,
Feb 22, 2009, 5:21:06 PM2/22/09
to reia
Excerpts from Tony Arcieri's message of Sun Feb 22 22:12:23 +0200 2009:

For what it is worth, I am in favour of the Logix/Haskell approach.


--
Magic is insufficiently advanced technology.

Phil Pirozhkov

unread,
Feb 22, 2009, 6:24:04 PM2/22/09
to re...@googlegroups.com
Imagine we have a file called indent.re:

if x
puts("x==true")


puts("outside all ifs")

Let's scan it with:
load('lib/file.re')
(~ok, scanned, _) = reia_scan::scan(File.read('indent.re').to_list())
commands
This currently scans to the following tokens:

[(~if,1),(~identifier,1,~x),(~eol,1),
(~indent,2),(~identifier,2,~puts),(~'(',2),(~string,2,"x==true"),(~')',2),(~eol,2),
(~dedent,3),
(~identifier,3,~puts),(~'(',3),(~string,3,"outside all ifs"),(~')',3),(~eol,3)]

And it looks like it should be easily parsed.
But reia_parser claims that "syntax error before: puts" on row 3


Let's look at it. That's how parser expects if expression to be:
if_expr -> if_op expr eol indent statements dedent :

(~if,1) matches if_op,
(~identifier,1,~x) matches expr, et c.

But if we look at the very beginning, we can see that:

statements -> statement : ['$1'].
statements -> statement statement_ending : ['$1'].
statements -> statement statement_ending statements : ['$1'|'$3'].
...
statement_ending -> ending_token : '$1'.
statement_ending -> statement_ending ending_token : '$1'.
ending_token -> ';' : '$1'.
ending_token -> eol : '$1'.

Let's modify our scanned:
scanned = [(~if,1),(~identifier,1,~x),(~eol,1),
(~indent,2),(~identifier,2,~puts),(~'(',2),(~string,2,"x==true"),(~')',2),(~eol,2),
(~dedent,3),(~eol,3),
(~identifier,3,~puts),(~'(',3),(~string,3,"outside all ifs"),(~')',3),(~eol,3)]

Yes, right, we've added an additional EOL after dedent, as STATEMENTS expect it before next STATEMENT

Let's parse it
reia_parse::parse(scanned)

(~ok,[(~if,1,(~identifier,1,~x),[(~funcall,2,(~identifier,2,~puts),[(~string,2,"x==true")])],(~else_clause,1,[(~atom,1,nil)
])),(~funcall,3,(~identifier,3,~puts),[(~string,3,"outside all ifs")])])

YES!
We've found a snake, now it's time to think what to do with it.

It's quite clear that most statements end with EOL (end-of-line), but some end with EOL _and_ DEDENT,
while we expect to have EOL even after multi-line multi-level STATEMENT.

It's late night already, and i don't see keyboard very sharp, same with the solution

Cheers, Phil

Tony Arcieri

unread,
Feb 22, 2009, 8:00:57 PM2/22/09
to re...@googlegroups.com
On Sun, Feb 22, 2009 at 3:07 PM, Phil Pirozhkov <pi...@mail.ru> wrote:
Is there a special reason for that?

Yes, it's a result of the expression-based grammar.  I'll explain below
 
It looks at a glance that

if x
 if y
   puts("y")
 else
   puts("not y")
 puts("i'm only seen when x is true")
puts("outside all ifs")

can be parsed, and i don't see any possible conflicts here

Yes, this parses fine in Python.  This is because Python separates out statements from expressions in its grammar, and has a special pushdown for "compound statements":

http://www.python.org/doc/2.5.2/ref/grammar.txt
statement ::= stmt_list NEWLINE | compound_stmt
So here we see, Python's stmt_lists (single-line statements separated by semicolons) need a NEWLINE at the end, but compound_stmts (which have indentation blocks) do not.  In fact, Python's statements with indentation blocks have no statement separators whatsoever.

Unfortunately, Reia has a pure expression-based grammar, which permits this sort of syntax:
 
 x = if true
   'yay'
 else
   'boo'

Or to give an example which really necessitates being expression-based, a function call which takes a multiline block:

 x = foo(1,2,3) do |bar|
   baz = bar * 42
   baz ** 2

...or so forth.  This means I can't separate out multiline expressions and special case them in the grammar the way Guido was able to do in Python.  And that said, I'll definitely admit Guido's approach is more elegant.  But if we want to go with Guido's approach, that means we can't have any multiline expressions, which means no blocks.  And I want blocks...

In a poll I put together of "Ruby-features", blocks were (unsurprisingly) the most popular feature:

http://www.twiigs.com/poll/Technology/Computers/25588

...and blocks don't really make sense in an indentation-sensitive grammar unless you have expressions which use indentation blocks.

empty lines with any number of tabs/spaces, possibly ending with a comment
should be skipped ('$empty') during parsing

That's how the lexer works now, which is fine for a Python-like grammar.  Indeed if you compare the way Reia's lexer works as compared to Python's, you'll see they work identically, although I wasn't specifically trying to copy Python's lexer.

I would need to add a special case for using blank lines to terminate expressions.



Here's the problem: you're adding EOL to terminate the expression.  But the scanner doesn't know where the expression ends... that requires the knowledge of the grammar.  So we'd need what Guido describes as a "Rube Goldberg machine" in the scanner using some kind of lexer/scanner feedback.  I'm avoiding that for a few reasons, first because I agree with Guido that such an approach is silly and second because I don't really know how to implement complex lexer/parser feeedback with leex/yecc.

It works in the case you're testing, because it has only one "clause".  Try adding an else clause.  That's where things start getting more difficult.  Try a case statement with multiple clauses.

Then try to assign the return value of the if statement to a variable.  This is where things really start to break down.  If we wish to treat things like if and case statements (and more importantly, function calls with indent-based multiline blocks) as expressions, they need the same sort of terminator token that expressions require.

--
Tony Arcieri
medioh.com

Tony Arcieri

unread,
Feb 22, 2009, 8:04:26 PM2/22/09
to re...@googlegroups.com
On Sun, Feb 22, 2009 at 3:21 PM, Eero Saynatkari <proj...@kittensoft.org> wrote:
For what it is worth, I am in favour of the Logix/Haskell approach.

I certainly think it's worthy of investigation and from what I can tell feels like it meshes better with a purely expression-based grammar.

After modifying the lexer to work as best I think possible with the current approach, I will start (and push) a branch using this approach for comparison.

After both are available for evaluation, I'll await feedback and make a decision as to which path to pursue.

--
Tony Arcieri
medioh.com

Phil Pirozhkov

unread,
Feb 23, 2009, 12:04:12 PM2/23/09
to re...@googlegroups.com
Fix me if i'm incorrect
The final goal here is to allow:

foo(1,2,3) do |bar|
bar * 2
.print()

and

if x
"yes"
else
"no"
.print()

mappings.add(Person)
.add(Account)
.add(Bank)

what prevents from adding a
puts("ground") on the same level of indentation?
This shouldn't confuse parser a lot, since yes, it begins with a dot

All to be expression-based?
i don't really think the followong should be supported:
if if s == "30"
30
else
20
> 25
...

ugly
Cheers, Phil

Tony Arcieri

unread,
Feb 23, 2009, 1:26:57 PM2/23/09
to re...@googlegroups.com
On Mon, Feb 23, 2009 at 10:04 AM, Phil Pirozhkov <pi...@mail.ru> wrote:

Fix me if i'm incorrect
The final goal here is to allow:

foo(1,2,3) do |bar|
 bar * 2
.print()

and

if x
 "yes"
else
 "no"
.print()

I wasn't pointing out these examples as something particularly desirable to have, only as valid syntax due to a fluke in the way indent blocks in the middle of expressions work.  I assume things like this are the reason Guido didn't want indent blocks in expressions.

The final goal here would be to facilitate indent blocks in expressions without the need for a terminal newline, so...

x = if true
  'yes'
else
  'no'
puts(x)

or perhaps more importantly, in the case of blocks:

x = [1,2,3].map do |n|
  n * 42
puts(x)

In Python, statements (such as if) are among the highest precedence parts of the grammar, whereas in Reia (and Ruby, and Erlang) they are the lowest.  Function calls are second from lowest.  This means we can do things like use them in match expressions (as in the above), or unary expressions, or any other type of higher precedence expression that we desire.

To do this grammar needs to be able to handle expressions with or without indent blocks interchangably, and this is what doesn't let us special case statements the way Python does it.  We can't go "for this set of statements, don't require a terminal newline" because the only set the grammar has to operate on is the entire set of expressions.
 
mappings.add(Person)
.add(Account)
.add(Bank)

This isn't valid because the ".add" calls aren't at the end of an indent block.
 
what prevents from adding a
puts("ground") on the same level of indentation?
This shouldn't confuse parser a lot, since yes, it begins with a dot

To give the Ruby equivalent of why you can't add a puts immediately after an indent block, it would look something like this:

if true
  'yes'
else
  'no'
end puts(x)

the "end puts(x)" part is not syntactically valid, and this is exactly how Reia's parser sees tokens in the case of a dedent immediately followed by another expression.
 
i don't really think the followong should be supported:
if if s == "30"
   30
 else
   20
 > 25
 ...

ugly

Yes

--
Tony Arcieri
medioh.com

Tony Arcieri

unread,
Mar 2, 2009, 3:29:14 PM3/2/09
to Tom Locke, Reia Mailing List
On Sun, Mar 1, 2009 at 4:00 AM, Tom Locke <t...@tomlocke.com> wrote:
Hi Tony

The Haskell-like indentation rules in Logix worked out pretty well, but I never did anything like

if true
 'works'
else
 'doesnt'
.puts()

Although I can't remember if that was impossible or I just always parenthesised such things.

Yeah, that's not exactly a syntactic goal so much as an oddity of the way the current grammar works.
 
These days I find I actually prefer Ruby's syntax, with the 'end' markers, to something like python.

If I can't get the indentation-based grammar to work in a way I like, I'm considering ditching indentation-based syntax and just going with Ruby-style syntax.

--
Tony Arcieri
medioh.com

Phil Pirozhkov

unread,
Mar 3, 2009, 1:31:51 AM3/3/09
to re...@googlegroups.com
> From: Tony Arcieri <to...@medioh.com> Date: Mon, 2 Mar 2009 13:29:14 -0700

>
> If I can't get the indentation-based grammar to work in a way I like, I'm
> considering ditching indentation-based syntax and just going with Ruby-style
> syntax.
No objections.
The main thing i've choosen Ruby over Python some time ago was Python's compulsory indentation

Cheers, Phil

ian eyberg

unread,
Mar 3, 2009, 9:35:18 AM3/3/09
to re...@googlegroups.com
I totally agree with this as well
--
ian eyberg
i...@telematter.com
573.219.06858

Matthew King

unread,
Mar 3, 2009, 10:53:33 AM3/3/09
to re...@googlegroups.com
On Tue, Mar 3, 2009 at 8:35 AM, ian eyberg <i...@telematter.com> wrote:
>
> I totally agree with this as well
>
> On Tue, Mar 03, 2009 at 09:31:51AM +0300, Phil Pirozhkov wrote:
>>
>> > From: Tony Arcieri <to...@medioh.com> Date: Mon, 2 Mar 2009 13:29:14 -0700
>> >
>> > If I can't get the indentation-based grammar to work in a way I like, I'm
>> > considering ditching indentation-based syntax and just going with Ruby-style
>> > syntax.

+1

Carsten Nielsen

unread,
Mar 3, 2009, 10:57:56 AM3/3/09
to Reia
+1

Tony Arcieri

unread,
Mar 3, 2009, 11:18:25 PM3/3/09
to re...@googlegroups.com
On Sun, Mar 1, 2009 at 4:00 AM, Tom Locke <t...@tomlocke.com> wrote:
These days I find I actually prefer Ruby's syntax, with the 'end' markers, to something like python.

On Mon, Mar 2, 2009 at 11:31 PM, Phil Pirozhkov <pi...@mail.ru> wrote:
No objections.
The main thing i've choosen Ruby over Python some time ago was Python's compulsory indentation

On Tue, Mar 3, 2009 at 7:35 AM, ian eyberg <i...@telematter.com> wrote:
I totally agree with this as well

On Tue, Mar 3, 2009 at 8:57 AM, Carsten Nielsen <heyca...@gmail.com> wrote:
+1

On Tue, Mar 3, 2009 at 8:53 AM, Matthew King <autom...@gmail.com> wrote:
+1

Okay, not seeing a lot of people here sticking up for an indentation-sensitive syntax :)

Given that perhaps I'll put together and push a branch with end keywords in lieu of an indentation sensitive syntax.  If it's well received I'll merge it into master.

--
Tony Arcieri
medioh.com
Reply all
Reply to author
Forward
0 new messages