Like Erlang and Ruby (and unlike Python) Reia has a purely expression-based grammar where everything returns a value. In Python, all indentation-based constructs are all statements, and by nature of being statements cannot return values.
Reia attempts to introduce Ruby-style "blocks" into a Python-like grammar. Blocks are attached to a grammatical element which must be an expression: a function call. So, in order to have multi-line blocks Reia must have expressions with indentation blocks. I would also like to retain the pure-expression based nature of Erlang.
Guido found it unacceptable to embed an indentation-based block in the middle of an expression. I'm starting to see some of the cases where doing this allows for ugly and strange syntax, and can see why Guido wanted to disallow it for sake of syntactic purity.
Multi-line expressions allow some strange (and arguably ugly) syntax:
fun(a, b, c) do a + b + c .to_s()
Furthermore, every multiline expression will need to be "terminated" with a newline. So this:
v = if false 'works' else 'doesnt' puts(v)
...will be invalid. Every if statement, case statement, function call that accepts a multiline block, or multiline lambda declaration will need a blank line after it to terminate the expression:
v = if true 'works' else 'doesnt'
puts(v)
or you could use the weird syntax to invoke the puts method on the return value:
if true 'works' else 'doesnt' .puts()
This sort of odd syntax is also permissable as an effect of the way the grammar works:
v = if true 'works' else 'doesnt' ; puts(v)
Here semicolon is acting as the statement terminator, rather than a newline.
I'm going to add support for this in the lexer this weekend. However, I'm not really sure I'm happy about this syntax.
What do you think? Weird? Annoying? Or does it actually make sense?
This is the best I'm able to do in terms of solving Guido's "unsolvable puzzle" of multi-line expressions in a Python-like grammar. I'm certain Guido would not be happy with this solution, and I think he'd equally dislike that this has come as the result of something of a puzzle-solving exercise, rather than looking for the most elegant approach. I'll certainly admit it is somewhat inelegant.
Someone on #python pointed me at a language called Logix:
which is another Python-like language with an expression-based grammar. It makes use of the off-side rule in a way more akin to Haskell (whereas Reia uses it more like Python). It's another potential direction I could go. The other would be abandoning indentation sensitivity altogether and going with a more Ruby-like grammar.
Logix appears to have been the project of Hobo author Tom Locke. If anyone has his e-mail address I'd be curious to ask him what he thinks about this.
I'd rather go for "coding guidelines", than implementing coding restrictions in the language.
Yes, all the examples you've given look really ugly.
I remember Linus's phrase "If you need more than 3 levels of indentation, you're screwed anyway, and should fix your program" (http://en.wikiquote.org/wiki/Linus_Torvalds), and no matter what language i'm using, i'm trying to follow this.
No one can prevent people from using the good language to express ugly things.
-----Original Message-----
From: Tony Arcieri <t...@medioh.com>
To: reia@googlegroups.com
Date: Sat, 21 Feb 2009 13:32:16 -0700
Subject: [reia] Indentation sensitivity issues
> Like Erlang and Ruby (and unlike Python) Reia has a purely expression-based
> grammar where everything returns a value. In Python, all indentation-based
> constructs are all statements, and by nature of being statements cannot
> return values.
> Reia attempts to introduce Ruby-style "blocks" into a Python-like grammar.
> Blocks are attached to a grammatical element which must be an expression: a
> function call. So, in order to have multi-line blocks Reia must have
> expressions with indentation blocks. I would also like to retain the
> pure-expression based nature of Erlang.
> Guido found it unacceptable to embed an indentation-based block in the
> middle of an expression. I'm starting to see some of the cases where doing
> this allows for ugly and strange syntax, and can see why Guido wanted to
> disallow it for sake of syntactic purity.
> Multi-line expressions allow some strange (and arguably ugly) syntax:
> fun(a, b, c) do
> a + b + c
> .to_s()
> Furthermore, every multiline expression will need to be "terminated" with a
> newline. So this:
> v = if false
> 'works'
> else
> 'doesnt'
> puts(v)
> ...will be invalid. Every if statement, case statement, function call that
> accepts a multiline block, or multiline lambda declaration will need a blank
> line after it to terminate the expression:
> v = if true
> 'works'
> else
> 'doesnt'
> puts(v)
> or you could use the weird syntax to invoke the puts method on the return
> value:
> if true
> 'works'
> else
> 'doesnt'
> .puts()
> This sort of odd syntax is also permissable as an effect of the way the
> grammar works:
> v = if true
> 'works'
> else
> 'doesnt'
> ; puts(v)
> Here semicolon is acting as the statement terminator, rather than a newline.
> I'm going to add support for this in the lexer this weekend. However, I'm
> not really sure I'm happy about this syntax.
> What do you think? Weird? Annoying? Or does it actually make sense?
> This is the best I'm able to do in terms of solving Guido's "unsolvable
> puzzle" of multi-line expressions in a Python-like grammar. I'm certain
> Guido would not be happy with this solution, and I think he'd equally
> dislike that this has come as the result of something of a puzzle-solving
> exercise, rather than looking for the most elegant approach. I'll certainly
> admit it is somewhat inelegant.
> Someone on #python pointed me at a language called Logix:
> which is another Python-like language with an expression-based grammar. It
> makes use of the off-side rule in a way more akin to Haskell (whereas Reia
> uses it more like Python). It's another potential direction I could go.
> The other would be abandoning indentation sensitivity altogether and going
> with a more Ruby-like grammar.
> Logix appears to have been the project of Hobo author Tom Locke. If anyone
> has his e-mail address I'd be curious to ask him what he thinks about this.
> or you could use the weird syntax to invoke the puts method on the return > value:
> if true > 'works' > else > 'doesnt' > .puts()
> This sort of odd syntax is also permissable as an effect of the way the > grammar works:
> v = if true > 'works' > else > 'doesnt' > ; puts(v)
> Here semicolon is acting as the statement terminator, rather than a > newline.
I don't know if it's helpful, but a few years back I played with writing a grammar with the following attributes: * Was written test first (to see if TDD grammar design was doable) * No keywords at all * Used indentation in a similar way to python * Used a ":" to indicate the start of an indentation block
Due to using no keywords at all, everything was treated as a function call. (The aim in a way was to see if it was possible to put a pythonlike syntax on the front of a lisp-like language - using indentation & infix rather than prefix / brackets)
The key result I found was that it was possible, but that indented blocks needed some form of end keyword. That keyword didn't need to be defined (though its nice to have conventions :), but it did need to be there.
I never attached a backend to the parser/lexer, but the code works and is still accessible. I even gave a (slightly tongue in cheek :) lightning talk on it at Europython in 2005. As a result:
On Sun, Feb 22, 2009 at 4:59 AM, Phil Pirozhkov <p...@mail.ru> wrote:
> I'd rather go for "coding guidelines", than implementing coding > restrictions in the language.
The goal isn't so much being restrictive as using indentation to allow for a cleaner look.
Of course, I'm not really sure if it's being accomplished in this case.
The role of the blank line is the same as the "end" keyword in an expression-based grammar like Ruby's. Having it there implicitly seems somewhat weird to me though.
Requiring a blank line after a function which takes a block or a case/if statement seems like the one thing someone would have to do day-to-day which I don't necessarily like. The other things are just syntactic oddities that result from the way the grammar works at present.
That said, I'm going to go with it for now, then investigate a more Haskell/Logix-like grammar in another branch.
On Sun, Feb 22, 2009 at 6:09 AM, Michael Sparks <spark...@gmail.com> wrote: > The key result I found was that it was possible, but that indented blocks > needed some form of end keyword. That keyword didn't need to be defined > (though its nice to have conventions :), but it did need to be there.
I'm running into something similar, except expressions with indentation blocks need a statement separator to terminate them.
I'd like to avoid an explicit ending token (as I feel that goes against what an indentation sensitive grammar is trying to accomplish in the first place)
Using a blank line as an expression separator "token" seems like a decent compromise.
> From: Tony Arcieri <t...@medioh.com>
> On Sun, Feb 22, 2009 at 6:09 AM, Michael Sparks <spark...@gmail.com> wrote:
> > The key result I found was that it was possible, but that indented blocks
> > needed some form of end keyword. That keyword didn't need to be defined
> > (though its nice to have conventions :), but it did need to be there.
> I'm running into something similar, except expressions with indentation
> blocks need a statement separator to terminate them.
Is there a special reason for that? It looks at a glance that
if x
if y
puts("y")
else
puts("not y")
puts("i'm only seen when x is true")
puts("outside all ifs")
can be parsed, and i don't see any possible conflicts here
empty lines with any number of tabs/spaces, possibly ending with a comment
should be skipped ('$empty') during parsing
Currently,
module A
def a
if true
puts("y")
def b
..
doesn't work, and this means there's something wrong with indentation
> I'd like to avoid an explicit ending token (as I feel that goes against what
> an indentation sensitive grammar is trying to accomplish in the first place)
sure, ending token breaks the beauty, and just eats rows
> Using a blank line as an expression separator "token" seems like a decent
> compromise.
Excerpts from Tony Arcieri's message of Sun Feb 22 22:12:23 +0200 2009:
> On Sun, Feb 22, 2009 at 6:09 AM, Michael Sparks <spark...@gmail.com> wrote:
> > The key result I found was that it was possible, but that indented blocks > > needed some form of end keyword. That keyword didn't need to be defined > > (though its nice to have conventions :), but it did need to be there.
> I'm running into something similar, except expressions with indentation > blocks need a statement separator to terminate them.
> I'd like to avoid an explicit ending token (as I feel that goes against what > an indentation sensitive grammar is trying to accomplish in the first place)
> Using a blank line as an expression separator "token" seems like a decent > compromise.
For what it is worth, I am in favour of the Logix/Haskell approach.
Let's scan it with: load('lib/file.re') (~ok, scanned, _) = reia_scan::scan(File.read('indent.re').to_list()) commands This currently scans to the following tokens:
[(~if,1),(~identifier,1,~x),(~eol,1), (~indent,2),(~identifier,2,~puts),(~'(',2),(~string,2,"x==true"),(~')',2),( ~eol,2), (~dedent,3), (~identifier,3,~puts),(~'(',3),(~string,3,"outside all ifs"),(~')',3),(~eol,3)]
And it looks like it should be easily parsed. But reia_parser claims that "syntax error before: puts" on row 3
Let's look at it. That's how parser expects if expression to be: if_expr -> if_op expr eol indent statements dedent :
(~if,1) matches if_op, (~identifier,1,~x) matches expr, et c.
But if we look at the very beginning, we can see that:
Yes, right, we've added an additional EOL after dedent, as STATEMENTS expect it before next STATEMENT
Let's parse it reia_parse::parse(scanned)
(~ok,[(~if,1,(~identifier,1,~x),[(~funcall,2,(~identifier,2,~puts),[(~strin g,2,"x==true")])],(~else_clause,1,[(~atom,1,nil) ])),(~funcall,3,(~identifier,3,~puts),[(~string,3,"outside all ifs")])])
YES! We've found a snake, now it's time to think what to do with it.
It's quite clear that most statements end with EOL (end-of-line), but some end with EOL _and_ DEDENT, while we expect to have EOL even after multi-line multi-level STATEMENT.
It's late night already, and i don't see keyboard very sharp, same with the solution
On Sun, Feb 22, 2009 at 3:07 PM, Phil Pirozhkov <p...@mail.ru> wrote: > Is there a special reason for that?
Yes, it's a result of the expression-based grammar. I'll explain below
> It looks at a glance that
> if x > if y > puts("y") > else > puts("not y") > puts("i'm only seen when x is true") > puts("outside all ifs")
> can be parsed, and i don't see any possible conflicts here
Yes, this parses fine in Python. This is because Python separates out statements from expressions in its grammar, and has a special pushdown for "compound statements":
So here we see, Python's stmt_lists (single-line statements separated by semicolons) need a NEWLINE at the end, but compound_stmts (which have indentation blocks) do not. In fact, Python's statements with indentation blocks have no statement separators whatsoever.
Unfortunately, Reia has a pure expression-based grammar, which permits this sort of syntax:
x = if true 'yay' else 'boo'
Or to give an example which really necessitates being expression-based, a function call which takes a multiline block:
x = foo(1,2,3) do |bar| baz = bar * 42 baz ** 2
...or so forth. This means I can't separate out multiline expressions and special case them in the grammar the way Guido was able to do in Python. And that said, I'll definitely admit Guido's approach is more elegant. But if we want to go with Guido's approach, that means we can't have any multiline expressions, which means no blocks. And I want blocks...
In a poll I put together of "Ruby-features", blocks were (unsurprisingly) the most popular feature:
...and blocks don't really make sense in an indentation-sensitive grammar unless you have expressions which use indentation blocks.
empty lines with any number of tabs/spaces, possibly ending with a comment
> should be skipped ('$empty') during parsing
That's how the lexer works now, which is fine for a Python-like grammar. Indeed if you compare the way Reia's lexer works as compared to Python's, you'll see they work identically, although I wasn't specifically trying to copy Python's lexer.
I would need to add a special case for using blank lines to terminate expressions.
reia_parser claims that "syntax error before: puts" on row 3
> (~indent,2),(~identifier,2,~puts),(~'(',2),(~string,2,"x==true"),(~')',2),( ~eol,2), > (~dedent,3),(~eol,3), > (~identifier,3,~puts),(~'(',3),(~string,3,"outside all > ifs"),(~')',3),(~eol,3)]
> Yes, right, we've added an additional EOL after dedent, as STATEMENTS > expect it before next STATEMENT
> Let's parse it > reia_parse::parse(scanned)
> (~ok,[(~if,1,(~identifier,1,~x),[(~funcall,2,(~identifier,2,~puts),[(~strin g,2,"x==true")])],(~else_clause,1,[(~atom,1,nil) > ])),(~funcall,3,(~identifier,3,~puts),[(~string,3,"outside all ifs")])])
> YES! > We've found a snake, now it's time to think what to do with it.
Here's the problem: you're adding EOL to terminate the expression. But the scanner doesn't know where the expression ends... that requires the knowledge of the grammar. So we'd need what Guido describes as a "Rube Goldberg machine" in the scanner using some kind of lexer/scanner feedback. I'm avoiding that for a few reasons, first because I agree with Guido that such an approach is silly and second because I don't really know how to implement complex lexer/parser feeedback with leex/yecc.
It works in the case you're testing, because it has only one "clause". Try adding an else clause. That's where things start getting more difficult. Try a case statement with multiple clauses.
Then try to assign the return value of the if statement to a variable. This is where things really start to break down. If we wish to treat things like if and case statements (and more importantly, function calls with indent-based multiline blocks) as expressions, they need the same sort of terminator token that expressions require.
On Sun, Feb 22, 2009 at 3:21 PM, Eero Saynatkari <proje...@kittensoft.org>wrote:
> For what it is worth, I am in favour of the Logix/Haskell approach.
I certainly think it's worthy of investigation and from what I can tell feels like it meshes better with a purely expression-based grammar.
After modifying the lexer to work as best I think possible with the current approach, I will start (and push) a branch using this approach for comparison.
After both are available for evaluation, I'll await feedback and make a decision as to which path to pursue.
> From: Tony Arcieri <t...@medioh.com>
> On Sun, Feb 22, 2009 at 3:07 PM, Phil Pirozhkov <p...@mail.ru> wrote:
> Yes, it's a result of the expression-based grammar. I'll explain below
> > It looks at a glance that
> > if x
> > if y
> > puts("y")
> > else
> > puts("not y")
> > puts("i'm only seen when x is true")
> > puts("outside all ifs")
> > can be parsed, and i don't see any possible conflicts here
> Yes, this parses fine in Python. This is because Python separates out
> statements from expressions in its grammar, and has a special pushdown for
> "compound statements":
> So here we see, Python's stmt_lists (single-line statements separated by
> semicolons) need a NEWLINE at the end, but compound_stmts (which have
> indentation blocks) do not. In fact, Python's statements with indentation
> blocks have no statement separators whatsoever.
> Unfortunately, Reia has a pure expression-based grammar, which permits this
> sort of syntax:
> x = if true
> 'yay'
> else
> 'boo'
> Or to give an example which really necessitates being expression-based, a
> function call which takes a multiline block:
> x = foo(1,2,3) do |bar|
> baz = bar * 42
> baz ** 2
> ...or so forth. This means I can't separate out multiline expressions and
> special case them in the grammar the way Guido was able to do in Python.
> And that said, I'll definitely admit Guido's approach is more elegant. But
> if we want to go with Guido's approach, that means we can't have any
> multiline expressions, which means no blocks. And I want blocks...
> In a poll I put together of "Ruby-features", blocks were (unsurprisingly)
> the most popular feature:
> ...and blocks don't really make sense in an indentation-sensitive grammar
> unless you have expressions which use indentation blocks.
> empty lines with any number of tabs/spaces, possibly ending with a comment
> > should be skipped ('$empty') during parsing
> That's how the lexer works now, which is fine for a Python-like grammar.
> Indeed if you compare the way Reia's lexer works as compared to Python's,
> you'll see they work identically, although I wasn't specifically trying to
> copy Python's lexer.
> I would need to add a special case for using blank lines to terminate
> expressions.
> reia_parser claims that "syntax error before: puts" on row 3
> > Let's look at it. That's how parser expects if expression to be:
> > if_expr -> if_op expr eol indent statements dedent :
> > (~if,1) matches if_op,
> > (~identifier,1,~x) matches expr, et c.
> > But if we look at the very beginning, we can see that:
> > ])),(~funcall,3,(~identifier,3,~puts),[(~string,3,"outside all ifs")])])
> > YES!
> > We've found a snake, now it's time to think what to do with it.
> Here's the problem: you're adding EOL to terminate the expression. But the
> scanner doesn't know where the expression ends... that requires the
> knowledge of the grammar. So we'd need what Guido describes as a "Rube
> Goldberg machine" in the scanner using some kind of lexer/scanner feedback.
> I'm avoiding that for a few reasons, first because I agree with Guido that
> such an approach is silly and second because I don't really know how to
> implement complex lexer/parser feeedback with leex/yecc.
> It works in the case you're testing, because it has only one "clause". Try
> adding an else clause. That's where things start getting more difficult.
> Try a case statement with multiple clauses.
> Then try to assign the return value of the if statement to a variable. This
> is where things really start to break down. If we wish to treat things like
> if and case statements (and more importantly, function calls with
> indent-based multiline blocks) as expressions, they need the same sort of
> terminator token that expressions require.
On Mon, Feb 23, 2009 at 10:04 AM, Phil Pirozhkov <p...@mail.ru> wrote:
> Fix me if i'm incorrect > The final goal here is to allow:
> foo(1,2,3) do |bar| > bar * 2 > .print()
> and
> if x > "yes" > else > "no" > .print()
I wasn't pointing out these examples as something particularly desirable to have, only as valid syntax due to a fluke in the way indent blocks in the middle of expressions work. I assume things like this are the reason Guido didn't want indent blocks in expressions.
The final goal here would be to facilitate indent blocks in expressions without the need for a terminal newline, so...
x = if true 'yes' else 'no' puts(x)
or perhaps more importantly, in the case of blocks:
x = [1,2,3].map do |n| n * 42 puts(x)
In Python, statements (such as if) are among the highest precedence parts of the grammar, whereas in Reia (and Ruby, and Erlang) they are the lowest. Function calls are second from lowest. This means we can do things like use them in match expressions (as in the above), or unary expressions, or any other type of higher precedence expression that we desire.
To do this grammar needs to be able to handle expressions with or without indent blocks interchangably, and this is what doesn't let us special case statements the way Python does it. We can't go "for this set of statements, don't require a terminal newline" because the only set the grammar has to operate on is the entire set of expressions.
This isn't valid because the ".add" calls aren't at the end of an indent block.
> what prevents from adding a > puts("ground") on the same level of indentation? > This shouldn't confuse parser a lot, since yes, it begins with a dot
To give the Ruby equivalent of why you can't add a puts immediately after an indent block, it would look something like this:
if true 'yes' else 'no' end puts(x)
the "end puts(x)" part is not syntactically valid, and this is exactly how Reia's parser sees tokens in the case of a dedent immediately followed by another expression.
> i don't really think the followong should be supported: > if if s == "30" > 30 > else > 20 > > 25 > ...
On Sun, Mar 1, 2009 at 4:00 AM, Tom Locke <t...@tomlocke.com> wrote: > Hi Tony
> The Haskell-like indentation rules in Logix worked out pretty well, but I > never did anything like > if true > 'works' > else > 'doesnt' > .puts()
> Although I can't remember if that was impossible or I just always > parenthesised such things.
Yeah, that's not exactly a syntactic goal so much as an oddity of the way the current grammar works.
> These days I find I actually prefer Ruby's syntax, with the 'end' markers, > to something like python.
If I can't get the indentation-based grammar to work in a way I like, I'm considering ditching indentation-based syntax and just going with Ruby-style syntax.
> From: Tony Arcieri <t...@medioh.com> Date: Mon, 2 Mar 2009 13:29:14 -0700
> If I can't get the indentation-based grammar to work in a way I like, I'm > considering ditching indentation-based syntax and just going with Ruby-style > syntax.
No objections. The main thing i've choosen Ruby over Python some time ago was Python's compulsory indentation
On Tue, Mar 03, 2009 at 09:31:51AM +0300, Phil Pirozhkov wrote:
> > From: Tony Arcieri <t...@medioh.com> Date: Mon, 2 Mar 2009 13:29:14 -0700
> > If I can't get the indentation-based grammar to work in a way I like, I'm
> > considering ditching indentation-based syntax and just going with Ruby-style
> > syntax.
> No objections.
> The main thing i've choosen Ruby over Python some time ago was Python's compulsory indentation
On Tue, Mar 3, 2009 at 8:35 AM, ian eyberg <i...@telematter.com> wrote:
> I totally agree with this as well
> On Tue, Mar 03, 2009 at 09:31:51AM +0300, Phil Pirozhkov wrote:
>> > From: Tony Arcieri <t...@medioh.com> Date: Mon, 2 Mar 2009 13:29:14 -0700
>> > If I can't get the indentation-based grammar to work in a way I like, I'm >> > considering ditching indentation-based syntax and just going with Ruby-style >> > syntax.
> On Sun, Mar 1, 2009 at 4:00 AM, Tom Locke <t...@tomlocke.com> wrote:
> > Hi Tony
> > The Haskell-like indentation rules in Logix worked out pretty well, but I
> > never did anything like
> > if true
> > 'works'
> > else
> > 'doesnt'
> > .puts()
> > Although I can't remember if that was impossible or I just always
> > parenthesised such things.
> Yeah, that's not exactly a syntactic goal so much as an oddity of the way
> the current grammar works.
> > These days I find I actually prefer Ruby's syntax, with the 'end' markers,
> > to something like python.
> If I can't get the indentation-based grammar to work in a way I like, I'm
> considering ditching indentation-based syntax and just going with Ruby-style
> syntax.
On Sun, Mar 1, 2009 at 4:00 AM, Tom Locke <t...@tomlocke.com> wrote: > These days I find I actually prefer Ruby's syntax, with the 'end' markers, > to something like python.
On Mon, Mar 2, 2009 at 11:31 PM, Phil Pirozhkov <p...@mail.ru> wrote: > No objections. > The main thing i've choosen Ruby over Python some time ago was Python's > compulsory indentation
On Tue, Mar 3, 2009 at 7:35 AM, ian eyberg <i...@telematter.com> wrote: > I totally agree with this as well
On Tue, Mar 3, 2009 at 8:57 AM, Carsten Nielsen <heycars...@gmail.com>wrote:
> +1
On Tue, Mar 3, 2009 at 8:53 AM, Matthew King <automatt...@gmail.com> wrote: > +1
Okay, not seeing a lot of people here sticking up for an indentation-sensitive syntax :)
Given that perhaps I'll put together and push a branch with end keywords in lieu of an indentation sensitive syntax. If it's well received I'll merge it into master.