Syntax design brainstorming session

31 views
Skip to first unread message

Michiel

unread,
Mar 26, 2010, 5:07:20 PM3/26/10
to mist-discuss
There are some things I think would be good additions to the Mist
language. Most have been simmering in my head for a while. They make
sense. But some of them seem to conflict with each other. Let's look
at two of them. What's the best choice? Or can we somehow still have
our cake and eat it too?

These issues are mostly syntactical in nature. But am one of those who
believe syntax is important.

---------------(1)---------------
I quite like types that read from left to right, such as you may find
in Go. Examples:

[]int | Array of ints
^[[]char](int, int) | Pointer to string-map to (int, int) tuples
(int)(bool)int | int-function to bool-function to int

There have been plans for a template notation in Mist which, by
nature, acts the same way:

stack?int | Stack of ints
[]^stack?stack?bool | Array of pointers to stacks of stacks of bools

Variable declarations would look like this:

x: [][]int;

You could use the variables in the exact order shown in their
declaration:

x[1][2] | is of type int
---------------------------------

---------------(2)---------------
It seems (to me) quite natural to allow situations where either a type
or an expression is acceptable. More precisely, a type would be an
expression of type `type'. Not only would it simplify the grammar, it
would allow cool things like the following. First, two overloaded
functions for context:

void foo(int a, bool b, char c) { ... }
void foo(string s) { ... }

The following notation has long been planned for Mist:

foo!(int, bool, char) | The first overloaded function foo
foo!(string) | The second overloaded function foo

Note that these are function references, not calls. Now, allowing
expressions there would overload the !() notation for argument
binding:

foo!(42, bool, 'a') | Function of type (bool)void
foo!("bar") | Function of type ()void

There are other advantages to treating types and expressions the same
way. But the main reason is simplicity of the language.
---------------------------------

I think you can have either (1) or (2) but not both. If types are
written from left to right, and they can also be used in the same
places an expression could, there may be some ambiguity in the
grammar. For example, how should the following be parsed:

(x)(y)(z, z)

It might become a type: x-function to y-function to (z, z) tuple. It
might become an expression: Calling function x with argument y, which
returns a function which we call with arguments (z, z). Both should be
parsed differently, but we won't know which it is until some later
compiler phase. We can even complicate matters some more:

(x)(y)(z, z)(1, 2)

If the first part is a type, we're now calling its constructor. If
it's an expression, we're calling the function it returns.

I'm not sure if this is only an ambiguity on the grammar level. If
there is no ambiguity on the semantic level, we might even parse these
as sequences of generic things and disambiguate in the next compiler
phase. That's not very nice. But if it's completely unambiguous on the
semantic level... I might even consider it.

Or is there something else I'm missing?

Jonathan S. Shapiro

unread,
Mar 26, 2010, 7:34:36 PM3/26/10
to mist-d...@googlegroups.com
Two thoughts:
 
1. From experience, discussions of syntax will occupy infinite space, and for the most part will not be fruitful.
2. It's hard for people to evaluate syntax without a fairly complete picture of what the language syntax really looks like.

R. Matthew Schellhas

unread,
Mar 26, 2010, 10:57:26 PM3/26/10
to mist-discuss

On Mar 26, 4:07 pm, Michiel <mhelv...@gmail.com> wrote:
> I'm not sure if this is only an ambiguity on the grammar level. If
> there is no ambiguity on the semantic level, we might even parse these
> as sequences of generic things and disambiguate in the next compiler
> phase. That's not very nice. But if it's completely unambiguous on the
> semantic level... I might even consider it.

This is one of the core things I do in Tangent. There's a 3rd step
after lexing and parsing that does expression processing once the type
information is known (and identifiers properly resolved). It allows a
number of cool things, but it is rather awkward/novel for programmers
and it's fairly complex from an implementation standpoint. I'm not
sure it gains you much if all you're using it for is grammar
disambiguation. Though I'm not sure grammar disambiguation is the
correct phrase to use. I wouldn't be surprised if that sort of
ambiguity could be coerced into working by clever implementation of a
dynamic language.

Kannan Goundan

unread,
Mar 27, 2010, 1:41:41 AM3/27/10
to mist-discuss
A lot of syntax decisions boil down to personal aesthetic preferences,
and here's mine: I think that a little more verbosity in your type
expressions could make things more readable and uniform. Also, your
parens might be a little too overloaded. Something more explicit:

[]int --> array[int]
^[[]char](int, int) --> pointer[map[(int, int)]
(int)(bool)int --> func[int,func[bool,int]]
stack?int --> stack[int]
[]^stack?stack?bool --> array[pointer[stack[stack[bool]]]

I'm not totally against syntactic sugar, and I'd definitely add sugar
for functions (A -> B). Possibly for poitners, arrays, and maps, but
I'm not sure. But try and make sure user-defined types are also easy
to understand. For example, how would you handle types that accepted
multiple type parameters (like 'hashmap')?

> There are other advantages to treating types and expressions the same
> way. But the main reason is simplicity of the language.

One disadvantage is that the identifiers will now be in the same
namespace, which can be inconvenient to the programmer. For example,
if you had a type named 'color', you can now no longer have a variable
named 'color'. Some languages use capitalization to get around this
problem.

Also, this might make things confusing to the programmer. In your
last example:

> (x)(y)(z, z)(1, 2)

I don't like that the kind of operation being performed depends on
whether 'x' is bound to a type name vs a term name. At a glance, it
may be difficult for the programmer to remember what 'x' is bound to,
and therefore difficult to know what operation it is. IDEs can
mitigate this issue.

Michiel

unread,
Mar 27, 2010, 6:15:26 AM3/27/10
to mist-discuss
On Mar 27, 12:34 am, "Jonathan S. Shapiro" <s...@eros-os.org> wrote:

> Two thoughts:
>
> 1. From experience, discussions of syntax will occupy infinite space, and
> for the most part will not be fruitful.

True. I suppose I was banking on the small part that might be
fruitful.

> 2. It's hard for people to evaluate syntax without a fairly complete picture
> of what the language syntax really looks like.

Yes, good point. Of course, this is improbable, since I'm not sure
myself what the complete picture is yet. But I suppose it couldn't
hurt to write a page with the basics, readable within five minutes, to
which I could link when starting these discussions.

Michiel

unread,
Mar 27, 2010, 6:30:24 AM3/27/10
to mist-discuss
On Mar 27, 3:57 am, "R. Matthew Schellhas" <rms.tang...@gmail.com>
wrote:

> > I'm not sure if this is only an ambiguity on the grammar level. If
> > there is no ambiguity on the semantic level, we might even parse these
> > as sequences of generic things and disambiguate in the next compiler
> > phase. That's not very nice. But if it's completely unambiguous on the
> > semantic level... I might even consider it.
>
> This is one of the core things I do in Tangent. There's a 3rd step
> after lexing and parsing that does expression processing once the type
> information is known (and identifiers properly resolved).

Hm. I'd have to thoroughly figure out if this can't result in circular
reasoning somehow. (Mist has forward referencing of constant things
(functions and such).)

> It allows a number of cool things,

Now that I think about it, another cool thing would be user defined
operators with user defined precedence and associativity.

> but it is rather awkward/novel for programmers
> and it's fairly complex from an implementation standpoint.

That's certainly a downside. I was really hoping Mist parsing could
remain context-free.

> I'm not
> sure it gains you much if all you're using it for is grammar
> disambiguation. Though I'm not sure grammar disambiguation is the
> correct phrase to use. I wouldn't be surprised if that sort of
> ambiguity could be coerced into working by clever implementation of a
> dynamic language.

I'm not sure what a 'dynamic language' is, exactly (could mean many
things to me). I guess I should catch up on my reading.

Thanks!

Michiel

unread,
Mar 27, 2010, 8:50:46 AM3/27/10
to mist-discuss
On Mar 27, 6:41 am, Kannan Goundan <kan...@cakoose.com> wrote:

> A lot of syntax decisions boil down to personal aesthetic preferences,
> and here's mine: I think that a little more verbosity in your type
> expressions could make things more readable and uniform.  Also, your
> parens might be a little too overloaded.  Something more explicit:
>
> []int --> array[int]
> ^[[]char](int, int)  --> pointer[map[(int, int)]
> (int)(bool)int --> func[int,func[bool,int]]
> stack?int --> stack[int]
> []^stack?stack?bool --> array[pointer[stack[stack[bool]]]
>
> I'm not totally against syntactic sugar, and I'd definitely add sugar
> for functions (A -> B).  Possibly for poitners, arrays, and maps, but
> I'm not sure.

Ah, but *all* of those would be syntactic sugar for the template
notation. In the background, they are translated to something like
this:

[]int --> array?int
^[[]char](int, int) --> strong_pointer?map?(array?char, [int, int])

To clarify, [int, int] is a non-flattening tuple.

(int)(bool)int --> function?(int, function?(bool, int))
[]^stack?stack?bool --> array?strong_pointer?stack?stack?bool

> But try and make sure user-defined types are also easy
> to understand.  For example, how would you handle types that accepted
> multiple type parameters (like 'hashmap')?

Like this: map?(a, b)

(Not sure what type parameters a hashmap requires.)

> > There are other advantages to treating types and expressions the same
> > way. But the main reason is simplicity of the language.
>
> One disadvantage is that the identifiers will now be in the same
> namespace, which can be inconvenient to the programmer.  For example,
> if you had a type named 'color', you can now no longer have a variable
> named 'color'.  Some languages use capitalization to get around this
> problem.

A single namespace for everything was actually something I
intentionally borrowed from C++. When the programmer sees an
identifier, it would be immediately clear what is referenced, without
having to look at the context. This is, at least to me, an advantage.
Yes, I suppose the programmer would have to play with capitalization
and such.

> Also, this might make things confusing to the programmer.  In your
> last example:
>
> > (x)(y)(z, z)(1, 2)
>
> I don't like that the kind of operation being performed depends on
> whether 'x' is bound to a type name vs a term name.  At a glance, it
> may be difficult for the programmer to remember what 'x' is bound to,
> and therefore difficult to know what operation it is.  IDEs can
> mitigate this issue.

Yes, I completely agree. It's far from ideal. But I'm weighing this
against the advantages it might bring. It's a difficult issue.

R. Matthew Schellhas

unread,
Mar 27, 2010, 12:20:00 PM3/27/10
to mist-discuss

On Mar 27, 5:30 am, Michiel <mhelv...@gmail.com> wrote:
> On Mar 27, 3:57 am, "R. Matthew Schellhas" <rms.tang...@gmail.com>

> Hm. I'd have to thoroughly figure out if this can't result in circular
> reasoning somehow. (Mist has forward referencing of constant things
> (functions and such).)
>

I'm not quite sure what you mean. If you mean forward references like
C/C++, then... well, that's just a bad idea. I allow circular
references, and proper resolution wasn't too terribly difficult even
for a noob like me. Still, I kept that in mind during design. Some of
the things you want might make that hard/impossible.

> > It allows a number of cool things,
>
> Now that I think about it, another cool thing would be user defined
> operators with user defined precedence and associativity.

That was one of the primary motivators, though I infer the order of
operations rather than some arbitrary user defined precedence. It also
allows overloading by return type only, and generally needs less
parens/sugar. Still... it's a fairly big change and probably not worth
it if the language isn't built around it.

Michiel

unread,
Mar 27, 2010, 1:47:47 PM3/27/10
to mist-discuss
On Mar 27, 5:20 pm, "R. Matthew Schellhas" <rms.tang...@gmail.com>
wrote:

> > Hm. I'd have to thoroughly figure out if this can't result in circular


> > reasoning somehow. (Mist has forward referencing of constant things
> > (functions and such).)
>
> I'm not quite sure what you mean. If you mean forward references like
> C/C++, then... well, that's just a bad idea.

No, not like that. That's forward declarations. I'm talking about
forward referencing. Like a function call appearing textually before
the function declaration. Probably the same thing you're doing.
Circular references.

> I allow circular
> references, and proper resolution wasn't too terribly difficult even
> for a noob like me. Still, I kept that in mind during design. Some of
> the things you want might make that hard/impossible.

Yes, I'm considering so many crazy things, I need to make sure they're
actually safe. :-)

> > > It allows a number of cool things,
> >
> > Now that I think about it, another cool thing would be user defined
> > operators with user defined precedence and associativity.
>
> That was one of the primary motivators, though I infer the order of
> operations rather than some arbitrary user defined precedence.

Interesting. How does that work?

> It also
> allows overloading by return type only, and generally needs less
> parens/sugar. Still... it's a fairly big change and probably not worth
> it if the language isn't built around it.

Well, it wouldn't really be a `change' as such. :-) I've already
written two compiler versions of Mist from scratch. I'm starting again
with the third. Anyway, I'll have to think about this long and hard.

Thanks for your insight!

R. Matthew Schellhas

unread,
Mar 27, 2010, 4:16:19 PM3/27/10
to mist-discuss
On Mar 27, 12:47 pm, Michiel <mhelv...@gmail.com> wrote:
>
> Interesting. How does that work?
>

I make the assertion that all statements must result in void. Then
it's a matter of determining what permutation of operations 'makes
sense' (ie, result in void). Well, it's not quite that simple. There's
a set of rules/preferences that helps make sure that certain
permutations are more preferred than others to cut down on ambiguity.
And even then it's a balancing act between preventing compile-time or
runtime ambiguity and ease of use. Like operator associativity is
still an unsolved problem in the language at the moment.

Basically it's like type inference in reverse. Instead of taking the
operations and determining what type they result in, I force the
resultant type and determine what the operations are. Tangent exists
for me to explore if it was possible (it is), and if it's actually
usable/beneficial (the jury is still out on that one).

Reply all
Reply to author
Forward
0 new messages