The trouble with adopting a cat is that they always have kittens: hygiene for paradise and beyond

Eugene Burmako

unread,

Aug 10, 2013, 8:19:57 PM8/10/13

to scala-l...@googlegroups.com

Hi folks,

Now that the tree manipulation problem is solved, another problem becomes noticeable - the problem of hygiene / referential transparency. To quote Simon from [1]:

You can write code like tq"""Enum[$className]""", but if you want to be really sure that things don't break if people define their own Enum class in scope, one better does: tq"""_root_.java.lang.Enum[$className]""".

Of course, this doesn't have to be like that. In most LISPs and in a lot of other macro-enabled programming languages, tree construction is hygienic in the sense that bindings in macro-generated code are established using to some well-defined rules rather than using the HULK SMASH BIND principle. For example, in my candidacy exam write-up [2] I outlined such rules for Template Haskell, Nemerle and Racket, and of course there's a lot of other info to read on that matter on the web.

Therefore I propose that we now discuss what rules suit Scala macros and decide how to make it happen. I suggest we don't discuss whether these rules are going to be implementable or not - that's my problem - and focus solely on design.

1) The main goal to pursue is pretty clear: we want to write "import java.lang.Enum" at the definition site of the quasiquote mentioned above and then have this information carried to whatever macro expansion site the quasiquote ends up in.

2) And then there's the first tough choice. Do we preserve only top-level bindings, i.e. just "List" in "List(x, y)", or all sorts of bindings, i.e. "List", "x" and "y" in "List(x, y)"? The former represents a compromise between rigor and practicality, whereas the latter is rigorous, but quite inflexible. [3] provides a Nemerle vs Template discussion on this matter.

3) Sometimes we will want to break hygiene, so we need syntax for expressing the difference between hygienic and unhygienic names. Firstly, what should be the default? (I'd argue, hygiene). Secondly, how do we express the non-default? (I'd suggest introducing a special flavor of names that can then be spliced in identifier positions). Thirdly, do we provide special syntax for non-default? (In Template Haskell, they have/had %pi as a shortcut for $(dyn "pi"), but then we'll have to escape calls to methods named %).

4) How do we play nicely with 2.11, because with a very high probability we won't be ready by 2.11 code freeze (mid September)? Imho, the only thing to make sure here is aligning q"..." in paradise and q"..." in trunk. If we decide that hygiene should be the default, we should rename trunk's q/tq/cq/pq into something like uq/utq/ucq/upq to signify that they are unhygienic to make sure that we don't break people's code afterwards.

[1] https://groups.google.com/d/msg/scala-language/C7Pm6ab1sPs/-Sm4Pkz3O9EJ
[2] https://github.com/scalamacros/scalamacros.github.com/blob/master/paperstalks/2012-09-10-CandidacyExamPaper.pdf
[3] http://www.haskell.org/pipermail/template-haskell/2004-February/000250.html
[4] http://research.microsoft.com/~simonpj/tmp/notes2.ps

Cheers,
Eugene

martin odersky

unread,

Aug 11, 2013, 4:54:33 AM8/11/13

to scala-l...@googlegroups.com

On Sun, Aug 11, 2013 at 2:19 AM, Eugene Burmako <xen...@gmail.com> wrote:

Hi folks,

Now that the tree manipulation problem is solved, another problem becomes noticeable - the problem of hygiene / referential transparency. To quote Simon from [1]:

You can write code like tq"""Enum[$className]""", but if you want to be really sure that things don't break if people define their own Enum class in scope, one better does: tq"""_root_.java.lang.Enum[$className]""".

I just had an idea: If you want hygiene, why not write:

tq"""$Enum[$className]"""

?

Cheers

-- Martin

Of course, this doesn't have to be like that. In most LISPs and in a lot of other macro-enabled programming languages, tree construction is hygienic in the sense that bindings in macro-generated code are established using to some well-defined rules rather than using the HULK SMASH BIND principle. For example, in my candidacy exam write-up [2] I outlined such rules for Template Haskell, Nemerle and Racket, and of course there's a lot of other info to read on that matter on the web.

Therefore I propose that we now discuss what rules suit Scala macros and decide how to make it happen. I suggest we don't discuss whether these rules are going to be implementable or not - that's my problem - and focus solely on design.

1) The main goal to pursue is pretty clear: we want to write "import java.lang.Enum" at the definition site of the quasiquote mentioned above and then have this information carried to whatever macro expansion site the quasiquote ends up in.

2) And then there's the first tough choice. Do we preserve only top-level bindings, i.e. just "List" in "List(x, y)", or all sorts of bindings, i.e. "List", "x" and "y" in "List(x, y)"? The former represents a compromise between rigor and practicality, whereas the latter is rigorous, but quite inflexible. [3] provides a Nemerle vs Template discussion on this matter.

3) Sometimes we will want to break hygiene, so we need syntax for expressing the difference between hygienic and unhygienic names. Firstly, what should be the default? (I'd argue, hygiene). Secondly, how do we express the non-default? (I'd suggest introducing a special flavor of names that can then be spliced in identifier positions). Thirdly, do we provide special syntax for non-default? (In Template Haskell, they have/had %pi as a shortcut for $(dyn "pi"), but then we'll have to escape calls to methods named %).

4) How do we play nicely with 2.11, because with a very high probability we won't be ready by 2.11 code freeze (mid September)? Imho, the only thing to make sure here is aligning q"..." in paradise and q"..." in trunk. If we decide that hygiene should be the default, we should rename trunk's q/tq/cq/pq into something like uq/utq/ucq/upq to signify that they are unhygienic to make sure that we don't break people's code afterwards.

[1] https://groups.google.com/d/msg/scala-language/C7Pm6ab1sPs/-Sm4Pkz3O9EJ
[2] https://github.com/scalamacros/scalamacros.github.com/blob/master/paperstalks/2012-09-10-CandidacyExamPaper.pdf
[3] http://www.haskell.org/pipermail/template-haskell/2004-February/000250.html
[4] http://research.microsoft.com/~simonpj/tmp/notes2.ps

Cheers,
Eugene

--
You received this message because you are subscribed to the Google Groups "scala-language" group.
To unsubscribe from this group and stop receiving emails from it, send an email to scala-languag...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Simon Ochsenreither

unread,

Aug 11, 2013, 5:36:38 AM8/11/13

to scala-l...@googlegroups.com

I just had an idea: If you want hygiene, why not write:

tq"""$Enum[$className]"""

?

Yes, I had the exact same idea. It's doesn't work. One basically looses the best features of quasi-quotes and it's a lot _more_ verbose then copy-n-pasting it a few times around the code.

See other thread (more complete quote):

You can write code like ...

tq"""Enum[$className]"""

... but if you want to be really sure that things don't break if people define their own Enum class in scope, one better does:

tq"""_root_.java.lang.Enum[$className]"""

... but that gets pretty verbose and tedious, so one refactors it a bit:

val Enum = "_root_.java.lang.Enum"
tq"""$Enum[$className]"""

But this causes:

[error] exception during macro expansion:
[error] java.lang.AssertionError: assertion failed: "java.lang.Enum"
[error]     at scala.reflect.internal.Trees$AppliedTypeTree.<init>(Trees.scala:481)
[error]     at scala.reflect.internal.Trees$AppliedTypeTree$.apply(Trees.scala:478)
[error]     at scala.reflect.internal.Trees$AppliedTypeTree$.apply(Trees.scala:486)

Ouch. Of course, that's the right way to do it:

val Enum   = Select(Select(Ident(newTermName("java")), newTermName("lang")), newTypeName("Enum"))

Now I suddenly have to care about TermNames vs. TypeNames again!

I think we have two issues here with the current situation:
The "easy" way is extremely fragile and one gets no warning about it
Doing it "correctly" (fully-qualified names, no duplication) is a lot more verbose and invalidates one of the key features of quasi-quotes: Not having to mess with TermNames vs. TypeNames.
I think it is pretty unfortunate that the wrong way is easy, readable and concise while the right way is more complicated and verbose.

I really wonder why it breaks down so fast ... if string interpolators can figure out what's a type and what's a term in ...

tq"""_root_.java.lang.Enum[$className]"""

I really wonder why they can't do it for ...

val Enum = "_root_.java.lang.Enum"
tq"""$Enum[$className]"""

That feels really inconsistent.

Simon Ochsenreither

unread,

Aug 11, 2013, 5:40:15 AM8/11/13

to scala-l...@googlegroups.com

Imho there should be a way to tell macros to enforce that types/terms resolve to the same thing at both declaration site and use site.
Or even better, only consider types available at the the declaration site by default and make people introduce use-site types explicitly.

Eugene Burmako

unread,

Aug 11, 2013, 5:46:56 AM8/11/13

to scala-l...@googlegroups.com

The problem is that in your example the string gets silently promoted to Literal(Constant("...")), and then the compiler tries to apply a type argument to it, which doesn't make sense.

To be honest, I'm proud of Scala starting to enforce some tree invariants. In good old days, this would probably crash much later, somewhere in-between mixin and erasure. Maybe in refchecks :)

So back to the problem we were discussing. In a sense, quasiquotes do exactly what you requested, but I think we should still disallow splicing strings into positions where names are expected to avoid confusion.

***

As for Martin's suggestion, I think he meant that when we write:

import java.lang.Enum
q"$Enum"

Quasiquotes should: a) allow that, b) remember the lexical context for names introduced that way.

This is a very cute shortcut - concise and not requiring any special syntax. My only concern here is that in that case we make hygiene optional, because people (especially the newcomers) will sometimes forget to put dollars in front of those names.

Eugene Burmako

unread,

Aug 11, 2013, 5:50:07 AM8/11/13

to scala-l...@googlegroups.com

Yes, I agree with your last suggestion here. That's what I meant by proposing to turn hygiene on by default.

You could read up more on how people model that in my candidacy write-up that I linked above (sections II-B and III, and also papers from References).

On Sunday, 11 August 2013, Simon Ochsenreither <simon.och...@gmail.com> wrote:

> Imho there should be a way to tell macros to enforce that types/terms resolve to the same thing at both declaration site and use site.
> Or even better, only consider types available at the the declaration site by default and make people introduce use-site types explicitly.
>

martin odersky

unread,

Aug 11, 2013, 6:39:49 AM8/11/13

to scala-l...@googlegroups.com

On Sun, Aug 11, 2013 at 11:46 AM, Eugene Burmako <eugene....@epfl.ch> wrote:

The problem is that in your example the string gets silently promoted to Literal(Constant("...")), and then the compiler tries to apply a type argument to it, which doesn't make sense.

To be honest, I'm proud of Scala starting to enforce some tree invariants. In good old days, this would probably crash much later, somewhere in-between mixin and erasure. Maybe in refchecks :)

So back to the problem we were discussing. In a sense, quasiquotes do exactly what you requested, but I think we should still disallow splicing strings into positions where names are expected to avoid confusion.

***

As for Martin's suggestion, I think he meant that when we write:

import java.lang.Enum
q"$Enum"

Quasiquotes should: a) allow that, b) remember the lexical context for names introduced that way.

Yes, exactly. Sorry if I was too obscure about it.

What do quasiquotes currently do with it? I don't see a reason why they would not allow it, but am not sure whether they retain the lexical context.

This is a very cute shortcut - concise and not requiring any special syntax. My only concern here is that in that case we make hygiene optional, because people (especially the newcomers) will sometimes forget to put dollars in front of those names.

Yes, that's the tradeoff. But I like it because it's super simple.

Cheers

- Martin

Eugene Burmako

unread,

Aug 11, 2013, 6:46:13 AM8/11/13

to scala-l...@googlegroups.com

In essence, quasiquotes require splicees to be either trees or liftables (i.e. convertible to trees via a type class). (There are more rules, but it's irrelevant for this discussion).

Therefore, since Enum is neither (and, in fact, Enum is not even a term), it will be an error.

Retaining lexical context in an untyped setting is a challenging problem, whose implementation strategy is yet unclear, so we didn't tackle it in 2.11.0-M4.

It is simple indeed, but even simpler is silently treating all identifiers as hygienic ones :)

> --
> You received this

Simon Ochsenreither

unread,

Aug 11, 2013, 8:06:05 AM8/11/13

to scala-l...@googlegroups.com

Yes, I agree with your last suggestion here. That's what I meant by proposing to turn hygiene on by default.

You could read up more on how people model that in my candidacy write-up that I linked above (sections II-B and III, and also papers from References).

Thanks, will do!

Lex Spoon

unread,

Aug 11, 2013, 8:26:54 PM8/11/13

to scala-l...@googlegroups.com

On Sun, Aug 11, 2013 at 5:46 AM, Eugene Burmako <eugene....@epfl.ch> wrote:
> My only concern here is that in that case we make hygiene optional, because
> people (especially the newcomers) will sometimes forget to put dollars in
> front of those names.

+1

Inside quasi-quotes, plain identifiers should lexically bind in the
scope at the site of the quasi-quote. If you instead want to do the
lookup at the place the macro was invoked, then you should have to use
some non-default syntax that makes it obvious you are doing something
risky.

I have not followed this issue closely, but if both Enum and $Enum are
going to be supported, surely it's best to make Enum be hygienic and
$Enum be the wild and wooly one.

Lex

Eugene Burmako

unread,

Sep 8, 2013, 6:07:27 AM9/8/13

to scala-l...@googlegroups.com

Also see discussion at https://groups.google.com/forum/#!topic/scala-internals/ImJUXXMTTIM

Eugene Burmako

unread,

Sep 8, 2013, 6:23:07 AM9/8/13

to scala-l...@googlegroups.com

A brief summary of how Clojure works, using information from http://www.mail-archive.com/clo...@googlegroups.com/msg15293.html and from google.

Their implementation is almost trivial in a sense that referential transparency is maintained by simple global name lookups during quasiquoting and hygiene is deferred to the programmer (though with a nice syntax for gensym). Unfortunately, it also has several peculiarities that prevent lexical scoping from working consistently in quasiquotes. Situations like 1b and 1c show that full-fledged hygiene isn't just a tool for theorists writing crazy macros, but that it actually makes a lot of important things simpler to express and reason about.

Referentially transparent

Preserves global bindings

bar=> (ns bar)
nil
bar=> `(let (list 100))
(clojure.core/let (clojure.core/list 100))

Disregards local bindings

user=> (defmacro m [x] (let [list 1] `(list ~x)))
#'user/m
user=> (m 2)
(2)

Doesn’t distinguish binders from bindees

bar=> `(let [foo 100] (foo 100))
(clojure.core/let [bar/foo 100] (bar/foo 100))
When used in a macro expansion, this won’t compile!

Not hygienic

Locally introduced names (`’foo) can be inadvertently captured
There’s no way to specify the target of the capture
Has convenient gensym that works within the same quasiquote

user=> (defmacro m [x] `(let [x# 1] x#))
#'user/m
user=> (m 2)
1

Eugene Burmako

unread,

Sep 8, 2013, 6:07:06 PM9/8/13

to scala-l...@googlegroups.com

This is what I've inferred about Racket's approach to hygiene after soem googling and chatting in blogs. I've yet to read the doc on Racket's syntax model [1], so my interpretation of how hygiene in Racket works might be somewhat incorrect.

1) Referential transparency. When building a quasiquote, Racket resolves every identifier in the enclosing lexical context and remembers that information for future use. When expanding a macro that contains multiple quasiquotes thrown together, Racket first resolves bindings within the expansion, and then if some identifiers can't be bound, it uses the remembered bindings. This guarantees referential transparency and also handles cases 1b and 1c in the Clojure summary above.

2) Hygiene. Before expanding a macro, Racket puts a unique mark on all the trees constituting the macro application. Then, immediately after the macro returns a resulting tree, Racket again puts the same mark on all the trees in the expansion. If a tree has two marks of the same kind, they cancel each other, so everything that comes from the original macro application will become unmarked, whereas newly generated trees will become marked. Afterwards, when determining bindings, a bindee and a binder are allowed to connect only if their marks are the same.

This algorithm proposed by Dybvig et al. in [2] is very elegant in the sense that:
a) It naturally separates trees that come from the original program and trees that are synthetic.
b) It doesn't require quasiquotes to be hygiene-aware, as it works equally well independently of the facility that's used to construct trees.
c) It provides a facility for fine-grained control over lexical scope. For example, by manually putting a mark corresponding to the original program onto a synthetic tree, a programmer tells the macro system to bind the tree to something declared in the original program. Compare this with gensym, which tells to bind the generated identifier to whatever is nearest in scope, regardless of what transcription step that whatever comes from.

[1] http://docs.racket-lang.org/reference/syntax-model.html
[2] http://www.cs.indiana.edu/~dyb/pubs/LaSC-5-4-pp295-326.pdf

Reply all

Reply to author

Forward