It is well-known that the top level is hopeless.[1] At this point, it
seems unsalvageably ad-hoc. I’ve made peace with this, since I can
largely avoid touching the top level — Racket provides a nice,
predictable, decidedly non-hopeless module system. Unfortunately, I am
sometimes forced to face the top level’s existence, and therefore its
unruliness, every time I must modify my code to support the REPL.
Making Hackett support the Racket REPL has grown increasingly difficult.
The expansion model differs in subtle ways, but they are significant
enough to regularly cause problems, especially as Hackett grows ever
more complex. Generally, the contortions required have been localized
and tractable, but a recent change I’ve attempted — which works fine in
a module — falls flat on its face in the top level in a way I have not
managed to work around.
I started wondering: why does the REPL need to be synonymous with the
top level? Sure, there are things about the REPL that benefit from
weakener guarantees than those of a module, but as far as I can tell,
these are restricted to redefinition/shadowing and delayed binding of
variables to permit mutual recursion. Seeing as this delayed binding
doesn’t work in Hackett, anyway, since Hackett is typed, the lawlessness
of the top level seems like an unnecessarily expensive price to pay for
the little I get in return.
I had a conversation with Spencer Florence earlier today on this topic,
and he described a hypothetical alternative semantics for the REPL that
I find compelling. Given a sequence of REPL interactions:
e_0 e_1 e_2 ... e_n
These could be interpreted the same way as a series of lexically nested
submodules:
(module repl_0 racket
e_0
(module* repl_1 #f
e_1
(module* repl_2 #f
e_2
...
(module* repl_n #f
e_n) ... )))
This has some pleasant properties. Here are the ones that immediately
come to mind:
1. Expansion is completely predictable and consistent with expansion
in a module (assuming #%module-begin doesn’t do anything strange
with module* submodules), and it provides a meaning for REPL
interactions in terms of the much more well-defined semantics of
Racket’s module system.
2. Multiple definitions of the same identifier within the same REPL
interaction (that is, combined with a begin form) are an error,
just as they would be in a module. For multiple definitions of
the same identifier in separate interactions, the later
definition shadows the earlier one.
3. Free variables in an expression are a syntax error, the same way
they are within a module.
4. Mutually recursive definitions can be defined in the same REPL
interaction by defining both at once using begin. This is
potentially less convenient than the behavior of the top level,
but I think it is a small price to pay for predictable expansion
and error reporting.
There are a few downsides. The obvious one is the change in mutually
recursive definition behavior, and another is that set! on
previously-declared identifiers no longer works. However, this could be
solved by implicitly rewriting definitions to set!-transformers, so it’s
possible to restore this behavior without losing a semantics defined
exclusively in terms of modules.
Unfortunately, as far as I can tell, it isn’t possible to implement this
behavior in terms of the concepts Racket already provides, since one
cannot add a submodule to a module once it has already been declared.
However, since module* submodules are morally just modules declared at
the end of the current module, Spencer also pointed out that it’s
possible to perform a mostly-equivalent translation using ordinary
modules instead of nested submodules:
(module repl_0 racket
(provide (all-defined-out))
e0)
(module repl_1 racket
(require 'repl_0)
(provide (all-defined-out))
e1)
(module repl_2 racket
(require 'repl_0 'repl_1)
(provide (all-defined-out))
e2)
...
(module repl_2 racket
(require 'repl_0 'repl_1 'repl_2 ... 'repl_n-1)
(provide (all-defined-out))
en)
However, this translation is imperfect: it does not preserve bindings
imported using require. Still, at least it is possible to implement.
Here is a short program that implements a naïve version of the above
translation:
#lang racket
(require syntax/datum
syntax/stx
syntax/strip-context)
(define modules-so-far '())
(let ([old-eval (current-eval)])
(current-eval
(lambda (stx)
(with-datum ([new-mod (gensym)]
[[mod-path ...] modules-so-far])
(old-eval
(quasidatum
(module new-mod racket
(require 'mod-path ...)
(provide (all-defined-out))
(undatum (strip-context (stx-cdr stx))))))
(dynamic-require (datum 'new-mod) #f)
(set! modules-so-far
(cons (datum new-mod) modules-so-far))))))
I imagine such a naïve approach has additional downsides in speed and
memory overhead, but the core technique does seem sound.
To be clear, I do NOT propose replacing the Racket REPL with one that
uses this new model, but I would like to have the ability to define a
REPL for my own language that takes this approach (or a similar one).
So I ask a few questions:
1. Would people find such an approach to making the REPL less
hopeless valuable?
2. Are there any glaring flaws in the above description that would
make it either difficult or undesirable to implement?
3. If not, what is the right way to implement this? Could it be done
in a way that is efficient, but shares machinery with existing
module expansion?
In any case, I may try a fancier version of the naïve approach for
Hackett, to experiment with the idea. I would be happy to be able to
have a REPL for my language without having to be so careful to please
the top level.
Alexis
[1]:
https://gist.github.com/samth/3083053