I'm wondering if there's interest in a patch that opens up the read
table to manipulation from Clojure code, specifically for creating
reader macros a la Common Lisp. If there's interest, and Rich isn't
already doing this, I'd be delighted to put together a prototype that
could be a basis for further discussion.
An immediate application for such a patch would be writing a Clojure
subset of Edi Weitz' CL-INTERPOL. This will give us a lot of clarity
in the discussion on how/when/if Clojure should add explicit reader
syntax for regexps in the base distribution, because folks can
actually see the proposed reader syntax in action and try out corner
cases for themselves, live.
For reference, CL-INTERPOL provides facilities (largely inspired by
Perl/Ruby interpolating strings) such as:
1. Strings with arbitrary quote delimiters:
CL-USER> #?|Bob said, "Welcome to the jungle!"|
"Bob said, \"Welcome to the jungle!\""
2. Interpolation of variables and arbitrary Lisp forms within strings:
CL-USER> (let ((x "Sun")) #?"The $(x) Dog")
"The Sun Dog"
CL-USER> (mapcar (lambda (x) #?"$(x) * 16 = $((* x 16))") '(1 2 3 4 5))
("1 * 16 = 16" "2 * 16 = 32" "3 * 16 = 48" "4 * 16 = 64" "5 * 16 = 80")
3. Special syntax for quoting regexps:
CL-USER> #?"\\" ; Not a regex quote
"\\"
CL-USER> #?/\\/ ; This is a regex quote
"\\\\"
CL-USER> #?rx/ ^ \s+ \w+ \s* [=:] \s* ( \S .* ) / ; Extended regex quote
"^\\s+\\w+\\s*[=:]\\s*(\\S.*)"
Having used CL-INTERPOL and CL-PPCRE for text mangling in CL (and a
lot of Perl and Ruby), I find it quite hard to do serious regex
slinging without convenient reader syntax. If there's interest, I'll
get cracking on a prototype reader macro system.
Thoughts?
Darshan
I believe Clojure can provide reader macros that do not affect
interoperability at all. Reader macros cause problems when a
read-table modification performed in one source file affects other
source files that did not expect such a modification. I'd suggest
preventing such read-table damage as follows:
1. Read-tables become immutable, thus the base Clojure read-table (or
any other read-table for that matter) can never be modified.
2. Read-tables be strictly scoped to their source files. A source file
that modifies the read-table replaces only its own immutable
read-table with the new version - it cannot affect the reader in other
source files.
For instance, if there's a string interpolation library
string-interpolate.clj that provides reader macros, and I want to use
it in another file, text-mangler.clj, I'd use it like this:
(load "string-interpolate.clj")
(string-interpolate/enable-interpolate-syntax)
Where the call to enable-interpolate-syntax is necessary to modify the
read-table for text-mangler.clj, and the read-table modification lasts
only until the end of text-mangler.clj.
This could be implemented as follows:
1. clojure.lang.Compiler maintains a thread-local Stack of
read-tables. Each call to Compiler.load(Reader) pushes the default
Clojure read-table onto the stack. Each call to LispReader.read from
Compiler passes in the top read-table in the thread-local stack for
the reader to use (if the reader is not provided a read-table
explicitly, or is passed a null read-table, it uses the base
read-table).
2. Any code that attempts to modify the read-table replaces the top
read-table in the current thread's read-table stack with the new
read-table. If there is no read-table in the stack, i.e. no
Compile.load() call is currently active, attempts to modify the
read-table trigger an assert.
3. When Compiler.load(Reader) returns, it pops the read-table stack.
4. A source file may reset the read-table for the remainder of the
file to the default read-table at any time by calling a function,
maybe (reset-read-table), or just (set-read-table nil).
Such a system of read-tables cannot cause action-at-a-distance
read-table damage, and would still allow for extremely useful reader
macros. But we could go a little further and make them even more
handy:
* Provide functions to access the current read table and the default
read-table: possibly (clojure/current-read-table) and
(default-read-table). current-r-t would only return non-nil in the
context of a Compiler.load(), but its value can be remembered in a def
for future use: (def x (current-read-table)).
* Allow passing in a read-table to (clojure/load) and (clojure/read)
from Clojure code. (load) and (read) will always default to the base
Clojure read-table, but the default can be overridden by explicitly
passing in a different read-table. This allows loading configuration
files with a read-table that provides reader-goodies for end-user
convenience:
(load ".configure-myapp.clj" (my-read-table))
Other potential issues:
* Two different libraries may use the same dispatch characters by
default, but this can be solved easily if the libraries allow the
client code to specify the dispatch characters they should attach
their reader macros to. For instance:
(enable-interpolate-syntax \I) ; Uses #I"" for interpolating strings.
If the libraries do not allow choosing the dispatch character, they
cannot be used together in the same source file, but that can be
easily fixed (presuming you have the source, or don't mind
re-modifying the read-table after the library has done its thing), and
that little inconvenience is preferable to not having any reader macro
support. :-)
* A library may use a dispatch character that Clojure subsequently
assigns a meaning to. Again, this is not a problem, because old code
that uses the library will not by definition be using the new reader
syntax, so that code will not be affected by the library taking over
that dispatch character, and new code can tell the library to use a
different character. Clojure could also reserve any dispatch
characters it likes for future use - perhaps even reserve all
#<punctuation>.
----
As you note, reader macros provide a lot of power, and while most code
will never need reader macros, some code can benefit powerfully from
them. It is possible to fake reader macros in very roundabout ways
(provide custom definitions of load and read that check characters
against a dispatch table, and call the underlying read for characters
not in the custom dispatch table), but life would be a lot easier if
the language supports them directly.
What do you think of this proposal? If there's no interest in (safe)
reader macros at all, I'll drop this discussion, but if there is
interest, I'd be delighted to put together a prototype.
Thanks,
Darshan