Proposal: New Reader Macro #s{ ... }

66 views
Skip to first unread message

Greg

unread,
Oct 11, 2009, 5:38:14 PM10/11/09
to clo...@googlegroups.com
Dear Clojure group and developers,

In a recent discussion on #clojure it was pointed out that another
language called newLISP has an excellent feature that would be neat to
adopt into clojure, namely its special text delimiters {} and [text][/
text]. It uses these delimiters to specify verbatim text (i.e. what's
in the parens is *exactly* what the string is, including the newlines).

This feature makes it incredibly easy to write and include various
bits of text in the language such as example code, html, and it makes
writing regular expressions simple by avoiding the need for some
escapes.

For example (newLISP code):

(replace {"quoted" text} my-str {"quoted" string})

vs

(replace "\"quoted\" text" my-str "\"quoted\" string})

As this has numerous advantages we discussed how such a construct
could be brought in to the benefit of Clojure, as in Clojure both the
{} and [] characters are reserved.

The following candidates were considered and rejected for various
reasons:

<> ; rejected because conflicts with statements like (< x 1)
#"" ; rejected because represents regex
#[] ; rejected because implies some sort of data structure like sets,
#{}
[t][/t]; rejected because conflicts with arrays

Finally we agreed that #s{ ... } would make a nice fit, as it fits
nicely with clojure's existing syntax and tendency to use the sharp to
signify a shorthand for something. On irc 'Chousuke' pointed out that
this construct could be used to make it easier to write doc strings
that include sample code for Clojure's functions, but of course there
are many other uses for such a construct (which I should note exists
in many other languages as well, even bash, but I referenced newLISP
as it's also a lisp and has a particularly elegant implementation).

Any and all input is welcome on this proposal!

Kind regards and thanks in advance for taking this into consideration,
Greg (irc: itistoday)

James Reeves

unread,
Oct 11, 2009, 7:31:50 PM10/11/09
to Clojure
What if you need to use braces? It seems to me that any syntax for
representing long strings needs a terminator that is unlikely to occur
within the string itself. For example, Python uses """, and XML CDATA
uses ]]>, both of which are character sequences unlikely to turn up in
a string. By contrast, an ending brace } is not rare enough to be used
as a terminator, IMO.

- James

Greg

unread,
Oct 11, 2009, 8:03:41 PM10/11/09
to James Reeves, clo...@googlegroups.com
True, and it is why newLISP uses two delimiters for this purpose, {}
and [text][/text]. The latter might not work well in clojure though
because of its use of arrays.

The former is used for short strings, often for the purpose of regular
expressions or to avoid quoting quotes, and the latter is used when
you have a lot of text, or your text contains braces.

Your point is valid though because say for example you want include
Clojure sample code in the docs for a function, you are very likely to
run across braces. For this reason it may be useful to include two
different delimiters, a short-hand for short strings or strings that
don't include braces, and a long-hand for when you need it.

For the shorthand I still think #s{} makes a good fit for clojure, and
for a long hand something else could be done. Personally I like the
triple-quote you suggested because it's effective and aesthetically
pleasing.

Or you can just go with """ and scrap the #s{}, either way though I
think this would make an exciting addition to Clojure.

Hopefully the devs watch this list? Or should I also suggest this on
clojure-dev?

- Greg

Richard Newman

unread,
Oct 11, 2009, 8:11:28 PM10/11/09
to clo...@googlegroups.com
> True, and it is why newLISP uses two delimiters for this purpose, {}
> and [text][/text]. The latter might not work well in clojure though
> because of its use of arrays.

Reader macros have full access to the text stream, so it would be
straightforward to define a Perlish heredoc syntax for big literals,
e.g.

#_SOMETEXT
foo bar
...

SOMETEXT

I don't really see much point in this proposal as a whole, though --
almost every syntax will require escaping *something*, whether it be
quotes or curly braces -- so don't view this as me championing anything.

I certainly consider triple-quotes to be nicer than #s{}, which is
(IMO) visually repellent.

The reader macro approach has no significant advantage over triple
quotes, because either way involves a change to Clojure's Java code.
Indeed, it might be impossible to escape } within the reader macro
version, which defeats the point.

Greg

unread,
Oct 11, 2009, 9:01:49 PM10/11/09
to clo...@googlegroups.com
> Reader macros have full access to the text stream, so it would be
> straightforward to define a Perlish heredoc syntax for big literals,
> e.g.

Sure, heredoc could be done, it has the nice advantage of being
flexible, guaranteeing that you can escape any text. The only issue is
that like Perl, it's pretty ugly. I wouldn't object to it though as
long as there was another alternative shorthand for expressing
strings. The purpose of which would be primarily to make writing
regular expressions easier, but also to deal with strings that have
quotes in time, which are found quite frequently when dealing with UI
stuff and user-interaction.

> almost every syntax will require escaping *something*

Yes, but the chance that you'll find *both* a quote and a brace in a
short string is very low. Hence the utility of an alternative
shorthand for short strings.

And having a verbose alternative (heredocs, triple-quote, whatever)
will bullet-proof you against other situations.

- Greg

Greg

unread,
Oct 11, 2009, 9:04:00 PM10/11/09
to clo...@googlegroups.com
On Oct 11, 2009, at 9:01 PM, Greg wrote:
> The purpose of which would be primarily to make writing regular
> expressions easier, but also to deal with strings that have quotes
> in time

Sorry, that should read "quotes in them". ;-)

Mark Derricutt

unread,
Oct 11, 2009, 10:18:31 PM10/11/09
to clo...@googlegroups.com
{"quoted" text} is also a map, so conflicts with existing syntax.

Mark Derricutt

unread,
Oct 11, 2009, 10:19:36 PM10/11/09
to clo...@googlegroups.com
Ignore that, replied before reading fully to see #s{...} :(

John Harrop

unread,
Oct 11, 2009, 11:04:18 PM10/11/09
to clo...@googlegroups.com
How about borrowing a page from LaTeX? That has a \verb+text+ which can use any desired delimiter character. My thought is to have something like $+.....+ turning whatever was between the character following $ (here, +) until the next occurrence of that character into a literal string. A way to escape the chosen character is still desirable, though it could be chosen in most cases not to be needed. Perhaps the sequence $+ recurring would translate to a literal + (and any other $x digraph into a literal $x digraph). $ seems like a good choice because it suggests the letter s, as in string, is not already overloaded in Clojure for some following symbols (unlike #), and very rarely occurs in text with a nonwhitespace character afterward. (This post stands as an obvious exception, but even then, I've used $+ and $x only so using a third nonwhitespace character after the $ would allow quoting this post verbatim in the same manner.)
Sole limitation on the following character would be that it not be whitespace (a stand-alone $ would be treated the same as it currently is, as would a $ with no leading whitespace, so referencing nested Java classes would not be broken; multicharacter symbols starting with $ would be but I expect there is as of yet little or no preexisting code with such symbols).

Danny Woods

unread,
Oct 12, 2009, 4:40:43 AM10/12/09
to clo...@googlegroups.com
Perl and Ruby do something similar with regular expressions, where the
character following 'm' or 's' becomes the delimiter for that
expression, making 'm/\/some\/path/' identical to 'm!/some/path!'. The
delimeter can be 'smart' as well, where the closing delimiter is
dependent upon the opening one (in the case of brackets, parentheses and
braces) leading to expressions like 'm(/some/path)'. It would be really
handy if Clojure had a similar #s!!, #s(), #s//, etc. facility.

DanL

unread,
Oct 12, 2009, 11:20:18 AM10/12/09
to Clojure
I'm aware that this already has been discussed, and until recently I
was content with clojure not exposing the readtable, because of the
inherent modular problems. However, for CL, a nice approach to
readtables was lately released:

http://trittweiler.blogspot.com/2009/10/ann-named-readtables-09.html

I dabbled with it for some time and, in my opinion, this is a viable
solution, even though Clojure already is full of reader macros, such a
facility could be the solution to this kind of problem.

Sure, one can like reader macros, and one can detest them, but in
general I tend to prefer freedom to restriction, and nobody would be
forced to use such a facility.

Regards,

dhl

Greg

unread,
Oct 12, 2009, 2:43:56 PM10/12/09
to clo...@googlegroups.com
I'm not sure I follow what John is saying (some of the terminology is
unfamiliar to me, what the heck's a digraph? :p), but is he suggesting
that $+ not be followed by a whitespace to count as a string? If so, I
think that's undesirable as that is 1) confusing, and 2) limits the
sorts of strings that you can create with it when the point of this
addition would be to expand your freedom when creating strings. If
I've misunderstood however, please let me know.

> It would be really handy if Clojure had a similar #s!!, #s(), #s//,
> etc. facility.

I just vote for #s{ .. } because braces to me are visually appealing
and seem to imply a "grouping" around something. There's a nice
symmetry about them (unlike #s//), they're not associated with the
ever pervasive parens (which are all over the place in lisp, and
should be used to connotative functions calls or lists, and that's
it), and isn't arbitrarily angry (looking at you, #s!!). ;-)

- Greg

Greg

unread,
Oct 12, 2009, 2:46:06 PM10/12/09
to clo...@googlegroups.com
Forgive me, but I'm unfamiliar with the readtable, are you just
referring to where this syntax might be implemented? Or are you
suggesting an alternative syntax?

- Greg

DanL

unread,
Oct 12, 2009, 2:59:03 PM10/12/09
to Clojure
On 12 Okt., 20:46, Greg <g...@kinostudios.com> wrote:

> Forgive me, but I'm unfamiliar with the readtable, are you just  
> referring to where this syntax might be implemented? Or are you  
> suggesting an alternative syntax?

I'm suggesting that the readtable might be exposed to user changes as
it is in CL, but only if a means of managing different readtables is
provided, hence the link pointing to named-readtables, which allows
just that (naming and merging readtables using an API resembling the
package API). This would allow introduction of new reader macros by
the user.

As I said, there already was some discussion on that topic, but I just
have played around with named-readtables since the new release and
came to the conclusion, that something akin to them might be nice in
clojure, too. Things like heredocs are not something that every user
needs every day, so they would, IMHO, be a good candidate for a user
readmacro.

Regards,

dhl

Michael Wood

unread,
Oct 12, 2009, 3:29:53 PM10/12/09
to clo...@googlegroups.com
2009/10/12 Greg <gr...@kinostudios.com>:

>
> I'm not sure I follow what John is saying (some of the terminology is
> unfamiliar to me, what the heck's a digraph? :p), but is he suggesting

My understanding of a digraph is basically two characters that
together represent one real character. e.g. if you type ^Ka' in Vim
it turns the digraph consisting of a lowercase A and an apostrophe
into á (a acute).

> that $+ not be followed by a whitespace to count as a string? If so, I
> think that's undesirable as that is 1) confusing, and 2) limits the
> sorts of strings that you can create with it when the point of this
> addition would be to expand your freedom when creating strings. If
> I've misunderstood however, please let me know.

No, he's saying you could use $+ some string + or $% some string % or
$* some string * etc. to mean the same as " some string ", so the
thing immediately following the $ sign must not be a space, but could
be basically anything else.

The bit where he talks about digraphs was to do with being able to
quote the + character when you were using $+ and + for the delimiters.
He was proposing you use $+ to mean a + where it occurs in the middle
of the string. So:
$+something $+ else+ would be equivalent to "something + else". Of
course in this case you could just use $@something + else@ instead.

Where a $x is found in the middle of the string and the x is not a
delimiter (e.g. $#===$@===#) you would just leave it as-is (i.e. it
would be equivalent to "===$@===".)

--
Michael Wood <esio...@gmail.com>

Greg

unread,
Oct 12, 2009, 4:21:57 PM10/12/09
to clo...@googlegroups.com
On Oct 12, 2009, at 3:29 PM, Michael Wood wrote:

> No, he's saying you could use $+ some string + or $% some string % or
> $* some string * etc. to mean the same as " some string ", so the
> thing immediately following the $ sign must not be a space, but could
> be basically anything else.


Ah, so pretty similar to what was suggested originally except that the
use of $ instead of # and a variable delimiter?

I think that sounds good. :-)

My only concern, and perhaps John could elaborate on this because he
touched on it, would be how are Java nested classes protected from
this? Wouldn't it interfere with them? Or will the delimiters be
restricted to non-alphanumeric characters?

- Greg

Greg

unread,
Oct 12, 2009, 4:25:46 PM10/12/09
to clo...@googlegroups.com
Ah, I think I understand... but if you're saying that this should be
something that can be done on an individual basis I think that's a bad
idea, it would lead to confusing looking code and inconsistencies.

There should be an officially sanctioned .. whatever it is this thing
is called. That way whenever you look at someone else's code and see
it, you know what it is, and people can update their code parser's/
syntax highlighters appropriately.

- Greg

André Ferreira

unread,
Oct 12, 2009, 5:15:41 PM10/12/09
to Clojure
Lua's solution to long strings is having multiple levels of
delimiters. [[ is the opening delimiter of level 0, and is closed
by ]] . [=[ is of level 1, closed by ]=] , etc.
No matter what string you are trying to represent, it can always be
literally represented that way.

John Harrop

unread,
Oct 12, 2009, 6:39:55 PM10/12/09
to clo...@googlegroups.com
On Mon, Oct 12, 2009 at 4:21 PM, Greg <gr...@kinostudios.com> wrote:
My only concern, and perhaps John could elaborate on this because he
touched on it, would be how are Java nested classes protected from
this? Wouldn't it interfere with them? Or will the delimiters be
restricted to non-alphanumeric characters?

I thought I mentioned that in my previous post. Oh well.

The string literal starting $x would have to START with $x. A symbol with an internal $ wouldn't behave differently from now. Nor would the symbol $ (just a dollar sign by itself) other than that you would have to always have whitespace between it and any following parenthesis or other Clojure delimiter. Identifiers of multiple characters starting with a $ are the major backward-compatibility hit, but I think these are (at present) extremely uncommon to nonexistent.

One way to add backward compatibility would be to make the $x thing only work inside of (with-verbatim-strings ...) or something similar.

Actually, this suggests a way to allow general user-modifiable read tables without the problems everyone's worried about: custom read tables can be defined and associated to a var, but to use them the block of code to which the custom read table would apply would be wrapped in some macro like (with-read-table rt ...). Presumably these could be nested, and as the reader left each of these forms the read table would revert to the previous, using a stack of some sort with the default Clojure read table at the bottom. Custom read tables could be in libraries, e.g. (ns foo (require [package.mysuff :as my])) ... (with-read-table my/fancyrt ...) but would not affect anything unless desired (e.g. (ns foo (use package.mystuff)) won't cause fancyrt to magically take effect without further action).

This would require (with-read-table ...) to be a super-special form though, actually evaluated at read time. This would preclude programmatically generating a working (with-read-table ...) form in a macro, though not USING a custom read table in a macro where the read table to use was fixed rather than known only at macroexpansion time (i.e. was not computed from, or supplied directly via, one of the arguments); in that case, just use (with-read-table ...) around the macro or inside it.

The (with-read-table ...) should probably become (do ...) rather than be stripped out by the reader, so (if (cond) (with-read-table my-perl-interpreter $#*$&%&# @#*@&$&$%)) will behave as expected, evaluating both perl expressions in the then branch rather than treating them as the then and else branches as would occur if the (with-read-table ...) disappeared entirely from around the wrapped code, or treating the first as a function to call with the second as argument, as would occur if just the with-read-table and the symbol name disappeared. There should also be a function with-read-table in core that errors, to catch cases of macros outputting the thing.

The big trickiness here is that the reader will have to have access to certain vars; that is, (defreadtable name ...) would have to be evaluated by the reader, and (ns ...) and (require ...) and the like to discover any read tables. I think a multi-pass compilation becomes required: code is read up to the first (with-read-table ...) and then the reader pauses and macroexpansion and evaluation is done of all completely-read top-level forms. If the symbol's still not bound at this point, it's an error. This adds the restriction that (with-read-table foo ...) only works properly if foo is imported or defined in a previous top-level form, so (let [x 10] (defreadtable foo ...) (with-read-table foo ...) won't work, and redefinitions within a top level form won't take effect until the next one.

More sophisticated would be for a (with-read-table ...) to be read initially into a kind of place-holder token as a string. When it had to be evaluated, the token would be lazily read and macroexpanded, and hopefully by then the read table's name has been bound. This would even allow one-off local read tables, e.g. (let [x (read-table ...)] (with-read-table x)), or even (with-read-table (read-table ...) ...). It would also be in the spirit of Clojure, I think, to use laziness in this way. An additional benefit is that (read-table ...) can be a normal macro or even a normal function call, rather than a special form, that returns a suitable data structure.

Allowing (with-read-table ...) to be generated by macros is feasible if we go a step further and make (with-read-table ...) a nearly-normal special form with the property that a) it cannot redefine the meaning of parentheses, so everything after the second subform until the matching ) can be read as a string, unambiguously, and the form after reading becomes (with-read-table table-expr "one-big-string-literal"), b) the evaluator of macro output performs the same transformation, and c) when the form is evaluated, it recursively invokes the read-macroexpand-eval chain on its second argument in a context where the first argument is pushed into the readtable stack.

Disallowing changing the behavior of whitespace and ( and ) in custom reader macros seems like a good idea anyway, especially if the access is to be through a lispy syntax resembling (with-read-table foo ...) which requires the meaning of that last ) and immediately-preceding whitespace, and of internal parenthesis pairs, to remain normal.

On the other hand, doing so would stop the verbatim-string-literal thing being added in that way...

Jeff Valk

unread,
Oct 12, 2009, 10:49:28 PM10/12/09
to clo...@googlegroups.com
If a keeping syntax simple and singular is the goal, the reader macro could use the proposed #s{ ... } form, while supporting balanced braces in the verbatim text. The text literal would close when the number of closing braces matches the number of opening braces. Strings containing unmatched braces still wouldn't work, but this would eliminate the most obvious use case counterargument: the inclusion of (well-formed) source code.

#s{...} ; sure!
#s{{...}{{{{}}}} ... } ; just fine!
#s{.{..} ; no can do...

Not nearly as flexible as some other ideas mentioned, but deserving of mention.

- Jeff

B Smith-Mannschott

unread,
Oct 13, 2009, 3:29:16 AM10/13/09
to clo...@googlegroups.com
On Mon, Oct 12, 2009 at 01:31, James Reeves <weave...@googlemail.com> wrote:
>
> What if you need to use braces? It seems to me that any syntax for
> representing long strings needs a terminator that is unlikely to occur
> within the string itself. For example, Python uses """, and XML CDATA
> uses ]]>, both of which are character sequences unlikely to turn up in
> a string. By contrast, an ending brace } is not rare enough to be used
> as a terminator, IMO.

Yes, please. I'd like to pile on here with a few ideas and questions

(0) I'm not feeling the itch for verbatim strings, seeing as clojure
already does multi-line literals (with escaping) and has special
syntax for regex patterns.

(1) If it's all the same to everyone else, just use python's
triple-quotes. In practice, they work well enough, but if we're using
them for verbatim strings there still wouldn't be a way to embed, e.g.
a code fragment demonstrating use of a raw string in a a raw string.
Is this so terrible?

(2) Perlish/Sedish choose-your-own-quote has always struck me as an
ugly hack. More important though are worries about making tooling more
complicated:

How much more complex would this make, e.g. correct syntax
highlighting in emacs, in eclipse?
What about tools that wish to read clojure code as data but are not
themselves clojure? We wouldn't be doing them any favors by
unnecessarily complicating the surface syntax.

(3) Perhaps something akin to Lua's approach (mentioned previously in
this thread) could address the limitations of (1) without the uglyness
of (2).

Just my 2c
Ben

Laurent PETIT

unread,
Oct 13, 2009, 4:04:08 AM10/13/09
to clo...@googlegroups.com

2009/10/13 B Smith-Mannschott <bsmit...@gmail.com>


On Mon, Oct 12, 2009 at 01:31, James Reeves <weave...@googlemail.com> wrote:
>
> What if you need to use braces? It seems to me that any syntax for
> representing long strings needs a terminator that is unlikely to occur
> within the string itself. For example, Python uses """, and XML CDATA
> uses ]]>, both of which are character sequences unlikely to turn up in
> a string. By contrast, an ending brace } is not rare enough to be used
> as a terminator, IMO.

Yes, please. I'd like to pile on here with a few ideas and questions

(0) I'm not feeling the itch for verbatim strings, seeing as clojure
already does multi-line literals (with escaping) and has special
syntax for regex patterns.

(1) If it's all the same to everyone else, just use python's
triple-quotes. In practice, they work well enough, but if we're using
them for verbatim strings there still wouldn't be a way to embed, e.g.
a code fragment demonstrating use of a raw string in a a raw string.
Is this so terrible?

What I really miss most is the equivalent of python's triple-quotes, which allow to not have to escape double quotes.

Cheers,

--
Laurent
 

Greg

unread,
Oct 13, 2009, 1:54:28 PM10/13/09
to clo...@googlegroups.com
> (0) I'm not feeling the itch for verbatim strings, seeing as clojure
> already does multi-line literals (with escaping) and has special
> syntax for regex patterns.

That's unfortunate, as I think it would be an important addition to
the language. I didn't know Clojure's #"" doesn't require extra
escaping (after some searching I was happy to discover this was
because Chouser did a great job lobbying for it: http://tr.im/BFCT),
and that's great news, but verbatim strings are still useful.

If you're going to be writing a website and want to put some HTML into
your source for example, it's essential, or if you have a reason to
put any other document into your source, for example if you're writing
a command line tool and want to provide some great information in your
--help. While you may not feel the itch now, could you see how there
are situations you could find yourself in where it would be useful?

> (1) If it's all the same to everyone else, just use python's
> triple-quotes. In practice, they work well enough, but if we're using
> them for verbatim strings there still wouldn't be a way to embed, e.g.
> a code fragment demonstrating use of a raw string in a a raw string.
> Is this so terrible?


No, I personally have no real issue with triple-quotes, it's the same
number of characters as #s{.

The only thing some might object to is how it's not "clojurey". #s
{ .. } fits with Clojure's other syntax structures.

> (2) Perlish/Sedish choose-your-own-quote has always struck me as an
> ugly hack. More important though are worries about making tooling more
> complicated:
>
> How much more complex would this make, e.g. correct syntax
> highlighting in emacs, in eclipse?


I think you make an excellent point here. I also agree with you, the
Perl "make your own thing up" path, while it has some advantages, it
will lead to everyone having their own special convention (leading to
confusion when reading code) and it will make life difficult for
source highlighters, and the people who write them.

- Greg
Reply all
Reply to author
Forward
0 new messages