Raw strings

492 views
Skip to first unread message

Mark Engelberg

unread,
Mar 17, 2013, 5:57:07 PM3/17/13
to clojure
Python has a notation for "raw strings", i.e., strings where you don't need to worry about escape sequences because all the contents are taken verbatim.

This has come up several times on the Clojure group, but usually the conversation fades away without any particular resolution.  I want to resurrect this thread for two reasons:

1. I'm working on a project right now where the lack of raw strings is killing me.

2. Given the new work on edn and clojure.tools.reader, I'm hoping that the reader technology is now at a point where adding raw strings would be a trivial endeavor, thus tipping the value:complexity ratio of adding raw strings in favor of doing it.

WHY IT MATTERS

One of Clojure's value propositions is that it is good for making DSLs.  The reality is that Clojure is mostly only good for writing DSLs that use Clojure's syntax.  Its facilities for creating DSLs that use a novel syntax are quite limited.  If you want to come up with a new syntax, your DSL will need to be constructed and passed around as a string.  When possible, we tend to prefer DSLs built on Clojure data structures (e.g. ClojureQL) rather than string-based DSLs (e.g., SQL), but sometimes a string-based DSL is exactly what you need to achieve the desired clarity.  When you do, the lack of raw strings becomes a real hindrance -- the need to constantly use escape characters can deeply interfere with the readability of your DSL, especially if your DSL also requires its own use of quotes and escape characters.

THE OTHER REASON I'M MENTIONING THIS NOW

This week, at Clojure/west, Matthew Flatt is going to be talking about the impressive facilities that Racket has for creating DSLs.  My hope is that people who attend the conference will come back sufficiently envious that they will be talking about how to accomplish similar things in Clojure.  Raw strings are a step in the right direction, making it easier to develop and use string-based DSLs.

vemv

unread,
Mar 17, 2013, 9:07:13 PM3/17/13
to clo...@googlegroups.com
    Python has a notation for "raw strings"

Python also has multiple inheritance :) what I want to mean is that some features have dubious value, regardless of whether they made it to language X or Y.


    I'm working on a project right now where the lack of raw strings is killing me.

Do you absolutely need to interleave code and data? Can't you e.g. store the strings in separate files, and keep references to those strings in your code (i18n style)?.

There's also the possibility of using selector-based transformations, a la Enlive.

    Given the new work on edn and clojure.tools.reader, I'm hoping that the reader technology is now at a point where adding raw strings would be a trivial endeavor, thus tipping the value:complexity ratio of adding raw strings in favor of doing it.

Reading a raw string stored in a file is already trivial :)


    One of Clojure's value propositions is that it is good for making DSLs.

I don't remember a statement like that from the clojure.org texts or Rich's talks. Admittedly, Paul Graham's essays (which are all about DSLs) brought me (and many) to the Clojure world but it's easy to perceive that those values aren't particularly promoted by the language's design or Rich's discourse.


    The reality is that Clojure is mostly only good for writing DSLs that use Clojure's syntax.

I belive that data is the ultimate DSL anyway - it allows one to express arbitrarily specific information in an extensible way. Interleaving code and data (as some languages' DSL facilities foster) is inherently complex (and limited).

    Sometimes a string-based DSL is exactly what you need to achieve the desired clarity.

Nothing stops you from using/writing a parser for an alternative/custom syntax. Just don't mix that DSL with Clojure (in the same way that one doesn't mix server code, HTML, JS, SQL, all in the same files).

Mark Engelberg

unread,
Mar 18, 2013, 1:38:43 AM3/18/13
to clo...@googlegroups.com
On Sun, Mar 17, 2013 at 6:07 PM, vemv <ve...@vemv.net> wrote:
Reading a raw string stored in a file is already trivial :)

I'm aware that one can store a raw string in a file.  But in many instances, this would be absurd.  For the kind of rapid interactive development we have in Clojure, we don't necessarily want every single SQL query in a separate file.  Similarly, nobody wants to store every regexp in a separate file.  Actually, regexps are a great example, because the Clojure reader handles them specially so they don't have to deal quite as much with escaping as if they were ordinary strings.  I just want that kind of capability for my own library. 

I belive that data is the ultimate DSL anyway - it allows one to express arbitrarily specific information in an extensible way. Interleaving code and data (as some languages' DSL facilities foster) is inherently complex (and limited).

Clojure's edn format is convenient for many things, but it has its own set of limitations and there's nothing "ultimate" about it.  I don't see anyone proposing that regexps should be replaced by a Clojure-data DSL.  Some DSLs are based on a really handy notation that has been in use by computer scientists and/or mathematicians for decades or longer.  Trying to shoe-horn these convenient notations into something that looks like Clojure is not always the right way to go.


Softaddicts

unread,
Mar 18, 2013, 2:23:57 AM3/18/13
to clo...@googlegroups.com

I find raw string handling in XML simply ugly :) Of course I do have a strong opinion
about how bad XML turned out from a typesetting tool to an industry wide cancer :)
but cdata stuff basically breaks the XML structure. Visually speaking it's a mess,
obviously it's relative, XML by itself is not especially pleasing to the human eye...

I fear that adding raw strings into edn will make edn slip toward the not so
tasteful batch of ugly tools. Edn is all about structured representation which is
favored by its lisp like syntax. Raw strings would have to be a type by itself
if one wants to avoid to break this and not merely an escape hatch to avoid edn syntax
from time to time.

Do you have a suggestion on how to represent raw strings ? Something concrete
we could discuss about ?

I wrote several internal DSLs here and nothing pleased me more that doing
this using Clojure/Edn syntax, getting away from clumsy tasks like parsing strings
and ending up with structural representations with easiness.

Luc P.
> --
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>
>
--
Softaddicts<lprefo...@softaddicts.ca> sent by ibisMail from my ipad!

Mark Engelberg

unread,
Mar 18, 2013, 3:13:58 AM3/18/13
to clo...@googlegroups.com
On Sun, Mar 17, 2013 at 11:23 PM, Softaddicts <lprefo...@softaddicts.ca> wrote:

I find raw string handling in XML simply ugly :)

Agreed.
 

Do you have a suggestion on how to represent raw strings ? Something concrete
we could discuss about ?


In several languages, they use a sequence of three double-quotes to mark the beginning and the end of the raw string. 

In Clojure, I think raw strings would actually enhance the power of the edn format rather than detract, specifically the tagged elements.

Right now, edn supports things like:

#uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"

and

#inst "1985-04-12T23:20:50.52Z"

and you can create your own.  The built-ins happen to not utilize double-quotes or backslashes, so escaping is not a big deal, but if you want your tagged element to use a syntax where quotes and backslashes are natural, you're out of luck and you get a big ugly mess.  Think how much uglier the above inst element would look if - and : happened to be characters that required escaping.  All I want is the ability to do something like

#my/tag """45 \ "a" \ 50"""

if that's the most natural way to express the element rather than having to type

#my/tag "45 \\ \"a\" \\ 50"

which is already starting to look ugly, and looks far uglier the more characters are in there that require escaping.

Marko Topolnik

unread,
Mar 18, 2013, 4:16:50 AM3/18/13
to clo...@googlegroups.com
Chas Emerick is also one of those who voiced their desire to see such a feature in Clojure.

Softaddicts

unread,
Mar 18, 2013, 4:57:06 AM3/18/13
to clo...@googlegroups.com
Looks fine to me. If it's an extension to the literal syntax it's also a narrow scope change.

The standard escape sequences make things harder even for display purposes.
We would also benefit from this, we are handling huge amounts of raw text
and just for debugging purposes we have to mentally handle the \ escape
character... berk.

Our unit tests also suffer from this escaping mode.

Using a literal syntax like this one would make life easier.

Luc P.

> On Sun, Mar 17, 2013 at 11:23 PM, Softaddicts
> <lprefo...@softaddicts.ca>wrote:
>
> >
> > I find raw string handling in XML simply ugly :)
>
>
> Agreed.
>
>
> >
> > Do you have a suggestion on how to represent raw strings ? Something
> > concrete
> > we could discuss about ?
> >
> >
> In several languages, they use a sequence of three double-quotes to mark
> the beginning and the end of the raw string.
>
> In Clojure, I think raw strings would actually enhance the power of the edn
> format rather than detract, specifically the tagged elements.
>
> Right now, edn supports things like:
> #uuid "f81d4fae-7dec-11d0-a765-00a0c91e6bf6"and
> #inst "1985-04-12T23:20:50.52Z"and you can create your own. The built-ins
> happen to not utilize double-quotes or backslashes, so escaping is not a
> big deal, but if you want your tagged element to use a syntax where quotes
> and backslashes are natural, you're out of luck and you get a big ugly
> mess. Think how much uglier the above inst element would look if - and :
> happened to be characters that required escaping. All I want is the
> ability to do something like
>
> *#my/tag """45 \ "a" \ 50"""*
>
> if that's the most natural way to express the element rather than having to
> type
>
> *#my/tag "45 \\ \"a\" \\ 50"*
>
> which is already starting to look ugly, and looks far uglier the more
> characters are in there that require escaping.
>

Dave Sann

unread,
Mar 18, 2013, 6:50:37 AM3/18/13
to clo...@googlegroups.com
I'd welcome the ability to change delimiter. I've found it very useful in the past to avoid illegible or hard to read strings.

Maik Schünemann

unread,
Mar 18, 2013, 7:25:07 AM3/18/13
to clo...@googlegroups.com

vemv

unread,
Mar 18, 2013, 7:35:31 AM3/18/13
to clo...@googlegroups.com
Nobody wants to store every regexp in a separate file.
    
That's because regexes are 'atomic' - you don't place Clojure expressions in the middle of them. SQL or math are vastly different from that. As for SQL, it *is* common practice to store them as isolatedly as possible.

I have not questioned the validity of notations other than Clojure's. What I do consider mistaken though, is the concrete strategy of extending Clojure's reader.


Some DSLs are based on a really handy notation that has been in use by computer scientists and/or mathematicians for decades or longer. Trying to shoe-horn these convenient notations into something that looks like Clojure is not always the right way to go.

In fact your proposal is closer to "shoehorning" than mine :) by separating languages you can use arbitrary syntaxes for each, without risk of interfering. For example, one can list named physics/math functions using their "DSLs" (including funny Unicodes etc), and generate first-class callable code from that (no strings!). Like:

    File - example.math:
    
    add (x, y):
        x + y
    
    Compile util.math to Clojure and that to bytecode, ahead-of-time or on-the-fly.
    
    File - consumer.clj:
    
    (require 'example)
    (example/add 3 2)
    
Assuming the math/physics functions were substantial enough, I think developing a tiny compiler like that would be worth the effort.

Marko Topolnik

unread,
Mar 18, 2013, 7:50:13 AM3/18/13
to clo...@googlegroups.com
On Monday, March 18, 2013 12:35:31 PM UTC+1, vemv wrote:
Nobody wants to store every regexp in a separate file.
    
That's because regexes are 'atomic' - you don't place Clojure expressions in the middle of them. SQL or math are vastly different from that. As for SQL, it *is* common practice to store them as isolatedly as possible.

Dynamic regex building is a standard technique. Unfortunately, once you leave the regex literal world, you are back to escaping everything.

I also have tons of SQL (HQL, actually but same thing) inline with my Clojure. It is an approach that never backfired on me. I also have much need for dynamic SQL generation. I don't consider myself special because of that.

-marko

vemv

unread,
Mar 18, 2013, 8:11:25 AM3/18/13
to clo...@googlegroups.com
  • doc strings with examples could be more human readable

(defn ^{:examples '[(with-out-str (clipboard (doc distinct)))]
    clipboard [x] ...)

Just if a mechanism like this were used more widely... we'd get syntax coloring for free, and a facility for programatically querying examples.

  • XML, JSON or SQL generation tests become filled with escaped quotes

Again, one can either keep reference to files in the tests, or develop e.g. a XML-testing DSL.

Even if they were cases were there wasn't a clear alternative to the idea of raw strings, if such a feature were added, it could be used gratuitously.
For the same reason Clojure offers e.g. limited inheritance possibilities. I prefer having to eventually resort to Java to write "advanced" classes, than to use a language which users have it easy to build implementation hierarchies.

vemv

unread,
Mar 18, 2013, 8:29:09 AM3/18/13
to clo...@googlegroups.com
I wasn't familiar with dynamic regex building. Sounds like a task that would be best performed separately from normal Clojure reading/evaluation (i.e. using a different file).

I don't think dynamic SQL construction would benefit from raw strings. String interpolation is a different story.

Stefan Kamphausen

unread,
Mar 18, 2013, 3:37:19 PM3/18/13
to clo...@googlegroups.com
Hi,


On Monday, March 18, 2013 12:50:13 PM UTC+1, Marko Topolnik wrote:
Dynamic regex building is a standard technique. Unfortunately, once you leave the regex literal world, you are back to escaping everything.

this works pretty well, at least better than I expected, e.g.:

 user=> (def r1 #"(\s.)")
#'user/r1
user=> (def r2 #"([abc])")
#'user/r2
user=> (def r3 (re-pattern (str r1 "|" r2)))
#'user/r3
user=> r3
#"(\s.)|([abc])"
user=> (re-find r3 " x")
[" x" " x" nil]
user=> (re-find r3 "b")
["b" nil "b"]


Regards,
Stefan

Stefan Kamphausen

unread,
Mar 18, 2013, 3:41:04 PM3/18/13
to clo...@googlegroups.com


On Monday, March 18, 2013 12:25:07 PM UTC+1, Maik Schünemann wrote:


It would have been nice to still have #" available for this and #// for regexes.  That's probably my Perl heritage leaking through, though :)

Marko Topolnik

unread,
Mar 18, 2013, 3:48:49 PM3/18/13
to clo...@googlegroups.com


On Monday, March 18, 2013 8:37:19 PM UTC+1, Stefan Kamphausen wrote:

this works pretty well, at least better than I expected, e.g.:

 user=> (def r1 #"(\s.)")
#'user/r1
user=> (def r2 #"([abc])")
#'user/r2
user=> (def r3 (re-pattern (str r1 "|" r2)))
#'user/r3
user=> r3
#"(\s.)|([abc])"
user=> (re-find r3 " x")
[" x" " x" nil]
user=> (re-find r3 "b")
["b" nil "b"]

This is a nice trick, but obviously isn't universal: it works only if you build up from parts that are literal regexes. More often that not this will not be the case.

-marko
 

Andy Fingerhut

unread,
Mar 18, 2013, 3:59:27 PM3/18/13
to clo...@googlegroups.com

 str is based on the toString() method of java.util.regex.Pattern, so I do not understand why you say it only works if you build up from parts that are literal regexes.  For example:

user=> (def r1 (re-pattern "\\sfoo|bar\\d+"))
#'user/r1
user=> (def r2 (re-pattern "taking over the world( world)*\\b"))
#'user/r2
user=> (def r3 (re-pattern (str "(" r1 ")|(" r2 ")")))
#'user/r3
user=> r3
#"(\sfoo|bar\d+)|(taking over the world( world)*\b)"
user=> (re-find r3 "I don't like this fooing weather")
[" foo" " foo" nil nil]
user=> (re-find r3 "We should be taking over the world soon")
["taking over the world" nil "taking over the world" nil]

Andy

Marko Topolnik

unread,
Mar 18, 2013, 4:12:42 PM3/18/13
to clo...@googlegroups.com


On Monday, March 18, 2013 8:59:27 PM UTC+1, Andy Fingerhut wrote:
 str is based on the toString() method of java.util.regex.Pattern, so I do not understand why you say it only works if you build up from parts that are literal regexes.  For example:

user=> (def r1 (re-pattern "\\sfoo|bar\\d+"))
#'user/r1
user=> (def r2 (re-pattern "taking over the world( world)*\\b"))
#'user/r2
user=> (def r3 (re-pattern (str "(" r1 ")|(" r2 ")")))
#'user/r3
user=> r3

Well obviously, you aren't benefiting from the literal regex syntax here :) Stefan's point was using the literal syntax to avoid double-backslashes and such, while still programmatically building up the full regex.

-marko
 
Reply all
Reply to author
Forward
0 new messages