Consistency of symbols between different readers/edn

259 views
Skip to first unread message

Herwig Hochleitner

unread,
Sep 16, 2014, 2:40:32 PM9/16/14
to cloju...@googlegroups.com
There are several clojure/edn readers
- Clojure's builtin reader
- clojure.tools.reader (now also the reader of cljs)
- clojure.edn

Even though EDN is a subset of clojure, I think it's easy to agree that there should be no difference in symbol/keyword syntax.

That said, clojure's reader, aswell as clojure.edn allow symbols like foo/bar/baz. clojure.tools.reader only allows foo//
All the readers allow symbols like foo', but fail to mention ' as a possible constituent character in their docs.

http://clojure.org/reader#The%20Reader--Reader%20forms and it's language on / is quite ambigous. 

https://github.com/edn-format/edn#symbols seems to support the implementation of clojure.tools.reader (only foo// allowed), however clojure.edn happily reads foo/bar/baz

I hope to get to official consensus regarding this issue, hopefully leading to tickets and doc fixes.

kind regards

P.S.:
There has been persistent confusion on whether runtime-constructed, unreadable symbols like (symbol "foo bar") are considered valid, and I'd like to take this opportunity to propose, that they indeed are to be continually supported by clojure runtimes.
Maybe the printer should recognize them as unreadable and print them as #<symbol "foo bar"> or even #symbol ["foo bar"]

Alex Miller

unread,
Sep 16, 2014, 3:29:41 PM9/16/14
to cloju...@googlegroups.com
On Tue, Sep 16, 2014 at 1:40 PM, Herwig Hochleitner <hhochl...@gmail.com> wrote:
There are several clojure/edn readers
- Clojure's builtin reader
- clojure.tools.reader (now also the reader of cljs)
- clojure.edn

Even though EDN is a subset of clojure, I think it's easy to agree that there should be no difference in symbol/keyword syntax.

I would say more specifically that edn readers should consistently read valid EDN, but may also read extensinos to EDN depending on their purpose (for example I presume clojure.tools.reader reads more than EDN).
 
That said, clojure's reader, aswell as clojure.edn allow symbols like foo/bar/baz. clojure.tools.reader only allows foo//

foo/bar/baz should be invalid afaik. foo// should be valid.

Afaik, all of the readers should allow the foo// form since 1.6. 

Any questions on what EDN is should be filed as github issues on the edn repo. Any issues with readers that allow invalid things should be filed issues in the appropriate issue tracker.
 
All the readers allow symbols like foo', but fail to mention ' as a possible constituent character in their docs.

Should be filed as a bug in the edn repo github issues.
 
http://clojure.org/reader#The%20Reader--Reader%20forms and it's language on / is quite ambigous. 

If you want to file a ticket in jira, that would be helpful for me to track the clarification of this.
 
https://github.com/edn-format/edn#symbols seems to support the implementation of clojure.tools.reader (only foo// allowed), however clojure.edn happily reads foo/bar/baz

I hope to get to official consensus regarding this issue, hopefully leading to tickets and doc fixes.

kind regards

P.S.:
There has been persistent confusion on whether runtime-constructed, unreadable symbols like (symbol "foo bar") are considered valid, and I'd like to take this opportunity to propose, that they indeed are to be continually supported by clojure runtimes.

There is no confusion in intent. Symbols and keywords may be constructed programmatically with values that cannot be printed and read back by the reader. This is valid and (often) useful.
 
Maybe the printer should recognize them as unreadable and print them as #<symbol "foo bar"> or even #symbol ["foo bar"]

There is some discussion about adding support for the CL-style pipe-escaping (#|). This came up in the context of feature expressions to allow other readers to read the extended symbols supported by ClojureCLR in mixed-platform source files (ClojureCLR reader already supports this). If so, then this would be one avenue to actually make these symbols valid in the print/read lifecycle. See: https://github.com/clojure/clojure-clr/wiki/Specifying-types
 

--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

Francis Avila

unread,
Sep 16, 2014, 7:13:17 PM9/16/14
to cloju...@googlegroups.com
There was considerable ticket activity on the closely related issue of the readable forms for keywords (which also has some blurry edges):

http://dev.clojure.org/jira/browse/CLJ-1286
http://dev.clojure.org/jira/browse/CLJS-677
(These link to other related tickets, open and closed.)

The clojure reader and edn specifications are also compared meticulously here:

https://github.com/wagjo/serialization-formats

The http://clojure.org/reader page is very old, unclear, and doesn't describe the current clojure reader it seems. It should probably be updated or removed considering the amount of confusion it causes.

What I really wish for is a formal BNF of the reader and edn forms (maybe BNF of edn, then extensions for Clojure readers as needed to guarantee edn forms are always a strict subset of readable Clojure forms). The implementations of Clojure and of alternate serializations of Clojure datastructures are multiplying and not having a very precise spec is a hassle.

Reid McKenzie

unread,
Sep 17, 2014, 11:05:49 PM9/17/14
to cloju...@googlegroups.com
Comments in line

On 09/16/2014 06:13 PM, Francis Avila wrote:
> There was considerable ticket activity on the closely related issue of the readable forms for keywords (which also has some blurry edges):
>
> http://dev.clojure.org/jira/browse/CLJ-1286
> http://dev.clojure.org/jira/browse/CLJS-677
> (These link to other related tickets, open and closed.)
>
> The clojure reader and edn specifications are also compared meticulously here:
>
> https://github.com/wagjo/serialization-formats
>
> The http://clojure.org/reader page is very old, unclear, and doesn't describe the current clojure reader it seems. It should probably be updated or removed considering the amount of confusion it causes.
>
> What I really wish for is a formal BNF of the reader and edn forms (maybe BNF of edn, then extensions for Clojure readers as needed to guarantee edn forms are always a strict subset of readable Clojure forms). The implementations of Clojure and of alternate serializations of Clojure datastructures are multiplying and not having a very precise spec is a hassle.
Writing a monolithic (E)BNF grammar for Clojure turns out to be a bit
tricky because whitespace may or may not be a token depending on context
(it is in strings, patterns, not elsewhere, some implicit meaning as a
token terminator). You really wind up with a set of tokens and then a
token based grammar (which is totally fine and well suited to
traditional parser designs) unless you want to handle
whitespace/comments anywhere as syntactic elements (which the .net
languages do) which gets messy and error prone quickly.

The argument for doing this, and the reason that the .net platform
chooses to parse whitespace and comments is that it's convenient for
enabling documentation tools, and it allows refactoring engines to
preserve syntactically "linked" comments and indentation when moving
blocks rather than discarding them. One example of this in Clojure is
marginalia [1] which maintains its own custom reader and has an open
issue for hacking said reader to support ClojureScript forms.
> On Tuesday, September 16, 2014 1:40:32 PM UTC-5, Bendlas wrote:
>> There are several clojure/edn readers
>> - Clojure's builtin reader
>> - clojure.tools.reader (now also the reader of cljs)
>> - clojure.edn
>>
>>
>> Even though EDN is a subset of clojure, I think it's easy to agree that there should be no difference in symbol/keyword syntax.
Yep which I think is a strong argument for having a clear formal grammar
that all implementations can be held to. Even if it changes in the
future it makes life easier for everyone concerned.
>> That said, clojure's reader, aswell as clojure.edn allow symbols like foo/bar/baz. clojure.tools.reader only allows foo//
>> All the readers allow symbols like foo', but fail to mention ' as a possible constituent character in their docs.
>>
>>
>> http://clojure.org/reader#The%20Reader--Reader%20forms and it's language on / is quite ambigous.
>>
>>
>> https://github.com/edn-format/edn#symbols seems to support the implementation of clojure.tools.reader (only foo// allowed), however clojure.edn happily reads foo/bar/baz
>>
>>
>> I hope to get to official consensus regarding this issue, hopefully leading to tickets and doc fixes.
>>
>> kind regards
>>
>>
>>
>> P.S.:
>> There has been persistent confusion on whether runtime-constructed, unreadable symbols like (symbol "foo bar") are considered valid, and I'd like to take this opportunity to propose, that they indeed are to be continually supported by clojure runtimes.
>> Maybe the printer should recognize them as unreadable and print them as #<symbol "foo bar"> or even #symbol ["foo bar"]
If those forms are going to be readable, using the reader macro notation
is a nice way to do it. Note however that as this would be a "standard"
reader macro like #uuid or #inst, #symbol need not take a vector and can
just have a bare string.

This would also be nice in terms of feature expressions, since it would
clearly solve the "Reading unreadable things" issue(s) mentioned on the
design page. [2]

Furthermore formalizing the reader grammar would help with creating new
Clojure-like "dialects"/targets (oxcart, rhyne [3] and others) and
ensure a minimum of readability between different platforms that doesn't
currently exist. I'm a fan of this.

Reid

[1] https://github.com/gdeer81/marginalia
[2] http://dev.clojure.org/display/design/Feature+Expressions
[3] https://github.com/artagnon/rhine

Herwig Hochleitner

unread,
Sep 18, 2014, 8:30:14 AM9/18/14
to cloju...@googlegroups.com
2014-09-16 21:29 GMT+02:00 Alex Miller <al...@puredanger.com>:

I would say more specifically that edn readers should consistently read valid EDN, but may also read extensinos to EDN depending on their purpose (for example I presume clojure.tools.reader reads more than EDN).

Ok, my opinion was and is, that simple stuff like symbols should not be extended from edn. I certainly can't anticipate every use case, but I'd be hard pressed to agree with any clojure dialect reading more symbols than EDN, even though they still might add new syntax, like e.g. complex numbers or the CLR style bars.

Any questions on what EDN is should be filed as github issues on the edn repo. Any issues with readers that allow invalid things should be filed issues in the appropriate issue tracker.
 
All the readers allow symbols like foo', but fail to mention ' as a possible constituent character in their docs.

Should be filed as a bug in the edn repo github issues.
http://clojure.org/reader#The%20Reader--Reader%20forms and it's language on / is quite ambigous. 

If you want to file a ticket in jira, that would be helpful for me to track the clarification of this.
P.S.:
There has been persistent confusion on whether runtime-constructed, unreadable symbols like (symbol "foo bar") are considered valid, and I'd like to take this opportunity to propose, that they indeed are to be continually supported by clojure runtimes.

There is no confusion in intent. Symbols and keywords may be constructed programmatically with values that cannot be printed and read back by the reader. This is valid and (often) useful.

That's good to know, I'll point people saying otherwise to this thread.

There is some discussion about adding support for the CL-style pipe-escaping (#|). This came up in the context of feature expressions to allow other readers to read the extended symbols supported by ClojureCLR in mixed-platform source files (ClojureCLR reader already supports this). If so, then this would be one avenue to actually make these symbols valid in the print/read lifecycle. See: https://github.com/clojure/clojure-clr/wiki/Specifying-types

I can't really offer a qualified opinion to the CLR bars, just a vaguely dismissive: "do we really need specialized quotes for symbols?"
An extension that I'd rather support are configurable delimiters, to allow embedding binary data, a la HTTP multipart. Even though that would be horrible for editor support, I like it for conceptual reasons. 

2014-09-18 4:57 GMT+02:00 Reid McKenzie <rmcke...@gmail.com>:
Yep which I think is a strong argument for having a clear formal grammar
that all implementations can be held to. Even if it changes in the
future it makes life easier for everyone concerned.

I agree that having a formal grammar around would be terrific, then one could just file bugs to non-conformant implementations, without having to establish consensus on the format.
 
Note however that as this would be a "standard"
reader macro like #uuid or #inst, #symbol need not take a vector and can
just have a bare string.

Just a minor nitpick: It's not about "standard", I proposed a vector, because e.g. (symbol "non standard ns" "$weird/name%") => #symbol ["non standard ns" "$weird/name%"]
 

This would also be nice in terms of feature expressions, since it would
clearly solve the "Reading unreadable things" issue(s) mentioned on the
design page. [2]

Yeah, I suspect that most cases of someone wanting to extend clojure's syntax can now be solved by reader tags.

kind regards

David Miller

unread,
Sep 19, 2014, 12:11:48 PM9/19/14
to cloju...@googlegroups.com

On Thursday, September 18, 2014 7:30:14 AM UTC-5, Bendlas wrote:

I can't really offer a qualified opinion to the CLR bars, just a vaguely dismissive: "do we really need specialized quotes for symbols?"
An extension that I'd rather support are configurable delimiters, to allow embedding binary data, a la HTTP multipart. Even though that would be horrible for editor support, I like it for conceptual reasons. 

 Yes.  CLR needs some type of reader mechanism that returns symbols with characters that are not part of the allowed character set for symbols in the standard Clojure reader. Symbols are used to name types, and expressions cannot be used.  I put in |-quoting (not using #|, by the way -- neither does CL) as the simplest solution.  Going with #| ... | would have been another solution.  I'm sure you can find others.  I was not in favor of adding even more characters  #symbol["..."  "..."] to names that were already very long.  But if there is a solution agreed on across the implementations, ClojureCLR will follow.

-David

Reply all
Reply to author
Forward
0 new messages