clojure.edn/read isn't spec compliant

230 views
Skip to first unread message

EuAndreh

unread,
Oct 16, 2020, 9:07:40 PM10/16/20
to clo...@googlegroups.com

Hello there.

I was working on implementing a specification compliant edn reader on
Rust, and I found that clojure.edn/read itself isn't specification
compliant.

I have three examples below that should all throw exceptions, but
instead they are valid values according to clojure.edn/read. The quotes
were taken verbatim from the text of the specification[0].

--8<---------------cut here---------------start------------->8---
;; "Per the symbol rules above, :/ and :/anything are not legal keywords."
[(edn/read-string ":/")
;; "It can be used once only in the middle of a symbol to separate
the _prefix_ (often a namespace) from the _name
_"
(name (edn/read-string "a/b/c"))

;; specification doesn't talk about namespaced maps
(edn/read-string "#:a{:k 1}")]

[:/ "b/c" #:a{:k 1}]
--8<---------------cut here---------------end--------------->8---

I couldn't find many references to these issues, other than a Jira
ticket[1] and a thread on clojure-dev[2]. Both talk about
clojure.edn/read being consistent with LispReader, though. I have no
opinions on that.

Since the clojure.edn/read is an edn reader, shouldn't it comply with
the edn specification? Maybe not the namespaced maps parts, which the
specification itself could be extended to cover. But the other two cases
are explicitly forbidden on the specification, and clojure.edn/read
allows them.

I'm willing to write a patch to fix those, but is it something that
would be welcome? One could consider it a breaking change since the
reader will stop accepting data that is now does, but I could also argue
that this is a bug on the reader that was fixed, and the behaviour was
changed to match the expected behaviour, which is the specification.

The specification itself could change to match the behaviour of the
reader, but this is not desirable since it would invalidate the work
that others have done to implement edn outside of Clojure.

The tension between breaking the reader and matching the specification
should, IMHO, be favoured towards the matching the specification.
Otherwise, the actual specification isn't what edn-format.org says, but
it would instead be "whatever clojure.edn/read does", which is worse.
The value proposition of having an specification to begin with is lost.

WDYT? Is there any other resource on this that I missed?

[0]: https://raw.githubusercontent.com/edn-format/edn/a51127aecd318096667ae0dafa25353ecb07c9c3/README.md
[1]: https://clojure.atlassian.net/browse/CLJ-1530
[2]: https://groups.google.com/g/clojure-dev/c/b09WvRR90Zc/discussion

William la Forge

unread,
Oct 17, 2020, 8:14:00 PM10/17/20
to Clojure
My understanding is that run-time validation is often left weak in preference to speed of execution. In contrast to validation by the "compiler". Thus clojure throws many more exceptions than does the edn reader. --Bill la Forge

Justin Smith

unread,
Oct 17, 2020, 10:40:18 PM10/17/20
to Clojure
not only does clojure.edn accept invalid input, but the clojure reader
also accepts invalid input for the same reason (prioritizing speed of
implementation over validation)

user=> (name 'a/b/c)
"b/c"
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
> ---
> You received this message because you are subscribed to the Google Groups "Clojure" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
> To view this discussion on the web visit https://groups.google.com/d/msgid/clojure/1a9c0924-0b94-4094-8fa0-c8cd8f9bc667n%40googlegroups.com.

EuAndreh

unread,
Oct 20, 2020, 7:42:28 PM10/20/20
to clo...@googlegroups.com
Oops, part of the example lost formatting with word wrapping. Here's it
in full:

--8<---------------cut here---------------start------------->8---
;; "Per the symbol rules above, :/ and :/anything are not legal keywords."
[(edn/read-string ":/")
;; "It can be used once only in the middle of a symbol to separate the _prefix_ (often a namespace) from the _name_"
(name (edn/read-string "a/b/c"))

;; specification doesn't talk about namespaced maps
(edn/read-string "#:a{:k 1}")]

[:/ "b/c" #:a{:k 1}]
--8<---------------cut here---------------end--------------->8---

The text inside quotes are verbatim sections of the spec.

EuAndreh

unread,
Oct 20, 2020, 7:42:32 PM10/20/20
to Justin Smith, Clojure
The speed over validation is only valid for Clojure's LispReader, not to
clojure.edn. I'm completely fine with Clojure's reader keeping all of
those weird behaviours, and many other more.

But that doesn't apply to clojure.edn: it is code for a format with an
specification, and it goes against the specification. Having it be
faster or slower is less relevant in face of it not being correct, where
correct means "matches the specification".

Sean Corfield

unread,
Oct 20, 2020, 8:47:00 PM10/20/20
to Clojure Mailing List
As someone who has spent a lot of time around standardization committees (eight years on ANSI X3J16 C++ and some time around the ANSI C work before that, as well as years of BSI work as well), here's how I view the EDN specification: it states what is valid or invalid, a compliant reader should parse valid input correctly, what a compliant reader does with invalid input is either not specified or it is undefined.

This is very common in standards and specifications. C and C++ have undefined behavior (where the standard places no restrictions on what the system can do -- and does not require it be documented either!), implementation-defined behavior (where the standard allows systems to do what they want but it must be documented), and unspecified behavior (where the standard provides some guidelines but otherwise does not specify what a system should do, not is it necessarily required to be documented, but it is "defined" behavior somehow, e.g., order of evaluation of arguments).

Undefined behavior is deliberately very broad: a system can silently accept erroneous input with any outcome it chooses or it can dump core or launch missiles. It's generally the user's responsibility to ensure they do not provide erroneous input.

In the case of Clojure's EDN implementation, it makes sense to match the Clojure reader's behavior in cases where the EDN is "not valid" (i.e., does not have a defined meaning). In other EDN implementations, it might make more sense for EDN that has an undefined meaning to be rejected.

If you find valid EDN that a particular EDN reader fails to process correctly, that's a bug. If you feed it invalid EDN, well, you may or may not get an error or a value or...

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.


--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- https://corfield.org/
World Singles Networks, LLC. -- https://worldsinglesnetworks.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

James Reeves

unread,
Oct 20, 2020, 9:09:38 PM10/20/20
to Clojure
On Wednesday, 21 October 2020 at 00:42:32 UTC+1 EuAndreh wrote:
But that doesn't apply to clojure.edn: it is code for a format with an
specification, and it goes against the specification.

Where in the specification does it say that the edn reader should throw exceptions on errors?

EuAndreh

unread,
Oct 22, 2020, 10:21:57 AM10/22/20
to Sean Corfield, Clojure Mailing List
Sean Corfield <se...@corfield.org> writes:

> Undefined behavior is deliberately very broad

I acknowledge the value of having undefined behaviour,
implementation-defined behaviour and unspecified behaviour in an
implementation, and I embrace that approach.

However, none of those are distinguished on the spec, which only limits
itself to saying things like "is not legal", where "legal" is also
unspecified. So we have to stretch that a bit and interpret what is
forbidden, undefined, unspecified, etc.

> a system can silently accept
> erroneous input with any outcome it chooses or it can dump core or launch
> missiles. It's generally the user's responsibility to ensure they do not
> provide erroneous input.

You're right on principle here, but there is this really fine
distinction between exploiting on the implementation-defined behaviour
and relying on implementation-defined behaviour.

I can already see using an edn implementation other than clojure.edn
reporting a bug saying "implementation X can't process all edn that
clojure.edn does". The answer to that is also what you said:
implementation X is also correct, and the user is responsible to stop
feeding erroneous input. That's a WONT_FIX, because it isn't a bug.

All of that said, it is probably true that what I called "not being spec
compliant" isn't a bug, but rather implementation details that leak up,
and it wouldn't merit a patch to "match the specification".

Thanks for the response. It helped me get a clearer view of the value
proposition of the specification.

EuAndreh

unread,
Oct 22, 2020, 10:22:01 AM10/22/20
to James Reeves, Clojure
James Reeves <weave...@gmail.com> writes:

> Where in the specification does it say that the edn reader should throw
> exceptions on errors?

Well, it doesn't. I think had this expectation of forbidden things
throwing exceptions from some forbidden things throwing exceptions, and
some not doing so.

Both ":/" and ":/anything" are said to be "not legal keyword", and the
latter does throw and exception while the former doesn't.

Since "legal" isn't really defined, I indeed can't jump from "it is not
legal" to "it should throw an exception".

In fact, the spec doesn't even mention exceptions.

Gregg Reynolds

unread,
Oct 22, 2020, 12:00:49 PM10/22/20
to clo...@googlegroups.com
On Fri, Oct 16, 2020 at 8:07 PM 'EuAndreh' via Clojure <clo...@googlegroups.com> wrote:

Hello there.

I was working on implementing a specification compliant edn reader on
Rust

I could put that to good use, even if it isn't 100% "compliant".  Is it available?

Thanks

Gregg

EuAndreh

unread,
Oct 22, 2020, 1:02:46 PM10/22/20
to Gregg Reynolds, clo...@googlegroups.com
Gregg Reynolds <d...@mobileink.com> writes:

> I could put that to good use, even if it isn't 100% "compliant". Is it
> available?

You can find the current WIP code here:
https://git.euandreh.xyz/libedn/tree/src/core/rust

I'll announce on my website[0] once ready. Patches welcome.

Still missing:
- built-in tagged elements (#inst, #uuid)
- ad-hoc tagged elements
- quickcheck test coverage
- FFI binding validation

[0]: euandre.org

Gregg Reynolds

unread,
Oct 22, 2020, 2:50:51 PM10/22/20
to EuAndreh, clo...@googlegroups.com
On Thu, Oct 22, 2020 at 12:00 PM EuAndreh <e...@euandre.org> wrote:
Gregg Reynolds <d...@mobileink.com> writes:

> I could put that to good use, even if it isn't 100% "compliant".  Is it
> available?

You can find the current WIP code here:
https://git.euandreh.xyz/libedn/tree/src/core/rust

Awesome, thanks! 

EuAndreh

unread,
Oct 31, 2020, 8:38:16 AM10/31/20
to Sean Corfield, Clojure Mailing List
Sean Corfield <se...@corfield.org> writes:

> If you find valid EDN that a particular EDN reader fails to process
> correctly, that's a bug. If you feed it invalid EDN, well, you may or may
> not get an error or a value or...

This is a good guideline. A valid edn reader should read valid edn, and
the behaviour for "illegal" edn is unspecified. In fact, it is helping
me to think about my own implementation. Good tip.

Other than a few more "illegal" things that clojure.edn accepts, I have
found a valid edn value that it doesn't:

user=> (edn/read-string ":a:")
Execution error at user/eval33 (REPL:1).
Invalid token: :a:

As per the spec, a keyword:
"Keywords follow the rules of symbols, except they can (and must) begin with `:`"

And for symbols:
"`: #` are allowed as constituent characters in symbols other than
as the first character."

It follows that ":a:" is a valid keyword, as ":a#" is. The first
produces an error, while the second is a valid keyword.

From what we've discussed on this thread, this is a bug.

Did I miss anything?

Andy Fingerhut

unread,
Oct 31, 2020, 10:28:39 PM10/31/20
to clo...@googlegroups.com, Sean Corfield
Here is some possibly relevant information.

I suspect the reason that `(clojure.edn/read-string ":a:")` gives an error is that Clojure's EDN reader implementation was originally developed as an adaptation from Clojure's reader, and `(read-string ":a:")` also gives an error.  The reference documentation for Clojure's reader here https://clojure.org/reference/reader#_symbols says "Symbols beginning or ending with ':' are reserved by Clojure.  A symbol can contain one or more non-repeating ':'s".  That is likely why Clojure's reader gives an error attempting to read ":a:".

Perhaps it was intended that the last sentence should be included in the EDN specification, too.  I do not know.

My personal guess: the authors of the EDN specification and implementation are content with their level of detail, and might not be interested in making them 100% equivalent in all ways.  (This is only my personal guess.  Realize that making specifications and implementations match can be an exhausting and unrewarding process.)

Andy

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

EuAndreh

unread,
Nov 1, 2020, 7:01:04 PM11/1/20
to Andy Fingerhut, clo...@googlegroups.com, Sean Corfield
Andy Fingerhut <andy.fi...@gmail.com> writes:

> My personal guess: the authors of the EDN specification and
> implementation are content with their level of detail, and might not be
> interested in making them 100% equivalent in all ways. (This is only my
> personal guess. Realize that making specifications and implementations
> match can be an exhausting and unrewarding process.)

Agree on "making the implementation match the specification" being an
arduous task, as I am trying to do it myself in working in an edn
implementation.

However, I don't see a way around this type of job being an
specification.

Matching Socks

unread,
Nov 7, 2020, 8:22:06 AM11/7/20
to Clojure
This is not either/or.  
There is room for an alternative, spec-enforcing, EDN reader.  
A drop-in replacement, as it were, for those inclined to try it.
If you want speed, you use Transit anyway, right?

P.S.  Even better if the alternative, compliant, reader were compatibly licensed, to replace the original in Clojure 2.

Robert M. Mather

unread,
Nov 8, 2020, 4:22:48 PM11/8/20
to clo...@googlegroups.com
In idealized algorithmic terms, is there an efficiency justification for distinguishing the ':/' and ':/something' cases as the reader does? 

Seems like an artifact of the implementation rather than a time or space optimization. Maybe that error is only recognized upon entering the sub-parser for the keyword part after the '/', even though you could recognize and bail as soon as you see ':/'. 

Or, vice versa, read both cases without throwing.

It's bad UX for the canonical reader to silently accept something that other impls reject, but people are more likely to blame the alt impl. The more important invariant for reader/writer impls is round-tripping. I'm curious what the writer writes when the reader has read ':/' 

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.

EuAndreh

unread,
Nov 9, 2020, 6:01:04 AM11/9/20
to Matching Socks, Clojure
Matching Socks <phill...@gmail.com> writes:

> This is not either/or.

Sure, I agree. When I said "I don't see a way around this type of job",
I was responding to an earlier message that said that building an
specification and an implementation that matched such specification a
very tiresome one.

My point was that working with specifications themselves is tiresome, so
I couldn't see a way to avoid this tiresome job.

But I agree with other implementations being available being a good thing.

EuAndreh

unread,
Nov 9, 2020, 6:01:18 AM11/9/20
to Robert M. Mather, clo...@googlegroups.com
"Robert M. Mather" <robert.m...@gmail.com> writes:

> It's bad UX for the canonical reader to silently accept something that
> other impls reject, but people are more likely to blame the alt impl.

It isn't really bad UX, it is just unspecified behaviour that different
implementations interpret differently. And if people start relying on
that, they're locking themselves into an implementation, rather than a
specification.
Reply all
Reply to author
Forward
0 new messages