Print/read form exchangability

閲覧: 117 回
最初の未読メッセージにスキップ

Marshall Bockrath-Vandegrift

未読、
2014/11/19 10:46:042014/11/19
To: cloju...@googlegroups.com
Hi all,

I've recently run into an issue using EDN to ferry Clojure values over
an existing text channel, and I think it points to a somewhat
philosophical concern. That is, what should be the relationship
between a value and result of `read`ing the Clojure-printed form of
that value?

Pure value data round-trips cleanly, obviously -- that's somewhat the
point. Ticket CLJ-1074 attempts to address a case where the special
floating-point values +/-Infinity/NaN do not. Although the ticket
seems to have sat idle for the past ~year, it seems obvious to me that
the current behavior is wrong; I can't think of any situation it would
be useful to print the floating point NaN value and read back the
symbol `NaN`.

For my system, I've run into what I believe are two related cases: the
printing of vars and classes. Vars and classes both print to forms
which read back to values which *evaluate* to their original values vs
*are*/are-equal-to the original values. This leads to results which
work as desired in many circumstances, but not all, and I believe is
fundamentally just as wrong as the JVM `Double/NaN` value printing as
the symbol `NaN`.

A proposal:

(1) Establish a consensus on the desired relationship between values
and the result of print-read roundtrip. I think that anything which
prints to a readable form should result in an equal value when the
form is read, but maybe I'm missing some subtlety.

(2) Fix the current print/read disparity cases via tagged literals.
The `#'example` reader macro syntax could expand to the `#var example`
tagged literal, while classes could begin printing as `#class Class`
literals. The default Clojure source reader data readers for these
forms could produce the current eval-equivalent source forms, but
applications which needed to transport the values themselves could
provide appropriate data readers for `#var` and `#class`.

Thoughts?

-Marshall

Reid McKenzie

未読、
2014/11/19 11:48:082014/11/19
To: cloju...@googlegroups.com
Hey Marshall,

Yeah CLJ-1074 is something I've already voted for and wished for more
than once. I would encourage others to go vote it up as well since
that's the official indicator of interest in a patch besides talking
about it here.

So in Haskell, read and print are defined to be an isomorphism, which is
awesome because it means you can read anything you can print thus giving
you naive serialization for free. For most actual data types (besides
primitive arrays and soforth) Clojure's read-string and pr are an
isomorphism and I agree that they should be such over data.

Last night, to hack around unreadable #< ... > forms occurring in
existing EDN files I cooked up this hack:
https://github.com/clojure-grimoire/lib-grimoire/blob/develop/src/grimoire/api.clj#L184-L191
which isn't even correct due to the possibility of nested #<> forms ala
#< .. #<>> but was good enough to get my prototype of the next Grimoire
release running.

The "real" fix,
https://github.com/clojure-grimoire/lein-grim/blob/develop/src/grimoire/doc.clj#L110
being "just don't pr unreadable things!" I find unsatisfactory because
the pr documentation http://conj.io/1.6.0/clojure.core/pr says that the
output of pr is readable when clearly it may not be. However this leaves
us where you leave off trying to print things with no really meaningful
read representation.

pr of a var could be #var clojure.core/concat or #'clojure.core/concat
which could be resolved when read to the var as interred in the reading
Clojure instance's memory. Note that while this _may_ be an isomorphism,
it need not be because a given var may have an altered binding or even a
different root definition in another Clojure instance. This is why the
pr of a var explicitly wraps the pr of the var's value. Maybe it does
make sense to just make the existing pr of a var readable, but that
still doesn't solve the subproblem that the value form must be readable.

What is the pr of a function? We can't reasonably pr an equivalent
lambda expression that could be read, nor can we simply pr the address
of the instance as we do now. This is specifically the case that I was
choking on last night with lib-grimoire: some functions in core have
:inline metadata which has a function as a value.

What is the pr of Object[]? It could contain arbitrary unreadable
classes... itself even :|

I guess the "obvious" solution to me would be to have an "unreadable"
symbol which is defined to be the pr value of any value which cannot be
meaningfully serialized as outlined above. This is obviously
unsatisfactory, since it clearly admits that (comp read pr) is not an
identity operation. The good news is that we don't need a new
"unreadable" value (although we could have one). As "unreadable" would
be meaningless as a value, we already have a value that's meaningless:
nil. We can't define pr of unreadable things to be an empty string, then
we have issues with printing vars and maps and soforth we can no longer
naively print for fear of generating unreadable syntax so pring nil may
just be the least evil thing to do.

The only other thing we can really do is stay where we are at "don't
print unreadable things!" which requires that users be aware of what is
and isn't a readable value.

Just some thoughts.
Reid

Gary Fredericks

未読、
2014/11/19 12:33:052014/11/19
To: cloju...@googlegroups.com
Would there be any value in using a special #unreadable tag which would be attached to whatever data about the unreadable object we might want to include, but exists to specifically flag situations where you can't get an equivalent object back? This way we wouldn't throw away potentially useful information, like the class name, and might end up being more useful than #<...> since at least you can read it as something if you want to.


--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev+unsubscribe@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

James Reeves

未読、
2014/11/19 13:08:302014/11/19
To: cloju...@googlegroups.com
On 19 November 2014 17:32, Gary Fredericks <frederi...@gmail.com> wrote:
Would there be any value in using a special #unreadable tag which would be attached to whatever data about the unreadable object we might want to include, but exists to specifically flag situations where you can't get an equivalent object back?

This is the solution I was considering for a logging/tracing library. Something like:

    #unreadable {:class foo.bar.Baz, :string "foo.bar.Baz@3f5d7060"}

Since all objects are guaranteed to have .getClass and .toString methods.

It might be nice to have a ToEdn protocol that explicitly handles printing to edn, and either produces unreadable forms or straight errors depending on the options.

My currently solution is the rather less elegant:

    (defn- ^String serialize [x]
      {:post [(= x (edn/read-string %))]}
      (pr-str x))

- James

Reid McKenzie

未読、
2014/11/19 13:32:502014/11/19
To: cloju...@googlegroups.com
On 11/19/2014 12:08 PM, James Reeves wrote:
> #unreadable {:class foo.bar.Baz, :string "foo.bar.Baz@3f5d7060"}

I actually like that a lot because the class and string must be
readable, and you may be able to recover something meaningful by parsing
the .toString if you try hard enough and decide that the result would be
meaningful for you.

Having an obviously extensible ToEdn defaulting to the above would be
really nice as well.

Unfortunately I don't think that lets us escape from your serialize
function if you do want strict serialization rather than just printing,
but it's a start.

Reid

Marshall Bockrath-Vandegrift

未読、
2014/11/20 7:29:532014/11/20
To: cloju...@googlegroups.com
Interesting. I'll take this as a subtlety I missed, as the proposed
'#unreadable' tag (or any other way of reading something back from the
printed form of unreadable output) would become another instance of
the issue I'm trying address -- printed forms which successfully read
as something other than their original values. I'm not convinced that
this would actually be valuable or desired in many circumstances, but
maybe could be a variant controlled by printing options? The same
mechanism could also support other behavior on unreadable output, such
as throwing an exception. Practically speaking it also seems like a
far more difficult change to make across the ecosystem, given the
current free-for-all philosophy & practice around printing the
unreadable.

I did think of another case of the problem I'm specifically trying to
address, which it seems like should already have a CLJ ticket but
which I'm unable to find: the lack of separate forms for sorted sets
and maps. In this case the original values and read values are
`.equiv`, but so too would be the same data in one of the equivalent
mutable Java standard library data structures so I don't think that
relationship is strong enough. This lack of isomorphism would also be
easy to fix with tagged literals, simply printing as #sorted-set and
#sorted-map respectively.

I'm happy to happy to try to pull these ideas into a patch/patches
(improving isomorphism; probably not changing handling of unreadable
printing), but I'd like to better-understand if these problems are
practically affecting anyone else before tossing my offerings into the
JIRA void.
> --
> You received this message because you are subscribed to the Google Groups
> "Clojure Dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to clojure-dev...@googlegroups.com.

Andy Fingerhut

未読、
2014/11/20 10:48:402014/11/20
To: cloju...@googlegroups.com
A few comments:

If you want to preserve types during printing and reading back, there is *print-dup* in Clojure:

user=> (binding [*print-dup* true] (print {:a 1 :b 2}))
#=(clojure.lang.PersistentArrayMap/create {:b 2, :a 1})nil

Note that this might not be what you want for your application.  It prints things as #=(ClassName/method data), which relies on reading back in a way that can handle such forms, e.g. using clojure.core/read with *read-eval* true.  That allows arbitrary Java constructors to be called during reading, some of which can have nasty side effects like emptying your files: http://clojuredocs.org/clojure.core/read

I don't have a full knowledge of the philosophy here, but at least for the EDN subset I think there may be a design goal to *not* preserve the distinction between things like set and sorted-set -- they are all sets, and their contents don't change based on whether they are sorted.  Hopefully someone with better knowledge can correct that statement if I'm imagining things, which I sometimes do.

One detail: While .equiv often was true between immutable and mutable collections with .equiv elements in Clojure 1.5.1 and earlier, it is not true in general in Clojure 1.6.0:

user=> (clojure-version)
"1.6.0"
user=> (import '(java.util HashSet))
java.util.HashSet
user=> (set [1000 2000])
#{1000 2000}
user=> (HashSet. [1000 2000])
#{1000 2000}
user=> (= (set [1000 2000]) (HashSet. [1000 2000]))
true
user=> (= (set [(set [1000 2000])]) (set [(HashSet. [1000 2000])]))
false

Why?  http://dev.clojure.org/jira/browse/CLJ-1372 which may be not a bug.



Marshall Bockrath-Vandegrift

未読、
2014/11/23 10:21:152014/11/23
To: cloju...@googlegroups.com
I knew about `*print-dup*`, but it's definitely not the behavior I
want. Good points on the sorted vs hash collections -- I think that's
very relevant.

While at the Conj I spent a few minutes harassing Rich re: this issue,
and he suggested at least considering side-by-side an alternate design
where users of printing can provide alternative print functions, in
the same way users of reading can provide alternative data readers.

I had thought of that approach when I first started trying to solve
this problem for my system, but rejected it because (a) it seemed
superfluous and (b) the printing mechanism (the `print-method`
multimethod) has no affordance for such local configuration. After
speaking with Rich I approached the problem again with fresh eyes and
see that I was wrong on both counts.

Local printing configuration is only superfluous if the printed
representation of every value contains all information all possible
consumers might consider relevant for reading back "the same" value.
This is not only an impossible bar, but prevents adapting default
printed forms for common uses. The current printed form of classes
and vars is arguably the most useful in the development contexts where
they most frequently appear. The "lossy" printing of sorted
collections as Andy points out represents a conscious design design.
And as discussed earlier, there is a matrix of options for handling
objects without readable printed representations, none of which is
optimal for all applications.

As far as mechanism, compatibility appears to demand that
`print-method` remain a recursively-invokable `MultiFn` instance, but
nothing prevents it from having *additional* behavior as well. If
implemented in `clojure.core` as part of a new release, it might might
suffice to simply make `print-method` dynamic. For an external
proof-of-concept for existing releases, it is necessary instead to
replace `print-method`'s root binding with a `MultiFn` subclass
allowing recursive dynamic resolution of printing functions.

And so: https://github.com/llasram/letterpress

Comments on the approach (and implementation) appreciated. I'd still
like to see a solution for this problem in Clojure itself.
全員に返信
投稿者に返信
転送
新着メール 0 件