Tagged literals: undefined tags blow up reader

580 views
Skip to first unread message

kovas boguta

unread,
Apr 27, 2012, 8:04:23 PM4/27/12
to clo...@googlegroups.com
Thanks everyone involved for the 1.4 release. One issue:

In 1.4, tagged literals need to be defined, otherwise the reader blows up:

user=> [:a #foo/bar :b]
RuntimeException No reader function for tag foo/bar
clojure.lang.LispReader$CtorReader.readTagged (LispReader.java:1164)
RuntimeException Unmatched delimiter: ]
clojure.lang.Util.runtimeException (Util.java:170)

This is a show-stopper for using tagged literals as a data interchange format.

Its impossible to pass data through your system without every step
knowing about what it is.

I don't know what the best solution is, so I'm bringing this up here.

But however it looks, it would be great if undefined literals were
read into some kind of wrapper, and then the reader could go on with
its job.

Stuart Sierra

unread,
Apr 28, 2012, 10:34:47 AM4/28/12
to clo...@googlegroups.com
Yes, I've been considering this.

Unknown tags could return some kind of "tagged object" that has the tag in its metadata. I don't know is what the interface to that object should be.

-S

Steve Miner

unread,
Apr 28, 2012, 2:08:35 PM4/28/12
to clo...@googlegroups.com
Fogus filed a bug about this a while back:

http://dev.clojure.org/jira/browse/CLJ-927

My suggestion is that an unknown tag should be read as a map with a couple of well-defined keys (such as :unknown-literal and :value) so that the program has a way to support data that it doesn't understand. This way the program has a chance to recreate the literal representation of the data so it can be passed on to some other process.

#unknown [1 2]
;=> {:unknown-literal unknown :value [1 2]}

Maybe the unknown tag should go on the metadata instead of in the returned value, but I was thinking that I actually want different unknown tags to be unequal. #unk1 [1 2] and #unk2 [1 2] should not accidentally be considered the same data value if the tags are unknown.

Steve Miner
steve...@gmail.com
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

kovas boguta

unread,
Jul 16, 2012, 7:03:00 PM7/16/12
to clo...@googlegroups.com, the.stua...@gmail.com
Bumping this thread, since this is becoming more of a blocker for my
efforts with Session.

I think there are three operations you want to do on the unknown
literal wrapper:
1. print/serialize it (this would just output the original string you read in)
2. get the tag name
3. get the uninterpreted data structure

I think the wrapper should be a datatype rather than an ordinary
collection. That way, if you try to manipulate it, it will fail
immediately and in an obvious way. If it is a collection, there is a
potential for conflict if the "intended" tagged literal interpretation
implements some of the same protocols as the collection.

The minimal protocol could look like
(defprotocol IUnknownLiteral (literal-name [this]) (literal-data [this])

and then you would simply have
(deftype UnknownLiteral [name data]
IUnknownLiteral
(literal-name [this] name)
(literal-data [this] data))

(In the cljs case it can also implement the printing protocol (which
doesn't exist in clj), though thats an implementation detail.)

Other names could be: (Generic|Undefined|Inert)(Literal|Data|Tag)

The only conceptual issue is what to do with metadata.

Should the UnknownLiteral be allowed to have its own metadata? Or
should it attempt to delegate it's metadata to whatever metadata was
originally associated with its literal-data?

If it has it's own metadata, this will be lost upon re-serialization.
Delegating to the literal-data will not work when the literal-data is
not a type that supports metadata.

The middle ground is to allow it to have its own meta data, and upon
serialization, just attach it to its literal-data, and upon
deserialization you get what you get. This may be the best option, but
this feels like a pretty hypothetical question.

It might be best to just not do anything with metadata right now and
decide later once there is some experience.

thoughts?

kovas boguta

unread,
Jul 16, 2012, 7:35:14 PM7/16/12
to clo...@googlegroups.com, the.stua...@gmail.com
The special but important case of compiling clojurescript also raises
some issues.

When clojurescript source code is read, this is done by clojure, and
thus the clojure tag readers apply. Currently they are re-bound to
read the tag literals into clojurescript source code which, when
executed, create the corresponding object.

There are two problems with this
1. you have the definition of the cljs tags in 2 places, 1 in clj
during compile, and 1 in cljs during runtime
2. you have to manually juggle which readers clj should use when (this
can be a real problem for the cljs toolchain)

The root problem is that there is no representation of tagged literals
other than as strings, which is the problem we are trying to solve
here.

Instead of serializing objects directly into eg #inst "2012-01-01" ,
they could first be converted into a GenericLiteral type, and then
appropriately dealt with by the cljs emitter (which will emit code to
resolve the tag name at runtime)

So the procedure for creating user-defined tagged literals could be:
a) implement a function that takes data and returns the constructed
datastructure, b) implement a protocol to generate a GenericLiteral
from the constructed datastructure.

In addition to solving the above problems, it is simpler, easier, and
less error prone than forcing the user to assemble the string
themselves. It is also forward-compatible with more compact binary
representations of clojure data.

Stuart Sierra

unread,
Jul 16, 2012, 9:40:21 PM7/16/12
to clo...@googlegroups.com, the.stua...@gmail.com
Hi Kovas,

I considered the problem of what to do with undefined tags when I implemented this, but I didn't have a clear idea of what the result type should be, so I ignored it.

I also didn't know what to do with the metadata. For example, on the JVM, you can't put metadata on Java types like String or Date. If there's metadata on the tagged type, do you try to preserve it? For that matter, what does the reader do now if you try to put metadata on a tagged literal that evaluates to a type which doesn't accept metadata? I assume the reader blows up there too.

Other similar tagging systems allow undefined tags to be passed through, and I think this is the right approach. You can represent any tagged literal as a <symbol, data> pair, and this pair deserves its own type. I'd call it TaggedData. It should implement the interface that handles metadata (IObj in Clojure on the JVM I think).

I like the idea of being able to define how tagged literals should be emitted without mucking about with strings. At first glance, your approach seems reasonable.

Are you on Clojure/dev? To move this forward, you can put a design proposal on dev.clojure.org. Keep in mind that tagged literals went through months of discussion before the implementation: http://dev.clojure.org/pages/viewpage.action?pageId=950382
-S

kovas boguta

unread,
Sep 18, 2012, 11:20:58 PM9/18/12
to clo...@googlegroups.com
So the edn spec gives the following guidelines:
"If a reader encounters a tag for which no handler is registered, the
implementation can either report an error, call a designated 'unknown
element' handler, or create a well-known generic representation that
contains both the tag and the tagged element, as it sees fit. Note
that the non-error strategies allow for readers which are capable of
reading any and all edn, in spite of being unaware of the details of
any extensions present."

https://github.com/edn-format/edn

It would suffice then to have *unknown-element-handler* , and avoid
specifying the "well-known generic representation" for now if people
want to delay decisions on metadata and protocols.

Steve Miner

unread,
Sep 19, 2012, 11:49:35 AM9/19/12
to clo...@googlegroups.com
I added the following comment to CLJ-927 as a possible solution.

It would be convenient if I could handle unknown tags using some sort of catch-all key in data-readers (say 'default). The associated function should take two arguments: the tag and the literal value. If there is no 'default key in data-readers, then an error would be thrown (same as Clojure 1.4).

I think it's a simple way to allow the programmer to take control without having to add new API or data types. It's just one distinguished key ('default, :default something like that) and one additional line of doc.

I have a patch that I'm planning to submit if dev people indicate support for it. The alternative of binding another dynamic var as the unknown tag handler would also work for me.

Steve Miner

Reply all
Reply to author
Forward
0 new messages