source configurations: data_readers, conditional reading

175 views
Skip to first unread message

Herwig Hochleitner

unread,
Oct 1, 2015, 1:24:30 AM10/1/15
to cloju...@googlegroups.com
The details of how source code should be read, are set up by the runtime / repl. There doesn't seem to be a coherent concept of how those options can be customized in a project, so they are left to be supported ad-hoc by build tools (which they mostly don't).

Let's discuss how we can replace data_readers with a more general and composable mechanism for setting up your source files.
If you are not sure whether this is desirable, please read [1] where I posted the paragraphs about the issues of data_readers after it got too long for the mail.

I have two angles of thinking about it:
- generalize data_readers and make it composable, for discussion see [2]
- enriching and configuring the ns clause

Ideally those approaches should converge into a coherent configuration system for all of cljc and make it skinnable enough for DSL folders with custom file suffixes as well as regular project embedding those DSLs.

I'd love to hear your thoughts

happy hammocking

Thomas Heller

unread,
Oct 1, 2015, 4:34:24 AM10/1/15
to Clojure Dev
I was having problems with exactly this issue recently and came to the conclusion that reader literals SHOULD NOT be used in source code. They are for data transfer, not source code.

I implemented some date/time related tags a while back since I needed to transfer those between the frontend and backend. I quite liked the look of things and started using those tags in my source code, eg. I have a scheduler in my app which would accept args like

{:start #time/local-time [2 0]
:every #time/duration {:hours 24}}

This is the same format I used in edn config files so it looked natural to also use it in code. Some data_readers and REPL trickery later and everything works out ok, except when it doesn't. I eventually gave up debugging why it sometimes doesn't work but to be honest reader literals are a bad idea to use in code. It became very obvious that there was also no point to using them in the first place. If you use actual code (which also sets up :require correctly, not hidden somewhere) things looks almost identical and some issues go away completely since those things are now constructed when the code is EXECUTED not when it is READ.

{:start (time/local-time 2 0)
:every (time/duration {:hours 24})}

Anyways, that is my experience and I have since completely stopped using tagged literals in code. To be honest I'd call data_readers.clj a mistake but YMMV.

Just my 2 cents,
/thomas

Herwig Hochleitner

unread,
Oct 1, 2015, 12:21:49 PM10/1/15
to cloju...@googlegroups.com
Thomas, your stance of "no reader tags in source" seems to be somewhat influenced by your negative experience. I wonder if you would conclude the same, had reader tags been perfectly integrated.
IOW what fault do you see in the principle of using reader tags in source? Isn't source code a means of "data transfer" into the compiler?

I think we agree that 1. homoiconicity is desirable because of simplicity 2. homoiconicity means that the language is expressed in its own data model.
#### My hypothesis goes: homoiconicity should also mean that **all** of the data model is available in the language, i.e. clojure needs to remain a proper superset of edn

More concretely: A tagged-literal (vs a list) denotes a _constant_ in source code, which is tangibly different from its unevaluated constructor form, in a strict, dynamic language.
One possible point of a t-l is to convey: "This form can be fully understood in a static context. A compiler is free to constant-pool it, preallocate it in static memory; otherwise unaware readers may still derive a bloom filter [1] for it"

I agree that data_readers was a mistake, only in current design, but not in its spirit of enabling reader-tags in source.
The design mistake being, that, because it's just a runtime-global configuration, it doesn't facilitate composition of **source code** trees, but this is precisely what happens when you put uncompiled clojure libraries on the classpath.

best regards

[1] Equality in a boxed t-l implies equality of the runtime representation, but unequal t-l might still be equal at runtime (= #date "2000" #date [2000])

Thomas Heller

unread,
Oct 1, 2015, 1:39:23 PM10/1/15
to Clojure Dev
I should note that I actually used data literals in my code pretty much since 1.4 and it all worked as expected once I had everything configured. I recently reorganized my projects a bit and somehow messed up the config for the reader stuff which made it not work anymore. I just didn't feel like trying to get that working again and settled on rewriting the literals I used to instead use actual code.

The problem with tagged literals in source as I see it is the complexity they introduce for very little gain. (none actually but that is my bias)

Take my example: I basically just grabbed the #time/duration literal and called it "mine". What happens when clj-time (or other libraries in that domain) suddenly wants to do the same? Sure we could use fully qualified names but since we do not have the aliasing facilities the ns macro provides that would probably mean more typing. Also since the namespace I use the tagged literal in does not declare that it depends on that functionality we end up with something hidden in our dependency tree which is a problem in itself.

Your suggestion in the gist of moving the data_readers file into a directory that corresponds does not solve the conflict problem, it is just somewhere else now.

Part of what makes data literals so awesome is that I can redefine what a tag means at read time. I was totally amazed when I migrated my project to JDK8 and started using the new java.time stuff, I had some EDN data in my database (with #inst) but didn't want to go through it all and neither did I want to juggle both java.util.Date and java.time. So I just changed how the reader handled the "inst" tag and I had the new data types everywhere. No other changes needed. As awesome as that experience was, I honestly would not want someone to be able to change what my code means just because the user decided to mess with some data literal options.

You are introducing new data types to the _compiler_ it does not know or understand. Not sure how much of an issue that is in Clojure but I know that this would be a big issue in CLJS. Having spent quite a bit a time in the CLJS compiler internals I can say that I definitely would not want that complexity there. As a "tool" author I can definitely say that I do not want anything that makes handling code more complex.

While my opinion is clearly biased I would not consider it to be so because something negative happened. I honestly do not see anything that even a "perfect" integration would provide? What do you gain (or hope to) with regards to tagged literals in code?

cheers,
/thomas

Herwig Hochleitner

unread,
Oct 2, 2015, 7:38:02 AM10/2/15
to cloju...@googlegroups.com
TLDR: let's keep focussing on how we can improve the situation, given that tagged literals in source are here to stay (also, #IMO #my2c #lol #xoxo ^™ a-good-idea)

2015-10-01 19:39 GMT+02:00 Thomas Heller <th.h...@gmail.com>:
I should note that I actually used data literals in my code pretty much since 1.4 and it all worked as expected once I had everything configured. I recently reorganized my projects a bit and somehow messed up the config for the reader stuff which made it not work anymore.

You're making my point here. The way we configure those things should be more robust, so that we can enjoy just the "works as expected" part.
 
I just didn't feel like trying to get that working again and settled on rewriting the literals I used to instead use actual code. 

The problem with tagged literals in source as I see it is the complexity they introduce for very little gain. (none actually but that is my bias)

How do you feel about the conceptional benefit of using all of edn in source files? (are we even arguing this? edn is supposed to be a subset)
I'd like to think of this discussion as a quest for simplification, because I believe that most of the complexity you mention is incidental.

Take my example: I basically just grabbed the #time/duration literal and called it "mine". What happens when clj-time (or other libraries in that domain) suddenly wants to do the same?

I sketched a mechanism to enable global shorthands while preventing such collisions in thoughts on a solution :: shorthanding.
A good solution requires more thought. My best bet is to make the reader environment  configurable per namespace, with explicit importing and aliasing of fully qualified tags. Also it needs to be easy to share reader environments among namespaces. 

Sure we could use fully qualified names but since we do not have the aliasing facilities the ns macro provides that would probably mean more typing.

I _was_ thinking about proposing an extension to accept tags like #::time/duration or even ##time/duration to piggieback on clojure's namespace aliasing, similar to how I did with my (as of yet unmerged) work on data.xml. However, with the reader environment being configurable per ns, this is not nessecary. 
 
Also since the namespace I use the tagged literal in does not declare that it depends on that functionality we end up with something hidden in our dependency tree which is a problem in itself.

The key is to only configure your own sub-namespace within a package, as dictated by general packaging discipline. Oh, the joys of a shared, mutable namespace.
Seriously though: The reader environment has to be passed along with the source code. We don't want to write down a reader-env more than once, so we need to be able to refer to it. And this has to fit within java's concept of a classpath, which implies that we can't naturally discover package boundaries and have to rely on the assumption that people only ship code in subfolders to which they personally own the corresponding domain name.

Your suggestion in the gist of moving the data_readers file into a directory that corresponds does not solve the conflict problem, it is just somewhere else now.

Yes, this place is called "the producer's responsibility". It's a better place than "source of entropy in the consumer's setup". To be clear: With the current mechanism, even a fully aware and cooperating producer _has no possibility_ of taking this burden off their consumers.

As awesome as that experience was, I honestly would not want someone to be able to change what my code means just because the user decided to mess with some data literal options.

That's the reason the configuration must be confined to either namespaces or namespace-subtrees. Also there are two very different cases, where either your users twiddle their reader tags and *unintentionally* affect your code, or, your users want to run your code with different reader tags because of reasons. I worry mainly about the first one. Your awesome experience seems to be rooted in the second one and I'm sure you'd grant your users the same.

You are introducing new data types to the _compiler_ it does not know or understand.

Arguably the compiler is the very piece of code that comes closest to *really understanding* any value in a language, since it formalizes the runtime behaviors for them and, in fact, all of the language. And if you think about it, the only things with special meaning to the compiler are sequences and symbols. Everything else evaluates to itself, hence only needs to be serializable.
 
Not sure how much of an issue that is in Clojure but I know that this would be a big issue in CLJS.

Last I looked, clojure used print-dup to embed arbitrary values during compilation. How is this a big issue in CLJS?
 
Having spent quite a bit a time in the CLJS compiler internals I can say that I definitely would not want that complexity there. As a "tool" author I can definitely say that I do not want anything that makes handling code more complex.

While my opinion is clearly biased I would not consider it to be so because something negative happened. I honestly do not see anything that even a "perfect" integration would provide? What do you gain (or hope to) with regards to tagged literals in code?

So, you mentioned that you used to use tagged literals a lot, and still do so for "data" files. The reason you changed your opinion on using them in code seems to be that you found some disproportional complexity. It is not clear to me, how this complexity is essential to allowing tagged literals in source and not incidental in the current implementation.

I argue that any one of the following will lead to much (much) more complexity than a proper solution could ever incur:
- not closing the gap between aot and source distribution
- not closing the gap between clj and cljs
- bifurcating clojure and edn, i.e. not accepting all of edn as clojure source
- excluding reader tags from future versions of edn/clojure/cljs

When you maintain, that you'd rather not deal with the reading environment in your tool, then :read-cond :preserve and a generic tagged-literal box are for you.
However, tagged literals in source are already a thing and based on excluding the above possibilities, they are not going away. Even if you don't like tagged literals as a source feature in particular, you probably like the ecosystem enough to feel compelled to bring this feature towards its most "palatable" form, instead of leaving it as a wart to warn people off of.

I hope it's now also clear how the compiler can handle tagged literals on the generic-box level, thus adding no significant complexity to the serialize-compiler-artefact -> initialize-runtime transition of the evaluator. However, properly supporting them will require rethinking their **resolution mechanism** in the evaluator. This is a bold statement, but I can show an example:

    Even if clojure is embedded and keeps its own context minimal, the current bunch of dynamic repl vars to set up the reader environment isn't expressive enough. At the very least, a hook into `require would be necessary (monkey patching the var doesn't count), because otherwise build tools have no chance to preserve the reader environment for a library across inclusion in a project. That's because require is recursive and crosses reader-env boundaries, so while requiring the reader-env needs to updated before reading a specific source file as part of load.

Why not add support for specifying the reader config per namespace, within or without the ns clause, so build tools won't have to hook require in the first place?

Thomas Heller

unread,
Oct 2, 2015, 9:15:19 AM10/2/15
to Clojure Dev
I'm totally for improving the situation but you'll first need to convince me that there is an actual problem this solves.

Can you provide an example (code) where any of what you are proposing is illustrated? I see things as they are in my world right now, your world probably looks different and you have a use-case that I cannot think of now. In my world I just don't see a problem that would be solved by tagged literals for CODE.

The tool I was referring to is shadow-build [1] which is a build library for CLJS. It interfaces very directly with the CLJS reader, analyzer and compiler and does some things very differently than cljs.closure. Integrating support for "custom" tagged literals, even on a per ns basis, is actually not hard. Getting the compiler to recognize them however is very hard. Reading&analysis happens in Clojure, the compiler then emits javascript. Say we now try to compile #time/duration, which results in a java.time.Duration. This class does not exist in Javascript and you'd somehow need to teach the analyzer&compiler how to construct something else in its place. If you'd just use (time/duration ...) in your code, the compiler does not need to know anything at all. It just creates a normal function call and the rest happens at runtime.

Again, which problem do you see in CODE that tagged literals solve?

To be honest I think that transit has a far better story for data transfer than EDN since the "write" part of EDN currently is horrible. A global multi-method with a default handler does not make a robust system if it prints non-EDN data without complaining. But lets not get into a data discussion here, lets keep things on topic about code.

cheers,
/thomas


[1] https://github.com/thheller/shadow-build

Herwig Hochleitner

unread,
Oct 2, 2015, 3:39:52 PM10/2/15
to cloju...@googlegroups.com
2015-10-02 15:15 GMT+02:00 Thomas Heller <th.h...@gmail.com>:
I'm totally for improving the situation but you'll first need to convince me that there is an actual problem this solves.

I'm sorry, I think you're misunderstanding my intentions. I'm looking for constructive discussion, not for people to convince.

Can you provide an example (code) where any of what you are proposing is illustrated?

I'm not proposing anything yet, so I don't have any example code. I'm still struggling with a coherent definition of the various problems we have with configuring the source. Will you help me find such a definition?

The tool I was referring to is shadow-build [1] which is a build library for CLJS. It interfaces very directly with the CLJS reader, analyzer and compiler and does some things very differently than cljs.closure. Integrating support for "custom" tagged literals, even on a per ns basis, is actually not hard. Getting the compiler to recognize them however is very hard. Reading&analysis happens in Clojure, the compiler then emits javascript.

Good point there! The compiler can't assume that the runtime will use the same representation as itself, especially during cross compilation. That's an important design constraint.
This also means that my previous statement about the requirement of being serializable, is wrong. The compiler can't directly serialize literals, but it has to work from the semantics of embedding #time/duration "42ms" as `(binding [*data-readers* ~target-source-readers] (read-string "#time/duration\"42ms\""))
 
Say we now try to compile #time/duration, which results in a java.time.Duration. This class does not exist in Javascript and you'd somehow need to teach the analyzer&compiler how to construct something else in its place. If you'd just use (time/duration ...) in your code, the compiler does not need to know anything at all. It just creates a normal function call and the rest happens at runtime.

Well, the compiler would know that java.time.Duration *prints* as #time/duration "...", and therefore it can know how reconstruct the value at runtime. That implies that the compiler needs two matching sets of reader-tags, when compiling AOT/CLJS: One for the source code representation (passed to macros and such) and one for the target representation. Do you agree that this would solve the issue?

Thomas Heller

unread,
Oct 3, 2015, 6:11:22 AM10/3/15
to Clojure Dev

I'm sorry, I think you're misunderstanding my intentions. I'm looking for constructive discussion, not for people to convince.

Isn't it the point of any good discussion to convince other people that their point of view might not be "correct"? I'm all for exploring new ideas and concepts, maybe I just don't see something obvious yet. I always like to learn new things, I have many opinions but in truth I know nothing. ;)



I'm not proposing anything yet, so I don't have any example code. I'm still struggling with a coherent definition of the various problems we have with configuring the source. Will you help me find such a definition?

I don't see any issues with configuring the source at all, except for data_readers which I think should be removed as I said before. That is why I'm very curious to learn what problem you think there are that are addressed by data_readers, or anything else in that domain.

 

Good point there! The compiler can't assume that the runtime will use the same representation as itself, especially during cross compilation. That's an important design constraint.
This also means that my previous statement about the requirement of being serializable, is wrong. The compiler can't directly serialize literals, but it has to work from the semantics of embedding #time/duration "42ms" as `(binding [*data-readers* ~target-source-readers] (read-string "#time/duration\"42ms\""))

That would be problematic with advanced optimizations and would also have horrible runtime performance since you are basically parsing this over and over and over again if you use a literal in a function body. One could try to write a compiler pass that tries to extract those and turns them into real "constants" but for what gain?

 
 
Say we now try to compile #time/duration, which results in a java.time.Duration. This class does not exist in Javascript and you'd somehow need to teach the analyzer&compiler how to construct something else in its place. If you'd just use (time/duration ...) in your code, the compiler does not need to know anything at all. It just creates a normal function call and the rest happens at runtime.

Well, the compiler would know that java.time.Duration *prints* as #time/duration "...", and therefore it can know how reconstruct the value at runtime. That implies that the compiler needs two matching sets of reader-tags, when compiling AOT/CLJS: One for the source code representation (passed to macros and such) and one for the target representation. Do you agree that this would solve the issue?

This leaves way too many things for the compiler to worry about instead of letting the user just do it in code. Configuring a CLJS build is already way too complex, letting the user work this out would not be exactly user-friendly. I prefer to work with code rather than configuring a tool, which is also why using shadow-build is just writing clojure and not a config file. ;)

The code reusability problem is sort of addressed with Reader Conditionals. I don't quite like those BUT they are the best solution so far. I accept their complexity as a necessary evil, but I try to minimize their use as much as possible. Clojure code can already the incredible dense and in some situations it does make things much more difficult to think about the CLJ implementation while also looking at the CLJS implementation at the same time. Just look at [1], I have a very hard time figuring out how the ns form for CLJ looks like. I fear that Reader Conditions will make code much less maintainable over time since you have to consider all implementations at all times. Looking at the recent self-host work in CLJS I'm worried that it will slow things down considerably in the future, or things breaking on one or the other side which is directly related to Reader Conditionals. While the end goal certainly has value and is interesting I wonder if the price of the chosen implementation is too high, but time will tell and I hope things work out well.

I don't see tagged literals in code improving that situation which is why I don't think we should have them. 

Anyways, still curious to learn why you think they are such a great idea.

Cheers,
/thomas

Herwig Hochleitner

unread,
Oct 4, 2015, 8:42:38 PM10/4/15
to cloju...@googlegroups.com
2015-10-03 12:11 GMT+02:00 Thomas Heller <th.h...@gmail.com>:
Isn't it the point of any good discussion to convince other people that their point of view might not be "correct"?

I disagree and I think this narrow focus on one possible outcome of a discussion is actually harmful to the process.

I don't see any issues with configuring the source at all, except for data_readers which I think should be removed as I said before.
 
Before I can think about supporting that approach, I'll have to be clear about a few things:
- How would you go about removing data_readers?
  - Would you leave in the current default reader tags for #date, #uuid, ...?
    - If so, why only those?
- Would you, on the same grounds, also reject support for configuring custom read-cond flags?

The compiler can't directly serialize literals, but it has to work from the semantics of embedding #time/duration "42ms" as `(binding [*data-readers* ~target-source-readers] (read-string "#time/duration\"42ms\""))

That would be problematic with advanced optimizations and would also have horrible runtime performance since you are basically parsing this over and over and over again if you use a literal in a function body.

The key phrase is "work from the semantics of"

That implies that the compiler needs two matching sets of reader-tags, when compiling AOT/CLJS: One for the source code representation (passed to macros and such) and one for the target representation. Do you agree that this would solve the issue?

This leaves way too many things for the compiler to worry about instead of letting the user just do it in code. Configuring a CLJS build is already way too complex, letting the user work this out would not be exactly user-friendly. I prefer to work with code rather than configuring a tool, which is also why using shadow-build is just writing clojure and not a config file. ;)

I take it your response means: "Yes, that would solve it, but I still don't like the issue and therefore I don't like the solution either"?

Now to get it off my chest, I'll not answer to the rest of your email linewise, because AGAIN, I don't want to discuss whether reader tags, reader conditionals or any other feature already in clojure "are a good idea". I can only point you to the previous discussion that went into designing those features.

What I'd like to discuss is, given the form of clojure source code as allowed now, how to best support configuring it.
My justification is that configuration is necessary, because there needs to be an explicit API for build tools (which includes the repl).
It's easy to show that the current API, comprising load, require and several dynamic vars, is insufficient, as soon as crossing configuration boundaries.
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------

Look, I too am worried by the combinatoric explosion that feature-cond and tagged-literals might cause in the behaviors of source, I really am. But for the sake of thinking about their implementation, we need to suspend our disbelief and focus on the design. It's not possible to genuinely assess their value based on a broken implementation. Would you agree?

Thomas Heller

unread,
Oct 5, 2015, 5:58:34 AM10/5/15
to Clojure Dev


Now to get it off my chest, I'll not answer to the rest of your email linewise, because AGAIN, I don't want to discuss whether reader tags, reader conditionals or any other feature already in clojure "are a good idea". I can only point you to the previous discussion that went into designing those features.


I'm well aware of the work that went into reader tags or reader conditionals and mostly agree with all of them. I'm still critical about their implementation and "impact" however, which is why I started engaging in this discussion in the first place.

I started advocating to remove data_readers, then we derailed into reader conditionals and other things.

So let me try to clarify my initial point of view on this topic:

clojure.core/read-string and clojure.edn/read-string currently share a lot of implementation details but already are considerably different.

clojure.core/read-string (which under the hood uses the same reader clojure uses to parse source code) supports read-cond, tagged-literals, metadata, #=(eval calls) via *read-eval*, #my.Record. {constructor calls} and probably more.

clojure.edn/read-string only supports tagged-literals.

I'm advocating removing tagged-literals from clojure.core/read-string. Which is currently the only place data_readers is used, so their is no need to configure them in the first place. I'm not changing clojure.edn, that remains exactly like it is now. I'm under the impression that clojure.edn already ignores data_readers but I just realized that might not be the case. It doesn't change anything though as read handlers should be a function argument not something hidden on the classpath.

With removing tagged-literals from clojure.core/read-string the need to configure the compiler and build tools completely goes away, which is arguably the simplest solution. We _could_ implement several solutions to "properly" support tagged-literals in clojure.core/read-string, I'm arguing the point that we _should not_ do that. 

I'm trying the understand your opinion on tagged-literals in clojure.core/read-string as you clearly see value in them, which I do not. Please beware the distinction between the 2 readers I make here. One reads code, one reads data. clojure.core/read-string should never be used to read data, which is why clojure.edn was created in the first place.

I apologize if we are talking about different things or I did not make my point clear enough earlier.


Cheers,
/thomas

Herwig Hochleitner

unread,
Oct 5, 2015, 9:55:25 AM10/5/15
to cloju...@googlegroups.com
2015-10-05 11:58 GMT+02:00 Thomas Heller <th.h...@gmail.com>:
clojure.core/read-string and clojure.edn/read-string currently share a lot of implementation details but already are considerably different.

Yes, in that clojure.edn/read-string reads a subset of the language accepted by clojure.core/read-string.

clojure.core/read-string (which under the hood uses the same reader clojure uses to parse source code) supports read-cond, tagged-literals, metadata, #=(eval calls) via *read-eval*, #my.Record. {constructor calls} and probably more.

clojure.edn/read-string only supports tagged-literals.

Well, all of the above could be read with with basic tagged-literal machinery, but I get your point.

I'm advocating removing tagged-literals from clojure.core/read-string. Which is currently the only place data_readers is used, so their is no need to configure them in the first place.

You ignored 2 crucial questions: 

So what about the current default data-readers, like #instant, #uuid, ... would those stay?
And if so, why only those?

With removing tagged-literals from clojure.core/read-string the need to configure the compiler and build tools completely goes away, which is arguably the simplest solution.

You ignored another crucial question:

Would you, on the same grounds, also reject support for configuring custom read-cond flags?
 
I'm trying the understand your opinion on tagged-literals in clojure.core/read-string as you clearly see value in them, which I do not.

One word, arguably a simple one, I must have mentioned it like 100 times now: subset
 
Please beware the distinction between the 2 readers I make here. One reads code, one reads data.

I've been aware of that distinction you're making, and painfully so.
 
clojure.core/read-string should never be used to read data,

OK, now that statement would be laughable in any programming language. For a lisp, it crosses over into the firmly absurd.
 
which is why clojure.edn was created in the first place.

It was created because we wanted a formalized, controllable subset of code, for data read from untrusted peers.
Using the clojure reader on trusted data is completely fine.

I apologize if we are talking about different things or I did not make my point clear enough earlier.

Well, I think your point is clear now. From what I gather, you're talking about seperation of code and data to avoid dealing with data as code. I think this is unacceptable, and I want to stop talking about it. Still, if you can answer just my 5 questions, that I had on your point, I'll continue that line of thought for chance that I'm not seeing something right:

- How would you go about removing data_readers? (You partially answered this, but not the sub-questions)
  - Would you leave in the current default reader tags for #date, #uuid, ...?
    - If so, why only those?
- Would you, on the same grounds, also reject support for configuring custom read-cond flags?
- Would it be an error then, to return data, read by edn/read-string, from a macro, or passing it to eval by any other means? 

Alex Miller

unread,
Oct 5, 2015, 11:43:40 AM10/5/15
to Clojure Dev
Can we back this discussion up to the beginning and describe an actual problem we are trying to fix? 

Thomas reported that he ran into trouble with using tagged literals in code but did not describe what those problems actually were.

One specific issue mentioned was namespacing ("I grabbed time/..." but this could conflict) - the answer to this is to use your own less-general namespace. (I think grabbing "time" as a namespace is a bit presumptuous. Good thing you didn't also grab "space". :) 

All data_readers.clj on the classpath are read and combined. However, the general recommendation in the past is that libraries should not include them and should instead define how to install tagged literal readers in a single application level map. 

I feel like this thread has gone way down a path of speculation about specific things (some of which are simply not even on the table). Can we please re-orient it to describing a specific problem with tagged literals etc as they exist now?

Thomas Heller

unread,
Oct 5, 2015, 11:56:26 AM10/5/15
to Clojure Dev

OK, now that statement would be laughable in any programming language. For a lisp, it crosses over into the firmly absurd.

Yes, code is data. Absolutely no argument there. We are getting to the point where the words we use make things very complicated.

Since you keep using the word "subset":

Code is a "subset" of data composed of lists, vectors, keywords, symbols, etc. It does not include java.util.Date, java.util.UUID, clojure.lang.Atom or anything like that (IMHO).

Data includes just about everything, it is not a subset of code.


- How would you go about removing data_readers? (You partially answered this, but not the sub-questions)
  - Would you leave in the current default reader tags for #date, #uuid, ...?
    - If so, why only those?

I do not know how I'd go about removing this stuff but I would remove everything.
 
- Would you, on the same grounds, also reject support for configuring custom read-cond flags?

No, please do not mistake anything I say about tagged-literals to apply to read-cond. read-cond is special and can do things tagged-literals can not do (ie. #?@ splicing). All for configuring those although I do not see why you'd want that.
 
- Would it be an error then, to return data, read by edn/read-string, from a macro, or passing it to eval by any other means?

Well, that depends on what you read.


Alex Miller

unread,
Oct 5, 2015, 12:10:41 PM10/5/15
to cloju...@googlegroups.com
Removing data_readers is a solution (not a problem) and is highly unlikely to happen, so discussing that is not useful.

Please state a specific problem you are encountering that needs to be solved or end this thread.



--
You received this message because you are subscribed to the Google Groups "Clojure Dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure-dev...@googlegroups.com.
To post to this group, send email to cloju...@googlegroups.com.
Visit this group at http://groups.google.com/group/clojure-dev.
For more options, visit https://groups.google.com/d/optout.

Thomas Heller

unread,
Oct 5, 2015, 12:11:56 PM10/5/15
to Clojure Dev
Thanks for jumping in Alex.

I could not reliably reproduce the problems I had. At certain points I would get a "java.lang.ExceptionInitializationError" in a REPL when trying to load-file something that had tagged literals in them. Sometimes it worked, sometimes it didn't. I did not bother to try to find out why. "Have you tried turning it off and on again?" was simpler.

I'm well aware that removing tagged-literals is not on the table, I'm just proposing putting it there (even if just for Clojure 2.0, or something else far far off). I seem to be alone with that opinion so I'll shut up about it now.

Cheers,
/thomas

Herwig Hochleitner

unread,
Oct 5, 2015, 5:23:33 PM10/5/15
to cloju...@googlegroups.com
2015-10-05 17:43 GMT+02:00 Alex Miller <al...@puredanger.com>:
Can we back this discussion up to the beginning and describe an actual problem we are trying to fix? 

Sure, the problem to me is, that when (require '[dependency.foo]) crosses over a project boundary into another source dependency, clojure currently provides no mechanism to account for a different reader configuration there.
Currently possible workarounds would be:
- monkey patching require to set up the appropriate env
  -- let's just not do that
- enforcing a consistent source configuration over all clojure projects
  -- that's not realistic. it's one thing to encourage users to stick to vanilla clojure, but we need to account for allowed use cases and consistency of compilation results
- forbidding reader tags as well as read-cond in source
  -- well, no
- AOT compiling the dependency or dumbing it down with a transpiler
  -- such a transpiler might be nice to have as a mvn plugin anyway, since e.g. contrib projects wanting to retain clojure 1.4 compat could then utilize read-cond when interacting with newer language features
  -- relying on this as the only solution makes the repl less useful because it opens a semantic gap between AOT and eval.

One specific issue mentioned was namespacing ("I grabbed time/..." but this could conflict) - the answer to this is to use your own less-general namespace. (I think grabbing "time" as a namespace is a bit presumptuous. Good thing you didn't also grab "space". :)

Yes, at least with time, most people would agree that it's relative :o)

The generally correct answer, to grab a namespace that you DNS-own (or at least own on clojars ;-), is not always applicable or desirable. So, thinking about isolating source is the right mindset to also think about explicitly renaming tags, or shorthanding them. A system like clojure's shorthand - keywords ::sh/kw, based on namespace aliases, might be a good fit.

All data_readers.clj on the classpath are read and combined.

Ah, I wasn't aware of that. I thought that it was a first-one-in wins, as with other classpath resources.
 
However, the general recommendation in the past is that libraries should not include them and should instead define how to install tagged literal readers in a single application level map.

In effect this prevents a library to make use of tagged literals within its own source, if it wants to do source distribution. That is, because the user might choose to include different data_readers or none at all.
For read-cond, the situation is similar. I think the decision to only allow single keywords as features is good, but a project needs to be able to derive its own features.
E.g. A project might enable usage :jsr-310 on :clj, when compiled for jdk >= 8 or with a compatibility jar.
Projects need to be able to express such configuration in a build-tool - agnostic way, so that we can guarantee consistent results on repl and aot.
I'm not saying we should invent a full blown project object model like maven, but there needs to be enough to not force build tools to monkey-patch require or transpile the source.

Can we please re-orient it to describing a specific problem with tagged literals etc as they exist now?

Yes, please!

Given two programs to combine, both utilizing a conflicting #time/duration tag, I'd like clojure to support any of the following resolutions:

1. You actually want to override reader-tag definitions in the source programs
  a. the two definitions actually construct the same type, so you can make a reader tag fn, that accepts a superset of both their input syntaxes.
  b. the constructed types can be made interchangeable via protocols
2. data-flows going through either library are held separate, conversion is done explicitely
3. definitions don't actually conflict, but the user decides to refer to the reader-tag by a different name
--
1. is the only option, currently supported. Also, explicit overriding needs to remain available, maybe the overriding mechanism even needs to be extended to work consistently in AOT compiled code
2. and 3. may seem to work by just renaming the tags, until one of the libraries decides to utilize a reader tag in its own source, or a third program refers to the tags by their original name.

Ideally, there would be some means of forming a closure of reader-config over some source code, serializing the result to a jar-file or class-folder, and have everything be robust against composition, yet overridable.
Were it not for the "overridable" part, the answer might be as simple as AOT-compiling or transpiling, but since overriding is part of the intention, and since our evaluator directly understands reader-config, it should also understand a way to read non-uniformly configured code.

thanks!

Thomas, I'll respond to you off-list

Alex Miller

unread,
Oct 6, 2015, 12:08:22 PM10/6/15
to cloju...@googlegroups.com
As a general statement, I think that tagged literals are most appropriate for data and it is generally more challenging to use them safely in source code, particularly if there is a chance of them having more than one possible reader function for the same tag. I would use this technique sparingly, in cases where you have good control over the source involved (for instance, in your own application) and try to avoid it in libraries.

On Mon, Oct 5, 2015 at 4:23 PM, Herwig Hochleitner <hhochl...@gmail.com> wrote:
Sure, the problem to me is, that when (require '[dependency.foo]) crosses over a project boundary into another source dependency, clojure currently provides no mechanism to account for a different reader configuration there.

From the Clojure runtime pov, there is no "project boundary" and there is only one reader configuration - specifically the configuration defined by the sum of the data_readers.clj in the classpath (which must be non-conflicting).

I do not believe there is any desire or intention to create the situation where the same tag in source is read by different reader functions. That *is* possible when reading as data, by binding explicitly binding *data-readers* around the read call.
 
Currently possible workarounds would be:
[elided] 

The generally correct answer, to grab a namespace that you DNS-own (or at least own on clojars ;-), is not always applicable or desirable. So, thinking about isolating source is the right mindset to also think about explicitly renaming tags, or shorthanding them. A system like clojure's shorthand - keywords ::sh/kw, based on namespace aliases, might be a good fit.

All data_readers.clj on the classpath are read and combined.

Ah, I wasn't aware of that. I thought that it was a first-one-in wins, as with other classpath resources.

Additionally, conflicts generate an exception - there is no overloading or preference allowed.
 
 
However, the general recommendation in the past is that libraries should not include them and should instead define how to install tagged literal readers in a single application level map.

In effect this prevents a library to make use of tagged literals within its own source, if it wants to do source distribution. That is, because the user might choose to include different data_readers or none at all.

It is possible to include resources that are used at compile or test time but not shipped with the output, so this is not correct. However, for the reasons above, I agree that this is inadvisable.
 
For read-cond, the situation is similar. I think the decision to only allow single keywords as features is good, but a project needs to be able to derive its own features.

Reader conditionals are a totally different thing and I would not include them in this discussion. Reader features are *not* an open set - the only valid tags in source are :clj, :cljs, and :cljr. (You can invoke either the Clojure reader or tools.reader with additional features, but this will never be done with source code via the normal read paths.)
 
Projects need to be able to express such configuration in a build-tool - agnostic way, so that we can guarantee consistent results on repl and aot.

data_readers.clj is build-tool agnostic.

Re aot, note that it is not possible to represent arbitrary object instances or tagged literals in Java bytecode. The call to the data reader function (as defined at the time of compilation) will be embedded in the class file for the function using the literal. Additionally, you may need to add a require to load the ns containing that function in the source ns to make that work. Because the reader calls are made into RT at static class initialization time, there are possible load ordering problems, which could result in the exceptions that were mentioned by Thomas. In other words, aot (not surprisingly) makes something already complicated, more complicated.
 
Given two programs to combine, both utilizing a conflicting #time/duration tag, I'd like clojure to support any of the following resolutions:

1. You actually want to override reader-tag definitions in the source programs
  a. the two definitions actually construct the same type, so you can make a reader tag fn, that accepts a superset of both their input syntaxes.
  b. the constructed types can be made interchangeable via protocols
2. data-flows going through either library are held separate, conversion is done explicitely
3. definitions don't actually conflict, but the user decides to refer to the reader-tag by a different name

None of these seem like a good idea to me in tandem with *source* usage. In the context of *data* usage, if the libraries do not supply a data_readers.clj, then the application is free to construct a single data_readers.clj that resolves the issues. Or, the application can explicitly bind *data-readers* around the point of data read to resolve things in an appropriate manner (which may use one library or another). Printing is a separate (but related) matter - whatever type is being constructed may need to be printed and that depends on what's installed into print-method/print-dup. 
 
Ideally, there would be some means of forming a closure of reader-config over some source code, serializing the result to a jar-file or class-folder, and have everything be robust against composition, yet overridable.

That does not seem like a feasible goal based on what I know. The notion of "projects" with self-contained code and config is at odds with how the classpath is actually constructed and used by classloaders. I do not think making this *more* complicated is likely to make anything better.
 
Were it not for the "overridable" part, the answer might be as simple as AOT-compiling or transpiling, but since overriding is part of the intention, and since our evaluator directly understands reader-config, it should also understand a way to read non-uniformly configured code.

There is no intention to enable "non-uniformly configured code", as far as I understand the design.

Reply all
Reply to author
Forward
0 new messages