How to escape a space in a keyword?

1,814 views
Skip to first unread message

Frank Siebenlist

unread,
Mar 6, 2012, 1:44:21 PM3/6/12
to Clojure, Frank Siebenlist
I kind of accidently discovered that you can create keywords with an embedded space… not sure if that's a good idea, but you can:

user=> (def k (keyword "jaja nee"))
#'user/k
user=> (str k)
":jaja nee"
user=> (name k)
"jaja nee"
user=> (keyword? k)
true
user=> (keyword? :jaja nee)
CompilerException java.lang.RuntimeException: Unable to resolve symbol: nee in this context, compiling:(NO_SOURCE_PATH:15)
user=> (keyword? :jaja\ nee)
RuntimeException Unsupported character: \ nee clojure.lang.Util.runtimeException (Util.java:156)
RuntimeException Unmatched delimiter: ) clojure.lang.Util.runtimeException (Util.java:156)
user=> (keyword? :jaja\\ nee)
CompilerException java.lang.RuntimeException: Unable to resolve symbol: nee in this context, compiling:(NO_SOURCE_PATH:17)
user=>

The Q is how do you escape such an embedded space such that the reader will interpret it as a keyword?

-FrankS.

Frank Siebenlist

unread,
Mar 6, 2012, 2:48:17 PM3/6/12
to Clojure, Frank Siebenlist
One other option I should have tried before:

user=> (pr-str k)
":jaja nee"
user=> (read-string (pr-str k))
:jaja
user=>

but unfortunately that doesn't help either.

-FrankS

Moritz Ulrich

unread,
Mar 6, 2012, 2:49:18 PM3/6/12
to clo...@googlegroups.com
I don't think you're supposed to use spaes in keywords.

> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

--
Moritz Ulrich

Timothy Baldridge

unread,
Mar 6, 2012, 2:56:19 PM3/6/12
to clo...@googlegroups.com
> I don't think you're supposed to use spaces in keywords.


Using spaces in keywords is completely valid, as is using spaces in
symbols. You just have to be aware that there may be times when you
can't represent them in a literal form. pr-str could be extended to do
this though:

(pr-str :foo)
":foo"

(pr-str k) ; as in your example
"(keyword "jaja nee")"

There are many times when you may actually want to have spaces as
keywords. Consider the following json:

{"first name" : "John"
"last name" : "Doe"
"age" : 42}

If we convert this json to clojure, it would be nice to be able to do:

user=> (:age data)
42

And we can still do this:

user=> ((keyword "first name") data)
"John"

Why restrict the user needlessly?


Timothy

Frank Siebenlist

unread,
Mar 6, 2012, 3:05:43 PM3/6/12
to clo...@googlegroups.com, Frank Siebenlist
Those are convincing arguments/examples that spaces should be allowed in keywords.

Your solution to enhance "pr":

(pr-str k) ; as in your example
"(keyword "jaja nee")"

should work.

Btw., symbols seem to have the exact same issue:

user=> (def s (symbol "with space"))
#'user/s
user=> s
with space
user=> (pr-str s)
"with space"
user=> (read-string (pr-str s))
with
user=>

Is this a bug?

-FrankS.

Stuart Halloway

unread,
Mar 6, 2012, 3:13:20 PM3/6/12
to clo...@googlegroups.com
I don't think you're supposed to use spaces in keywords.


Using spaces in keywords is completely valid, as is using spaces in
symbols. 

Legal characters in keywords and symbols are documented at  http://clojure.org/reader :
 
"Symbols begin with a non-numeric character and can contain alphanumeric characters and *, +, !, -, _, and ? ... Keywords are like symbols ..."

Stu



Timothy Baldridge

unread,
Mar 6, 2012, 3:25:28 PM3/6/12
to clo...@googlegroups.com
> "Symbols begin with a non-numeric character and can contain alphanumeric
> characters and *, +, !, -, _, and ? ... Keywords are like symbols ..."

But this is the documentation for the reader...not necessarily for
symbols/keywords.

My argument is that clojure in no way validates the input to (symbol)
or (keyword) so why should we call this a bug? IMO, it's the same
problem with using Atoms and Refs with pr-str. If an Atom doesn't
round-trip through pr-str and the reader, is it somehow invalid? No,
it just means you shouldn't use it in cases where you need it to
round-trip. The same applies to keywords with unicode characters, or
symbols with spaces.

Timothy

Frank Siebenlist

unread,
Mar 6, 2012, 3:28:33 PM3/6/12
to clo...@googlegroups.com, Frank Siebenlist
So… spaces are not allowed in symbol and keyword identifiers according to the "spec"…

although Stu doesn't quote the phrase following the allowed chars, which reads:

(other characters will be allowed eventually…)

which seems to keep the door open for allowing spaces in the future (?).

If space are not allowed, then it seems we have a bug in (keyword …) and (symbol …),
if they will be allowed in the future, then (pr …) could/should be enhanced.

-FrankS.

Tassilo Horn

unread,
Mar 6, 2012, 4:40:07 PM3/6/12
to clo...@googlegroups.com
Timothy Baldridge <tbald...@gmail.com> writes:

>> I don't think you're supposed to use spaces in keywords.
>
> Using spaces in keywords is completely valid, as is using spaces in
> symbols. You just have to be aware that there may be times when you
> can't represent them in a literal form. pr-str could be extended to do
> this though:
>
> (pr-str :foo)
> ":foo"
>
> (pr-str k) ; as in your example
> "(keyword "jaja nee")"

I thought I could trick the reader by writing :jaja\u0020nee, but it
won't work. Thereby, I discovered that clojure diverges quite a bit
from java in its unicode escapes handling. Whereas in java, you can
replace every character in the source code by its unicode sequence, in
clojure you cannot.

For example, this is all legal java:

"foo\u0022 // => "foo"
1 \u002b 2 // => 3

// class Foo { int x; }
\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u0020\u007B\u0020\u0069\u006E\u0074\u0020\u0078\u003B\u0020\u007D

But the clojure reader doesn't accept "foo\u0022 as a valid string, and
neither is (\u002b 1 2) a valid call of clojure.core/+.

So basically, in java unicode escapes are replaced before parsing,
whereas in clojure any unicode escape evaluates to a character, e.g.,
(\u002b 1 2) is (\+ 1 2), not (+ 1 2).

Well, I think the clojure way is the saner one. And if you really need
to instantiate

(defrecord इ [x])

as (\u0907. 1), you can add your own `with-replaced-unicode' macro and
be done.

Bye,
Tassilo

Weber, Martin S

unread,
Mar 6, 2012, 4:22:00 PM3/6/12
to clo...@googlegroups.com
I was looking for something akin common lisps |weIrD SymBol!`| already,
too...

On 2012-03-06 15:28 , "Frank Siebenlist" <frank.si...@gmail.com>
wrote:

>SoŠ spaces are not allowed in symbol and keyword identifiers according to
>the "spec"Š


>
>although Stu doesn't quote the phrase following the allowed chars, which
>reads:
>

>(other characters will be allowed eventuallyŠ)


>
>which seems to keep the door open for allowing spaces in the future (?).
>

>If space are not allowed, then it seems we have a bug in (keyword Š) and
>(symbol Š),
>if they will be allowed in the future, then (pr Š) could/should be

Meikel Brandmeyer

unread,
Mar 6, 2012, 6:23:17 PM3/6/12
to clo...@googlegroups.com
Hi,

Am 06.03.2012 um 21:28 schrieb Frank Siebenlist:

> So… spaces are not allowed in symbol and keyword identifiers according to the "spec"…
>
> although Stu doesn't quote the phrase following the allowed chars, which reads:
>
> (other characters will be allowed eventually…)
>
> which seems to keep the door open for allowing spaces in the future (?).
>
> If space are not allowed, then it seems we have a bug in (keyword …) and (symbol …),
> if they will be allowed in the future, then (pr …) could/should be enhanced.

I think this has been discussed several times in the past.

This is *not* a bug in keyword or symbol. A string with spaces is simply not in the domain of these functions. So by definition it cannot be a bug. What happens is a pure decision by the implementer. The functions could throw an exception. Or they could return an invalid value. symbol and keyword do the latter. (Whether this is good or bad might be a matter of discussion, but it is not a bug.)

And of course vice versa: the behaviour of symbol or keyword is *not* the definition of valid keywords/symbols. That is what Stu cited.

I doubt that spaces will be officially allowed until there is – say – #|foo bar| for symbols and #:foo bar: for keywords.

Until then a translation of {"Foo Bar" 35} to {:foo-bar 35} is a quite feasible workaround.

BTW: Similarly you can use non-ASCII characters for symbols in your clojure code. This code would then be broken. The reader doesn't complain, but the “spec” says: only ASCII characters are allowed. Same situation as above.

BTW2: There are a lot of things which might be desirable for various reasons. Spaces in symbols or keywords might be one of these things. But that doesn't mean it's a good idea to implement it. In Clojure's past there were several things which were requested with a lot of pressure (often combined with “sky is falling” comparisons), which were not implemented in the language. Only after long considerations changes went into the language (often) providing a superior solution to the previously requested versions. Patience is a required skill of a clojurian. ;)

Sincerely
Meikel

Phil Hagelberg

unread,
Mar 6, 2012, 7:01:50 PM3/6/12
to clo...@googlegroups.com
Meikel Brandmeyer <m...@kotka.de> writes:

> I think this has been discussed several times in the past.
>
> This is *not* a bug in keyword or symbol.

Here's the relevant issue:

http://dev.clojure.org/jira/browse/CLJ-17

Closed as "declined" in October, so I think it's safe to say the
"you're on your own"ness is at least somewhat intentional.

-Phil

Leif

unread,
Mar 6, 2012, 7:11:28 PM3/6/12
to clo...@googlegroups.com

Unfortunately, the reader does not actually follow this spec, e.g. it will happily accept :div#id.cla$$ as a valid keyword.  Some web programming clojure[script] libraries use this pseudo-CSS syntax in keywords, so if the reader was changed to strictly follow these rules, a lot of web code would probably break.

Frank Siebenlist

unread,
Mar 6, 2012, 7:37:01 PM3/6/12
to clo...@googlegroups.com, Frank Siebenlist
Thanks for the explanation - I tried to search for previous discussions and bug reports about this issue before I posted, but it's clear I didn't look hard enough...

I could still argue that there is a bug in the documentation that should spell this out - especially when it has been discussed several times in the past… it could save a lot of time, confusion and rehashing.

Maybe the keyword/symbol doc string should add something like: "Note that the identifier-string is not checked for correctness, which implies that function will return an invalid value for an invalid identifier.".

Personally I think it really helps if the doc-string specifies the contract, especially when it has been decided to leave it somewhat ambigeous on purpose.

Thanks, FrankS.

PS Guess we can see the consequences of silently returning invalid values: possibly creating legacy that depends on those, like the json examples.

Tassilo Horn

unread,
Mar 7, 2012, 2:17:54 AM3/7/12
to clo...@googlegroups.com
Tassilo Horn <tas...@member.fsf.org> writes:

> So basically, in java unicode escapes are replaced before parsing,
> whereas in clojure any unicode escape evaluates to a character, e.g.,
> (\u002b 1 2) is (\+ 1 2), not (+ 1 2).
>
> Well, I think the clojure way is the saner one. And if you really
> need to instantiate
>
> (defrecord इ [x])
>
> as (\u0907. 1), you can add your own `with-replaced-unicode' macro and
> be done.

Ups, that statement is wrong, because the reader cannot read symbols
with unicode escapes, and it wouldn't work for strings like "foo\u0022
anyway.

Bye,
Tassilo

Meikel Brandmeyer

unread,
Mar 7, 2012, 5:00:25 PM3/7/12
to clo...@googlegroups.com
Hi,

Am 07.03.2012 um 01:11 schrieb Leif:

> Unfortunately, the reader does not actually follow this spec, e.g. it will happily accept :div#id.cla$$ as a valid keyword. Some web programming clojure[script] libraries use this pseudo-CSS syntax in keywords, so if the reader was changed to strictly follow these rules, a lot of web code would probably break.

It's the other way around: the reader *does* follow the spec. Feed it a valid sequence of characters and you will get a valid keyword. Feed it an invalid sequence of characters and you will get an invalid keyword. This is not in conflict with the spec. A bad decision to do so instead of bailing out? Maybe. Maybe not.

However documenting everywhere that feeding garbage in might result in shit coming out has this “things in the mirror might be closer than they appear” taste. WTF? The mirror is not the problem. Nor is the car manufacturer to be held responsible. If the driver of a car causes an accident, it's her fault. *She* is responsible. No one else.

The developers of said libraries obviously didn't live up to their responsibility. Either they did not research how the reader works and what valid keyword characters are. Then they did not do their job correctly. Or they knew how the reader works but knowingly decided to operate it outside defined limits. Then they basically gambled that things keep working and that the characters will be eventually allowed in future versions of the reader.

In this particular case I would expect the bet to pay off. But if suddenly the reader is changed to handle # differently, then all these libraries will break. And now guess who is responsible. The reader? No.

Don't pass the buck.

Sincerely
Meikel

PS: I hope Clojure's development mode is not switched to “fait accompli driven.”

Frank Siebenlist

unread,
Mar 7, 2012, 7:16:48 PM3/7/12
to Clojure, Frank Siebenlist
Clojure is a young language, and I believe there is little argument that some of the interfaces/implementations and associated docs could be improved. You can find plenty of examples where functions would throw exceptions for invalid input, others return nil in that case, and a number return garbage for garbage. Some of this perceived inconsistency could be improved upon, either by changing/adding validation code or by adding clarifications to the documentation of the functions.

The symbol function doc consists of only: "Returns a Symbol with the given namespace and name." Unless you suggest that we all look at the symbol code itself before we start using those parameters, it may help to add a little bit more text to guide the novice user of the interface such that one knows that this particular function doesn't throw exceptions or return nil, but returns invalid output for invalid input… for that suggestion I'm rewarded with a condescending "…WTF…" gender-neutral (?) rant - guess you're truly encouraging people on this list to ask questions and suggest solutions/improvements ;-).

My suggestion still stands: add more text to the doc-string of some of the functions such that the user has a better understanding of what to expect: no validation, nil, exception, expected parameter type, ???. I'd be happy to suggest wording.

Another option that may help us to use valid identifiers in our code is to have a function like: (clojure.core/valid-name? a-str), which is maintained by core (and not by us all writing some regex based on the current specs on a web page). We can use it in our code and tests to ensure we're following the current specs. If the valid characters are extended in the next clojure version, so is the function, and we can automatically conform. I'd be happy to provide code.

"Sincerely", FrankS.

Phil Hagelberg

unread,
Mar 7, 2012, 7:27:08 PM3/7/12
to clo...@googlegroups.com
Frank Siebenlist <frank.si...@gmail.com> writes:

> My suggestion still stands: add more text to the doc-string of some of
> the functions such that the user has a better understanding of what to
> expect: no validation, nil, exception, expected parameter type, ???.
> I'd be happy to suggest wording.

Agreed. This is only one instance of a number of cases that the user
must hunt around on clojure.org for things that should be available in
docstrings, which is unreasonable for a number of reasons.

> Another option that may help us to use valid identifiers in our code
> is to have a function like: (clojure.core/valid-name? a-str), which is
> maintained by core (and not by us all writing some regex based on the
> current specs on a web page).

If you're just interested in what's round-trippable, that could be done
with this: (apply = ((juxt read-string symbol) "symbol-name"))

But something more formal wouldn't hurt either.

-Phil

Justin Steward

unread,
Mar 7, 2012, 6:55:05 PM3/7/12
to clo...@googlegroups.com
On Thu, Mar 8, 2012 at 9:00 AM, Meikel Brandmeyer <m...@kotka.de> wrote:
> However documenting everywhere that feeding garbage in might result in shit coming out has this “things in the mirror might be closer than they appear” taste. WTF? The mirror is not the problem. Nor is the car manufacturer to be held responsible. If the driver of a car causes an accident, it's her fault. *She* is responsible. No one else.

If the driver is unaware that what they see in the mirror is
distorted, the driver will assume it is not. A driver who has been
supplied incomplete information about the operation of their vehicle
can not reasonably be held fully responsible, which is why we have so
many warning labels.

Documentation for programming languages should take the same approach.
If any input other than what is specified will lead to undefined
results, that needs to be documented. Otherwise, someone will feed in
invalid input, get what looks like valid output, and assume that the
documentation is out of date, incomplete, or something equally inane,
and continue to misuse the feature. You can never underestimate the
power of ignorance, no matter a person's creed, intelligence,
lifestyle, or profession. If invalid input will not throw an error
immediately, then it DOES need to be documented that invalid input
will result in undefined output.

~Justin

Cedric Greevey

unread,
Mar 12, 2012, 12:35:24 PM3/12/12
to clo...@googlegroups.com

That's an apples-to-oranges comparison. Atoms and Refs are identities.
Nobody expects identities to round-trip and keep their semantics. On
the other hand, Keywords and Symbols are values. Everybody expects
values to round-trip and keep their semantics, so when that fails to
happen in some instance it violates least surprise.

Armando Blancas

unread,
Mar 12, 2012, 1:39:59 PM3/12/12
to clo...@googlegroups.com

If invalid input will not throw an error
immediately, then it DOES need to be documented that invalid input
will result in undefined output.

~Justin

Documented by whom? By you and FrankS? Maybe the push back is for lotta suggestin' but little doin'.

Sean Corfield

unread,
Mar 12, 2012, 7:56:20 PM3/12/12
to clo...@googlegroups.com

Surely undocumented behavior is undefined behavior by definition?
That's certainly the approach taken by many programming language
standards. In which case, giving any function invalid input is
immediately in undefined behavior territory and the output is
"guaranteed" to be undefined - unless explicitly documented to the
contrary (e.g., when given invalid input, this function shall throw an
exception).
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/

"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)

Frank Siebenlist

unread,
Mar 13, 2012, 1:11:54 AM3/13/12
to clo...@googlegroups.com, Frank Siebenlist
This is my last reply for this thread as the support for "improved" doc strings for symbol and keyword has been kind of underwhelming - which is fine - time to move on. (besides, now I personally know more than enough about the implementation of those functions to use them conformantly ;-) )

> Surely undocumented behavior is undefined behavior by definition?

True - but that kind of assumes that there is documented behavior… the one-liner for (symbol ns name) doesn't say anything about input parameter value types, and leaves the valid characters set upto the user knowing where to find it on the "http://clojure.org/reader" page. Note that I had to find the valid types for [name] and [ns name] by looking thru the clojure.core clj&java code.

> ...unless explicitly documented to the


> contrary (e.g., when given invalid input, this function shall throw an
> exception).


It took me about 3 minutes to scan thru the API list and testing at the repl to find alias, bases and bound? - all throwing exceptions while it is not mentioned in their docs. I'm sure you could have found many more in the same time before you wrote your reply - not sure what that tells you (?).

Also not sure what the big issue is to help the (novice) users out by being explicit in the doc-string about the contract and behaviour of a function. As far as I can tell, you have functions that validate and barf, validate and return some well-known value like nil for invalid input, and you have the garbage-in, garbage-out type. Unless you can guarantee that the first two types are always clearly identified in their docs, you can leave the last one open… however, why not mention it also explicitly? Especially for symbol, which is a pretty key entity for the clojure language. We have a one-liner, low on content for symbol/keyword and a page-long doc for deftype/defprotocol.

So one last time… instead of having the current docs:

clojure.core/symbol
([name] [ns name])


Returns a Symbol with the given namespace and name.

clojure.core/keyword
([name] [ns name])
Returns a Keyword with the given namespace and name. Do not use :
in the keyword strings, it will be added automatically.


My suggested docs are:


clojure.core/symbol
([name] [ns name])


Returns a Symbol with the given namespace and name.

(symbol name): name can be string or symbol.
(symbol ns name): ns and name must both be string.
A symbol string, begins with a non-numeric character

and can contain alphanumeric characters and *, +, !, -, _, and ?.

(see "http://clojure.org/reader" for details).
Note that function does not validate input strings for ns and name,
and may return improper symbols with undefined behavior for non-conformant ns and name.

clojure.core/keyword
([name] [ns name])
Returns a Keyword with the given namespace and name. Do not use :
in the keyword strings, it will be added automatically.
(keyword name): name can be string, symbol or keyword.
(keyword ns name): ns and name must both be string.
A keyword string, like a symbol, begins with a non-numeric

character and can contain alphanumeric characters and *, +, !, -, _, and ?.

(see "http://clojure.org/reader" for details).
Note that function does not validate input strings for ns and name,
and may return improper keywords with undefined behavior for non-conformant ns and name.


Take it or leave it.

Regards, FrankS.

</email-thread>

Andy Fingerhut

unread,
Mar 13, 2012, 2:46:58 AM3/13/12
to clo...@googlegroups.com, Frank Siebenlist

Meikel Brandmeyer (kotarak)

unread,
Mar 13, 2012, 2:53:48 AM3/13/12
to clo...@googlegroups.com, Frank Siebenlist
Hi,

Am Dienstag, 13. März 2012 07:46:58 UTC+1 schrieb Andy Fingerhut:

And right below is an example of invalid usage.

Sincerely
Meikel
 

Andy Fingerhut

unread,
Mar 13, 2012, 3:04:42 AM3/13/12
to clo...@googlegroups.com, Frank Siebenlist
Which one?

(symbol 'foo)

(symbol "foo")

(symbol "clojure.core" "foo")

I don't see it, but I'm probably having a senior moment.

clojuredocs.org are editable to anyone willing to create a free account, by the way.  I'm nobody special there.

Andy

Kevin Ilchmann Jørgensen

unread,
Mar 13, 2012, 3:07:04 AM3/13/12
to clo...@googlegroups.com
So
user=> (map symbol (rest (.split "1+2+3+4" "")))

can be deleted ?

/Kevin

Andy Fingerhut

unread,
Mar 13, 2012, 3:14:48 AM3/13/12
to Andy Fingerhut, clo...@googlegroups.com, Frank Siebenlist
Ah, my senior moment was not noticing the invalid example use of symbol in the second example, which was passing strings of decimal digits to symbol.  I went ahead and deleted that one.

Thanks,
Andy

Didier

unread,
Aug 8, 2018, 7:55:08 PM8/8/18
to Clojure
Reviving an old thread. I have a case where I convert ION to Clojure, and ION has a SYMBOL type, which can be any UTF-8 character, including spaces. I though of making them keywords in Clojure, since they serve the same purpose, to be used as identifiers. I can create such keyword with the keyword function, but they don't serialize to EDN and back using the default printer and reader.

I'm thinking of extending the printer like tbc++ says so keywords are printed as (keyword "string") instead. Does anyone believe there is something that's going to bite me later if I do this?

Andy Fingerhut

unread,
Aug 8, 2018, 8:37:11 PM8/8/18
to clo...@googlegroups.com
If you want to serialize the data to EDN and back, and print them out into the EDN file as (keyword "arbitrary-char-sequence"), then using the normal Clojure functions for reading the EDN data in will leave those expressions as the lists (keyword "arbitrary-char-sequence").  Of course you could write a simple function that walks the data structure looking for such lists and replacing them with the corresponding keywords.  However, that breaks round-tripability of the data if you ever have an occurrence of such a list in your original data before printing it to EDN.  If you believe, or can somehow ensure, that will never happen, seems workable to me.

Using a custom data-reader like #my.namespace/keyword "arbitrary-char-sequence" with a globally unique namespace that you own would be less susceptible to such aliasing problems.

Andy

---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Alex Miller

unread,
Aug 8, 2018, 9:15:47 PM8/8/18
to Clojure
Why are you fighting so hard to make keywords with spaces? If you need things with spaces, use strings.

Didier

unread,
Aug 9, 2018, 12:48:12 AM8/9/18
to Clojure
Thanks Andy, ya I actually realized this, I'm using a custom reader literal now instead.

> Why are you fighting so hard to make keywords with spaces? If you need things with spaces, use strings.

Why have keywords at all then? What does a space add that somehow negates the premise of keywords?

I see keywords over strings give you interning, namespaces, and the semantic that this value serves as an identifier. What's wrong with having that and a space as well?

Also, to be more specific to my use case. If I convert ION symbols to Clojure strings, I've lost information, and can no longer convert back to the same ION.

I need to do ION -> EDN -> ION in a lossless way. Piggybacking on keywords seemed easiest.

Erik Assum

unread,
Aug 9, 2018, 2:23:56 AM8/9/18
to clo...@googlegroups.com
I was wondering about the rationale for the unreadable keywords a while a ago. 

Weavejester pointed me to 
and

Alex Miller

unread,
Aug 9, 2018, 8:34:09 AM8/9/18
to Clojure
I don’t understand what ION -> EDN -> ION means?

James Reeves

unread,
Aug 9, 2018, 10:08:06 AM8/9/18
to clo...@googlegroups.com
If Clojure lacks a type that exactly matches ION's symbol type, why not add your own type with a record, then add a data reader for it.

For example: #ion/symbol "foo"

Didier

unread,
Aug 10, 2018, 12:27:35 AM8/10/18
to Clojure
Ion is a data serialization format from Amazon http://amzn.github.io/ion-docs/

One lf its supported data type is symbol, defined as: Interned, Unicode symbolic atoms (aka identifiers)

I interact with systems that interchange their data using ION. My systems use EDN though internally. Thus I need to convert from ION to EDN and back in a lossless manner.

If you give me ION with symbols in them, and I convert them to EDN, I need be able to go back to the exact same ION symbol when converting my EDN back to ION.

If I make symbols string, I lose the type info, and would convert it back to an ION string. This wouod break my clients.

I thought I could get away not having to build a custom representation for them in Clojure and EDN, and piggyback on keywords.

It now seems to me like I can still piggyback on keywords when in Clojure, since `keyword` and clojure.lang.Keyword appear to support full unicode.

But it seems EDN does not support full unicode serialization for them. So I've extended the edn serialization and deserialization so that it does.

I did so by overriding print-method for keywords so that when it prints in an unreadable way, it instead outputs a custom literal #myproject/keyword "namespace/name". And I've added a reader for this which uses keyword to get a Clojure keyword back.

Does any of this seem unsound to anyone?

Timothy Baldridge

unread,
Aug 10, 2018, 12:36:29 AM8/10/18
to clo...@googlegroups.com
Have you tried this with Transit instead of EDN? From what I understand by all this Transit shouldn't have problems with spaces in keywords/strings as it doesn't print them in the same way, it's more of a marshaling format than a printer/reader, and you get the big upside of Transit being *way* faster than EDN. 

Aside from that, I'd recommend just introducing a new type. It's easy to extend EDN, as you've mentioned, so why not just have a new type called IonSymbol that you use when talking to ION. Use defrecord, and you'll get equality semantics out of the box. 

--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


--
“One of the main causes of the fall of the Roman Empire was that–lacking zero–they had no way to indicate successful termination of their C programs.”
(Robert Firth)

Alex Miller

unread,
Aug 10, 2018, 12:41:51 AM8/10/18
to Clojure
Thanks, wasn't familiar (and this was clashing in my head with Datomic Ions). Sounds to me like you should make a new tagged literal edn type like #ion/symbol rather than trying to force this into edn symbols, which do not support the semantics you want. Printing as (keyword "foo bar") means you would require evaluation, which edn does not do on read and adds all sorts of constraints on how you could use this. By contrast, using a tagged literal is a read-time feature, fully round-trippable and not even necessarily requiring intermediaries to understand the tag.
Reply all
Reply to author
Forward
0 new messages