user=> (def k (keyword "jaja nee"))
#'user/k
user=> (str k)
":jaja nee"
user=> (name k)
"jaja nee"
user=> (keyword? k)
true
user=> (keyword? :jaja nee)
CompilerException java.lang.RuntimeException: Unable to resolve symbol: nee in this context, compiling:(NO_SOURCE_PATH:15)
user=> (keyword? :jaja\ nee)
RuntimeException Unsupported character: \ nee clojure.lang.Util.runtimeException (Util.java:156)
RuntimeException Unmatched delimiter: ) clojure.lang.Util.runtimeException (Util.java:156)
user=> (keyword? :jaja\\ nee)
CompilerException java.lang.RuntimeException: Unable to resolve symbol: nee in this context, compiling:(NO_SOURCE_PATH:17)
user=>
The Q is how do you escape such an embedded space such that the reader will interpret it as a keyword?
-FrankS.
user=> (pr-str k)
":jaja nee"
user=> (read-string (pr-str k))
:jaja
user=>
but unfortunately that doesn't help either.
-FrankS
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en
--
Moritz Ulrich
Using spaces in keywords is completely valid, as is using spaces in
symbols. You just have to be aware that there may be times when you
can't represent them in a literal form. pr-str could be extended to do
this though:
(pr-str :foo)
":foo"
(pr-str k) ; as in your example
"(keyword "jaja nee")"
There are many times when you may actually want to have spaces as
keywords. Consider the following json:
{"first name" : "John"
"last name" : "Doe"
"age" : 42}
If we convert this json to clojure, it would be nice to be able to do:
user=> (:age data)
42
And we can still do this:
user=> ((keyword "first name") data)
"John"
Why restrict the user needlessly?
Timothy
Your solution to enhance "pr":
(pr-str k) ; as in your example
"(keyword "jaja nee")"
should work.
Btw., symbols seem to have the exact same issue:
user=> (def s (symbol "with space"))
#'user/s
user=> s
with space
user=> (pr-str s)
"with space"
user=> (read-string (pr-str s))
with
user=>
Is this a bug?
-FrankS.
I don't think you're supposed to use spaces in keywords.
Using spaces in keywords is completely valid, as is using spaces in
symbols.
But this is the documentation for the reader...not necessarily for
symbols/keywords.
My argument is that clojure in no way validates the input to (symbol)
or (keyword) so why should we call this a bug? IMO, it's the same
problem with using Atoms and Refs with pr-str. If an Atom doesn't
round-trip through pr-str and the reader, is it somehow invalid? No,
it just means you shouldn't use it in cases where you need it to
round-trip. The same applies to keywords with unicode characters, or
symbols with spaces.
Timothy
although Stu doesn't quote the phrase following the allowed chars, which reads:
(other characters will be allowed eventually…)
which seems to keep the door open for allowing spaces in the future (?).
If space are not allowed, then it seems we have a bug in (keyword …) and (symbol …),
if they will be allowed in the future, then (pr …) could/should be enhanced.
-FrankS.
>> I don't think you're supposed to use spaces in keywords.
>
> Using spaces in keywords is completely valid, as is using spaces in
> symbols. You just have to be aware that there may be times when you
> can't represent them in a literal form. pr-str could be extended to do
> this though:
>
> (pr-str :foo)
> ":foo"
>
> (pr-str k) ; as in your example
> "(keyword "jaja nee")"
I thought I could trick the reader by writing :jaja\u0020nee, but it
won't work. Thereby, I discovered that clojure diverges quite a bit
from java in its unicode escapes handling. Whereas in java, you can
replace every character in the source code by its unicode sequence, in
clojure you cannot.
For example, this is all legal java:
"foo\u0022 // => "foo"
1 \u002b 2 // => 3
// class Foo { int x; }
\u0063\u006C\u0061\u0073\u0073\u0020\u0046\u006F\u006F\u0020\u007B\u0020\u0069\u006E\u0074\u0020\u0078\u003B\u0020\u007D
But the clojure reader doesn't accept "foo\u0022 as a valid string, and
neither is (\u002b 1 2) a valid call of clojure.core/+.
So basically, in java unicode escapes are replaced before parsing,
whereas in clojure any unicode escape evaluates to a character, e.g.,
(\u002b 1 2) is (\+ 1 2), not (+ 1 2).
Well, I think the clojure way is the saner one. And if you really need
to instantiate
(defrecord इ [x])
as (\u0907. 1), you can add your own `with-replaced-unicode' macro and
be done.
Bye,
Tassilo
On 2012-03-06 15:28 , "Frank Siebenlist" <frank.si...@gmail.com>
wrote:
>SoŠ spaces are not allowed in symbol and keyword identifiers according to
>the "spec"Š
>
>although Stu doesn't quote the phrase following the allowed chars, which
>reads:
>
>(other characters will be allowed eventuallyŠ)
>
>which seems to keep the door open for allowing spaces in the future (?).
>
>If space are not allowed, then it seems we have a bug in (keyword Š) and
>(symbol Š),
>if they will be allowed in the future, then (pr Š) could/should be
Am 06.03.2012 um 21:28 schrieb Frank Siebenlist:
> So… spaces are not allowed in symbol and keyword identifiers according to the "spec"…
>
> although Stu doesn't quote the phrase following the allowed chars, which reads:
>
> (other characters will be allowed eventually…)
>
> which seems to keep the door open for allowing spaces in the future (?).
>
> If space are not allowed, then it seems we have a bug in (keyword …) and (symbol …),
> if they will be allowed in the future, then (pr …) could/should be enhanced.
I think this has been discussed several times in the past.
This is *not* a bug in keyword or symbol. A string with spaces is simply not in the domain of these functions. So by definition it cannot be a bug. What happens is a pure decision by the implementer. The functions could throw an exception. Or they could return an invalid value. symbol and keyword do the latter. (Whether this is good or bad might be a matter of discussion, but it is not a bug.)
And of course vice versa: the behaviour of symbol or keyword is *not* the definition of valid keywords/symbols. That is what Stu cited.
I doubt that spaces will be officially allowed until there is – say – #|foo bar| for symbols and #:foo bar: for keywords.
Until then a translation of {"Foo Bar" 35} to {:foo-bar 35} is a quite feasible workaround.
BTW: Similarly you can use non-ASCII characters for symbols in your clojure code. This code would then be broken. The reader doesn't complain, but the “spec” says: only ASCII characters are allowed. Same situation as above.
BTW2: There are a lot of things which might be desirable for various reasons. Spaces in symbols or keywords might be one of these things. But that doesn't mean it's a good idea to implement it. In Clojure's past there were several things which were requested with a lot of pressure (often combined with “sky is falling” comparisons), which were not implemented in the language. Only after long considerations changes went into the language (often) providing a superior solution to the previously requested versions. Patience is a required skill of a clojurian. ;)
Sincerely
Meikel
> I think this has been discussed several times in the past.
>
> This is *not* a bug in keyword or symbol.
Here's the relevant issue:
http://dev.clojure.org/jira/browse/CLJ-17
Closed as "declined" in October, so I think it's safe to say the
"you're on your own"ness is at least somewhat intentional.
-Phil
I could still argue that there is a bug in the documentation that should spell this out - especially when it has been discussed several times in the past… it could save a lot of time, confusion and rehashing.
Maybe the keyword/symbol doc string should add something like: "Note that the identifier-string is not checked for correctness, which implies that function will return an invalid value for an invalid identifier.".
Personally I think it really helps if the doc-string specifies the contract, especially when it has been decided to leave it somewhat ambigeous on purpose.
Thanks, FrankS.
PS Guess we can see the consequences of silently returning invalid values: possibly creating legacy that depends on those, like the json examples.
> So basically, in java unicode escapes are replaced before parsing,
> whereas in clojure any unicode escape evaluates to a character, e.g.,
> (\u002b 1 2) is (\+ 1 2), not (+ 1 2).
>
> Well, I think the clojure way is the saner one. And if you really
> need to instantiate
>
> (defrecord इ [x])
>
> as (\u0907. 1), you can add your own `with-replaced-unicode' macro and
> be done.
Ups, that statement is wrong, because the reader cannot read symbols
with unicode escapes, and it wouldn't work for strings like "foo\u0022
anyway.
Bye,
Tassilo
Am 07.03.2012 um 01:11 schrieb Leif:
> Unfortunately, the reader does not actually follow this spec, e.g. it will happily accept :div#id.cla$$ as a valid keyword. Some web programming clojure[script] libraries use this pseudo-CSS syntax in keywords, so if the reader was changed to strictly follow these rules, a lot of web code would probably break.
It's the other way around: the reader *does* follow the spec. Feed it a valid sequence of characters and you will get a valid keyword. Feed it an invalid sequence of characters and you will get an invalid keyword. This is not in conflict with the spec. A bad decision to do so instead of bailing out? Maybe. Maybe not.
However documenting everywhere that feeding garbage in might result in shit coming out has this “things in the mirror might be closer than they appear” taste. WTF? The mirror is not the problem. Nor is the car manufacturer to be held responsible. If the driver of a car causes an accident, it's her fault. *She* is responsible. No one else.
The developers of said libraries obviously didn't live up to their responsibility. Either they did not research how the reader works and what valid keyword characters are. Then they did not do their job correctly. Or they knew how the reader works but knowingly decided to operate it outside defined limits. Then they basically gambled that things keep working and that the characters will be eventually allowed in future versions of the reader.
In this particular case I would expect the bet to pay off. But if suddenly the reader is changed to handle # differently, then all these libraries will break. And now guess who is responsible. The reader? No.
Don't pass the buck.
Sincerely
Meikel
PS: I hope Clojure's development mode is not switched to “fait accompli driven.”
The symbol function doc consists of only: "Returns a Symbol with the given namespace and name." Unless you suggest that we all look at the symbol code itself before we start using those parameters, it may help to add a little bit more text to guide the novice user of the interface such that one knows that this particular function doesn't throw exceptions or return nil, but returns invalid output for invalid input… for that suggestion I'm rewarded with a condescending "…WTF…" gender-neutral (?) rant - guess you're truly encouraging people on this list to ask questions and suggest solutions/improvements ;-).
My suggestion still stands: add more text to the doc-string of some of the functions such that the user has a better understanding of what to expect: no validation, nil, exception, expected parameter type, ???. I'd be happy to suggest wording.
Another option that may help us to use valid identifiers in our code is to have a function like: (clojure.core/valid-name? a-str), which is maintained by core (and not by us all writing some regex based on the current specs on a web page). We can use it in our code and tests to ensure we're following the current specs. If the valid characters are extended in the next clojure version, so is the function, and we can automatically conform. I'd be happy to provide code.
"Sincerely", FrankS.
> My suggestion still stands: add more text to the doc-string of some of
> the functions such that the user has a better understanding of what to
> expect: no validation, nil, exception, expected parameter type, ???.
> I'd be happy to suggest wording.
Agreed. This is only one instance of a number of cases that the user
must hunt around on clojure.org for things that should be available in
docstrings, which is unreasonable for a number of reasons.
> Another option that may help us to use valid identifiers in our code
> is to have a function like: (clojure.core/valid-name? a-str), which is
> maintained by core (and not by us all writing some regex based on the
> current specs on a web page).
If you're just interested in what's round-trippable, that could be done
with this: (apply = ((juxt read-string symbol) "symbol-name"))
But something more formal wouldn't hurt either.
-Phil
If the driver is unaware that what they see in the mirror is
distorted, the driver will assume it is not. A driver who has been
supplied incomplete information about the operation of their vehicle
can not reasonably be held fully responsible, which is why we have so
many warning labels.
Documentation for programming languages should take the same approach.
If any input other than what is specified will lead to undefined
results, that needs to be documented. Otherwise, someone will feed in
invalid input, get what looks like valid output, and assume that the
documentation is out of date, incomplete, or something equally inane,
and continue to misuse the feature. You can never underestimate the
power of ignorance, no matter a person's creed, intelligence,
lifestyle, or profession. If invalid input will not throw an error
immediately, then it DOES need to be documented that invalid input
will result in undefined output.
~Justin
That's an apples-to-oranges comparison. Atoms and Refs are identities.
Nobody expects identities to round-trip and keep their semantics. On
the other hand, Keywords and Symbols are values. Everybody expects
values to round-trip and keep their semantics, so when that fails to
happen in some instance it violates least surprise.
If invalid input will not throw an error
immediately, then it DOES need to be documented that invalid input
will result in undefined output.~Justin
Surely undocumented behavior is undefined behavior by definition?
That's certainly the approach taken by many programming language
standards. In which case, giving any function invalid input is
immediately in undefined behavior territory and the output is
"guaranteed" to be undefined - unless explicitly documented to the
contrary (e.g., when given invalid input, this function shall throw an
exception).
--
Sean A Corfield -- (904) 302-SEAN
An Architect's View -- http://corfield.org/
World Singles, LLC. -- http://worldsingles.com/
"Perfection is the enemy of the good."
-- Gustave Flaubert, French realist novelist (1821-1880)
> Surely undocumented behavior is undefined behavior by definition?
True - but that kind of assumes that there is documented behavior… the one-liner for (symbol ns name) doesn't say anything about input parameter value types, and leaves the valid characters set upto the user knowing where to find it on the "http://clojure.org/reader" page. Note that I had to find the valid types for [name] and [ns name] by looking thru the clojure.core clj&java code.
> ...unless explicitly documented to the
> contrary (e.g., when given invalid input, this function shall throw an
> exception).
It took me about 3 minutes to scan thru the API list and testing at the repl to find alias, bases and bound? - all throwing exceptions while it is not mentioned in their docs. I'm sure you could have found many more in the same time before you wrote your reply - not sure what that tells you (?).
Also not sure what the big issue is to help the (novice) users out by being explicit in the doc-string about the contract and behaviour of a function. As far as I can tell, you have functions that validate and barf, validate and return some well-known value like nil for invalid input, and you have the garbage-in, garbage-out type. Unless you can guarantee that the first two types are always clearly identified in their docs, you can leave the last one open… however, why not mention it also explicitly? Especially for symbol, which is a pretty key entity for the clojure language. We have a one-liner, low on content for symbol/keyword and a page-long doc for deftype/defprotocol.
So one last time… instead of having the current docs:
clojure.core/symbol
([name] [ns name])
Returns a Symbol with the given namespace and name.
clojure.core/keyword
([name] [ns name])
Returns a Keyword with the given namespace and name. Do not use :
in the keyword strings, it will be added automatically.
My suggested docs are:
clojure.core/symbol
([name] [ns name])
Returns a Symbol with the given namespace and name.
(symbol name): name can be string or symbol.
(symbol ns name): ns and name must both be string.
A symbol string, begins with a non-numeric character
and can contain alphanumeric characters and *, +, !, -, _, and ?.
(see "http://clojure.org/reader" for details).
Note that function does not validate input strings for ns and name,
and may return improper symbols with undefined behavior for non-conformant ns and name.
clojure.core/keyword
([name] [ns name])
Returns a Keyword with the given namespace and name. Do not use :
in the keyword strings, it will be added automatically.
(keyword name): name can be string, symbol or keyword.
(keyword ns name): ns and name must both be string.
A keyword string, like a symbol, begins with a non-numeric
character and can contain alphanumeric characters and *, +, !, -, _, and ?.
(see "http://clojure.org/reader" for details).
Note that function does not validate input strings for ns and name,
and may return improper keywords with undefined behavior for non-conformant ns and name.
Take it or leave it.
Regards, FrankS.
</email-thread>
can be deleted ?
/Kevin
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
> Why are you fighting so hard to make keywords with spaces? If you need things with spaces, use strings.
Why have keywords at all then? What does a space add that somehow negates the premise of keywords?
I see keywords over strings give you interning, namespaces, and the semantic that this value serves as an identifier. What's wrong with having that and a space as well?
Also, to be more specific to my use case. If I convert ION symbols to Clojure strings, I've lost information, and can no longer convert back to the same ION.
I need to do ION -> EDN -> ION in a lossless way. Piggybacking on keywords seemed easiest.
One lf its supported data type is symbol, defined as: Interned, Unicode symbolic atoms (aka identifiers)
I interact with systems that interchange their data using ION. My systems use EDN though internally. Thus I need to convert from ION to EDN and back in a lossless manner.
If you give me ION with symbols in them, and I convert them to EDN, I need be able to go back to the exact same ION symbol when converting my EDN back to ION.
If I make symbols string, I lose the type info, and would convert it back to an ION string. This wouod break my clients.
I thought I could get away not having to build a custom representation for them in Clojure and EDN, and piggyback on keywords.
It now seems to me like I can still piggyback on keywords when in Clojure, since `keyword` and clojure.lang.Keyword appear to support full unicode.
But it seems EDN does not support full unicode serialization for them. So I've extended the edn serialization and deserialization so that it does.
I did so by overriding print-method for keywords so that when it prints in an unreadable way, it instead outputs a custom literal #myproject/keyword "namespace/name". And I've added a reader for this which uses keyword to get a Clojure keyword back.
Does any of this seem unsound to anyone?
--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
---
You received this message because you are subscribed to the Google Groups "Clojure" group.
To unsubscribe from this group and stop receiving emails from it, send an email to clojure+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.