Keyword names and namespaces

206 views
Skip to first unread message

Rob Lachlan

unread,
Oct 18, 2010, 6:02:26 PM10/18/10
to Clojure
There seems to be a discrepancy between what keyword names are
supposed to be allowed, according to the reader documentation, and
which the reader actually allows. For instance, periods are supposed
to be disallowed in keyword names, and only one forward slash allowed,
but no errors are thrown at something like this:

{:f/o/o.o :bar}

The key :f/o/o.o is interpreted as a keyword with namespace f/o and
name o.o

Using the keyword function, we seem to be able to make keywords out of
any pair arbitrary strings, even including spaces. This might seem
pathological, but since keywords just evaluate to themselves there
doesn't seem to be great harm in allowing this kind of liberal
behavior. (Note also that keywords don't create a namespace, so we
don't have to worry about inadmissible namespaces for keywords.)

On the other hand, if this isn't to be allowed, then shouldn't the
keyword function throw an error when inadmissible strings are provided
for namespaces or names? I should point out that the symbol function
is also similarly permissive, and that seems like it might be more
worrisome. I would be in favor of keeping the behavior of the keyword
function as is, but possibly making the symbol function a bit
stricter.

Note, I'm using version 1.2. This all is motivated by a stackoverflow
discussion: http://stackoverflow.com/questions/3951761/what-are-the-allowed-characters-in-a-clojure-keyword/

Abhishek Reddy

unread,
Oct 18, 2010, 7:21:37 PM10/18/10
to clo...@googlegroups.com
Hi,

The reader (LispReader) and the interning functions (symbol and keyword) are separate.  The reader tries to enforce some constraints, but overlooks some edge cases, before eventually interning.  The interning functions do not validate input.

Besides the problems you raised, there are some confusing edge cases involving colons.  For example, there is no implicit way to produce a symbol of the name ":b", but you can get away with the qualified form (read-string "user/:b").  Similarly, (read-string ":user/:b") produces a keyword symbol of the name ":b".

Also, repeating colons are disallowed, and checked in the reader.  Presumably, this is to prevent reading something like "a/::b".  But then the rule could probably be relaxed to only check for colons at the start of the name.  (Incidentally, this would be useful for applying the reader to data -- as swank-clojure tries to -- from other lisps such as CL, where foo::bar is meaningful.)

Meanwhile, the interning functions do not check for any of this.  You can get away with (symbol "a::b") and so on.  I suspect it would take some more serious refactoring to get them to run the same checks as the reader, but I don't know if they are intentionally or accidentally more liberal in the first place.  Anyway, I would like to see at least the reader adopt a more complete and consistent validation routine too.

Cheers


--
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clo...@googlegroups.com
Note that posts from new members are moderated - please be patient with your first post.
To unsubscribe from this group, send email to
clojure+u...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en



--
Abhishek Reddy
http://abhishek.geek.nz


Phil Hagelberg

unread,
Oct 19, 2010, 12:24:15 AM10/19/10
to clo...@googlegroups.com
On Mon, Oct 18, 2010 at 3:02 PM, Rob Lachlan <robert...@gmail.com> wrote:
> There seems to be a discrepancy between what keyword names are
> supposed to be allowed, according to the reader documentation, and
> which the reader actually allows.  For instance, periods are supposed
> to be disallowed in keyword names, and only one forward slash allowed,
> but no errors are thrown at something like this:

I think the official stance is that there's a big difference between
what is officially supported and what you happen to be able to do in
the current version without things blowing up.

> Using the keyword function, we seem to be able to make keywords out of
> any pair arbitrary strings, even including spaces.

I submitted a patch for this over a year ago, but I gather there were
some concerns about the runtime cost of such behaviour. It's one of
the most long-standing tickets still open:

https://www.assembla.com/spaces/clojure/tickets/17-gc-issue-13-%09validate-in-(keyword-s)-and-(symbol-s)

-Phil

Rob Lachlan

unread,
Oct 19, 2010, 2:18:34 AM10/19/10
to Clojure
I see, thank you for linking to the ticket, Phil that really clarifies
things. I suppose that I would tend more to Chas Emerick's view in
his sept 28 comment (on the ticket), questioning whether there is a
need to validate Keywords (and possibly symbols) stringently. But
I'll take your point that we shouldn't count on the current behaviour
continuing.

Rob

On Oct 18, 9:24 pm, Phil Hagelberg <p...@hagelb.org> wrote:
> https://www.assembla.com/spaces/clojure/tickets/17-gc-issue-13-%09val...)
>
> -Phil

Alessio Stalla

unread,
Oct 19, 2010, 6:08:40 AM10/19/10
to Clojure
On Oct 19, 8:18 am, Rob Lachlan <robertlach...@gmail.com> wrote:
> I see, thank you for linking to the ticket, Phil that really clarifies
> things.  I suppose that I would tend more to Chas Emerick's view in
> his sept 28 comment (on the ticket), questioning whether there is a
> need to validate Keywords (and possibly symbols) stringently.  But
> I'll take your point that we shouldn't count on the current behaviour
> continuing.

FWIW, in Common Lisp no validation is done on symbol names; they can
be arbitrary strings. Strictly speaking, the reader doesn't validate
them, either, but in order to parse them it has to place some
restrictions, e.g. to disallow ambiguous strings like foo::bar:baz.
However, the CL reader allows one to quote characters in symbol names,
so you can effectively intern any string with it: for example, foo::|
abCDef gh:123::@&"| is a valid symbol in the FOO package.
Personally I don't see any value in restricting what can be interned;
symbols are not necessarily only used as keyword or variable names.
However, consistency between how a symbol is printed and how it is
read back in is important.
In Clojure, also, Java interop could be a problem in principle if
Clojure symbols are to be used as class, method or field names;
however, in such cases the problem can be solved locally, by
disallowing certain symbols when compiling to a Java class, or by
mangling them.

Cheers,
Alessio
Reply all
Reply to author
Forward
0 new messages