clojurescript: var names with "-" and "_" are rendered to the same internal name (?)

761 Aufrufe
Direkt zur ersten ungelesenen Nachricht

Frank Siebenlist

ungelesen,
24.09.2012, 01:00:0224.09.12
an Clojure, Frank Siebenlist
The following cljs-repl session shows the issue:

--------

ClojureScript:cljs.user> (def my-var "YES")
"YES"
ClojureScript:cljs.user> my-var
"YES"
ClojureScript:cljs.user> (def my_var "NO")
"NO"
ClojureScript:cljs.user> my_var
"NO"
ClojureScript:cljs.user> my-var
"NO"
ClojureScript:cljs.user> (set! my-var "MAYBE")
"MAYBE"
ClojureScript:cljs.user> my_var
"MAYBE"
ClojureScript:cljs.user>

--------

The official clojure spec allows both "-" and "_" in the symbol names (http://clojure.org/reader), and the clojurescript docs don't mention anything about this as far as I could tell.

Not sure if this has been reported before - I searched JIRA and the mailing list, but couldn't find anything, but it's difficult to search for "_"…

Bug?
Mapping limitation?

-FrankS.

Raju Bitter

ungelesen,
24.09.2012, 04:45:1124.09.12
an clo...@googlegroups.com, Frank Siebenlist
Identifiers in JavaScript cannot contain a hyphen/minus character:
https://developer.mozilla.org/en-US/docs/JavaScript/Guide/Values,_variables,_and_literals
> A JavaScript identifier must start with a letter, underscore (_), or dollar sign ($);
> subsequent characters can also be digits (0-9). Because JavaScript is case
> sensitive, letters include the characters "A" through "Z" (uppercase) and the
> characters "a" through "z" (lowercase).
> Starting with JavaScript 1.5, you can use ISO 8859-1 or Unicode letters such
> as å and ü in identifiers. You can also use the \uXXXX Unicode escape
> sequences as characters in identifiers."

ClojureScript maps all hyphens in identifiers to underscores.

This has been reported as a bug for protocols before:
http://dev.clojure.org/jira/browse/CLJS-336

- Raju

Herwig Hochleitner

ungelesen,
24.09.2012, 09:53:1724.09.12
an clo...@googlegroups.com
Not sure if this has been reported before - I searched JIRA and the mailing list, but couldn't find anything, but it's difficult to search for "_"…

Bug?
Mapping limitation?

This behavior comes from cljs.compiler/munge, I'd say it's a mapping limitation that should be considered a bug.
A possible fix would be to replace _ with _UNDERSCORE_ in munge.

Notice that clojure.core/munge has the same limitation and should probably fixed aswell:

(defn foo-bar []
  "DASH")

(defn foo_bar []
  "UNDERSCORE")

(defn -main []
  (println "Dash version: " (foo-bar))
  (println "Underscore version: " (foo_bar)))


When AOT compiling this example and running it, it prints:

Dash version:  UNDERSCORE
Underscore version:  UNDERSCORE

as opposed to the expected

Dash version:  DASH
Underscore version:  UNDERSCORE

when running from source.

Frank Siebenlist

ungelesen,
24.09.2012, 13:01:0524.09.12
an rajub...@gmail.com, Frank Siebenlist, clo...@googlegroups.com
That CLJS-336 feels like a different issue that doesn't map to what I'm seeing...

Frank Siebenlist

ungelesen,
24.09.2012, 13:15:5924.09.12
an clo...@googlegroups.com, Frank Siebenlist
Thanks for digging.

The mapping of "-" to "_" comes indeed from clojure.lang.Compile/munge which is called by cljsh.compiler/munge:

--------------------
user=> (#'cljs.compiler/munge "-")
"_"
user => (clojure.lang.Compiler/munge "-")
"_"
user => (clojure.lang.Compiler/munge "_")
"_"
user =>
--------------------

Looking at the java-source, the reason can be found in the CHAR_MAP mapping table in Compiler.java:

--------------------
static final public IPersistentMap CHAR_MAP =
PersistentHashMap.create('-', "_",
// '.', "_DOT_",
':', "_COLON_",
'+', "_PLUS_",
'>', "_GT_",
'<', "_LT_",
'=', "_EQ_",
'~', "_TILDE_",
'!', "_BANG_",
'@', "_CIRCA_",
'#', "_SHARP_",
'\'', "_SINGLEQUOTE_",
'"', "_DOUBLEQUOTE_",
'%', "_PERCENT_",
'^', "_CARET_",
'&', "_AMPERSAND_",
'*', "_STAR_",
'|', "_BAR_",
'{', "_LBRACE_",
'}', "_RBRACE_",
'[', "_LBRACK_",
']', "_RBRACK_",
'/', "_SLASH_",
'\\', "_BSLASH_",
'?', "_QMARK_");

static public String munge(String name){
StringBuilder sb = new StringBuilder();
for(char c : name.toCharArray())
{
String sub = (String) CHAR_MAP.valAt(c);
if(sub != null)
sb.append(sub);
else
sb.append(c);
}
return sb.toString();
}
--------------------

Note that it's in the first entry of CHAR_MAP - a little obscured by the (original) formatting.

What's puzzling is that all mappings use a convention to map the char to a _WORD_ , except the "-" which gets mapped to "_" directly.

What's further puzzling is that in the clj-repl this mapping deosn't seemed to be used:

--------------------
user=> (def my-var "YES")
#'user/my-var
user => (def my_var "NO")
#'user/my_var
user => my-var
"YES"
swimtimer=>
--------------------

Guess the AOT compiler uses it but the REPL-one doesn't (???).

Confused - FrankS.
> --
> You received this message because you are subscribed to the Google
> Groups "Clojure" group.
> To post to this group, send email to clo...@googlegroups.com
> Note that posts from new members are moderated - please be patient with your first post.
> To unsubscribe from this group, send email to
> clojure+u...@googlegroups.com
> For more options, visit this group at
> http://groups.google.com/group/clojure?hl=en

Herwig Hochleitner

ungelesen,
24.09.2012, 18:24:4124.09.12
an clo...@googlegroups.com
2012/9/24 Frank Siebenlist <frank.si...@gmail.com> 
Guess the AOT compiler uses it but the REPL-one doesn't (???).

The reason for this is two-fold:

1) Vars in Clojure are not suffering from that issue, since they are reified. That means, that their field in the java class is a generated name and they are interned by their string name. So a def works one way or the other.

2) With the defn, the *function* classes are named after their namespace var. So (type foo-bar) => core$foo_bar. When evaluating a defn from the repl, the fn class is emitted, loaded into the class loader and an instance of it is assigned to the var. When aot compiling, however, the fn classes are written as .class files and instantiated (and assigned to their vars) in the namespace constructor.

When you aot compile my example and look at the classes with jd, you see that there is just one fn class file core$foo_bar.class. That means that the former one got overwritten. 

In core__init.class you see that two vars are created with their correct names, but both are assigned an instance of the same class.
Allen antworten
Antwort an Autor
Weiterleiten
0 neue Nachrichten