Re: UTF-8 behavior ClojureScript (vs. Clojure)

477 views
Skip to first unread message

Andy Fingerhut

unread,
Oct 18, 2012, 8:16:47 AM10/18/12
to clo...@googlegroups.com
Hopefully someone else can answer why there is a difference in the output of the str function.  I suspect in ClojureScript's case, it is simply the default behavior to use \x and two hex digits to display a character in a string with a code point in the range 128 through 255, inherited from JavaScript, whereas in Clojure/JVM it uses the currently specified character set encoding of the underlying JVM.

As far as Clojure/JVM being able to read strings encoded in this way, there is an enhancement request ticket CLJ-1025 open, and there is recent discussion on the Clojure Dev group about whether this enhancement should be included in the yet-to-be-released Clojure 1.5:


Andy

On Oct 18, 2012, at 4:12 AM, Henrik Mohr wrote:

Hi there!

I'm wondering why ClojureScript seems to handle international characters differently from Clojure.

Simple example in Clojure (= my preferred behaviour):
user=> (str "ø")
"ø"

The same example in ClojureScript:
ClojureScript:cljs.user>   #_=> (str 'ø')
"\xF8'"

Can anyone explain to me why ClojureScript behaves like that?

I need to send strings from ClojureScript to a remote service, so I need the output from ClojureScript to be straight UTF-8 encoded strings.

Because when the (Clojure based) remote service receives the string from ClojureScript it doesn't decode it correctly with read-string:
Exception: java.lang.RuntimeException: Unsupported escape character: \x

Anyone?

Thanks.

Best regards,
Henrik

Henrik Mohr

unread,
Oct 19, 2012, 4:52:41 AM10/19/12
to clo...@googlegroups.com
Still I hope someone can answer the question on why ClojureScript behaves differently from Clojure.

Output from Clojure:
user=> (str "ø")
"ø"

Output from ClojureScript:
  #_=> (str "ø")
"\xF8"

Output from node.js:
> console.log ("ø");
ø

Output from Chrome Console:
console.log ("ø")
ø

Anyone from Clojure Core that can comment on this?

Thanks in advance.

BRgds,
Henrik

David Nolen

unread,
Oct 19, 2012, 11:25:30 AM10/19/12
to clo...@googlegroups.com
On Fri, Oct 19, 2012 at 4:52 AM, Henrik Mohr <lupo...@gmail.com> wrote:
Still I hope someone can answer the question on why ClojureScript behaves differently from Clojure.

Output from Clojure:
user=> (str "ø")
"ø"

Output from ClojureScript:
  #_=> (str "ø")
"\xF8"

Output from node.js:
> console.log ("ø");
ø

Output from Chrome Console:
console.log ("ø")
ø

Anyone from Clojure Core that can comment on this?

Thanks in advance.

BRgds,
Henrik

I believe this may be due to the logic in compiler.clj on lines 70-84. Perhaps the condition on line 82 should be a bit broader, (< 31 cp 256) instead of (< 31 cp 127) ?

I'm not sure ... if somebody else could chime in on that logic that would help.

David 

Chas Emerick

unread,
Oct 19, 2012, 12:57:17 PM10/19/12
to clo...@googlegroups.com
It's simpler than that; cljs is just using an inappropriate string quoting mechanism.

See http://groups.google.com/group/clojure-dev/msg/f679b8759b3fc54f and the linked issue and proposed patch.  (…which I've not filed with an issue yet because I think we should first come to some closure on whether it or the originally-proposed enhancement to JVM Clojure in CLJ-1025 is a better path.)

- Chas


Dave Sann

unread,
Oct 19, 2012, 6:18:06 PM10/19/12
to clo...@googlegroups.com
Chas, If your patch works without issue - this is probably better because it will then work with existing versions of Clojure - clojurescript is changing faster and people a probably upgrading faster. 

I don't think it does any harm for Clojure to be able to read these chars but fixing the interchange the the real issue (for me).

Stu closed 1025 already.

Dave

Chas Emerick

unread,
Oct 19, 2012, 8:55:16 PM10/19/12
to Clojure
I've filed a CLJS issue for this, and attached a patch:

http://dev.clojure.org/jira/browse/CLJS-400

Thanks for keeping on this, Dave. :-)

- Chas

Dave Sann

unread,
Oct 23, 2012, 4:55:44 AM10/23/12
to clo...@googlegroups.com
I notice this is fixed on clojurescript master. 

thanks guys. 

I can now delete my special edition clojure :)
Reply all
Reply to author
Forward
0 new messages