UTF-8

108 views
Skip to first unread message

David Powell

unread,
Apr 29, 2009, 7:30:35 AM4/29/09
to clo...@googlegroups.com

There seems to be two separate issues regarding character sets in Clojure:

1) What encoding should clojure use to read/write to *in*, *out*, and *err*?

2) What encoding should clojure's use to load .clj files.


For 1) Clojure currently uses UTF-8 - this is hardcoded in the constants for those vars in RT.java.
I suppose there is no guarantee what encoding stdout is expecting, but the platform default encoding seems
a better bet. It might not always be correct - eg people might pipe the output of a Clojure script to a
file, or they might be using a console that uses a non-default encoding.

I notice that some tools, eg cmd.exe, allow the caller to specify the encoding of stdin/out via a command-
line flag.

In Clojure, I suppose, if an application really does want to write unicode to stdout, then it is just a
matter of rebinding *out* first to wrap System.out with a reader with the prefered encoding.


For 2) I think that it is good for Clojure to pick a fixed encoding for loading clojure files. It is much
nicer to create portable code, rather than have .clj files that fail to load on a server because the
server happens to have a different platform default encoding.

So, I'd like to see the constants in RT.java changed to not specify UTF-8, but for the encoding used by
the compiler to continue to specify UTF-8.

Anyone have any opinions?

--
Dave

Stephen C. Gilardi

unread,
Apr 29, 2009, 8:52:19 AM4/29/09
to clo...@googlegroups.com

On Apr 29, 2009, at 7:30 AM, David Powell wrote:

> So, I'd like to see the constants in RT.java changed to not specify
> UTF-8, but for the encoding used by
> the compiler to continue to specify UTF-8.
>
> Anyone have any opinions?

I think your explanation, reasoning, and conclusions are all exactly
correct and Clojure should change as you described.

--Steve

Toralf

unread,
Apr 29, 2009, 9:13:30 AM4/29/09
to Clojure
On Apr 29, 1:30 pm, David Powell <djpow...@djpowell.net> wrote:
> So, I'd like to see the constants in RT.java changed to not specify UTF-8, but for the encoding used by
> the compiler to continue to specify UTF-8.

+1

Laurent PETIT

unread,
Apr 29, 2009, 9:25:04 AM4/29/09
to clo...@googlegroups.com
Yes, not fixing it before 1.0 would be the kind of thing that people
willing to make FUD on clojure would exploit really easily with great
success:

"clojure does not even handle encoding problems well"

:'(

2009/4/29 Toralf <toralf....@gmail.com>:

Rich Hickey

unread,
Apr 29, 2009, 12:54:55 PM4/29/09
to Clojure


On Apr 29, 9:25 am, Laurent PETIT <laurent.pe...@gmail.com> wrote:
> Yes, not fixing it before 1.0 would be the kind of thing that people
> willing to make FUD on clojure would exploit really easily with great
> success:
>
> "clojure does not even handle encoding problems well"
>

Ok - patch welcome ASAP (not singling out you Laurent :)

Thanks,

Rich

>
> 2009/4/29 Toralf <toralf.witt...@gmail.com>:

Stephen C. Gilardi

unread,
Apr 29, 2009, 1:24:07 PM4/29/09
to clo...@googlegroups.com

On Apr 29, 2009, at 12:54 PM, Rich Hickey wrote:

> Ok - patch welcome ASAP (not singling out you Laurent :)


I've entered an issue and provided a patch:

http://code.google.com/p/clojure/issues/detail?id=112

Thanks,

--Steve

Rich Hickey

unread,
Apr 29, 2009, 2:37:23 PM4/29/09
to Clojure
Patch applied - rev 1360 - thanks!

Rich

Perry Trolard

unread,
Apr 29, 2009, 4:23:33 PM4/29/09
to Clojure
For those on OS X -- who probably don't want the default MacRoman --
or anyone else who wants to override the system default, I wanted to
point out that setting the "file.encoding" system property when
invoking java (e.g. `java -Dfile.encoding=UTF-8 ... clojure.main`)
will set the encoding for in, out, & err. If you want to override
globally, this is easier than re-binding the vars.

Perry

Laurent PETIT

unread,
Apr 29, 2009, 4:53:33 PM4/29/09
to clo...@googlegroups.com
2009/4/29 Stephen C. Gilardi <sque...@mac.com>:

That was quick ! :-)

Reply all
Reply to author
Forward
0 new messages