z/OS and codepage issues?

242 views
Skip to first unread message

David Andrews

unread,
Mar 10, 2009, 3:40:31 PM3/10/09
to Clojure
I'm having difficulty running Clojure under z/OS. If I download a
current Clojure onto my Gentoo Linux system and build it so:
svn checkout http://clojure.googlecode.com/svn/trunk/ clojure-read-
only
cd clojure-read-only
ant
Then I can treat myself to a delicious REPL:
~/clojure-read-only $ java -cp clojure.jar clojure.lang.Repl
Clojure
user=> (+ 1 1)
2

So I move this write-once-read-anywhere bundle to z/OS: I tar the
clojure-read-only directory, binary-ftp it to my z/OS system, ssh in
to *that* and untar the archive, ending up with an identical clojure-
read-only directory on the z/OS filesystem.

But now I have serious code page issues. The best I can get out of
Clojure here is:
$ java -Dfile.encoding=ISO8859-1 -Dconsole.encoding=IBM-1047 -cp
clojure.jar clojure.lang.Repl
Clojure
user=> (+ 1 1)
java.lang.Exception: Unable to resolve symbol: MN in this context
(NO_SOURCE_FILE:0)
java.lang.Exception: Unable to resolve symbol: ��� in this context
(NO_SOURCE_FILE:0)

One person suggested to me that "The code is probably doing something
odd with how it reads input, making bad assumptions about the codepage
being "ASCII". Running with ISO8859-1 as the file encoding usually
fixes this, but apparently not in this case." I'm well out of my
depth here, and wonder if anyone has a suggestion on how I can
proceed?

David Andrews

unread,
Mar 10, 2009, 1:03:24 PM3/10/09
to Clojure

Rich Hickey

unread,
Mar 26, 2009, 12:03:50 PM3/26/09
to Clojure


On Mar 10, 1:03 pm, David Andrews <dammi...@gmail.com> wrote:
> I'm having difficulty running Clojure underz/OS. If I download a
> current Clojure onto my Gentoo Linux system and build it so:
> svn checkouthttp://clojure.googlecode.com/svn/trunk/clojure-read-
> only
> cd clojure-read-only
> ant
> Then I can treat myself to a delicious REPL:
> ~/clojure-read-only $ java -cp clojure.jar clojure.lang.Repl
> Clojure
> user=> (+ 1 1)
> 2
>
> So I move this write-once-read-anywhere bundle toz/OS: I tar the
> clojure-read-only directory, binary-ftp it to myz/OSsystem, ssh in
> to *that* and untar the archive, ending up with an identical clojure-
> read-only directory on thez/OSfilesystem.
>
> But now I have serious code page issues. The best I can get out of
> Clojure here is:
> $ java -Dfile.encoding=ISO8859-1 -Dconsole.encoding=IBM-1047 -cp
> clojure.jar clojure.lang.Repl
> Clojure
> user=> (+ 1 1)
> java.lang.Exception: Unable to resolve symbol: MN in this context
> (NO_SOURCE_FILE:0)
> java.lang.Exception: Unable to resolve symbol: in this context
> (NO_SOURCE_FILE:0)
>
> One person suggested to me that "The code is probably doing something
> odd with how it reads input, making bad assumptions about the codepage
> being "ASCII". Running with ISO8859-1 as the file encoding usually
> fixes this, but apparently not in this case." I'm well out of my
> depth here, and wonder if anyone has a suggestion on how I can
> proceed?


Can anyone help out with this issue?

Rich

Stephen C. Gilardi

unread,
Mar 26, 2009, 12:32:21 PM3/26/09
to clo...@googlegroups.com

On Mar 10, 2009, at 1:03 PM, David Andrews wrote:

> One person suggested to me that "The code is probably doing something
> odd with how it reads input, making bad assumptions about the codepage
> being "ASCII". Running with ISO8859-1 as the file encoding usually
> fixes this, but apparently not in this case." I'm well out of my
> depth here, and wonder if anyone has a suggestion on how I can
> proceed?

Clojure's default stream for input (including REPL input) is set up to
expect the input encoding to be UTF-8. Since all the characters in "(+
1 1)" are 7-bit ASCII Characters and since all those characters are
encoded identically in ASCII and UTF-8, Clojure is, effectively,
expecting ASCII input in this case.

There was a request on the mailing list a month or two ago for Clojure
not to specify UTF-8 as the encoding for its input stream which the
poster suggested would allow it to accept the platform default input
encoding. Perhaps that change would help here.

Note that Compiler/loadFile explicitly specifies UTF-8 as the encoding
for Clojure source files as well.

The input stream encoding is set on line 175 of src/jvm/clojure/lang/
RT.java . Removing ", UTF8" from the InputStreamReader constructor
would cause Clojure to use the platform's default character set rather
than expect UTF8 or its ASCII subset.

Var.intern(CLOJURE_NS, Symbol.create("*in*"),
- new LineNumberingPushbackReader(new
InputStreamReader(System.in, UTF8)));
+ new LineNumberingPushbackReader(new
InputStreamReader(System.in)));
final static public Var ERR =


--Steve

David Andrews

unread,
Mar 26, 2009, 2:44:43 PM3/26/09
to Clojure
Great catch, Steve. Thanks! My z/OS system now tells me:

$ java -Dfile.encoding=ISO8859-1 -Dconsole.encoding=IBM-1047 -cp
clojure.jar clojure.lang.Repl
Clojure
user=> (+ 1 1)
2

Stephen C. Gilardi

unread,
Mar 26, 2009, 4:00:48 PM3/26/09
to clo...@googlegroups.com

You're quite welcome, David. I gather (via Google search) that you've
been interested in this since November, I'm sorry I wasn't aware of it
earlier.

With your modified Clojure, you may be able to get rid of the "-D"
command line args as well.

I see in the mailing list history that the UTF-8 encoding for the
default input and output streams was from a patch that was primarily
intended to institute a standard for Clojure's file input format as
UTF-8. Even at the time, the requester (Chas Emerick) was unsure of
the advisability of a change away from Java's default encoding for
Clojure's default input and output streams. The patch included both
those changes, so that's the current behavior of Clojure.

I think as an interoperability enhancer, having Clojure source files
be required to be UTF-8 encoded is a good idea. In contrast, I think
this z/OS experience shows that doing so for the default input and
output streams (and therefore for the REPL) may do more harm than
good. Presumably our JVM host is providing default encodings for each
platform for a good reason and we should (by default) honor that.

Right now I think this should be an issue against Clojure and would
welcome some discussion about it. My inclination currently is that we
should keep the UTF-8 standard for encoding Clojure source files, but
change the default input and output streams (and therefore the default
REPL) back to using the platform's default encoding(s).

There is more discussion in this thread:

http://groups.google.com/group/clojure/browse_thread/thread/39ba33d15e7633e0/191ee8b83f815189

--Steve

Laurent PETIT

unread,
Mar 26, 2009, 4:27:20 PM3/26/09
to clo...@googlegroups.com
Hello Stephen,

Well, it seems that javac (java compilation tool) accepts this option :

-encoding encoding
Set the source file encoding name, such as EUCJIS/SJIS. If -encoding is not specified, the platform default converter is used.
So maybe something along those lines for everything that is loaded from streams could be interesting ?

Also, I'm guessing that someone programming with encoding XX in his/her source files, will also want to use (by default) the same encoding XX in the REPL, if he want a great experience of copy/paste between the two ?

My 0,02€,

--
Laurent


2009/3/26 Stephen C. Gilardi <sque...@mac.com>
Reply all
Reply to author
Forward
0 new messages