Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.
Dismiss

"Platform default encoding"

217 views
Skip to first unread message

Tim McDaniel

unread,
Feb 28, 2011, 1:01:12 PM2/28/11
to
I'm not sure which comp.lang.java.* newsgroup is appropriate.

javac documentation at (inter alia)
http://download.oracle.com/javase/6/docs/technotes/tools/windows/javac.html

javac [ options ] [ sourcefiles ] [ classes ] [ @argfiles ]

...

-encoding encoding

Set the source file encoding name, such as EUC-JP and
UTF-8. If -encoding is not specified, the platform default
converter is used.

This is frustrating, because
- it doesn't say what the valid choices are.
- it doesn't tell how to actually *determine* the "platform default
converter"! There might be a way to invoke Sun's javac on such a
machine to find out, but even better, I'd like to have a URL to a
page provided by Sun (or whoever) listing them.

--
Tim McDaniel, tm...@panix.com

Joshua Cranmer

unread,
Feb 28, 2011, 1:21:20 PM2/28/11
to
On 02/28/2011 01:01 PM, Tim McDaniel wrote:
> This is frustrating, because
> - it doesn't say what the valid choices are.
> - it doesn't tell how to actually *determine* the "platform default
> converter"! There might be a way to invoke Sun's javac on such a
> machine to find out, but even better, I'd like to have a URL to a
> page provided by Sun (or whoever) listing them.

If you pay attention to the link of charset information on Java APIs,
they all boil down to one page:
<http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html>.

The valid choices are determined by what your current JRE supports
(javac is actually implemented in Java). Determining the default
character set is easily done with a simple Java program (left as an
exercise for the reader); the short answer is that it is probably the
appropriate CP-* charset for your locale on Windows (e.g., Cp-1252, more
or less equivalent to ISO 8859-1), and probably UTF-8 on any other
platform you develop.

--
Beware of bugs in the above code; I have only proved it correct, not
tried it. -- Donald E. Knuth

markspace

unread,
Feb 28, 2011, 2:49:13 PM2/28/11
to
On 2/28/2011 10:01 AM, Tim McDaniel wrote:
> I'm not sure which comp.lang.java.* newsgroup is appropriate.


Just FYI, clj.programmer, .help and .gui are pretty much read by all the
same folks. Post in one, and you'll get the same answers, so it really
doesn't matter.

Just please don't multi-post, it really breaks up the discussion because
we do all read those groups. Also, posting with a follow-up set to a
different group causes the same discussion foul-up, so that's to be
eschewed also.

Just pick any one group, post, and it'll be good.

Lew

unread,
Feb 28, 2011, 5:09:36 PM2/28/11
to
On Feb 28, 1:01 pm, t...@panix.com (Tim McDaniel) wrote:
> I'm not sure which comp.lang.java.* newsgroup is appropriate.
>
> javac documentation at (inter alia)http://download.oracle.com/javase/6/docs/technotes/tools/windows/java...

>
>     javac [ options ] [ sourcefiles ] [ classes ] [ @argfiles ]
>
>     ...
>
>     -encoding encoding
>
>         Set the source file encoding name, such as EUC-JP and
>         UTF-8. If -encoding is not specified, the platform default
>         converter is used.
>
> This is frustrating, because
> - it doesn't say what the valid choices are.
> - it doesn't tell how to actually *determine* the "platform default
>   converter"!  There might be a way to invoke Sun's [sic] javac on such a

>   machine to find out, but even better, I'd like to have a URL to a
>   page provided by Sun (or whoever) listing them.
>

Your platform documentation will tell you what encoding it uses. No
way Oracle can know that for you.

As for legal encodings, those are universal:
http://ietf.org/rfc/rfc2278.txt

You can find those by reading the Javadocs pertaining to encoding,
e.g.,
http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html
(This class is linked from java.lang.String and
java.io.InputStreamReader)

Note that the Javadocs there list:

> Charset Description
>
> US-ASCII Seven-bit ASCII, a.k.a. ISO646-US, a.k.a. the Basic Latin block of the Unicode character set
> ISO-8859-1 ISO Latin Alphabet No. 1, a.k.a. ISO-LATIN-1
> UTF-8 Eight-bit UCS Transformation Format
> UTF-16BE Sixteen-bit UCS Transformation Format, big-endian byte order
> UTF-16LE Sixteen-bit UCS Transformation Format, little-endian byte order
> UTF-16 Sixteen-bit UCS Transformation Format, byte order identified by an optional byte-order mark
>

Or who knows? You might even consider reading the occasional
tutorial!

http://download.oracle.com/javase/tutorial/index.html

which features

http://download.oracle.com/javase/tutorial/i18n/text/string.html

So the answer is: Read the documentation, and remember -

GIYF!

--
Lew

Tim McDaniel

unread,
Feb 28, 2011, 6:17:37 PM2/28/11
to
In article <87576bda-dafa-43de...@n16g2000prc.googlegroups.com>,

Lew <l...@lewscanon.com> wrote:
>No way Oracle can know that for you.

I downloaded Java via a sun.com URL; they ought to know what they
support.

>As for legal encodings, those are universal:
>http://ietf.org/rfc/rfc2278.txt

I don't see any there. They're actually listed at another URL you mentioned,
<http://download.oracle.com/javase/6/docs/api/java/nio/charset/Charset.html>
"Every implementation of the Java platform is required to support the
following standard charsets": US-ASCII, ISO-8859-1, UTF-8, UTF-16BE,
UTF-16LE, and UTF-16, as you noted. It seems a reasonable assumption
that "required to support" includes Java source code and not just the
run-time behavior of Charset.

<http://download.oracle.com/javase/6/docs/technotes/guides/intl/encoding.doc.html>
lists encodings for "The classes java.io.InputStreamReader,
java.io.OutputStreamWriter, java.lang.String, and classes in the
java.nio.charset package ... Sun's Java SE Development Kit 6 for all
platforms (Solaris(TM) operating environment, Linux, and Microsoft
Windows) and the Java SE Runtime Environment 6 for Solaris and Linux
support all encodings shown on this page.". I think it likely that
those are also supported by "javac -encoding". Though it would take a
while to test all 160 or so, I likely need no more than 4 or so.


None of which documents what Oracle's Java considers the DEFAULT
encoding.


>Your platform documentation will tell you what encoding it uses.

...


>So the answer is: Read the documentation

I would like to read the documentation on the default encoding for
javac, but I don't know where it is. For supported encodings, I can
make inferences from the above, but I'd prefer an explicit statement
about compile-time behavior, not just run time.

>GIYF!

I did Google. I didn't see it, though I did stop after 30 or so hits,
as few seemed pertinent.

--
Tim McDaniel, tm...@panix.com

Lew

unread,
Feb 28, 2011, 7:01:54 PM2/28/11
to
On 02/28/2011 06:17 PM, Tim McDaniel wrote:
> In article<87576bda-dafa-43de...@n16g2000prc.googlegroups.com>,
> Lew<l...@lewscanon.com> wrote:
>> No way Oracle can know that for you.
>
> I downloaded Java via a sun.com URL; they ought to know what they
> support.
>
>> As for legal encodings, those are universal:
>> http://ietf.org/rfc/rfc2278.txt
>
> I don't see any there. They're actually listed at another URL you mentioned,

From that URL:
"Examples of coded character sets are ISO 10646 [ISO-10646], US-ASCII
[US-ASCII], and the ISO-8859 series [ISO-8859]."

and
"A Character Encoding Scheme (CES) is a mapping from a Coded Character Set or
several coded character sets to a set of octets. A given CES is typically
associated with a single CCS; for example, UTF-8 applies only to ISO 10646."

From there you can use the terms they provide for further googling.

GIYF.

...

> None of which documents what Oracle's Java considers the DEFAULT
> encoding.

And your answer is:


>> Your platform documentation will tell you what encoding it uses.
> ...
>> So the answer is: Read the documentation
>
> I would like to read the documentation on the default encoding for
> javac, but I don't know where it is. For supported encodings, I can

It's whatever the default encoding is for your platform. No way Oracle can
know that. Read the documentation for your platform, as recommended.

Regardless, all your source code should be in UTF-8 encoding anyway.

>> GIYF!

> I did Google. I didn't see it, though I did stop after 30 or so hits,
> as few seemed pertinent.

Really? I got lots of useful hits from
http://lmgtfy.com/?q=Java+character+encoding+what+is+the+default

which quickly yielded
http://mindprod.com/jgloss/encoding.html#DEFAULTENCODING
http://stackoverflow.com/questions/361975/setting-the-default-java-character-encoding
http://vietunicode.sourceforge.net/howto/java/encoding.html
http://www.rgagnon.com/javadetails/java-0505.html
http://www.rgagnon.com/javadetails/encoding.html
and
http://publib.boulder.ibm.com/infocenter/iwedhelp/v6r0/index.jsp?topic=/com.ibm.db2e.doc/dbeapc1606.html

all on the first page of hits. And I haven't even tried Wikipedia yet!

--
Lew
Honi soit qui mal y pense.

Roedy Green

unread,
Feb 28, 2011, 8:23:58 PM2/28/11
to
On Mon, 28 Feb 2011 18:01:12 +0000 (UTC), tm...@panix.com (Tim
McDaniel) wrote, quoted or indirectly quoted someone who said :

>- it doesn't say what the valid choices are.
>- it doesn't tell how to actually *determine* the "platform default
> converter"!

see http://mindprod.com/jgloss/encoding.html for a list of possible
choices.

see http://mindprod.com/applet/wassup.html

You can determine what it is by looking at the system property.

It depends on the OS.

--
Roedy Green Canadian Mind Products
http://mindprod.com
Refactor early. If you procrastinate, you will have
even more code to adjust based on the faulty design.
.

Roedy Green

unread,
Mar 1, 2011, 2:21:27 PM3/1/11
to
On Mon, 28 Feb 2011 13:21:20 -0500, Joshua Cranmer
<Pidg...@verizon.invalid> wrote, quoted or indirectly quoted someone
who said :

>The valid choices are determined by what your current JRE supports

>(javac is actually implemented in Java). Determining the default
>character set is easily done with a simple Java program (left as an
>exercise for the reader)

source code available at http://mindprod.com/applet/encodings.html

Roedy Green

unread,
Mar 1, 2011, 2:26:02 PM3/1/11
to
On Mon, 28 Feb 2011 23:17:37 +0000 (UTC), tm...@panix.com (Tim
McDaniel) wrote, quoted or indirectly quoted someone who said :

>None of which documents what Oracle's Java considers the DEFAULT
>encoding.

There is no universal default encoding. It depends on the platform.

Roedy Green

unread,
Mar 1, 2011, 2:28:23 PM3/1/11
to
On Mon, 28 Feb 2011 18:01:12 +0000 (UTC), tm...@panix.com (Tim

McDaniel) wrote, quoted or indirectly quoted someone who said :

>I'm not sure which comp.lang.java.* newsgroup is appropriate.

From a practical point of view, stick to ASCII, using \uxxxx for
awkward characters, and don't worry about the encoding.

OR

use UTF-8 and specify UTF-8 on the javac command line.

Java source code needs to be shared and anything else will be a PITA.

Tim McDaniel

unread,
Mar 1, 2011, 2:35:18 PM3/1/11
to
In article <g1iqm6t68v4sea15g...@4ax.com>,

Roedy Green <see_w...@mindprod.com.invalid> wrote:
>On Mon, 28 Feb 2011 23:17:37 +0000 (UTC), tm...@panix.com (Tim
>McDaniel) wrote, quoted or indirectly quoted someone who said :
>
>>None of which documents what Oracle's Java considers the DEFAULT
>>encoding.
>
>There is no universal default encoding. It depends on the platform.

I know. It would still be nice if there were convenient documentation
per platform, or a javac option to show interesting values including
this (and the default VM size and the default max VM size and ...), or
some such.

--
Tim McDaniel, tm...@panix.com

Roedy Green

unread,
Mar 1, 2011, 3:37:31 PM3/1/11
to
On Mon, 28 Feb 2011 14:09:36 -0800 (PST), Lew <l...@lewscanon.com>

wrote, quoted or indirectly quoted someone who said :

>As for legal encodings, those are universal:
>http://ietf.org/rfc/rfc2278.txt

That RFC has been supplanted by 2978.

Roedy Green

unread,
Mar 1, 2011, 3:58:51 PM3/1/11
to
On Tue, 1 Mar 2011 19:35:18 +0000 (UTC), tm...@panix.com (Tim McDaniel)

wrote, quoted or indirectly quoted someone who said :

>I know. It would still be nice if there were convenient documentation


>per platform, or a javac option to show interesting values including
>this (and the default VM size and the default max VM size and ...), or
>some such.


Much of that information is available by looking at the system
properties. See http://mindprod.com/applet/wassup.html for a simple
properties viewer.

There IS no universal default value for -Xms or -Xmx. The initial
value is computed based on the system configuration.

The question you want to ask is "What value for -Xms is Java.exe
currently using?" I don't know how to determine that off the top of my
head.

You might want to look at Jet which dynamically adjusts memory use
throughout execution. See http://mindprod.com/jgloss/jet.html

You can get solid state drives now, in the 64..256 gig range quite
cheaply. I would think putting the page file and commonly used files
on the drive would give you a heck of a boost. It might be a lot
easier than trying to tweek Java.exe parms. see
http://mindprod.com/bgloss/ssd.html I have not yet experimented with
one, but it on my todo list. I want to improve by SATA on the
motherboard first.

You could put the entire OS on it, plus selected apps.

Tim McDaniel

unread,
Mar 1, 2011, 4:17:12 PM3/1/11
to
In article <g9mqm6dsvmsosrp01...@4ax.com>,

Roedy Green <see_w...@mindprod.com.invalid> wrote:
>There IS no universal default value for

To repeat: I know!

>>It would still be nice if there were convenient documentation per
>>platform, or a javac option to show interesting values including
>>this (and the default VM size and the default max VM size and ...),
>>or some such.

A javac option would be nice because you wouldn't have to depend on
the documentation being findable and accurate, and it needs to know
its own defaults anyway. As an analogy, "perl -V" gives pretty
comprehensive info on its defaults and properties.

--
Tim McDaniel, tm...@panix.com

Lew

unread,
Mar 1, 2011, 4:29:02 PM3/1/11
to
Tim McDaniel wrote:

> Roedy Green wrote:
>> There IS no universal default value for
>
> To repeat: I know!

And yet you keep asking for it.

>>> It would still be nice if there were convenient documentation per platform

That's the platform vendor's job, not Oracle's.

>>>, or a javac option to show interesting values including
>>> this (and the default VM size and the default max VM size and ...),
>>> or some such.

What values could it show? It can only show one value, actually, either the
one that the OS provides or the one that you do.

> A javac option would be nice because you wouldn't have to depend on
> the documentation being findable and accurate, and it needs to know

You *always* have to depend on the documentation being at hand and accurate.

Any other way lies madness and wrong results.

> its own defaults anyway. As an analogy, "perl -V" gives pretty
> comprehensive info on its defaults and properties.

What kind of javac option do you mean? How would javac gain this information?

javac doesn't "know" its own defaults - it asks the OS for them. They can
change between one run and the next.

If you have a documentation problem, it's with the OS, not Java. Java doesn't
know enough to document your OS - it relies on the OS vendor for that.

So should you.

Lew

unread,
Mar 1, 2011, 4:30:02 PM3/1/11
to
On 03/01/2011 03:37 PM, Roedy Green wrote:
> On Mon, 28 Feb 2011 14:09:36 -0800 (PST), Lew<l...@lewscanon.com>
> wrote, quoted or indirectly quoted someone who said :
>
>> As for legal encodings, those are universal:
>> http://ietf.org/rfc/rfc2278.txt
>
> That RFC has been supplanted by 2978.

Good show. Thanks for the correction.

Lew

unread,
Mar 1, 2011, 4:40:07 PM3/1/11
to
Roedy Green wrote:
> From a practical point of view, stick to ASCII, using \uxxxx for
> awkward characters, and don't worry about the encoding.
>
> OR
>
> use UTF-8 and specify UTF-8 on the javac command line.
>
> Java source code needs to be shared and anything else will be a PITA.

Java source should always be UTF-8 encoded. Same with XML.

"\uxxxx" is a good suggestion, but 'ware the fact that it precedes parsing.

The trouble with sticking with ASCII is that it makes your code too hard to
read when you break out that eighth or higher bit. Which would you prefer to
maintain:

public static final String STYLE_FAULT = "That's pass\u00e9!";
or
public static final String STYLE_FAULT = "That's passé!";
?

Or how about:

String composerName = "Anton\u00edn Leopold Dvo\u0159\u00e1k";
vs.
String composerName = "Antonín Leopold Dvořák";

I'd pick the latter.

Message has been deleted

Roedy Green

unread,
Mar 2, 2011, 1:43:10 PM3/2/11
to
On Tue, 1 Mar 2011 21:17:12 +0000 (UTC), tm...@panix.com (Tim McDaniel)

wrote, quoted or indirectly quoted someone who said :

>>There IS no universal default value for
>
>To repeat: I know!

That is irrelevant. This is not email. We have an audience.

Roedy Green

unread,
Mar 2, 2011, 1:55:50 PM3/2/11
to
On Tue, 01 Mar 2011 16:40:07 -0500, Lew <no...@lewscanon.com> wrote,

quoted or indirectly quoted someone who said :

>I'd pick the latter.

When Java started out, UTF-8 editors/IDEs were rare. Now they are all
but universal, so you might as well go UTF-8. Code is a lot easier to
proofread. Some of our readers may be using quite old equipment, so
UTF-8 may not be the obvious choice for them.

Roedy Green

unread,
Mar 2, 2011, 2:04:46 PM3/2/11
to
On 2 Mar 2011 01:23:17 GMT, r...@zedat.fu-berlin.de (Stefan Ram) wrote,

quoted or indirectly quoted someone who said :

>http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4175635


This is a spurious bug. The author did not understand what the
file.encoding system property was for.

You can determine the default char set this way:

String defaultEncoding = System.getProperty( "file.encoding" );
String canonicalName = Charset.forName( defaultEncoding ).name();

The default character encoding is a property of the OS. Think back to
the days of 8-bit char sets. Your character set depended on your
locale. It is not some universal constant, though it would make sense
now for Microsoft/Oracle to make the default UTF-8 everywhere.


To find out your default character set view
http://mindprod.com/applet/encodings.html

Source code provided.

The initial values for the various system properties are computed by a
mysterious native method. I suppose now you could get a look at the C
source for it.

Lew

unread,
Mar 4, 2011, 2:40:16 PM3/4/11
to
On Mar 2, 2:04 pm, Roedy Green <see_webs...@mindprod.com.invalid>
wrote:

> On 2 Mar 2011 01:23:17 GMT, r...@zedat.fu-berlin.de (Stefan Ram) wrote,
> quoted or indirectly quoted someone who said :
>
> >http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=4175635
>
> This is a spurious bug. The author did not understand what the
> file.encoding system property was for.
>

Well, Roedy, obviously Stefan posted the link so the OP could read,
"- The java.nio.charset.Charset class description now mentions that
'Every instance of the Java virtual machine has a default charset,
which may or may not be one of the standard charsets. The default
charset is determined during virtual-machine startup and typically
depends upon the locale and charset being used by the underlying
operating system.'

"- The same class also offers a method defaultCharset, which lets an
application find out about the actual default encoding.

"- The String and InputStreamReader specifications refer to the
Charset
class multiple times.

"There are still some gaps though - for example, the javac
documentation
refers to "platform default converter" with no explanation what that
is.

"The file.encoding is intentionally not documented in the platform
specification. It's an implementation detail of Sun's JREs, and
applications should not rely on it. See bug 4163515."

Of course, some folks who frequent this newsgroup get their knickers
in a twist any time someone suggests they read another link or do a
little research on their own, and scream, rant, cry and hold their
breath until they turn purple until someone spoon-feeds them the
answer, which they then misinterpret and argue against and complain
that Python does it better.

So for those folks I spoon-feed you the relevant parts of the
referenced link.

--
Lew

0 new messages