Compiler needs java files with ascii encoding ?

283 views
Skip to first unread message

Jey

unread,
Jan 18, 2010, 2:08:29 PM1/18/10
to android-platform

Twice in build/core/definitions.mk [http://android.git.kernel.org/?
p=platform/build.git;a=blob;f=core/
definitions.mk;h=3221525d813afc8537129bfaab3823126151bf58;hb=HEAD#l1254]
there is an explicit flag set to ascii encoding, when using the javac
compiler. Is there a particular reason for this? To improve code
readability, changing it from

$(HOST_JAVAC) -encoding ascii
to
$(HOST_JAVAC) -encoding utf-8

be a problem? Is there any dependency on the aidl->java conversion
tools in android? I would think utf-8 should cause no problems, as it
contains ascii as a subset. Any objections to the above change?

thanks,
Jey

lbcoder

unread,
Jan 18, 2010, 2:54:49 PM1/18/10
to android-platform
utf-8 is NOT backwards compatible with ascii.

Jim Ancona

unread,
Jan 19, 2010, 10:31:38 AM1/19/10
to android-platform

On Jan 18, 2:54 pm, lbcoder <lbco...@gmail.com> wrote:
> utf-8 is NOT backwards compatible with ascii.

It certainly is. Quoting RFC 3629, UTF-8 "has the quality of
preserving the full US-ASCII [US-ASCII] range: US-ASCII characters are
encoded in one octet having the normal US-ASCII value, and any octet
with such a value can only stand for a US-ASCII character, and nothing
else." (http://tools.ietf.org/html/rfc3629)

What makes you think otherwise?

Jim

lbcoder

unread,
Jan 19, 2010, 11:02:06 AM1/19/10
to android-platform
ASCII is a SUBSET of UTF-8 using 1 octet. UTF-8 supports up to 4
octets.

Jim Ancona

unread,
Jan 19, 2010, 12:46:01 PM1/19/10
to android-platform

On Jan 19, 11:02 am, lbcoder <lbco...@gmail.com> wrote:
> ASCII is a SUBSET of UTF-8 using 1 octet. UTF-8 supports up to 4
> octets.

Exactly. All valid US-ASCII Java source files are also valid UTF-8
Java source files. So changing the javac setting to compile using
UTF-8 shouldn't break anything, which is what most people mean by
backwards compatible.

Having said that, I'm in favor of continuing to require US-ASCII
encoding for source files. Java provides ways to generate Unicode
characters in locations where you would want them (comments, Javadoc,
string literals). AFAIK, the only thing you can do with UTF-8 source
files that you can't do with ASCII is use non-ASCII characters in Java
code (class names, method names, etc.), which I see as undesirable due
to the possibility of confusion or error. Is there some other use-case
I'm missing?

Jim

Jey Michael

unread,
Jan 20, 2010, 1:55:47 AM1/20/10
to android-...@googlegroups.com
Whats the provision in Java, Jim? (escape characters?)

http://android.git.kernel.org/?p=platform/packages/inputmethods/LatinIME.git;a=blob;f=src/com/android/inputmethod/latin/LatinIME.java;h=8b76dbd3950982e3ca3eb41e36d0ad94d5120205;hb=HEAD#l695
(line 695)

generates compiler warnings today in AOSP tree. (even though its just
a comment)

-Jey


On Tue, Jan 19, 2010 at 9:46 AM, Jim Ancona <j...@anconafamily.com> wrote:
>
> Having said that,  I'm in favor of continuing to require US-ASCII
> encoding for source files. Java provides ways to generate Unicode
> characters in locations where you would want them (comments, Javadoc,

> string literals)....
> Jim

Jim Ancona

unread,
Jan 20, 2010, 11:00:03 AM1/20/10
to android-platform
On Jan 20, 1:55 am, Jey Michael <jey.mich...@gmail.com> wrote:
> Whats the provision in Java, Jim?  (escape characters?)
>
> http://android.git.kernel.org/?p=platform/packages/inputmethods/Latin...

> (line 695)
>
> generates compiler warnings today in AOSP tree. (even though its just
> a comment)

In a Java literal you could write 'ß' as '\u00DF'.

With ASCII-only source, I'd probably write the comment like so:

// TODO: This doesn't work with \u00DF (Eszett), need to fix it in the
next release.

That has the advantage of being unambiguous to the eye. Out of
context, I first thought the character in the comment might be a Greek
beta (β \u03B2).

Jim

lbcoder

unread,
Jan 20, 2010, 11:17:41 AM1/20/10
to android-platform
No, that won't break, but taking that source and trying to compile it
on any OTHER system that IS using ASCII *WILL*, hence the ASCII
requirement.

Jey Michael

unread,
Feb 4, 2010, 11:45:19 PM2/4/10
to android-...@googlegroups.com
Sticking to the lowest common denominator and working with escaped
characters is actually not the one I am heavily concerned about. This
came to my attention during cleaning up a mess of warnings. Unless
these specific warnings are turned to errors, the code is, will be, an
inconsistent state.

ie, Expecting developers not to use unicode, and still the system
letting unicode in, thereby creating a broken window.

-Jey

> --
> You received this message because you are subscribed to the Google Groups "android-platform" group.
> To post to this group, send email to android-...@googlegroups.com.
> To unsubscribe from this group, send email to android-platfo...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/android-platform?hl=en.
>
>
>
>

Chih-Wei

unread,
Feb 5, 2010, 12:40:33 AM2/5/10
to android-platform
Well, the webpage of source.android.com clearly said that
only Linux and Mac OS can be used to compile AOSP.
Even Windows supports utf-8.
So what's the OTHER system do you expect to compile AOSP?
The ancient MS-DOS?
Reply all
Reply to author
Forward
0 new messages