How many character sets are supported by Android?

333 views
Skip to first unread message

Gordon

unread,
Mar 10, 2009, 5:17:23 AM3/10/09
to android-platform
Seems only "GSM" and "IRA" are supported and they are the same. Can
anyone give some information regarding the character set?

Hildum Eric-XFQ473

unread,
Mar 10, 2009, 12:35:08 PM3/10/09
to android-...@googlegroups.com
Are you asking about character sets (the list of displayable characters) or character encoding (the assignment of characters to a numerical value)? Note that most of what is published on the web regarding this is wrong; these errors have propagated into standards as well, where character encoding is frequently used to indicate character set and visa versa. In particular, character encoding implies the character set in some, but not all character encoding systems.

As for character sets, the testing I have done with the most recent SDK indicates that the character set supported by Android is quite complete, containing Latin-1, Chinese, and Japanese. I would not be surprised if the character set is largely Unicode complete based on what I have seen in the emulator.

Not having a G1 to play with, I would not be able to tell you want glyphs are supported in the supplied fonts for an actual device.

Finally, I have not tested much, but it appears that the emulator properly supports a UTF-8 character encoding for the XML resources. Internally, the Java apparently supports either UCS-2 or UTF-16, it is not clear which from the documentation. Sun originally defined the Java language to support UCS-2 internally, but with the 1.5 release of the language made an incredible daft decision and changed the internal representation to UTF-16, meaning that we now have all the compatibility and encoding issues in Java that we have with all other languages for anything other than basic plane characters. (The correct decision would have been to allow for UCS-2 and UCS-4 strings to exist, having two additional string handling byte codes and automatic string promotion; basically using the same mechanisms as used to handle floating point representations.) It is not clear what Google chose.

I have not tested any right to left languages, Korean, nor language that have complex glyph selection rules.


Eric Hildum
Senior Product Manager, Mobile Developer Tools & SDK
Developer Platforms and Services
Ecosystem and Market Development
Motorola
Direct: +1-408-541-6809

809 11th Avenue
Sunnyvale, CA 94089
USA

Reply all
Reply to author
Forward
0 new messages