As for character sets, the testing I have done with the most recent SDK indicates that the character set supported by Android is quite complete, containing Latin-1, Chinese, and Japanese. I would not be surprised if the character set is largely Unicode complete based on what I have seen in the emulator.
Not having a G1 to play with, I would not be able to tell you want glyphs are supported in the supplied fonts for an actual device.
Finally, I have not tested much, but it appears that the emulator properly supports a UTF-8 character encoding for the XML resources. Internally, the Java apparently supports either UCS-2 or UTF-16, it is not clear which from the documentation. Sun originally defined the Java language to support UCS-2 internally, but with the 1.5 release of the language made an incredible daft decision and changed the internal representation to UTF-16, meaning that we now have all the compatibility and encoding issues in Java that we have with all other languages for anything other than basic plane characters. (The correct decision would have been to allow for UCS-2 and UCS-4 strings to exist, having two additional string handling byte codes and automatic string promotion; basically using the same mechanisms as used to handle floating point representations.) It is not clear what Google chose.
I have not tested any right to left languages, Korean, nor language that have complex glyph selection rules.
Eric Hildum
Senior Product Manager, Mobile Developer Tools & SDK
Developer Platforms and Services
Ecosystem and Market Development
Motorola
Direct: +1-408-541-6809
809 11th Avenue
Sunnyvale, CA 94089
USA