Strange encoding issue

Seth Goldenberg

unread,

May 5, 2014, 7:55:42 PM5/5/14

to adt...@googlegroups.com

Strange issue here with merging resources. I have a build machine that's botching the encoding of some unicode characters in my XML when merging resources. This is building just fine on my own machine.

This line is in an exploded AAR prior to being merged:

(From build/exploded-aar/com.facebook/facebook-android-sdk/3.5.2/res/values/values.xml)

And like this after being merged:

(From build/res/all/release/values/values.xml)

aapt (version 19.0.3) chokes with this error on the line above:

error: Error parsing XML: not well-formed (invalid token)

This error does not happen on my personal machine. Both are running Mac OS X 10.9.2. What dependency does aapt have that the same version of it would produce different outputs on different machines?

Thanks,

Seth

Christopher Pickslay

unread,

May 5, 2014, 7:59:10 PM5/5/14

to adt...@googlegroups.com

Seth, I'm having the same problem, seems to be the 0.10.0 gradle plugin. Try changing your plugin version to com.android.tools.build:gradle:0.9.2 for now, and run gradle with the --refresh-dependencies flag. Fixed it for me.

In my case, it's not just unicode characters--it chokes on HTML entities as well. Strangely only in the default strings file, not in any localized files.

Seth Goldenberg

unread,

May 5, 2014, 8:40:56 PM5/5/14

to adt...@googlegroups.com

It seems related to this bug, but I'm already implementing Xavier's workaround to write back to UTF-8. I also can only reproduce this on one machine.

https://code.google.com/p/android/issues/detail?id=61613

On Monday, May 5, 2014 4:55:42 PM UTC-7, Seth Goldenberg wrote:

Seth Goldenberg

unread,

May 6, 2014, 12:50:59 PM5/6/14

to adt...@googlegroups.com

I was actually running version 0.9.x of the Android Gradle plugin when I ran into this issue. I tried incrementing to 0.10.x to see if the problem is fixed, and it wasn't. I've been running ./gradlew clean after making changes to make sure they take effect.

I have a couple scripts like this that replace strings in XML files. Removing these didn't do anything. I'll keep reiterating that this is only happening on one machine, so it might have something to do with a system configuration issue.

variant.mergeResources.doLast{

println("Checking for the Jenkins build number")

ext.env = System.getenv()

def buildNumber = env.BUILD_NUMBER

if (buildNumber != null) {

File valuesFile = file("${buildDir}/res/all/${variant.dirName}/values/values.xml")

println("Replacing revision number in " + valuesFile)

println("Build number = " + buildNumber)

String content = valuesFile.getText('UTF-8')

content = content.replaceAll(/devBuild/, buildNumber)

valuesFile.write(content, 'UTF-8')

Xavier Ducrohet

unread,

May 6, 2014, 12:57:16 PM5/6/14

to adt...@googlegroups.com

I cannot reproduce either. Normally we force encoding to UTF8 so there shouldn't be different behaviors.

Can you check which JVM your 2 machines use?

--
You received this message because you are subscribed to the Google Groups "adt-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to adt-dev+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

--
Xavier Ducrohet
Android SDK Tech Lead
Google Inc.
http://developer.android.com | http://tools.android.com

Please do not send me questions directly. Thanks!

Seth Goldenberg

unread,

May 6, 2014, 1:04:26 PM5/6/14

to adt...@googlegroups.com

All machines I've checked are running the Java SE 1.6.0_65

--
You received this message because you are subscribed to a topic in the Google Groups "adt-dev" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/adt-dev/CER-RNfiYjo/unsubscribe.
To unsubscribe from this group and all its topics, send an email to adt-dev+u...@googlegroups.com.

Tor Norbye

unread,

May 6, 2014, 1:13:45 PM5/6/14

to adt...@googlegroups.com

Xav, it looks like the ValueResourceParser2 reads in files like this:

stream = new BufferedInputStream(new FileInputStream(file));

InputSource is = new InputSource(stream);

...

return builder.parse(is);

Note that it constructs the input stream and XML input source without specifying a UTF-8 encoding.

The above is correct when you want to actually pick up encodings from the XML file itself, since the XML prolog can specify an encoding, so the XML parser expects to get a raw bytestream and to do its own encoding handling.

However, we really discourage users from using custom or default platform encodings, and lint will complain about any XML files it finds without UTF-8 encoding.

Perhaps we should just read in the XML as UTF-8 characters instead -- and maybe we can have the XML parser abort / retry if it encounters an actual encoding pragma?

In the meantime, Seth -- can you see whether your XML files start with an encoding prolog, and if not, try putting this at the top of your files (or at least the ones containing the character entities) :

<?xml version="1.0" encoding="utf-8"?>

Seth Goldenberg

unread,

May 6, 2014, 1:28:11 PM5/6/14

to adt...@googlegroups.com

The XML file in the exploded AAR that had the unicode characters in it specific UTF-8 at the top.

I actually solved the problem by deleting the .gradle directory in my workspace. Performing a clean before each build didn't do anything. I'm wondering if the configuration can get in a state where the wrong encoding is stored somewhere.

Overall, this was a very bizarre issue. Thank you both for your help, Xavier and Tor. Let me know if I can provide anything else that would help prevent folks from running into this problem again.

Reply all

Reply to author

Forward