JNA string encoding issue

859 views
Skip to first unread message

Aaron Madlon-Kay

unread,
Apr 9, 2016, 2:26:28 AM4/9/16
to Java Native Access
Hello.

I am confused by this passage in the JNA Javadoc (emphasis mine):

When converting Java unicode characters into an array of char, the default platform encoding is used, unless the system property jna.encoding is set to a valid encoding. This property may be set to "UTF8", for example, to ensure all native strings use that encoding.

http://java-native-access.github.io/jna/4.2.1/overview-summary.html#strings

However in what appears to be the relevant section of the code we have:

     public static final String DEFAULT_ENCODING = "utf8";

    ...

    /**
     * @return The default string encoding.  Returns the value of the system
     * property <code>jna.encoding</code> or {@link Native#DEFAULT_ENCODING}.
     */
    public static String getDefaultStringEncoding() {
        return System.getProperty("jna.encoding", DEFAULT_ENCODING);
    }

https://github.com/java-native-access/jna/blob/master/src/com/sun/jna/Native.java

It seems that the documentation and code do not agree. If the documentation were correct then I'd expect DEFAULT_ENCODING to be set from e.g. Charset.defaultCharset().name(). So which is correct?

This mismatch appears to be the cause of an issue that's been reported with an application I maintain, in which on Windows a file path passed to a native API is misinterpreted when it contains non-ASCII characters because JNA is providing UTF-8 while the native code is expecting Cp1252. The issue is mitigated by setting jna.encoding to Cp1252, but it's not clear that that's safe for all of the libraries we are using that use JNA under the hood.

Thanks,
Aaron

Aaron Madlon-Kay

unread,
Apr 9, 2016, 3:53:15 AM4/9/16
to Java Native Access
Quick followup: I dug around some more and I understand that in JNA 4.0 you can now set string encoding on a per-library basis via a couple mechanisms, but in our case it looks like none of that has the desired effect: We need to specify the encoding of strings going *from* Java *to* the native library, and it seems that only Native.getDefaultStringEncoding() is consulted in this codepath.

So unless I am mistaken, our only choice is to set jna.encoding globally. This makes me again wonder why the default is UTF-8 and not the platform's native charset per the docs.

-Aaron

Aaron Madlon-Kay

unread,
Apr 9, 2016, 4:30:38 AM4/9/16
to Java Native Access
Sorry for the one-man thread here. Last update: I am thinking that unconditionally defaulting to UTF-8 must be an unintentional side-effect of the linked commit below.

https://github.com/java-native-access/jna/commit/14925461502b5bae2a185f983af6c634118dcc0c?diff=unified#diff-02cbcdcad9148f413bae67326f2b08c2L299

Before, Native.toString(byte[]) forwarded to toString(byte[], String) with System.getProperty("jna.encoding") for the encoding; when jna.encoding was not set, the latter tried instantiating a new string with new String(byte[], String), and when this failed then it fell back to new String(byte[]) which properly uses the system native encoding.

Now instead it is always using UTF-8 if jna.encoding is not set.

So this smells like a bug. If not, I think it deserves some better documentation.

-Aaron

Timothy Wall

unread,
Apr 9, 2016, 1:18:53 PM4/9/16
to jna-...@googlegroups.com
You’re right, that looks like a bug. The default encoding in most cases _should_ be the platform default and not “utf8”.
> --
> You received this message because you are subscribed to the Google Groups "Java Native Access" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to jna-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Aaron Madlon-Kay

unread,
Apr 10, 2016, 4:19:26 AM4/10/16
to Java Native Access
Thanks for confirmation. In that case it sounds like all users of JNA 4.1.0 or later should be setting System.setProperty("jna.encoding", Charset.defaultCharset().name()) to ensure correct operation on Windows.

I have sent a PR on GitHub.

-Aaron
Reply all
Reply to author
Forward
0 new messages