Fix JSON escaping of unicode characters to work in JDK 7. (issue1796803)

389 views
Skip to first unread message

skyb...@google.com

unread,
Jul 25, 2012, 8:38:12 PM7/25/12
to acl...@google.com, google-web-tool...@googlegroups.com, re...@gwt-code-reviews-hr.appspotmail.com
Reviewers: acleung,

Description:
Fix JSON escaping of unicode characters to work in JDK 7.

JDK 7 supports Unicode 6 and some characters changed:
- zero-width-space is no longer a whitespace character
- invisible-plus is new

This caused JSON encoding tests to fail in HTMLUnit somehow, so
escape these characters just to be safe.

Fixes issue 7444.


Please review this at http://gwt-code-reviews.appspot.com/1796803/

Affected files:
M user/src/com/google/gwt/core/client/JsonUtils.java


Index: user/src/com/google/gwt/core/client/JsonUtils.java
===================================================================
--- user/src/com/google/gwt/core/client/JsonUtils.java (revision 11175)
+++ user/src/com/google/gwt/core/client/JsonUtils.java (working copy)
@@ -28,7 +28,7 @@
* eval(). Control characters, quotes and backslashes are not affected.
*/
public static native String escapeJsonForEval(String toEscape) /*-{
- var s =
toEscape.replace(/[\xad\u0600-\u0603\u06dd\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202e\u2060-\u2063\u206a-\u206f\ufeff\ufff9-\ufffb]/g,
function(x) {
+ var s =
toEscape.replace(/[\xad\u0600-\u0603\u06dd\u070f\u17b4\u17b5\u200b-\u200f\u2028-\u202e\u2060-\u2064\u206a-\u206f\ufeff\ufff9-\ufffb]/g,
function(x) {
return
@com.google.gwt.core.client.JsonUtils::escapeChar(Ljava/lang/String;)(x);
});
return s;
@@ -38,7 +38,7 @@
* Returns a quoted, escaped JSON String.
*/
public static native String escapeValue(String toEscape) /*-{
- var s =
toEscape.replace(/[\x00-\x1f\xad\u0600-\u0603\u06dd\u070f\u17b4\u17b5\u200c-\u200f\u2028-\u202e\u2060-\u2063\u206a-\u206f\ufeff\ufff9-\ufffb"\\]/g,
function(x) {
+ var s =
toEscape.replace(/[\x00-\x1f\xad\u0600-\u0603\u06dd\u070f\u17b4\u17b5\u200b-\u200f\u2028-\u202e\u2060-\u2064\u206a-\u206f\ufeff\ufff9-\ufffb"\\]/g,
function(x) {
return
@com.google.gwt.core.client.JsonUtils::escapeChar(Ljava/lang/String;)(x);
});
return "\"" + s + "\"";
@@ -149,6 +149,7 @@
out[0x70f] = '\\u070f'; // Syriac abbreviation mark
out[0x17b4] = '\\u17b4'; // Khmer vowel inherent aq
out[0x17b5] = '\\u17b5'; // Khmer vowel inherent aa
+ out[0x200b] = '\\u200b'; // Zero width space
out[0x200c] = '\\u200c'; // Zero width non-joiner
out[0x200d] = '\\u200d'; // Zero width joiner
out[0x200e] = '\\u200e'; // Left-to-right mark
@@ -164,6 +165,7 @@
out[0x2061] = '\\u2061'; // Function application
out[0x2062] = '\\u2062'; // Invisible times
out[0x2063] = '\\u2063'; // Invisible separator
+ out[0x2064] = '\\u2064'; // Invisible plus
out[0x206a] = '\\u206a'; // Inhibit symmetric swapping
out[0x206b] = '\\u206b'; // Activate symmetric swapping
out[0x206c] = '\\u206c'; // Inherent Arabic form shaping


j...@jaet.org

unread,
Jul 25, 2012, 10:18:47 PM7/25/12
to skyb...@google.com, acl...@google.com, google-web-tool...@googlegroups.com, re...@gwt-code-reviews-hr.appspotmail.com

t.br...@gmail.com

unread,
Jul 26, 2012, 3:02:08 AM7/26/12
to skyb...@google.com, acl...@google.com, j...@jaet.org, google-web-tool...@googlegroups.com, re...@gwt-code-reviews-hr.appspotmail.com
I'm not against this kind of change but IMO any deviation from the JSON
and/or ECMAScript specs should be documented, or we risk removing them
at a later time and break things again (that being said, I don't
understand what's "broken" here and how this CL "fixes" it wrt JDK 7, as
all I see is JavaScript / client code; if the issue is that HTMLUnit
doesn't follow the specs but rely on Java's Unicode support, then it
should be documented that this is a workaround for an HTMLUnit bug)

http://gwt-code-reviews.appspot.com/1796803/

skyb...@google.com

unread,
Jul 26, 2012, 5:07:41 PM7/26/12
to acl...@google.com, j...@jaet.org, t.br...@gmail.com, google-web-tool...@googlegroups.com, re...@gwt-code-reviews-hr.appspotmail.com
Yes, sorry for being unclear. What we know is that
JSONTest.testParseEscaped() fails on JDK7 without this fix. It's
probably a bug in HTMLUnit, but we didn't find the root cause.

However, it seems okay to be conservative when escaping characters in
JSON strings. JSON only requires double quote, control characters, and
backslash to be escaped, but we were already escaping many more
characters than that.

I'll fix the description in the submitted patch.

http://gwt-code-reviews.appspot.com/1796803/
Reply all
Reply to author
Forward
0 new messages