JSONParser can't parse some unescaped unicode characters

2,185 views
Skip to first unread message

Boris Granveaud

unread,
Jun 7, 2010, 5:00:41 AM6/7/10
to Google Web Toolkit
Hi,

It seems that JSONParser doesn't like some unicode characters if they
are not escaped with \uxxxx:

public class Test implements EntryPoint {
public void onModuleLoad() {
for (char c = 0x2000; c < 0x2050; c++) {
String str = "{\"string\":\"" + c + "\"}";
try {
JSONValue json = JSONParser.parse(str);
} catch (Exception e) {
System.out.println("JSON parse error char=" +
Integer.toHexString((int) c));
}
}
}
}

In GWT 2.0.3 emulator, I've got the following results:

- Chrome 5.0 and FF 3.6: error with character 0x2028 and 0x2029
- IE 8.0: no error

It works if I escape the characters with \u2028 and \u2029.

I'm using Jackson on server-side to generate the JSON string which is
sent to my GWT application and it doesn't escape by default these
characters. As a workaround, I've implemented a Jackson custom
serializer.

Boris.

Thomas Broyer

unread,
Jun 7, 2010, 1:45:26 PM6/7/10
to Google Web Toolkit


On Jun 7, 11:00 am, Boris Granveaud <bgran...@gmail.com> wrote:
> Hi,
>
> It seems that JSONParser doesn't like some unicode characters if they
> are not escaped with \uxxxx:
>
> public class Test implements EntryPoint {
>   public void onModuleLoad() {
>     for (char c = 0x2000; c < 0x2050; c++) {
>       String str = "{\"string\":\"" + c + "\"}";
>       try {
>         JSONValue json = JSONParser.parse(str);
>       } catch (Exception e) {
>         System.out.println("JSON parse error char=" +
> Integer.toHexString((int) c));
>       }
>     }
>   }
>
> }
>
> In GWT 2.0.3 emulator, I've got the following results:
>
> - Chrome 5.0 and FF 3.6: error with character 0x2028 and 0x2029
> - IE 8.0: no error
>
> It works if I escape the characters with \u2028 and \u2029.

This is because GWT for now uses eval() to "parse" JSON, and U+2028
and U+2029 are line terminators in JavaScript (per spec).
Quoting ECMASCript 5, which defines JSON.parse:
"""JSON uses a more limited set of white space characters than
WhiteSpace and allows Unicode code points U+2028 and U+2029 to
directly appear in JSONString literals without using an escape
sequence."""

> I'm using Jackson on server-side to generate the JSON string which is
> sent to my GWT application and it doesn't escape by default these
> characters. As a workaround, I've implemented a Jackson custom
> serializer.

Jackson is right, as these aren't special characters in JSON, but on
the other hand, it could escape them to cope with web apps that eval()
the result instead of JSON.parse()ing it.

Boris Granveaud

unread,
Jun 8, 2010, 4:15:57 AM6/8/10
to Google Web Toolkit
if someone is interested by the workaround, it is now on Jackson wiki:
http://wiki.fasterxml.com/JacksonSampleQuoteChars

there are also several issues regarding escaping in JIRA:

http://jira.codehaus.org/browse/JACKSON-102
http://jira.codehaus.org/browse/JACKSON-262
http://jira.codehaus.org/browse/JACKSON-219

Boris.

Olivier Monaco

unread,
Jun 9, 2010, 4:01:20 AM6/9/10
to Google Web Toolkit
Hi,

JSONParser must only be used with trusted JSON because it uses
"eval()". Thing about using a real JSON parser. This will avoid
problem like this one.

As a JSON parser example, you can see my port of the JavaScript parser
from json.org: http://code.google.com/p/tyco/source/browse/#svn/trunk/tyco-gwt/src/main/com/googlecode/tyco/gwt/client/json

It uses the native JSON parser if available, and uses a JavaScript
implementation otherwise. I know someone try to do something similar
into GWT but had some issues. I hope a true JSON parser will come
soon.

Olivier
Reply all
Reply to author
Forward
0 new messages