Java Drivers 3.0 : problem with UTF-8 when decoding JSON

195 views
Skip to first unread message

William Delanoue

unread,
May 3, 2015, 5:31:07 PM5/3/15
to mongod...@googlegroups.com
Hi,

where am I wrong :

System.out.println(JSON.serialize(new Document("_id", "𑡜ᳫ鉠鮻罖᧭䆔瘉")));
System.out.println(Document.parse(JSON.serialize(new Document("_id", "𑡜ᳫ鉠鮻罖᧭䆔瘉")), new DocumentCodec()));

I got :
{ "_id" : "꼢𑡜ᳫ鉠鮻罖᧭䆔瘉"}
Document{{_id=꼢ᡜ?ᳫ鉠鮻罖᧭䆔瘉}}

First line is good, second one is wrong : you can see the second character who is not good

(org.junit.ComparisonFailure: [1 6236 vs 55302]
Expected :'?'
Actual   :'ᡜ')


Do I use in a wrong way ?


Jeff Yemin

unread,
May 4, 2015, 7:38:00 AM5/4/15
to mongod...@googlegroups.com
Hi William,

This looks like a bug in JsonReader's parsing of strings containing Unicode surrogate pairs.  I've reported it in JAVA-1793, so please follow our progress on that ticket.

For now, you will be ok if you use Document.toJson() instead of JSON.serialize().  This is because Document.toJson() uses Unicode escapes for surrogate pairs (e.g. "\ud806\udc5c"), and JsonReader correctly handles those.

Many thanks for reporting this issue.

Regards,
Jeff

William Delanoue

unread,
May 4, 2015, 11:12:31 AM5/4/15
to mongod...@googlegroups.com
Hi,

class JsonBuffer {

public int read() {
return (position >= buffer.length()) ? -1 : buffer.charAt(position++);
}

and all is fine (I use it in "fongo" https://github.com/fakemongo/fongo).

Thanks for the ticket

Jeff Yemin

unread,
May 4, 2015, 11:51:29 AM5/4/15
to mongod...@googlegroups.com
Yes, that's exactly what I did too.

Regards,
Jeff

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/dd104abf-a852-4f7e-a848-bdc99d55420a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

William Delanoue

unread,
May 4, 2015, 5:39:18 PM5/4/15
to mongod...@googlegroups.com
Sorry, me again :)

In org.bson.json.JsonWriter
especially in org.bson.json.JsonWriter#doWriteBinaryData

default:
writeStartDocument();
writeString("$binary", printBase64Binary(binary.getData()));
writeString("$type", Integer.toHexString(binary.getType() & 0xFF));
writeEndDocument();

$type is not a String but an Integer (http://docs.mongodb.org/manual/reference/bson-types/)
You can reproduce the problem with :


final Document document = new Document("test", new BsonBinary("test".getBytes()));
final String json = document.toJson();
System.out.println(json);
JSON.parse(json);


and you got :
{ "test" : { "$binary" : "dGVzdA==", "$type" : "0" } }

java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Integer
    at com.mongodb.util.JSONCallback.objectDone(JSONCallback.java:127)
    at com.mongodb.util.JSONParser.parseObject(JSON.java:274)
    at com.mongodb.util.JSONParser.parse(JSON.java:227)
    at com.mongodb.util.JSONParser.parseObject(JSON.java:263)
    at com.mongodb.util.JSONParser.parse(JSON.java:227)
    at com.mongodb.util.JSONParser.parse(JSON.java:155)
    at com.mongodb.util.JSON.parse(JSON.java:92)
    at com.mongodb.util.JSON.parse(JSON.java:73)

($type, as I understand, must be an Integer)

Jeff Yemin

unread,
May 4, 2015, 6:12:32 PM5/4/15
to mongod...@googlegroups.com
The JsonWriter and JsonReader classes conform to the MongoDB Extended JSON specification for the binary BSON type, but it looks like the old JSON class does not.  If you'd like to see that addressed, please report a Jira issue.

Regards,
Jeff

William Delanoue

unread,
May 4, 2015, 6:31:48 PM5/4/15
to mongod...@googlegroups.com
Thanks for the explanation !

https://jira.mongodb.org/browse/JAVA-1796
Reply all
Reply to author
Forward
0 new messages