Gracefully handling serialization errors in Jackson

1,072 views
Skip to first unread message

Zhenkai

unread,
Oct 16, 2013, 4:08:05 PM10/16/13
to jacks...@googlegroups.com

hello Jackson Dev

We use Jackson to serialize and deserialize some of our data. Sometimes bad data get's in (from other codepaths), leading to exceptions like the two examples below. I looked at the docs/code, I don't see any way currently to configure Jackson to either throw exception, ignore it, or possibly substitute in an "unknown character glyph". This would be similar to the functionality that JDK CharsetDecoder has, with onUnmappableCharacter(). This would be a desirable feature because, while obviously we'd like to not have bad data, it's not always something that can be purged or handled on a large scale, especially when starting with an existing data set.

Is this in any current plan, and if not, would you accept code contribution? Who would be the contact person for this?

Thanks,

Zhenkai

Caused by: java.io.IOException: Broken surrogate pair: first char 0xd83c, second 0x2e; illegal combination

    at org.codehaus.jackson.io.UTF8Writer.convertSurrogate(UTF8Writer.java:364)

    at org.codehaus.jackson.io.UTF8Writer.write(UTF8Writer.java:185)

    at org.codehaus.jackson.impl.WriterBasedGenerator._flushBuffer(WriterBasedGenerator.java:1050)

    at org.codehaus.jackson.impl.WriterBasedGenerator._writeString(WriterBasedGenerator.java:751)

    at org.codehaus.jackson.impl.WriterBasedGenerator.writeString(WriterBasedGenerator.java:207)

    at com.linkedin.data.codec.JacksonDataCodec$JsonTraverseCallback.stringValue(JacksonDataCodec.java:340)

    at com.linkedin.data.Data.traverse(Data.java:281)

    at com.linkedin.data.Data.traverse(Data.java:302)

    at com.linkedin.data.Data.traverse(Data.java:322)

    at com.linkedin.data.Data.traverse(Data.java:302)

    at com.linkedin.data.Data.traverse(Data.java:302)

    at com.linkedin.data.Data.traverse(Data.java:302)

    at com.linkedin.data.Data.traverse(Data.java:302)

    at com.linkedin.data.codec.JacksonDataCodec.writeObject(JacksonDataCodec.java:202)

    at com.linkedin.data.codec.JacksonDataCodec.objectToBytes(JacksonDataCodec.java:116)

    at com.linkedin.data.codec.JacksonDataCodec.mapToBytes(JacksonDataCodec.java:92)

or

java.io.IOException: Unmatched second part of surrogate pair (0xdc78)

        at org.codehaus.jackson.io.UTF8Writer.throwIllegal(UTF8Writer.java:379)

        at org.codehaus.jackson.io.UTF8Writer.write(UTF8Writer.java:178)

        at org.codehaus.jackson.impl.WriterBasedGenerator._writeSegment(WriterBasedGenerator.java:862)

        at org.codehaus.jackson.impl.WriterBasedGenerator._writeLongString(WriterBasedGenerator.java:821)

        at org.codehaus.jackson.impl.WriterBasedGenerator._writeString(WriterBasedGenerator.java:744)

        at org.codehaus.jackson.impl.WriterBasedGenerator.writeString(WriterBasedGenerator.java:207)

        at com.linkedin.data.codec.JacksonDataCodec$JsonTraverseCallback.stringValue(JacksonDataCodec.java:222)

Tatu Saloranta

unread,
Oct 16, 2013, 5:04:24 PM10/16/13
to jacks...@googlegroups.com
This is a low-level error below JSON, indicating that whatever is producing UTF-8 content has corrupted the stream. Jackson has not settings to deal with that, and I don't think it is something I would want to handle at that level. When input is corrupt, trying to deal with it in generic manner is risky and possibly counter-productive: I would rather that fundamental errors like this were used to push back on broken senders -- content is not Unicode, nor valid JSON by extension.

However: to work around specific issue you can implement/reuse a java.io.Reader implementation that would handle decoding, taking into account special needs of broken input.
I don't know what default JDK Reader would do:

   Reader r = new InputStreamReader(in, "UTF-8");

with respect to broken input; I think it might give you the "question mark" for unpaired surrogates. If so, this might be acceptable work-around.

If that does not work, it should be possible to find other UTF-decoding readers, patch them. Woodstox XML parser (which I wrote) for example has "com.ctc.wstx.io.UTF8Reader" which could be modified to handle the issue in whatever way it makes sense (remove/replace)

I hope this helps,

-+ Tatu +-



--
You received this message because you are subscribed to the Google Groups "jackson-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jackson-dev...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Zhenkai

unread,
Nov 8, 2013, 4:14:14 PM11/8/13
to jacks...@googlegroups.com
Hey Tatu,

Sounds reasonable. Thanks for the suggestion!

Zhenkai
Reply all
Reply to author
Forward
0 new messages