Tatu Saloranta
unread,Mar 10, 2021, 11:34:23 PM3/10/21Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to jacks...@googlegroups.com
One of the problems to tackle in 2.13 is that of if and how to include
parts of input source for `JsonProcessingException`. For example,
given input like:
{
X"id":124
}
(that is, 3 lines, with invalid name for JSON Object property)
we'd get exception with message like:
-------
Unexpected character ('X' (code 88)): was expecting double-quote to
start field name
at [Source: (String)"{
X"id":124
}"; line: 2, column: 2]
--------
So far so good. First N characters input are included; there is a way
to prevent this inclusion as well (disable
StreamReadFeature.INCLUDE_SOURCE_IN_LOCATION)
But:
1. If input contains control-characters or other non-visible
characters, they are currently not escaped or quoted
2. Linefeeds and tabs are similarly included as is
3. Attempt is made to even include contents of binary formats
I'll tackle all of these but wanted feedback first on (1). I think
non-visible/-printable characters should absolutely be escaped but
there are a few ways to do that, and no real standard.
Leading choices I can think of are (in some sort of descending
popularity order), for character 0x7F (DEL)
1) \u7F -- Java escape (or \u28FE for values beyond 8-bit)
2) %7F -- URL encode (presumably can also then use %28FE if need be?)
3) <7F> -- Unix "less" command
4) U+007F -- Unicode notation
5)  -- XML
6) something else? (U
The goal here is NOT to make result parseable but just to:
(a) Help dev possibly see where the problem might be (or not, as this
is the leading snippet for static content cases)
(b) but keeping output sane (not mess console)
There is also the question of consistency vs optimality: that is, whether to:
(a) Use same escaping for all formats, OR
(b) Vary escaping based on format in question
My personal preference here is (a) although I am open to good arguments for (b).
But on main escaping, at least for JSON snippet, which of presented
choices (1) - (6) makes most sense? Why?
-+ Tatu +-
ps. In absence of feedback, I am leaning towards (1), Java-style escaping.