JsonParser#getCurrentLocation is not at the end of the current token, if current token is FIELD_NAME

7 views
Skip to first unread message

Алексей Алефиров

unread,
Oct 29, 2019, 1:52:05 PM10/29/19
to jackson-user
Hi,

For project I'm working on, I need to get tokens from a source file with it's start and end location. Using `JsonParser`, and it's methods `getTokenLocation` and `getCurrentLocation` with theirs `getLineNr` and `getColumnNr` seemed to me like a perfect solution. Unfortunately, it turned out, for a field name it `getCurrentLocation` is a either position where field value token (next token) ends or before new line, whichever coming earlier. I would expect it to be right after closing double-quote.

{
 
"fieldName" : "fieldValue"
}



Philosophical/Design question: is this okay?

Practical question: for my purposes, is this appropriate to use my original solution (with `getTokenLocation` and `getCurrentLocation`) and for FIELD_NAME override it with `getTokenLocation.getColumnNr` + `getTextLength` + 2 (for opening and closing double-quotes)?

Tatu Saloranta

unread,
Oct 29, 2019, 2:27:46 PM10/29/19
to jackson-user
On Tue, Oct 29, 2019 at 10:52 AM Алексей Алефиров <alefir...@gmail.com> wrote:
Hi,

For project I'm working on, I need to get tokens from a source file with it's start and end location. Using `JsonParser`, and it's methods `getTokenLocation` and `getCurrentLocation` with theirs `getLineNr` and `getColumnNr` seemed to me like a perfect solution. Unfortunately, it turned out, for a field name it `getCurrentLocation` is a either position where field value token (next token) ends or before new line, whichever coming earlier. I would expect it to be right after closing double-quote.

Yes, this is due to an implementation detail: an optimization made at some point (2.0 maybe) changed handling a bit so that in addition to FIELD_NAME tokenization, start of the following token is inspected. This leads to current location being bit ahead of what might be otherwise excepted.
 
 

{
 
"fieldName" : "fieldValue"
}



Philosophical/Design question: is this okay?

Practical question: for my purposes, is this appropriate to use my original solution (with `getTokenLocation` and `getCurrentLocation`) and for FIELD_NAME override it with `getTokenLocation.getColumnNr` + `getTextLength` + 2 (for opening and closing double-quotes)?

Ok, so. Starting with difference on "current" and "token" location: former is meant to be helpful for error messages, and ideally indicating specific character in input stream where something problematic was found wrt tokenization. It may be in the middle of a token, so it typically won't be super helpful for many automated use cases (like trying to outline a token or make changes).
But it is not designed or meant to give information on token boundaries: its value will be affected by "lazy parsing" for tokens (for JSON Strings, for example, location will be after opening double-quote, after JsonToken.VALUE_STRING is returned, but will move if actual contents are requested).
So it can not be used to indicate token boundaries reliably.

Token location, on the other hand, should point to the very first character that is part of the token that was returned. So excluding any possible preceding white space and/or separators (and, in non-compliant modes, comments).
If this location is incorrect that would be a bug and a new issue should be filed along with reproduction.

The challenge in your case, then (assuming token location was accurate) would be finding token end location. I think you are correct in that for everything else except for JsonToken.FIELD_NAME, current location will point to character right after token, as long as value has been accessed.
That is:

* For string values, one of String accessors is called (getText() or end offset)
* For numeric values, matching accessors (or getNumber(), or even getText())
* all other tokens (start/end markers, null/true/false) are fully tokenized right away.

FIELD_NAME case is trickier, however. Length of field name is not sufficient since there may be escaping (backslash). If you have all content, you could backtrack from start of following (value) token, although comments might be problematic. Or, you could traverse from token start towards the end, find closing double-quote (and observing backslashes).

-+ Tatu +- 

Reply all
Reply to author
Forward
0 new messages