One idea for handling Object Id forward references

437 views

Skip to first unread message

Tatu Saloranta

unread,

Dec 8, 2013, 12:33:45 AM12/8/13

to jacks...@googlegroups.com

Ok, one of the most wanted features for Jackson is ability to handle forward-references for Object Ids. Currently, deserialization of Object Ids requires that definition of an Object Id (that is, object with its id is encountered before a reference; excluding limited cases where one-level reordering resolves problem).
While Jackson itself respects this invariant, and ensures that Object Ids are always written in such an order, this may be broken by other libraries and languages.

The problem in resolving this problem is not so much the inability to keep track of unresolved references, but the separation of two parts:

1. Resolution of Object reference via id to actual Object

2. Assignment of Object to property

so at point where #1 is done, no information is available to retain to be able to do #2 when out-of-order reference is resolved.

One possibility would be to try to pipe more information through processing (property that is being assigned); but this can be invasive, and adds overhead for feature that is not really needed by majority of processing.

But I realized that by turning the problem "inside out", it should be possible to do the reverse: "push" information on unresolved reference up to assignment.

How? By doing something that at first seems like abuse of OO; throwing an exception with necessary information. But when you think about it, this is sort exceptional thing; expectation being that references can be resolved in regular flow.

So: SettableBeanProperty.deserialize() would be the place to catch exception (throw by ObjectIdReader for unsuccessful resolution), and store information about unresolved reference (including read Object Id and SettableBeanProperty instance; stored in DeserializationContext, so that unresolved refs can be caught in the end).

A hook would be also needed from the place where successful resolution occurs (place where Id is associated with instance, for future resolutions), to see if any unresolved ids exist.

Now, creation of exceptions is costly. It is probably possible to create a singleton Exception instance, and instead carry information in ThreadLocal (or such), if this proves to be very significant cost (I don't mind measurable lowish cost, like +10% for when feature is needed).

Another smaller concern is regarding try-catch block for 'deserialize()', since this is cost that all execution would incur. But it may well be insignificant cost.

The reason I share this idea is mostly because I probably will not have time to work on this for at least a month or so. But I think this is the most practical idea I have thought so far (in fact, the first one I consider worth implementing), and I thought that may be someone might be interested in implementing it.

-+ Tatu +-

Pascal Gelinas

unread,

Dec 9, 2013, 10:16:30 AM12/9/13

to jacks...@googlegroups.com

Comments inlined.

On 12/08/2013 12:33 AM, Tatu Saloranta wrote:
> Ok, one of the most wanted features for Jackson is ability to handle
> forward-references for Object Ids. Currently, deserialization of
> Object Ids requires that definition of an Object Id (that is, object
> with its id is encountered before a reference; excluding limited cases
> where one-level reordering resolves problem).
> While Jackson itself respects this invariant, and ensures that Object
> Ids are always written in such an order, this may be broken by other
> libraries and languages.

In fact, Jackson doesn't respect this invariant when
@JsonIdentityReference(alwaysAsId = true) is used.

>
> The problem in resolving this problem is not so much the inability to
> keep track of unresolved references, but the separation of two parts:
>
> 1. Resolution of Object reference via id to actual Object
> 2. Assignment of Object to property
>
> so at point where #1 is done, no information is available to retain to
> be able to do #2 when out-of-order reference is resolved.
>
> One possibility would be to try to pipe more information through
> processing (property that is being assigned); but this can be
> invasive, and adds overhead for feature that is not really needed by
> majority of processing.
>
> But I realized that by turning the problem "inside out", it should be
> possible to do the reverse: "push" information on unresolved reference
> up to assignment.
> How? By doing something that at first seems like abuse of OO; throwing
> an exception with necessary information. But when you think about it,
> this is sort exceptional thing; expectation being that references can
> be resolved in regular flow.

What would this new Exception be? A sub-class of
JsonProcessingException? I suspect you don't want to add another
exception to the signature of method.

>
> So: SettableBeanProperty.deserialize() would be the place to catch
> exception (throw by ObjectIdReader for unsuccessful resolution), and
> store information about unresolved reference (including read Object Id
> and SettableBeanProperty instance; stored in DeserializationContext,
> so that unresolved refs can be caught in the end).
> A hook would be also needed from the place where successful resolution
> occurs (place where Id is associated with instance, for future
> resolutions), to see if any unresolved ids exist.

This would work for bean properties, but not for Collection or Map
deserialization, because they add a level of indirection. Well, it could
be made to work if SettableBeanProperty.deserialize() then checks
whether its deserializing a Collection or a Map, but this feel hacky.
Also, SettableBeanProperty#deserialize doesn't know on which bean to set
that property, so when resolution occurs you would still lack some
information to properly set the resolved object.

>
> Now, creation of exceptions is costly. It is probably possible to
> create a singleton Exception instance, and instead carry information
> in ThreadLocal (or such), if this proves to be very significant cost
> (I don't mind measurable lowish cost, like +10% for when feature is
> needed).
> Another smaller concern is regarding try-catch block for
> 'deserialize()', since this is cost that all execution would incur.
> But it may well be insignificant cost.
>
> The reason I share this idea is mostly because I probably will not
> have time to work on this for at least a month or so. But I think this
> is the most practical idea I have thought so far (in fact, the first
> one I consider worth implementing), and I thought that may be someone
> might be interested in implementing it.

In fact, I've been working on this on and off for the past week and this
morning I have a working solution for all cases I've encountered until
now: Bean properties, Collections and Maps. There's only one last thing
I need to implement for this to be complete and that's a check at the
end of processing to ensure that all forward references were resolved.

The basic idea is to store the missing information for #2 in
DeserializationContext. I've started with storing this in an attribute
like you first suggested, but it was getting out of hand, so I've
created a separate property. This property is, for all intent and
purpose, a Stack where extra information can be passed to
down-processing without altering the signature of methods. When an
unresolved id is encountered, the stack of information is peeked for the
most recent "Referring" which then register itself for an id. When the
id is resolved, the "Referring" is then called to insert the resolved
object at the proper location (a Bean property, in a collection or in a
map).

Note that there's some drawback to this solution: some bean properties
(and map entries) will be set twice; first with null, because
unresolved, then with the proper resolved value. Also, the current way
to resolve Id in Collection incurs a copy of an "accumulating"
collection into the result and that each time an unresolved id is resolved.

So, that's the basis of what I've been working on for this issue. I'll
give a spin to the Exception idea though, as that would fix some of the
drawback of my current solution.
>
> -+ Tatu +-
>
> --
> You received this message because you are subscribed to the Google
> Groups "jackson-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to jackson-dev...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.

Tatu Saloranta

unread,

Dec 9, 2013, 1:14:11 PM12/9/13

to jacks...@googlegroups.com

On Mon, Dec 9, 2013 at 7:16 AM, Pascal Gelinas <pascal....@gmail.com> wrote:

Comments inlined.

On 12/08/2013 12:33 AM, Tatu Saloranta wrote:

Ok, one of the most wanted features for Jackson is ability to handle forward-references for Object Ids. Currently, deserialization of Object Ids requires that definition of an Object Id (that is, object with its id is encountered before a reference; excluding limited cases where one-level reordering resolves problem).
While Jackson itself respects this invariant, and ensures that Object Ids are always written in such an order, this may be broken by other libraries and languages.

In fact, Jackson doesn't respect this invariant when @JsonIdentityReference(alwaysAsId = true) is used.

True. I left that detail out for sake of simplicity -- yes, you can break the invariant by this annotation.

The problem in resolving this problem is not so much the inability to keep track of unresolved references, but the separation of two parts:

1. Resolution of Object reference via id to actual Object
2. Assignment of Object to property

so at point where #1 is done, no information is available to retain to be able to do #2 when out-of-order reference is resolved.

One possibility would be to try to pipe more information through processing (property that is being assigned); but this can be invasive, and adds overhead for feature that is not really needed by majority of processing.

But I realized that by turning the problem "inside out", it should be possible to do the reverse: "push" information on unresolved reference up to assignment.
How? By doing something that at first seems like abuse of OO; throwing an exception with necessary information. But when you think about it, this is sort exceptional thing; expectation being that references can be resolved in regular flow.

What would this new Exception be? A sub-class of JsonProcessingException? I suspect you don't want to add another exception to the signature of method.

Correct, either subtype of that, or perhaps more likely a runtime exception unrelated to it.

So: SettableBeanProperty.deserialize() would be the place to catch exception (throw by ObjectIdReader for unsuccessful resolution), and store information about unresolved reference (including read Object Id and SettableBeanProperty instance; stored in DeserializationContext, so that unresolved refs can be caught in the end).
A hook would be also needed from the place where successful resolution occurs (place where Id is associated with instance, for future resolutions), to see if any unresolved ids exist.

This would work for bean properties, but not for Collection or Map deserialization, because they add a level of indirection. Well, it could be made

Similar handling would need to be added there.

to work if SettableBeanProperty.deserialize() then checks whether its deserializing a Collection or a Map, but this feel hacky. Also,

Right, that would not be pretty.

SettableBeanProperty#deserialize doesn't know on which bean to set that property, so when resolution occurs you would still lack some information to properly set the resolved object.

That would be problematic yes. I'll have to look into if and how this would be resolved.

Now, creation of exceptions is costly. It is probably possible to create a singleton Exception instance, and instead carry information in ThreadLocal (or such), if this proves to be very significant cost (I don't mind measurable lowish cost, like +10% for when feature is needed).
Another smaller concern is regarding try-catch block for 'deserialize()', since this is cost that all execution would incur. But it may well be insignificant cost.

The reason I share this idea is mostly because I probably will not have time to work on this for at least a month or so. But I think this is the most practical idea I have thought so far (in fact, the first one I consider worth implementing), and I thought that may be someone might be interested in implementing it.

In fact, I've been working on this on and off for the past week and this morning I have a working solution for all cases I've encountered until now: Bean properties, Collections and Maps. There's only one last thing I need to implement for this to be complete and that's a check at the end of processing to ensure that all forward references were resolved.

Sounds good, yes. One other case would be arrays (or maybe you are counting those under Collections).

The basic idea is to store the missing information for #2 in DeserializationContext. I've started with storing this in an attribute like you first suggested, but it was getting out of hand, so I've created a separate property. This property is, for all intent and purpose, a Stack where extra information can be passed to down-processing without altering the signature of methods. When an unresolved id is encountered, the stack of information is peeked for the most recent "Referring" which then register itself for an id. When the id is resolved, the "Referring" is then called to insert the resolved object at the proper location (a Bean property, in a collection or in a map).

Note that there's some drawback to this solution: some bean properties (and map entries) will be set twice; first with null, because unresolved, then with the proper resolved value. Also, the current way to resolve Id in Collection incurs a copy of an "accumulating" collection into the result and that each time an unresolved id is resolved.

So, that's the basis of what I've been working on for this issue. I'll give a spin to the Exception idea though, as that would fix some of the drawback of my current solution.

I like your approach. And I think you have same concerns wrt requiring changes to existing API; ideally improvements should be minimally intrusive.

The only concern I have is that cost of tracking "current POJO" should be minimal for cases where it is not needed (I don't mind some extra cost for cases where Object Id is known to be needed -- pay for what you need etc). Adding an explicit property is probably cheaper than use of general-purpose attributes, so it is probably good call.

-+ Tatu +-

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "jackson-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to jackson-dev+unsubscribe@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "jackson-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to jackson-dev+unsubscribe@googlegroups.com.

Pascal Gelinas

unread,

Dec 9, 2013, 1:41:22 PM12/9/13

to jacks...@googlegroups.com

On 12/09/2013 01:14 PM, Tatu Saloranta wrote:

Sounds good, yes. One other case would be arrays (or maybe you are counting those under Collections).

I don't have a specific unit test for arrays, I was under the impression it fell within the realm of CollectionDeserializer. I'll have to add a unit test for that.

The basic idea is to store the missing information for #2 in DeserializationContext. I've started with storing this in an attribute like you first suggested, but it was getting out of hand, so I've created a separate property. This property is, for all intent and purpose, a Stack where extra information can be passed to down-processing without altering the signature of methods. When an unresolved id is encountered, the stack of information is peeked for the most recent "Referring" which then register itself for an id. When the id is resolved, the "Referring" is then called to insert the resolved object at the proper location (a Bean property, in a collection or in a map).

Note that there's some drawback to this solution: some bean properties (and map entries) will be set twice; first with null, because unresolved, then with the proper resolved value. Also, the current way to resolve Id in Collection incurs a copy of an "accumulating" collection into the result and that each time an unresolved id is resolved.

So, that's the basis of what I've been working on for this issue. I'll give a spin to the Exception idea though, as that would fix some of the drawback of my current solution.

I like your approach. And I think you have same concerns wrt requiring changes to existing API; ideally improvements should be minimally intrusive.

The only concern I have is that cost of tracking "current POJO" should be minimal for cases where it is not needed (I don't mind some extra cost for cases where Object Id is known to be needed -- pay for what you need etc). Adding an explicit property is probably cheaper than use of general-purpose attributes, so it is probably good call.

Tracking of "current POJO" is only done in case where Object Id is known to be needed. For now, that means either the property itself has a @JsonIdentityInfo or that the value deserializer (property, collection or map values) has an ObjectIdReader. Without those, no information for Object Id resolution is added to the context.

One thing I forgot to mention is that for property based deserialization to work properly I had to create a new subclass to SettableBeanProperty, ObjectIdReferenceProperty, which is responsible for adding information in the context. Only when "deserializeAndSet" and "deserializeSetAndReturn" are called though, because otherwise the current POJO being deserialized isn't known. They are created during the "resolve" phase of a deserializer, only when required (again, either ObjectIdInfo is known or value deser has ObjectIdReader) and that was based on the work for Managed properties (ie, it wraps another property). One side-effect this has is that AnySetter properties are not part of the common resolution loop, so I'll have to add some custom processing for that too.

-+ Tatu +-

-+ Tatu +-

--
You received this message because you are subscribed to the Google Groups "jackson-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to jackson-dev...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "jackson-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to jackson-dev...@googlegroups.com.

For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "jackson-dev" group.

To unsubscribe from this group and stop receiving emails from it, send an email to jackson-dev...@googlegroups.com.

Tatu Saloranta

unread,

Dec 10, 2013, 2:09:02 PM12/10/13

to jacks...@googlegroups.com

On Mon, Dec 9, 2013 at 10:41 AM, Pascal Gelinas <pascal....@gmail.com> wrote:

On 12/09/2013 01:14 PM, Tatu Saloranta wrote:

Sounds good, yes. One other case would be arrays (or maybe you are counting those under Collections).

I don't have a specific unit test for arrays, I was under the impression it fell within the realm of CollectionDeserializer. I'll have to add a unit test for that.

There is some shared functionality, but since arrays and Collections can not really be handled generically, code paths differ. It should not be difficult to support arrays, just some code duplication at times; if Collections can be made to work, so can arrays.

The basic idea is to store the missing information for #2 in DeserializationContext. I've started with storing this in an attribute like you first suggested, but it was getting out of hand, so I've created a separate property. This property is, for all intent and purpose, a Stack where extra information can be passed to down-processing without altering the signature of methods. When an unresolved id is encountered, the stack of information is peeked for the most recent "Referring" which then register itself for an id. When the id is resolved, the "Referring" is then called to insert the resolved object at the proper location (a Bean property, in a collection or in a map).

Note that there's some drawback to this solution: some bean properties (and map entries) will be set twice; first with null, because unresolved, then with the proper resolved value. Also, the current way to resolve Id in Collection incurs a copy of an "accumulating" collection into the result and that each time an unresolved id is resolved.

So, that's the basis of what I've been working on for this issue. I'll give a spin to the Exception idea though, as that would fix some of the drawback of my current solution.

I like your approach. And I think you have same concerns wrt requiring changes to existing API; ideally improvements should be minimally intrusive.

The only concern I have is that cost of tracking "current POJO" should be minimal for cases where it is not needed (I don't mind some extra cost for cases where Object Id is known to be needed -- pay for what you need etc). Adding an explicit property is probably cheaper than use of general-purpose attributes, so it is probably good call.

Tracking of "current POJO" is only done in case where Object Id is known to be needed. For now, that means either the property itself has a @JsonIdentityInfo or that the value deserializer (property, collection or map values) has an ObjectIdReader. Without those, no information for Object Id resolution is added to the context.

Perfect.

One thing I forgot to mention is that for property based deserialization to work properly I had to create a new subclass to SettableBeanProperty, ObjectIdReferenceProperty, which is responsible for adding information in the context. Only when "deserializeAndSet" and "deserializeSetAndReturn" are called though, because otherwise the current POJO being deserialized isn't known. They are created during the "resolve" phase of a deserializer, only when required (again, either ObjectIdInfo is known or value deser has ObjectIdReader) and that was based on the work for Managed properties (ie, it wraps another property).

Makes sense.

One side-effect this has is that AnySetter properties are not part of the common resolution loop, so I'll have to add some custom processing for that too.

Right, AnySetter is yet another variation, and has in some cases lagged behind (that is, some expected features not working; I fixed polymorphic type handling for it in 2.3 for example).