Deserialize old XML after field removal

267 views
Skip to first unread message

Emilian Bold

unread,
Jun 17, 2021, 4:33:22 PM6/17/21
to XStream User
Hello,

I see the docs don't give much options on the subject:

> If a field is removed from the class, deserializing an old version that contains the field will cause an exception. Leaving the field in place but declaring it as transient will avoid the exception, but XStream will not try to deserialize it.

... but I can't believe it.

So, I have class Parent which contains a list of class Children which
contain a class Toy.

It *may* seems that I can use ignoreUnknownElements or omitField but
this basically only helps with serialization.

During de-serialization of old data, both methods will basically skip
the whole XML subtree!

What this means that if that if I have a reference to a 'Toy' object
that was serialized as part of a now ignored element/subtree, that
reference is never re-created. Later on, I will get an 'Invalid
Reference' error when the object is referenced from an acceptable
place.

So... is there really no way to just remove a field? Must one go back,
re-add them and make them transient? And keep them forever?

Ideally .omitField() would not serialize/deserialize that field. But
it could (maybe with yet another flag) instantiate all the object
subtree so that later references are valid.

--emi

Emilian Bold

unread,
Jun 17, 2021, 4:36:49 PM6/17/21
to XStream User
PS: Actually, transient does not work either. Marking a relevant field
with transient will also skip deserializing the whole object sub-tree
so a further valid reference will get an 'Invalid Reference' error
from XStream!

--emi

Jörg Schaible

unread,
Jun 17, 2021, 6:55:53 PM6/17/21
to XStream User

Am Donnerstag, 17. Juni 2021, 22:32:44 CEST schrieb Emilian Bold:

> Hello,

>

> I see the docs don't give much options on the subject:

> > If a field is removed from the class, deserializing an old version that

> > contains the field will cause an exception. Leaving the field in place

> > but declaring it as transient will avoid the exception, but XStream will

> > not try to deserialize it.

> ... but I can't believe it.

>

> So, I have class Parent which contains a list of class Children which

> contain a class Toy.

>

> It *may* seems that I can use ignoreUnknownElements or omitField but

> this basically only helps with serialization.



> During de-serialization of old data, both methods will basically skip

> the whole XML subtree!


If you omit a member that contains a list, then yes, the list any stuff it includes will be omitted.


> What this means that if that if I have a reference to a 'Toy' object

> that was serialized as part of a now ignored element/subtree, that

> reference is never re-created. Later on, I will get an 'Invalid

> Reference' error when the object is referenced from an acceptable

> place.


Yes. What else? XStream has no longer any idea what the omitted field originally contained. Was it a String, a list or another complex type? There's no information left.

 

> So... is there really no way to just remove a field? Must one go back,

> re-add them and make them transient? And keep them forever?


You can write a custom converter and handle the tag yourself. Let XStream unmarshal it into a list and ignore the result.


> Ideally .omitField() would not serialize/deserialize that field. But

> it could (maybe with yet another flag) instantiate all the object

> subtree so that later references are valid.


As said, XStream ignores the member field of a type. It is no longer relevant, what that field originally contained. An enhancement to XStream would be possible though, if you could pass the deserialization type together woth the field to omit as parameter. However, the implementaion would do the same as you could do with that custom converter.


Regards,

Jörg

Jörg Schaible

unread,
Jun 17, 2021, 7:00:38 PM6/17/21
to XStream User, Emilian Bold
On Thursday, 17. June 2021, 22:36:11 CEST Emilian Bold wrote:
> PS: Actually, transient does not work either. Marking a relevant field
> with transient will also skip deserializing the whole object sub-tree.

Yes, this is how it works. A transient field will no longer be part of any
serializatio/deserialization operation.

> so a further valid reference will get an 'Invalid Reference' error
> from XStream!

XStream was in first place designed to support Java to XML and back. Here
"back" no longer works (out of the box), because you modified the Java part.

Regards,
Jörg


Emilian Bold

unread,
Jun 17, 2021, 7:20:08 PM6/17/21
to Jörg Schaible, XStream User
Can a custom convertor work on a given field? As far as I can tell a
convertor works on a given type.

Since I'm getting 'Invalid reference' I haven't dropped that type
entirely from the object graph.

So... must the convertor now take care of all the instances of that
type but conditionally (based on some info from UnmarshallingContext?)
see it's actually the "removed" field and store it elsewhere? How does
storing it elsewhere help since I have `Object unmarshal()` which
means XStream expects an instantiated value and will use that value
somehow (supposedly, to set it on the removed field?).

OR, do I actually: keep the field, mark it as transient, write the
convertor (so that XStream is forced to visit the subtree), then have
the convertor return a dummy value (such as an empty list)? Seems like
a very round about way.

I somehow assumed that it's quite common for the Java side to change
and to have old XStream serializations around. Otherwise it means
XStream itself is designed for transient serializations. But transient
serializations can be accomplished with many other things starting
with basic Serializable.

--emi

Jörg Schaible

unread,
Jun 17, 2021, 8:50:59 PM6/17/21
to XStream User
Hi,

On Friday, 18. June 2021, 01:19:30 CEST Emilian Bold wrote:
> Can a custom convertor work on a given field? As far as I can tell a
> convertor works on a given type.

Yes.

> Since I'm getting 'Invalid reference' I haven't dropped that type
> entirely from the object graph.
>
> So... must the convertor now take care of all the instances of that
> type but conditionally (based on some info from UnmarshallingContext?)
> see it's actually the "removed" field

Yes. However, the idea is to derive from the ReflectionConverter - or
implements your Parent the Serializable interface?

> and store it elsewhere?

No. The UnmarshallingContext keeps track of the deserialized objects.

> How does
> storing it elsewhere help since I have `Object unmarshal()` which
> means XStream expects an instantiated value and will use that value
> somehow (supposedly, to set it on the removed field?).

The assumption does not apply.

> OR, do I actually: keep the field, mark it as transient, write the
> convertor (so that XStream is forced to visit the subtree), then have
> the convertor return a dummy value (such as an empty list)? Seems like
> a very round about way.

Something similar. Originally I thought, you can overwrite some method to
unmarshall simply a list, when it tries to look for the unknown element, but
there's no public hook to do that.

However, you can use a different workaround. You will need a helper class for
the migration. Use a static inner class of the custom converter that is
either derived from Parent or one that contains the same member fields
(including the inherited) in the same sequence (inherited first). It depends on
the case if the fields declared after the list already contain referencess the
some stuff that is in the list now:

=========== %< ============

class ParentConverter extends ReflectionConverter {
// either
static class MigrationParent extends Parent {
List list;
private Object readResolve() {
Parent parent = new Parent(...);
// setup parent with values of this instance
// it might be necessary to use the reflectionProvider
return parent;
}
}

// or
static class MigrationParent {
// inherited serialized fields
// serialized fields declared before the list
List list;
// serialized fields declared after the list
// add same readResolve method as above
}

ParentConverter(Mapper mapper, ReflectionProvider reflectionProvider) {
super(mapper, reflectionProvider, Parent.class);
}
Object unmarshal(HierarchicalStreamReader reader, UnmarshallingContext
context) {
MigrationParent mp = new MigrationParent();
// or reflectionProvider.newInstance(MigrationParent.class);
mp = (MigrationParent)doUnmarshal(mp, reader, context);
return serializationMembers.callReadResolve(mp);
}
}

xstream.registerConverter(new ParentConverter(xstream.getMapper(),
xstream.getReflectionProvider());

=========== %< ============

In the readResolve method of the MigrationParent you can create the correct
Parent instance with all the data that is collected by the helper class. If
you have to use the reflectionProvider of the converter, provide it as ctor
argument to the MigrationParent and keep it in a transient field.

> I somehow assumed that it's quite common for the Java side to change
> and to have old XStream serializations around.

Yes. However, you're the first in more than 15 years, who omits stuff that is
referenced later on.

> Otherwise it means
> XStream itself is designed for transient serializations. But transient
> serializations can be accomplished with many other things starting
> with basic Serializable.

XStream mimics Java serialization as close as possible apart from the fact
that is handles also non-serializable types.

Regards,
Jörg


Jörg Schaible

unread,
Jun 17, 2021, 8:54:32 PM6/17/21
to XStream User
You actually don't have to use the readResolve at all. Simply create the
Parent in unmarshal and setup the instance there.

Emilian Bold

unread,
Jun 17, 2021, 10:21:04 PM6/17/21
to XStream User
> Yes. However, you're the first in more than 15 years, who omits stuff that is 
> referenced later on.

Indirectly referenced.

I'd expect that XML serialization is depth first. So, a deep reference will first be serialized under the 1st top-level object reached. Any further reference will just use the reference identifier of the previously serialization.

So, I'm surprised this never came up unless people just topologically sort their object tree somehow. Even on something simple like

class A {
 List<Item> elements;
 Item lastElement;
}

getting rid of 'elements' will hit this problem since 'lastElement' will reference an object in the list.

Thank you for the code snippets. I will look over them and experiment.

--emi


--
You received this message because you are subscribed to the Google Groups "XStream User" group.
To unsubscribe from this group and stop receiving emails from it, send an email to xstream-user...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/xstream-user/6348766.Wl00Ti0lHA%40floh.
Reply all
Reply to author
Forward
0 new messages