Why should value-based objects not be serialized?

4,240 views
Skip to first unread message

Michael Piefel

unread,
Aug 29, 2016, 5:20:03 AM8/29/16
to SonarQube
Hallo everyone,

I’m struggling to even understand the purpose of the rule squid:S3437 “Value-based objects should not be serialized”.

The rule’s description refers to the docs which state it’s not OK to distinguish references to value-based classes, which is fine. It then continues to just say it will flag serialization of such classes. To me, there is no connection at all between these. Serialization it not reference comparison.

So, what is the real point of this rule? It’s also very hard to comply with this rule; the compliant example just uses transient, which of course changes the semantics quite dramatically.

G. Ann Campbell

unread,
Aug 29, 2016, 8:47:05 AM8/29/16
to SonarQube
Hi Michael,

The experts advise not to serialize value-based objects. Here's where Brian Goetz weighs in on that. The rule was written to help enforce that, but of course you're free to disable it.


Ann

Michael Piefel

unread,
Aug 29, 2016, 10:54:16 AM8/29/16
to SonarQube
OK, I still don’t get it. In his Stack Overflow answer Brian Goetz writes – rather too briefly, in my opinion – that serialization uses identity hash code. That does not answer the question, even if it is the accepted answer there. Or rather, it answers the very specific questions at the end of the OP’s text instead of the more general question in the headline.

What would be the dangers here? What topology or circular dependencies await? Looking at the source code, the developers went quite a mile to perform sensible serialization and deserialization in the runtime library, using advanced writeReplace() and what-not. And BTW, note that LocalDate.now() returns a fresh instance on each call, so people are quite used to not relying on identity here.

You are right, Ann, in that I can disable the rule if it doesn’t seem fit. But it is actually enabled by default, and I would expect rules to be the default that actually help increase code quality. Replacing the serialization from the standard library with something self-made increases code size considerably and is unlikely to be actually any better (likely worse, in my experience).

Paul Wagland

unread,
Aug 29, 2016, 11:37:00 AM8/29/16
to SonarQube
Imagine that you have a class that has two references to a value. These might both be pointing to the same value object. What Brian was saying in his answer is that serialization/deserialization will not guarantee that you would end up with the same topology of the result. Specifically, your de-serialized object might point to two seperate value objects. This might happen, since identityHashMap won't (might not) behave as expected for value objects.

Now, imagine that you have two value objects, which point to each other in a loop, via intermediate mutable objects. In this scenario, there is no guarantee that serialisation will ever finish, since it may not be able to detect that it has already serialised this object previously, and so just continues going around the loop. Although the mutable object in the middle, which cannot be a value based object is probably enough to stop the infinite loop. However, you may still end up with A → Mᴬ → B → A₁ → Mᴬ → B → A₁ instead of the expected A → Mᴬ → B → A.

Michael Piefel

unread,
Aug 30, 2016, 4:13:54 AM8/30/16
to SonarQube
 Thank you, Paul, that’s really a helpful explanation. So we cannot rely on the serialization/deserialization to create exactly the same object graph.

However, if I understand the intent of value types in future Java versions correctly, we cannot rely on object graphs with value types in them anyway, since the VM is allowed to replace values with wrappers and vice versa as it sees fit. And that is quite OK, since we ourselves should not rely on object identity when handling value types. In fact, we shouldn’t rely much on object identity anyway, should we, this is what S1698 is all about.

Back to my original question: Why shouldn’t we serialize value-based types? The error is relying on object identity, not serializing the values. Specifically, S3437 disallows serialization of LocalDate and friends, so we only have to fear Paul’s first scenario, not the second.

And, as another question that nobody has addressed so far: What should be done instead?


PS: The JavaDoc fpor LocalDate states “This is a value-based class; use of identity-sensitive operations (including reference equality (==), identity hash code, or synchronization) on instances of LocalDate may have unpredictable results and should be avoided. The equals method should be used for comparisons.” – no mention of serialization at all.

Nicolas Peru

unread,
Sep 5, 2016, 2:23:57 AM9/5/16
to Michael Piefel, SonarQube
Hi, 

Michael I'm not really sure I get your main problem here. Your last PS states the problem : you should use equals for comparison of LocalDate and not ref equality BUT, as stated in the SO question refered and what is explained by Paul, serialization relies heavily on ref equality and as such you're in for trouble if you are serializing such fields.
This is why the rule detects when you are actually serializing value based class like LocalDate. 

What should be done ? this is not a question that can be solved in a general way as far as I understand : this is why we recommend you to not serialize this field by making it transient. If you really need to get that info serialized, up to you to come with a solution to that issue (custom write, using a type which is not value based,, etc.).

HTH,
Cheers, 


--
You received this message because you are subscribed to the Google Groups "SonarQube" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonarqube+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/sonarqube/0345bb9b-6543-4fb9-8922-618adea1e678%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.
--
Nicolas PERU | SonarSource
Senior Developer
http://sonarsource.com

Michael Piefel

unread,
Sep 5, 2016, 4:32:33 AM9/5/16
to SonarQube, michael...@welldoo.com


Am Montag, 5. September 2016 08:23:57 UTC+2 schrieb Nicolas Peru:
[…] you're in for trouble if you are serializing such fields.
This is why the rule detects when you are actually serializing value based class like LocalDate. 

But which trouble? Really, I cannot see any. I asked around my colleagues, and they do not understand either. What is the trouble with serializing LocalDate? The answer “serialization relies heavily on identity” is not enough. OK, so it relies on identiy, but why is that a problem? In which scenario will serializing an object that contains a LocalDate and subsequently deserializing it cause trouble for the user of the deserialized object? Apart, of course, if the user of said object relies on object identity which they should never ever have done in the first place.

 
[…] we recommend you to not serialize this field by making it transient. If you really need to get that info serialized, up to you to come with a solution to that issue (custom write, using a type which is not value based,, etc.).

 I think that is bad advice. Adding transient is an easy fix which will immediately break any program using the deserialized object, except when carefully writing you own replacements. And using a type which is not value based would be going back to java.util.Date then? That’s not an improvement.


Paul Wagland

unread,
Sep 5, 2016, 5:11:00 AM9/5/16
to SonarQube, michael...@welldoo.com
The issue is more complex than that. The issue is that when you have two LocalDate objects, ld1 and ld2, then (ld1 == ld2) is undefined. That is, if it is perfectly permissible for (ld1.equals(ld2) == true) and (ld1 != ld2 == true). This would break serialisation horribly since two different objects could be mistakenly assumed to be the same. Or it might work. Or it might work sometimes, but not others. That's the great thing about undefined.

Realistically, you should be able to serialise these types of objects, however the spec, as currently written, does not allow you to do that safely.

Feel free to disable the rule if you are prepared to live with that risk. And you might well say that of course this will always work, but then you have to question why the docs explicitly deny that fact ;-)

Michael Piefel

unread,
Sep 5, 2016, 9:28:38 AM9/5/16
to SonarQube, michael...@welldoo.com


Am Montag, 5. September 2016 11:11:00 UTC+2 schrieb Paul Wagland:
The issue is more complex than that. The issue is that when you have two LocalDate objects, ld1 and ld2, then (ld1 == ld2) is undefined.


Rather, will be undefined in the future. For Java 8, there actually are no value-types. You might well consider not applying any value-base-type rule to code with language-level 8. On the other hand, it is of course great to be warned of possible issues long ahead.


 
That is, if it is perfectly permissible for (ld1.equals(ld2) == true) and (ld1 != ld2 == true). This would break serialisation horribly since two different objects could be mistakenly assumed to be the same. Or it might work. Or it might work sometimes, but not others. That's the great thing about undefined.


 I really feel stupid now – but how will it break serialization? Two different LocalDate objects that happen to represent the same day may end up unified during serialization or later during deserialization; what does it matter? Or, two references to the same instance may actually come out of the process as two different instances; but what does it matter?


Paul Wagland

unread,
Sep 5, 2016, 12:47:14 PM9/5/16
to SonarQube, michael...@welldoo.com


On Monday, September 5, 2016 at 3:28:38 PM UTC+2, Michael Piefel wrote:


Am Montag, 5. September 2016 11:11:00 UTC+2 schrieb Paul Wagland:
The issue is more complex than that. The issue is that when you have two LocalDate objects, ld1 and ld2, then (ld1 == ld2) is undefined.


Rather, will be undefined in the future. For Java 8, there actually are no value-types. You might well consider not applying any value-base-type rule to code with language-level 8. On the other hand, it is of course great to be warned of possible issues long ahead.

Indeed. This is a potential future problem, not an actual now problem.
  
That is, if it is perfectly permissible for (ld1.equals(ld2) == true) and (ld1 != ld2 == true). This would break serialisation horribly since two different objects could be mistakenly assumed to be the same. Or it might work. Or it might work sometimes, but not others. That's the great thing about undefined.

 I really feel stupid now – but how will it break serialization? Two different LocalDate objects that happen to represent the same day may end up unified during serialization or later during deserialization; what does it matter? Or, two references to the same instance may actually come out of the process as two different instances; but what does it matter?

So the practical result, as opposed to “undefined behaviour means that it is allowed to format your hard drive” arguments, is that if you have an Optional you might not be able to re-instantiate it to be of the right type. The 'its undefined behaviour, so this is possible but probably not likely' answer is that you could have two Optional values that represent very different things ending up getting combined, since Serialization might mistakenly think that they are the same. Imagine, for example, one Optional that was "null", and the other that held a real value. If these two got 'combined', you would almost certainly not be happy!

Cheers,
Paul 

Michael Piefel

unread,
Sep 6, 2016, 2:54:09 AM9/6/16
to SonarQube, michael...@welldoo.com
Hi Paul,


Am Montag, 5. September 2016 18:47:14 UTC+2 schrieb Paul Wagland:
[…] is that if you have an Optional you might not be able to re-instantiate it to be of the right type.


We were talking about LocalDate here, which is infinitely less dangerous, and also, as opposed to Optional, actually serializable.

 
[…] you could have two Optional values that represent very different things ending up getting combined, since Serialization might mistakenly think that they are the same. Imagine, for example, one Optional that was "null", and the other that held a real value. If these two got 'combined', you would almost certainly not be happy!

For what it’s worth, “undefined behaviour” does not mean “anything and all can happen”. Javadoc for value-based types rather talks of “unpredictable results”, which is closer to the truth, but really all it means here is what you yourself wrote earlier: It is perfectly permissible for ld1.equals(ld2) == true whether the references compare equal or not (which is actually not surpring at all), and also for ld1 != ld2 and ld1 == ld2 to hold at different times during execution, possibly flipping on each test or within different threads. It does of course not mean that it is allowed that ld3.equals(ld4) == false && ld3 == ld4.

The scenario you describe, however, cannot happen ever, since the two Optional values are neither the same nor are they even equal!

If serialization relied solely on equality, then a bunch of equal LocalDate instances might end up unified (so what?), and if it relied solely on identity, some references to the same LocalDate instance might afterwards point to different instances (so what?), or any combination of these cases.

Michael Piefel

unread,
Sep 20, 2016, 8:12:22 AM9/20/16
to SonarQube, michael...@welldoo.com
I’m going to raise an issue in the bug tracker then, since I’m still convinced that this borders on a false positive. At the very least this rule should not be enabled in the “Sonar Way”, since it actually promotes making your code less stable.

Nicolas Peru

unread,
Sep 23, 2016, 3:08:29 AM9/23/16
to Michael Piefel, SonarQube
Hi Michael, 

This is not a false positive. Serializing a value type _will_ (because JVM 8 won't do anything as of today) break serialization as it will break its contract. 
Have a look at this answer of B.Goetz on the valhalla ML : http://mail.openjdk.java.net/pipermail/valhalla-dev/2015-February/001043.html to get a better understanding of why. It completes its answer on stackoverflow : it all boils down to serialized graph topology.
I admit that as of today it is a bit of a theoretical problem, however you can't state it is a false positive when the rue actually enforce the documentation recommendation. 

Cheers, 

Le mar. 20 sept. 2016 à 14:12, Michael Piefel <michael...@welldoo.com> a écrit :
I’m going to raise an issue in the bug tracker then, since I’m still convinced that this borders on a false positive. At the very least this rule should not be enabled in the “Sonar Way”, since it actually promotes making your code less stable.

--
You received this message because you are subscribed to the Google Groups "SonarQube" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonarqube+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Michael Piefel

unread,
Sep 23, 2016, 4:41:54 AM9/23/16
to SonarQube, michael...@welldoo.com
I’m sorry, Nicolar, but I disagree on two accounts here.


Am Freitag, 23. September 2016 09:08:29 UTC+2 schrieb Nicolas Peru:
Hi Michael, 

This is not a false positive. Serializing a value type _will_ (because JVM 8 won't do anything as of today) break serialization as it will break its contract. 

Let’s assume it will cause breakage in Java 9 – it is still debatable whether it is a good idea to mark this as a problem, and a “critical” at that, when it cannot harm today. It is tagged as java8 – why? It works it Java 8, or if you argue that it is broken in Java 8, it’s broken in Java 7 as well. If it had a tag java9, that would be a different story, as then people could easily recognize forward-compatibility problems, or ignore them when their platform is guaranteed to stay at the current version for many years to come.

 
Have a look at this answer of B.Goetz on the valhalla ML : http://mail.openjdk.java.net/pipermail/valhalla-dev/2015-February/001043.html to get a better understanding of why. It completes its answer on stackoverflow : it all boils down to serialized graph topology.

I have to admit that even though I passed my OCP for Java 7, I was not even aware that serialization and deserialization had this contract of “will not be aliased” and “*will* be aliased”, that is, that it maintains the exact object topology at all costs. So I never relied on that. And I know nobody who did. But that, of course, is not proof that there might not be code actually relying on it somewhere.

You may argue that it is wrong to rely on the topology of an object graph that contains value-based types. Nobody should rely on that. Documentation even clearly states that the JVM may decide at any time to alias two equal instances, or even at different times to actually unalias two references to the same object. It can happen at any time. So even if the serialization completely maintains the topology of the object graph at the moment it is serialized, you have no guarantee whatsoever that this object was not changed internally just prior to serialization, and you have no guarantee that it will not change shortly after deserialization, and thus, it doesn’t matter at all whether (de-)serialization added it’s own aliasing.

 
I admit that as of today it is a bit of a theoretical problem, however you can't state it is a false positive when the rue actually enforce the documentation recommendation. 

I believe the documentation recommendation is wrong here. Also, even that does not say “don’t ever serialize a LocatDate”.

One more thing: You don’t even know for sure (because it’s not documented) whether LocalDate.now() doles out fresh instances or caches them somehow. And you shouldn’t ever care. Rule squid:S3437 (and that is what started this thread) specifically talks about all the different date and time classes. They are leaves in every object graph and no-one relies on their identity.

I even think that this rule does harm. The compliant solution just adds transient, which leads to NPEs if followed blindly. To correctly fix this, we will have to add our own serialization to the containing classes, resulting in duplicated code all over. It will even duplicate the code that is already present in the Java runtime, and is very likely to be loads worse, and have new bugs of its own. Almost guaranteed, it will also in the end produce random object graphs when used with Java 9, not improving the situation a bit.

Given the clear opinions stated here by SonarSource developers, however, I see no point raising a JIRA issue anymore.


tl;dr

Given that S3437 may actually cause harm; is only about a theoretical problem; is only about forward-compatibility and not an issue today; I think it should not be part of the default “Sonar Way” rule set. If at all, it should be non-default and tagged java9.


Regards,
     Michael

yohan...@eiosis.com

unread,
Sep 13, 2017, 7:56:22 AM9/13/17
to SonarQube
Hi everyone,

I know I'm digging the thread out, but it seems the problem is not solved : I'm a little bit frightened by the answers made to M. Piefel here.
A lot of people using SonarQube are probably analyzing code which use JPA. With entities containing java8 dates.

This includes myself, and I still can't figure what we are supposed to do with this problem ? Every soft using JPA will break in Java9 ?

Nicolas Peru

unread,
Sep 29, 2017, 11:40:57 AM9/29/17
to yohan...@eiosis.com, SonarQube
Hi, 

This conversation relates to a problem that _could_ only become one with value based object which is not part of Java9 but part of project valhala : http://openjdk.java.net/projects/valhalla/ which was thought about for java 10 (or whatever version number this one will have, likely 18.3).

Cheers, 

--
You received this message because you are subscribed to the Google Groups "SonarQube" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sonarqube+...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.
--
Nicolas Peru | SonarSource
Reply all
Reply to author
Forward
0 new messages