alternative to ObjectOutputStream

669 views
Skip to first unread message

Ted Yu

unread,
Sep 25, 2015, 12:36:53 PM9/25/15
to guava-discuss
Hi,
Since ObjectOutputStream keeps strong references, it is prone to OOME when serializing certain objects.

Is there alternative to ObjectOutputStream which is more resilient to memory pressure ?

Thanks

Louis Wasserman

unread,
Sep 25, 2015, 12:49:17 PM9/25/15
to Ted Yu, guava-discuss
Guava certainly doesn't have anything along those lines, but I confess I would be surprised if anything other than ObjectOutputStream existed, at least aside from entirely different serialization libraries.  ObjectOutputStream is deeply baked into Java's built-in serialization mechanism, and I don't believe it's intended that other implementations could be swapped in.

--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/guava-discuss/CALte62zd6wWVrLZ5-s52KipD%3DFdh3_BmZ_hCT8gFm0%2BhEhzaCQ%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Éamonn McManus

unread,
Sep 25, 2015, 2:59:41 PM9/25/15
to guava-discuss
I second what Louis said about its being unlikely for there to be an alternative that is not a completely different serialization mechanism. ObjectOutputStream keeps a reference to every object X that it has serialized so that if the object is serialized again it can just say "and this is just X again". When you deserialize, just one X will be created and all references will point to it. Depending on what you are doing, you might be able to use reset() to discard the saved references. That does mean that if any object is referenced again after the reset(), it will be deserialized as a new copy.

Luke Sandberg

unread,
Sep 25, 2015, 3:03:27 PM9/25/15
to Éamonn McManus, guava-discuss
seems like ObjectOutputStream should hold weak references,  if the gc clears it, then there is no way to reserialize the same object again.

--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss
 
This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
---
You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.

Éamonn McManus

unread,
Sep 25, 2015, 3:09:06 PM9/25/15
to guava-discuss, emcm...@google.com
That is an excellent point. It doesn't currently hold weak references but it seems as if a compatible implementation could. I'm pretty sure that would be out of scope for Guava, though.

Osvaldo Doederlein

unread,
Sep 25, 2015, 3:10:10 PM9/25/15
to Éamonn McManus, guava-discuss
The GC won't clear a weak references for objects that are still strongly-reachable. I don't see the point of an ObjectOutputStream that would use weak references, this would have higher overhead for no gain.


For more options, visit https://groups.google.com/d/optout.



--
Osvaldo Doederlein | Software Engineer, DoubleClick Ad Exchange | opi...@google.com

Luke Sandberg

unread,
Sep 25, 2015, 3:11:39 PM9/25/15
to Osvaldo Doederlein, Éamonn McManus, guava-discuss
the gain would be for exactly the case where OOS is causing the leak (it is the last strong reference), which is what the OP cared about.

but yeah, out of scope for guava

Louis Wasserman

unread,
Sep 25, 2015, 3:16:08 PM9/25/15
to Luke Sandberg, Osvaldo Doederlein, Éamonn McManus, guava-discuss
Yup, I think I follow Luke's argument and agree -- ObjectOutputStream holds a strong reference to everything passed through it, so if nothing else is holding a reference to an object besides the ObjectOutputStream, that object should be eligible for GC, which is pretty much exactly how weak references work.

It sounds like a really nice upstream JDK patch, though.

Éamonn McManus

unread,
Sep 25, 2015, 3:16:40 PM9/25/15
to guava-discuss, emcm...@google.com
My assumption was that the original poster's use case was one where objects can become unreachable during serialization. That could happen if the ObjectOutputStream is open for some time, for example if it is being used to communicate with a peer rather than to save a serialized object for later reconstruction. We had this issue with the JMXMP protocol, for example. We ended up calling reset() all the time, but we might not have had to if WeakReferences were used.

Osvaldo Doederlein

unread,
Sep 25, 2015, 6:07:35 PM9/25/15
to Éamonn McManus, guava-discuss
I still don't get the problem–how would an OOS help produce an OOME? An OOS is a temporary object which lifecycle is a subset as that of the "input" object being serialized; there's no point in avoiding extra references inside the OOS while the input is still pinned to the heap.

Well, of course you can imagine a scenario where the input uses weak references to hold some of its children, but that would mean that the result of the serialization is not well-defined; some of the data may or may not be included in the stream depending on your luck with the garbage collector. This almost certainly means that you are serializing things you shouldn't, such as internal caches, or lazily-evaluated derived fields that can be just recreated after desserialization. I'd be very curious to see any real-world scenario that benefits from WeakReference-enabled OOS.




For more options, visit https://groups.google.com/d/optout.

Luke Sandberg

unread,
Sep 25, 2015, 6:12:24 PM9/25/15
to Osvaldo Doederlein, Éamonn McManus, guava-discuss
it is for when you are serializing more than one object over a longer timespan to the same OOS.

e.g.
ObjectOutputStream oos = ...
while (someCondition()) {
  Object o1 = calculateSomeLargeObject();
  oos.write(o1);
  oos.flush();
  // o1 should be able to be collected now, but OOS holds a reference
}


Louis Wasserman

unread,
Sep 25, 2015, 6:12:51 PM9/25/15
to Osvaldo Doederlein, Éamonn McManus, guava-discuss
I think the scenario is

ObjectOutputStream out = new ObjectOutputStream();
for (int i = 0; i < 100; i++) {
    HeavyweightObject obj = getHeavyweightObject();
    out.writeObject(obj);
}

...where the ObjectOutputStream ends up holding strong references to 100 heavyweight objects, even though they won't be used in future.  ObjectOutputStream apparently holds strong references to every distinct object it's outputted since it was created or the last reset() call, because serialization dedupes objects by identity -- except that if an object has no strong references outside the ObjectOutputStream, there's no way it could match identities with a newly serialized object.

Osvaldo Doederlein

unread,
Sep 25, 2015, 7:30:43 PM9/25/15
to Louis Wasserman, Éamonn McManus, guava-discuss
Oh I see. But using a single serialized stream for lots of objects is a Bad Idea, even if you're not doing that for persistence. [If you are persisting that stream in any way, it's criminal :)] Optimizing OSS is fixing the symptom; if your stream of objects is big enough so retention of stuff by the OOS's canonicalization map impacts memory pressure, that's a big red flag that you shouldn't be using serialization at all. The PL research community has put automatic object serialization in the list of bad ideas some two decades ago, although new languages keep supporting the feature because it's so convenient. (It's actually not bad in Javascript/JSON, because it avoids all the complication -- no effort wasted handling shared objects or cycles, no support for customization and no attempt to produce an efficient binary stream so people are not temped to abuse the thing for anything bigger than RPC calls).

Joachim Durchholz

unread,
Sep 26, 2015, 3:28:45 AM9/26/15
to guava-...@googlegroups.com
Am 26.09.2015 um 01:30 schrieb 'Osvaldo Doederlein' via guava-discuss:
> if your stream of objects is big enough so retention of
> stuff by the OOS's canonicalization map impacts memory pressure, that's a
> big red flag that you shouldn't be using serialization at all.

There are scenarios where it's different.
E.g. serializing a stream of versions of a large copy-on-write object graph.
This could be used for backups, or for failover-to-standby scenarios, in
situations where a rare loss of work from a limited period of time is
preferrable to regular loss of availability (say, game servers, or
compute servers that feed a continuous but interruptible process).

> The PL
> research community has put automatic object serialization in the list of
> bad ideas some two decades ago,

Is there a reference to that?
I've been loosely following that community, but that particular
conclusion slipped by me. I'd like to check what assumptions this
conclusion was bound to, and whether they still hold today.

Ted Yu

unread,
Sep 26, 2015, 4:09:27 PM9/26/15
to Luke Sandberg, Éamonn McManus, guava-discuss
Thanks for the comments.

 1: /* ObjectOutputStream.java -- Class used to write serialized objects
   2:    Copyright (C) 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2006
   3:    Free Software Foundation, Inc.

If I clone ObjectOutputStream.java changing strong reference to weak reference, would the license term be compatible with Apache License ?

Cheers

Osvaldo Doederlein

unread,
Sep 27, 2015, 10:32:22 PM9/27/15
to Joachim Durchholz, guava-discuss
On Sat, Sep 26, 2015 at 3:28 AM, Joachim Durchholz <j...@durchholz.org> wrote:
Am 26.09.2015 um 01:30 schrieb 'Osvaldo Doederlein' via guava-discuss:
if your stream of objects is big enough so retention of
stuff by the OOS's canonicalization map impacts memory pressure, that's a
big red flag that you shouldn't be using serialization at all.

There are scenarios where it's different.
E.g. serializing a stream of versions of a large copy-on-write object graph.
This could be used for backups, or for failover-to-standby scenarios, in situations where a rare loss of work from a limited period of time is preferrable to regular loss of availability (say, game servers, or compute servers that feed a continuous but interruptible process).

Well the best solution for that is usually protobuf :) But if you need to support non-proto objects which contain sharing and cycles, you can still customize Java serialization and often achieve significant advantages in both speed and stream size compared to default serialization. Avoiding the cost of sharing/canonicalization when you don't need it (or doing it manually) is one of many tricks that help optimizing serialization. go/effectivejava has a whole chapter on Java Serialization, you need to check this if your use of serialization is sufficiently heavy or sophisticated that you're running into problems like OOME and considering solutions like customizing ObjectOutputStream.
 
 


> The PL
research community has put automatic object serialization in the list of
bad ideas some two decades ago,

Is there a reference to that?
I've been loosely following that community, but that particular conclusion slipped by me. I'd like to check what assumptions this conclusion was bound to, and whether they still hold today.


That's a hard question. :) I remember reading some compendium about either OODBs or GC that lambasted serialization, that was in 1999~2000 while doing my MSc. There's a good paper specific to Java that I could recall and find, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.6548&rep=rep1&type=pdf.  But the most important signal is that Serialization is a dead research subject; nobody works on this stuff anymore, nether in academia nor in the industry.  Most products / frameworks / research that relied critically on serialization like Jini, PJama, Prevayler, RMI etc. went nowhere. (I think the single holdout is EJB, which is particularly blessed by RMI-IIOP...)  Java's serialization API, and its implementation, have not changed in years, it's a dead-end technology so nobody is interested to invest on it even if there are still some opportunities for improvement.

 


--
guava-...@googlegroups.com
Project site: https://github.com/google/guava
This group: http://groups.google.com/group/guava-discuss

This list is for general discussion.
To report an issue: https://github.com/google/guava/issues/new
To get help: http://stackoverflow.com/questions/ask?tags=guava
--- You received this message because you are subscribed to the Google Groups "guava-discuss" group.
To unsubscribe from this group and stop receiving emails from it, send an email to guava-discus...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Joachim Durchholz

unread,
Sep 28, 2015, 6:29:26 AM9/28/15
to guava-...@googlegroups.com
Am 28.09.2015 um 04:32 schrieb 'Osvaldo Doederlein' via guava-discuss:
> On Sat, Sep 26, 2015 at 3:28 AM, Joachim Durchholz <j...@durchholz.org> wrote:
>
>> Am 26.09.2015 um 01:30 schrieb 'Osvaldo Doederlein' via guava-discuss:
>>
>> There are scenarios where it's different.
>> E.g. serializing a stream of versions of a large copy-on-write object
>> graph.
>> This could be used for backups, or for failover-to-standby scenarios, in
>> situations where a rare loss of work from a limited period of time is
>> preferrable to regular loss of availability (say, game servers, or compute
>> servers that feed a continuous but interruptible process).
>
> Well the best solution for that is usually protobuf :) But if you need to
> support non-proto objects which contain sharing and cycles, you can still
> customize Java serialization and often achieve significant advantages in
> both speed and stream size compared to default serialization.

Well, I have yet to hear anybody recommending doing that in anger.
Default serialization isn't good, and adapting it is awkward and
error-prone - there's a reason why people still build serialization
libraries for Java :-)

> Avoiding the
> cost of sharing/canonicalization when you don't need it (or doing it
> manually) is one of many tricks that help optimizing serialization.
> go/effectivejava has a whole chapter on Java Serialization, you need to
> check this if your use of serialization is sufficiently heavy or
> sophisticated that you're running into problems like OOME and considering
> solutions like customizing ObjectOutputStream.

Heh.
Well, the project that's sparking my interest in these issues is
more-or-less shelved, so I'm in no hurry to decide anything.

Kryo has been looking best to me: fast, bloat-avoiding, can deal with
sharing (and consequentially cycles), comes with readymade options for
most of the optimizing tricks one would want to do.
Can't guarantee for it since I haven't used it in anger yet, but at
least the project is getting its priorities right.

>>> The PL
>>> research community has put automatic object serialization in the list of
>>> bad ideas some two decades ago,
>>
>> Is there a reference to that?
>> I've been loosely following that community, but that particular conclusion
>> slipped by me. I'd like to check what assumptions this conclusion was bound
>> to, and whether they still hold today.
>
> That's a hard question. :) I remember reading some compendium about either
> OODBs or GC that lambasted serialization, that was in 1999~2000 while doing
> my MSc. There's a good paper specific to Java that I could recall and find,
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.36.6548&rep=rep1&type=pdf.

Ah I see.
It's going through several issues, of which some exist at the JVM level,
some in Java, some just in Serializable. Some points aren't actually a
problem specific to serialization; e.g. section 8 is essentially a
rehash of the well-known fact that requiring read consistency requires
immutability, copy-on-write, or locking.

I think the issues with Serializable can be solved by using a different
library (Kryo and Protobuf have been mentioned).
The other issues need to be worked around, and that seriously limits the
usefulness of serialization in Java.

> But the most important signal is that Serialization is a dead research
> subject; nobody works on this stuff anymore, nether in academia nor in the
> industry.

Yeah, getting serialization right does require some serious thinking and
design to get the semantics right.
Issues I have encountered involve:
* Dealing with once-per-JVM stuff: singletons, static data, file handles.
* Mutability. You can't serialize a mutable object to another JVM and
expect the semantics to remain unchanged. You need to serialize a proxy
instead.
* Class/data versioning. You'll need conversion code to upgrade
old-class data to new classes, and that code would need to be validated
because it's really easy to hide nasty bugs in that.
* Bytecode generation at runtime combines mutability and class/data
versioning issues.

Much of that is hard or impossible to do in Java as it is, so I agree
it's mostly dead in Java.
I'm somewhat suprised that programming language research has given up on
this. None of the points above come without a good solution, with the
excepction of mutability if you have a proxy and an unreliable
authoritaty object; even that is a bit surprising, PL researchers tend
to dislike mutability anyway.

> Java's serialization API, and its implementation, have not changed in
> years, it's a dead-end technology so nobody is interested to invest on it
> even if there are still some opportunities for improvement.

Heh. I certainly wouldn't want to invest into Serializable myself.
Reply all
Reply to author
Forward
0 new messages