Nonrecursive Kryo: Analysis

5 views
Skip to first unread message

Jo Durchholz

unread,
Apr 25, 2024, 4:44:35 AMApr 25
to kryo-...@googlegroups.com
Hi all,

continuing with the idea.

In this thread, I want to pursue aspects that (should) have an unanimous
answer, so everybody can be on the same boat.

Here's my analysis.
It likely contains mistakes and misunderstandings, so please let me know
which you see ;-)

Oh, and apologies for the amount of text.
I hope it's worth reading anyway.

SERIALIZERS

1) The recursion happens between Serializer#write(), #read(), and
#copy() on one hand, and the Kryo instance on the other hand.

2) #read() and #copy() should follow the same principles as #write(), so
it is enough to analyze #write().
This is not entirely true for #read() which cannot follow the object
graph but has a byte stream to parse, but Kryo's setup already handles that.

3) Serializer#write() is the single point through which each recursion
goes, that's why I am focusing on Serializer, to make analysis and
design easier and hence more reliable.

4) Serializer is an abstract class, with ~90 implementations provided by
Kryo, and an unknown number of implementations in applications.
Since applications can provide their own subclasses, the public and
protected functions in Serializer and how they interact are part of
Kryo's public API.

5) ~20 Kryo-provided Serializer subclasses are potentially recursive.

6) It is completely possible to migrate just some of the Serializer
subclasses to nonrecursion.
This offers a whole bunch of useful options:
- Migrate just a single subclass, e.g. for proofs of concept.
- Migrate Kryo's Serializers incrementally.
- Validate each migrated Serializer independently.
- Allow applications to keep their recursive Serializers if they do not
have to.
- Allow applications to migrate their recursive Serializers on their own
schedule, and to upgrade Kryo without upgrading their Serializers.

STACK

7) Eliminating recursion means moving parameters and local variables
from the call stack to an explicit stack (typically implemented using an
ArrayList).
There is a whole lot of options how to structure the data inside the
stack: What to store, dumb data vs. lambdas. There are too many viable
options here, so I'd like to leave these options to a later thread that
deals with trade-offs and proofs of concept; let's stick with things
that have (or should have!) unanimous answers.

8) An explicit stack tends to invert the processing order of subitems.
Rectifying this is highly desirable but will require extra memory and/or
CPU cycles, so some trade-offs are required.
This is not an issue if stream compatibility between Kryo versions can
be dropped; I suspect this requires a major version upgrade for Kryo but
I do not know.

NONFUNCTIONAL ASPECT

9) Eliminating recursion inevitably transforms endless-loop situations
from stack overflow to nontermination.
Some (many?) Kryo users would consider that inacceptable.
Fortunately, this can be handled using Kryo's existing depth tracking
mechanism and we don't have to add something new to Kryo.

END OF ANALYSIS

Question to all who made it to the end of this post: What do you people
think?

Regards,
Jo

Thomas Heigl

unread,
Apr 29, 2024, 9:15:29 AMApr 29
to kryo-...@googlegroups.com
Hi Jo,

In my opinion, the only serializer that really needs to be non-recursive is the "generic object serializer", i.e. FieldSerializer or one of its sub-classes.

Did you have a look at the implementation suggested here https://github.com/EsotericSoftware/kryo/issues/103?


Best,

Thomas


j...@durchholz.org

unread,
Apr 30, 2024, 5:26:45 AMApr 30
to kryo-...@googlegroups.com
Hi Thomas,

> In my opinion, the only serializer that really needs to be
> non-recursive is the "generic object serializer", i.e.
> FieldSerializer or one of its sub-classes.

There's also BeanSerializer and RecordSerializer, so I don't think that
unrecursification can be limited to FieldSerializer.

FieldSerializer is a great proof of concept though.
And since one major goal would be to allow recursive serializers to
coexist with nonrecursive ones, serializers could be unrecursified on an
as-needed basis, so I'm very much willing to leave the details for later.

For your amusement, here's a classification:

Serializers for classes with a potentially deep tree structure:
* BeanSerializer
* ClosureSerializer
* CompatibleFieldSerializer
* FieldSerializer
* RecordSerializer
* TaggedFieldSerializer
* VersionFieldSerializer

Serializers for container classes (tree structure possible but unlikely):
* CollectionSerializer
* MapSerializer
* ObjectArraySerializer

Serializers for classes with a potentially deep linear structure:
* AtomicReferenceSerializer

Serializers for one-element container classes (list structure possible
but unlikely):
* CollectionsSingletonListSerializer
* CollectionsSingletonMapSerializer
* CollectionsSingletonSetSerializer
* OptionalSerializer

Uncategorized serializers, some probably not even deeply recursive:
* BlowfishSerializer
* DeflateSerializer
* ExternalizableSerializer

It's quite possible that some serializers are misclassified, as detailed
analysis isn't high on my priority list right now.

Regards,
Jo

j...@durchholz.org

unread,
Apr 30, 2024, 7:51:38 AMApr 30
to kryo-...@googlegroups.com
Hi Thomas,

Separate post because it's a different train of thought:

> Did you have a look at the implementation suggested here
> https://github.com/EsotericSoftware/kryo/issues/103?

I did, but the patches were not migrated from Google Code to GitHub, so
I believed the code changes are lost.

As for the text, I wanted to do my own analysis first, so I'd get a
fresh overview of the design space.
Take-aways:
- I did arrive at a stack-of-lambdas approach, too, but I don't know if
I am really doing the same thing.
- Some alternatives change the serialization order. These probably have
best performance, but order changes are a pretty difficult sell for
project management and library users, so I'm not going to pursue these
unless somebody tells me it's okay to do so.
- Among the order-preserving alternatives, there are various approaches
that work with compiletime-created Runnables instead of runtime-created
lambdas. This removes some of the GC pressure.
- The stack could be preallocated, to further reduce GC pressure.
Relevant only where the same Kryo instance is reused, I don't know if
this is a relevant scenario for performance testing.

The patches on Google Code were not migrated to GitHub, unfortunately.

I did find
https://github.com/EsotericSoftware/kryo/compare/master...romix:kryo:kryo-2.23-continuations.
Might be the same or a closely related patch to what was on Google Code,
I couldn't find out.

Anyway, looking at the GitHub diff, I find it difficult to separate the
basic idea from the consequences of using Continuation-Passing Style.
I see warnings against using CPS when coding manually, specifically when
used in conjunction with trampolining, because maintainability and bug
visibility can suffer greatly. So I guess that diff is not going to be
super helpful - which is a shame actually.

> - https://github.com/MarcMil/NonRecursiveKryo

Ah, I missed that one.

I am seeing that it hardcodes the assumption that unrecursifying just
FieldSerializer is enough.
Even if this assumption were accurate, I do not think it should be
hardcoded at this time, so I am not taking a deeper look at it.

Regards,
Jo
Reply all
Reply to author
Forward
0 new messages