How to maximize deserialization performance?

184 views
Skip to first unread message

Hans Blafoo

unread,
Dec 27, 2019, 4:05:29 AM12/27/19
to protostuff
Hi,

I'm using protostuff-1.6.2 (latest maven release) for (de-)serializing classifiers of the Weka Framework (https://www.cs.waikato.ac.nz/ml/weka/). I'm working with several different classifiers, which I've trained on my data and which I've persisted with the help of protostuff.

For a specific use case, I need to be able to deserialize an already loaded byte array as fast as possible.

What I'm currently doing:
- using RuntimeScheme to dynamically get the scheme of the classifier as I don't have a proto-file and the class hierarchy in Weka is pretty deep. Naturally, I cache the scheme internally and it doesn't count for my benchmark times
- using GraphIOUtil as there can be cyclic references, e.g. a tree object having other tree objects as instance attributes

Right now, deserializing takes about 200 ms on an i3-6006U with OpenJDK 13. Do you have an idea how I can reduce the deserialization time significantly? Maybe I'm missing out some configuration parameters.

Thanks and best regards

David Yu

unread,
Dec 27, 2019, 5:01:48 AM12/27/19
to protostuff

Use ExplicitIdStrategy to write the type metadata as int (ser/deser will be faster and the serialized size will be smaller).

Register your concrete classes at startup via ExplicitIdStrategy.Registry.
 

Thanks and best regards

--
You received this message because you are subscribed to the Google Groups "protostuff" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protostuff+...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/protostuff/5e67f071-bba6-40c0-a14d-c2efe9502283%40googlegroups.com.


--
When the cat is away, the mouse is alone.
dyuproject.com

Hans Blafoo

unread,
Dec 27, 2019, 7:03:02 PM12/27/19
to protostuff
Thanks for your reply. I will check it up later, as I have discovered a problem where an object did not get fully serialized with protostuff. Additionally, I didn't get the right usage of the ExplicitIdStrategy.Registry yet because it seems to be rather complicated.

Hans Blafoo

unread,
Dec 28, 2019, 6:17:42 PM12/28/19
to protostuff
Hi,

I checked ExplicitIdStrategy and implemented it as shown in the linked java file. However, I do not see any performance improvement and the resulting file size seems to be a bit larger. Do I have to do anything else than implementing the IdStrategy.Factory, registering POJOs, Collections etc. at an ExplicitIdStrategy and setting it as the desired strategy via System.setProperty("protostuff.runtime.id_strategy_factory",
                "de.machinelearning.training.ClassifierPersistor$IdStrategyFactory");

?

Thanks and best regards

David Yu

unread,
Dec 30, 2019, 10:15:45 AM12/30/19
to protostuff
On Sun, Dec 29, 2019 at 7:17 AM 'Hans Blafoo' via protostuff <proto...@googlegroups.com> wrote:
Hi,

I checked ExplicitIdStrategy and implemented it as shown in the linked java file. However, I do not see any performance improvement and the resulting file size seems to be a bit larger.
That's surprising.  Do you have numbers to back that up?  The file size should be smaller since an integer is written out instead of the FQCN string. 
Do I have to do anything else than implementing the IdStrategy.Factory, registering POJOs, Collections etc. at an ExplicitIdStrategy and setting it as the desired strategy via System.setProperty("protostuff.runtime.id_strategy_factory",
                "de.machinelearning.training.ClassifierPersistor$IdStrategyFactory");

?

Thanks and best regards

--
You received this message because you are subscribed to the Google Groups "protostuff" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protostuff+...@googlegroups.com.

Hans Blafoo

unread,
Dec 30, 2019, 3:35:21 PM12/30/19
to protostuff
I digged deeper into the serialized results and found out, that using ExplicitIdStrategy also serializes an attribute which wasn't serialized by using the traditional way. In that way, it makes sense to me that the file size is larger. However, I don't know why it behaves differently, but the way it is running now is fine for me.
Reply all
Reply to author
Forward
0 new messages