I'm using HPPC at the hearth of the Grph java lib.
http://www-sop.inria.fr/members/Luc.Hogie/grph/
While I was working on the I/O capabilities for the lib, I noticed
that HPPC does not support serialization. This is a great pity since
most often, serialization support is achievable by adding "implements
Serializable" to your class.
Considering the simplicity of it, is there a good reason for not
supporting serialization? This would dramatically improve HPPC.
Regards,
Luc.
We could use MessagePack to add efficient (and compact) serialization
to HPPC... but this would be an external dependency, so I'm not
entirely convinced it makes sense.
Anyway, if you'd like to add proper serialization support (see the
problem with dictionaries above), patches are welcome.
Dawid
> Serialization is not supported because just implementing
> "Serializable" does not solve the issues of versioning (java
> serialization is quite terrible at that) and custom serialization does
The use of SerialVersionUID should be enough.
> require some additional work. Besides, for things like hash maps
> (dictionaries) trivial serialization by adding @Serializable will
> serialize everything, including empty slots, which is a great waste of
> space.
Right, but the real benefit of using Java serialization is not to get
compact/efficient encoding: it is have trivial and effective I/O
functionality. Actually the performance of Java serialization is fine
for most applications.
Additionally, the waste of space can be easily solved by gzipping the
resulting byte stream.
> We could use MessagePack to add efficient (and compact) serialization
> to HPPC... but this would be an external dependency, so I'm not
> entirely convinced it makes sense.
Well the Jar has to be kept small I think. You're right doubting.
> Anyway, if you'd like to add proper serialization support (see the
> problem with dictionaries above), patches are welcome.
I just need the serialization functionality. :)
It is really a weakness of HPPC that is doesn't support Java
serialization. I really think you should simply make your class extend
java.io.Serializable. In 5mn the weakness will have vanished. :)
Cheers,
Luc.
It's not enough for reading data saved with incompatible (older) class
version; I'm talking about changes in the structure that can be
backwards-compatible for serialized data.
> Right, but the real benefit of using Java serialization is not to get
> compact/efficient encoding: it is have trivial and effective I/O
> functionality. Actually the performance of Java serialization is fine for
> most applications.
HPPC is meant for in-memory use in critical loops, really.
Serialization and persistence is not part of its goal. fastutil does
implement this feature though, so you can always peek in there.
> Additionally, the waste of space can be easily solved by gzipping the
> resulting byte stream.
This is hardly a solution, sorry.
> I just need the serialization functionality. :)
Well, fork the project on github and add it -- that's why it's open
source. :) If there are more people wishing for serialization, I may
add it, but it's not really a priority (for the reasons above).
D.
> HPPC is meant for in-memory use in critical loops, really.
And it meets its objective very well! This goal doesn't prevent it for
supporting Java serialization (which is a standard). Efficient
serialization can come later, if needed.
To be honest (but not offensive at all), it sounds like you refuse HPPC
to support Java serialization because its inefficient. Instead, it
should support it because it is a standard way to serialize objects.
> Serialization and persistence is not part of its goal. fastutil does
> implement this feature though, so you can always peek in there.
I'm not going to switch to Fastutils (and giving up HPPC which I like
very much) just for the need of serialization. :(
> Well, fork the project on github and add it -- that's why it's open
> source. :) If there are more people wishing for serialization, I may
I'm not willing to separate from the community, but if you really don't
want to add "implements Serializable" to your classes, then I'll end
doing it myself. :)
By the way, the use HPPC brang great performance to our graph library!
Regards,
Luc.
--
Luc Hogie - CNRS Research Engineer
COMRED Research Unit (I3S(CNRS-UNS) INRIA)
http://www-sop.inria.fr/members/Luc.Hogie/
luc....@inria.fr
+33 4 89 73 24 25 (office)
+33 6 80 91 40 71 (mobile)
Skype ID: luchogie
I would suggest that instead of trying to make Java serialization
(with its known issues) work, it might be worthwhile to work with
data-binding library authors to allow supporting, say, JSON
serialization of HPPC data structures.
For example, Jackson JSON processor can easily be extended with
library/datatype-specific modules, to allow seamless handling of all
kinds of types. And if that would be of interest, I could definitely
help.
Beyond JSON, Jackson actually supports many other datatypes; Smile
(binary JSON serialization), XML (jackson-xml-databind), BSON. And
more formats will be supported in near future (cvs, perhaps avro).
-+ Tatu +-
I'll check this.
Right, just something to consider (in fact, regardless of whether
native serialization was supported).
And JSON has one benefit over XML, in that it has native array and map
types, simplifying things a bit.
Anyway, if anyone is interested in this specifically with JSON, let me
know; it should be very easy to do.
I wrote initial version of module to support Guava
(ex-google-collections) types, this should be quite similar.
-+ Tatu +-
Java serialization is a nasty way to serialize objects and many people
will tell you that. Many people prefer json, avro or other ways to
serialize (I suggested MessagePack earlier on).
> I'm not going to switch to Fastutils (and giving up HPPC which I like very
> much) just for the need of serialization. :(
I'm glad to hear that, although you should pick the one that suits
your needs; no need to be emotional with respect to code ;)
> I'm not willing to separate from the community, but if you really don't want
> to add "implements Serializable" to your classes, then I'll end doing it
> myself. :)
You didn't understand me -- if you fork on github and add
serialization, notify me and I'll take a peek at what you've done and
perhaps merge it back. Github is a great way to collaborate and
contribute.
> By the way, the use HPPC brang great performance to our graph library!
Awesome, thanks.
Dawid
I ended writing a custom solution, which is already functional and
wasn't a big deal programming.
It is next to impossible to write a serialization library that will
automatically work for all kinds of objects, including ones that are
not designed to play nicely with serialization (just in general), so I
am not surprised in this sense.
Collection types especially need some special handling to work
efficiently, more so than things that are exposed as POJOs.
Did you try contacting authors of any of those libraries? Which ones
did you try?
Even just notifying authors on issues encountered would help others in
knowing what works, what does not, and perhaps work on resolving (or
at least documenting) those issues.
> I ended writing a custom solution, which is already functional and wasn't a
> big deal programming.
Alas, this does not help others that may have similar issues. They
will have to do the same.
This is why I suggested that working with others would be helpful and
allow sharing of some effort.
Anyway; as I said, if anyone else finds need to serialize/deserialize
HPPC types, I would be interested in collaborating.
My own use case is currently memory-only, without need for
serialization, but that may change in future.
This could also be one more feature to help HPCC get more adoption,
over earlier libraries.
-+ Tatu +-
Absolutely! You're welcome to contribute. Like I said -- either fork
on github, add your stuff and make a contrib request, or file an issue
to the project's JIRA here:
http://issues.carrot2.org/browse/HPPC
Dawid
Well Java serialization DOES work nicely most of the time.
> Did you try contacting authors of any of those libraries? Which ones
> did you try?
Not all of them. Dunno if I will or not.
I tried Castor, JAXB, JSon...
>> I ended writing a custom solution, which is already functional and wasn't a
>> big deal programming.
>
> Alas, this does not help others that may have similar issues. They
> will have to do the same.
This helps in the sense that it testifies that ad hoc solutions should
never be neglected. Of course it doesn't help in serializing HPPC object. :(
> This is why I suggested that working with others would be helpful and
> allow sharing of some effort.
I started the development of a "yet new efficient serializer". I'll see
in the future it can be useful to HPPC users.
Don't duplicate the efforts. How about if you try messagepack first?
It seems to be able to serialize all public fields of an object
(without any annotations on the object itself). This would be
identical to adding @Serializable (only more efficient, in fact). Try
it and let us know if this worked for you.
http://wiki.msgpack.org/display/MSGPACK/QuickStart+for+Java#QuickStartforJava-Withoutannotation
Dawid
Dawid
I had horrible experience trying to make MessagePack do anything too
-- please do not consider that to be representative of good java
serialization libraries. Xstream is better one, for example, as a
general purpose serializer (which msgpack is not).
As to Java serialization -- even that requires that class obeys basic
rules such as marking non-serializable things to be marked as
transient, objects themselves as Serializable; and not count on a
constructor being called (since deserialization never calls any of
constructors).
This is what I mean that classes absolutely must take possibility of
serialization into account, even when using JDK serialization.
-+ Tatu +-
If anyone decides to try msgpack, just keep in mind its error
reporting is pretty bad, documentation sparse, so it's trial-and-error
all the way.
One thing I did learn was that anything that can be null MUST be
annotated with @Optional.
-+ Tatu +-
I think that if it is possible to get HPPC types serializable with
msgpack, with suitable field declarations (including adding some
transient markers perhaps), they would also work with Xstream. As well
as standard JDK serialization.
It will probably also help with other serialization libraries that use
JDK-serialization-like approach of using only fields.
So that sounds like a good thing to do.
For what it's worth, I started this github project:
https://github.com/FasterXML/jackson-datatype-hppc
which adds module for Jackson (JSON processor) to make HPPC datatypes
automatically serialized/deserialized to/from JSON.
So far I just added support for basic primitive containers (as sort of
proof of concept), which was easy to do.
This effort should require no changes to HPPC (or Jackson) code or
class definitions.
-+ Tatu +-
Transient markers should be present in those fields that are really
transient, but at the moment I just don't think there are any (all the
fields have a purpose and need to be serialized). I've added an issue
to add Java serialization to HPPC. Again -- this shouldn't be too
difficult, but I don't give it much priority. If you can provide a
patch (and tests) that provide serialization, go ahead.
http://issues.carrot2.org/browse/HPPC-64
Dawid
Understood.
Anyone who wants HPPC types to be serializable in certain way should
help in getting that done by suggesting necessary changes (scratch the
itch etc).
-+ Tatu +-