Release 0.4.0

66 views
Skip to first unread message

Dawid Weiss

unread,
May 1, 2011, 5:47:04 PM5/1/11
to java-high-performance-primitive-collections

Dear All,

HPPC version 0.4.0 has just been pushed to Maven Central repositories (will be available in about an hour). For a list of issues and improvements, see this link:


Notable highlights:

** API-breaking changes:

HPPC-60: Cleaned up the code of all iterators (including some name/scope changes
         of iterator classes, so if you relied on these, things may break).

HPPC-59: keySet() renamed to keys() on associative containers. 

HPPC-46: toArray() on object types must return actual T[], not Object[]

HPPC-52: Dropped custom hash functions and comparators from associative containers
         for speed reasons.

** New features

HPPC-61: Cleaned up Maven structure: parent aggregator and submodules.

HPPC-57: Added a view of values to associative containers (values() method).

HPPC-49: Added support for XorShift random.

HPPC-34: Added support for Cloneable.

HPPC-51: Replace double hashing in open hash map/set to linear probing and a good 
         hashing function to ensure random distribution of elements

HPPC-47: Changed the implementation of MurmurHash to MurmurHash3, impl.
         borrowed from Sebastiano Vigna's fastutil library. [ASL]

** Bug fixes

HPPC-46: toArray() on object types must return actual T[], not Object[]

** Other

HPPC-58: Better integration with Eclipse, new template->code generation.

Luc Hogie

unread,
Jun 9, 2011, 5:36:57 AM6/9/11
to java-high-performance...@googlegroups.com

Hi,

I'm using HPPC at the hearth of the Grph java lib.

http://www-sop.inria.fr/members/Luc.Hogie/grph/

While I was working on the I/O capabilities for the lib, I noticed
that HPPC does not support serialization. This is a great pity since
most often, serialization support is achievable by adding "implements
Serializable" to your class.

Considering the simplicity of it, is there a good reason for not
supporting serialization? This would dramatically improve HPPC.

Regards,
Luc.

Dawid Weiss

unread,
Jun 9, 2011, 5:56:44 AM6/9/11
to java-high-performance...@googlegroups.com
Serialization is not supported because just implementing
"Serializable" does not solve the issues of versioning (java
serialization is quite terrible at that) and custom serialization does
require some additional work. Besides, for things like hash maps
(dictionaries) trivial serialization by adding @Serializable will
serialize everything, including empty slots, which is a great waste of
space.

We could use MessagePack to add efficient (and compact) serialization
to HPPC... but this would be an external dependency, so I'm not
entirely convinced it makes sense.

Anyway, if you'd like to add proper serialization support (see the
problem with dictionaries above), patches are welcome.

Dawid

Luc Hogie

unread,
Jun 9, 2011, 6:06:27 AM6/9/11
to java-high-performance...@googlegroups.com, Dawid Weiss

Hi Dawid, thanks for replying.

> Serialization is not supported because just implementing
> "Serializable" does not solve the issues of versioning (java
> serialization is quite terrible at that) and custom serialization does

The use of SerialVersionUID should be enough.

> require some additional work. Besides, for things like hash maps
> (dictionaries) trivial serialization by adding @Serializable will
> serialize everything, including empty slots, which is a great waste of
> space.

Right, but the real benefit of using Java serialization is not to get
compact/efficient encoding: it is have trivial and effective I/O
functionality. Actually the performance of Java serialization is fine
for most applications.

Additionally, the waste of space can be easily solved by gzipping the
resulting byte stream.

> We could use MessagePack to add efficient (and compact) serialization
> to HPPC... but this would be an external dependency, so I'm not
> entirely convinced it makes sense.

Well the Jar has to be kept small I think. You're right doubting.

> Anyway, if you'd like to add proper serialization support (see the
> problem with dictionaries above), patches are welcome.

I just need the serialization functionality. :)

It is really a weakness of HPPC that is doesn't support Java
serialization. I really think you should simply make your class extend
java.io.Serializable. In 5mn the weakness will have vanished. :)

Cheers,
Luc.

Dawid Weiss

unread,
Jun 9, 2011, 6:10:46 AM6/9/11
to Luc Hogie, java-high-performance...@googlegroups.com
> The use of SerialVersionUID should be enough.

It's not enough for reading data saved with incompatible (older) class
version; I'm talking about changes in the structure that can be
backwards-compatible for serialized data.

> Right, but the real benefit of using Java serialization is not to get
> compact/efficient encoding: it is have trivial and effective I/O
> functionality. Actually the performance of Java serialization is fine for
> most applications.

HPPC is meant for in-memory use in critical loops, really.
Serialization and persistence is not part of its goal. fastutil does
implement this feature though, so you can always peek in there.

> Additionally, the waste of space can be easily solved by gzipping the
> resulting byte stream.

This is hardly a solution, sorry.

> I just need the serialization functionality. :)

Well, fork the project on github and add it -- that's why it's open
source. :) If there are more people wishing for serialization, I may
add it, but it's not really a priority (for the reasons above).

D.

Luc Hogie

unread,
Jun 9, 2011, 11:19:34 AM6/9/11
to java-high-performance...@googlegroups.com, Dawid Weiss

Dawid,

> HPPC is meant for in-memory use in critical loops, really.

And it meets its objective very well! This goal doesn't prevent it for
supporting Java serialization (which is a standard). Efficient
serialization can come later, if needed.

To be honest (but not offensive at all), it sounds like you refuse HPPC
to support Java serialization because its inefficient. Instead, it
should support it because it is a standard way to serialize objects.

> Serialization and persistence is not part of its goal. fastutil does
> implement this feature though, so you can always peek in there.

I'm not going to switch to Fastutils (and giving up HPPC which I like
very much) just for the need of serialization. :(

> Well, fork the project on github and add it -- that's why it's open
> source. :) If there are more people wishing for serialization, I may

I'm not willing to separate from the community, but if you really don't
want to add "implements Serializable" to your classes, then I'll end
doing it myself. :)

By the way, the use HPPC brang great performance to our graph library!

Regards,
Luc.

--
Luc Hogie - CNRS Research Engineer
COMRED Research Unit (I3S(CNRS-UNS) INRIA)

http://www-sop.inria.fr/members/Luc.Hogie/
luc....@inria.fr
+33 4 89 73 24 25 (office)
+33 6 80 91 40 71 (mobile)
Skype ID: luchogie

Tatu Saloranta

unread,
Jun 9, 2011, 2:25:37 PM6/9/11
to java-high-performance...@googlegroups.com

I would suggest that instead of trying to make Java serialization
(with its known issues) work, it might be worthwhile to work with
data-binding library authors to allow supporting, say, JSON
serialization of HPPC data structures.

For example, Jackson JSON processor can easily be extended with
library/datatype-specific modules, to allow seamless handling of all
kinds of types. And if that would be of interest, I could definitely
help.
Beyond JSON, Jackson actually supports many other datatypes; Smile
(binary JSON serialization), XML (jackson-xml-databind), BSON. And
more formats will be supported in near future (cvs, perhaps avro).

-+ Tatu +-

Luc Hogie

unread,
Jun 9, 2011, 3:30:54 PM6/9/11
to java-high-performance...@googlegroups.com

I didn't know JSON, but indeed, maybe there are technologies better (as
easy to use but more efficient) than Java Serialization than might do
the job.

I'll check this.

Tatu Saloranta

unread,
Jun 9, 2011, 4:41:24 PM6/9/11
to java-high-performance...@googlegroups.com
On Thu, Jun 9, 2011 at 12:30 PM, Luc Hogie <luc....@laposte.net> wrote:
>
> I didn't know JSON, but indeed, maybe there are technologies better (as easy
> to use but more efficient)  than Java Serialization than might do the job.

Right, just something to consider (in fact, regardless of whether
native serialization was supported).
And JSON has one benefit over XML, in that it has native array and map
types, simplifying things a bit.

Anyway, if anyone is interested in this specifically with JSON, let me
know; it should be very easy to do.
I wrote initial version of module to support Guava
(ex-google-collections) types, this should be quite similar.

-+ Tatu +-

Dawid Weiss

unread,
Jun 10, 2011, 2:30:42 AM6/10/11
to Luc Hogie, java-high-performance...@googlegroups.com
> To be honest (but not offensive at all), it sounds like you refuse HPPC to
> support Java serialization because its inefficient. Instead, it should
> support it because it is a standard way to serialize objects.

Java serialization is a nasty way to serialize objects and many people
will tell you that. Many people prefer json, avro or other ways to
serialize (I suggested MessagePack earlier on).

> I'm not going to switch to Fastutils (and giving up HPPC which I like very
> much) just for the need of serialization. :(

I'm glad to hear that, although you should pick the one that suits
your needs; no need to be emotional with respect to code ;)

> I'm not willing to separate from the community, but if you really don't want
> to add "implements Serializable" to your classes, then I'll end doing it
> myself. :)

You didn't understand me -- if you fork on github and add
serialization, notify me and I'll take a peek at what you've done and
perhaps merge it back. Github is a great way to collaborate and
contribute.

> By the way, the use HPPC brang great performance to our graph library!

Awesome, thanks.

Dawid

Luc Hogie

unread,
Jun 10, 2011, 9:27:01 AM6/10/11
to java-high-performance...@googlegroups.com

I found that none of the serialization libraries available on the market
were really satisfactory. Either they simply fail or they
require the modification of the code or painful configuration (and maybe
will fail the same :)).

I ended writing a custom solution, which is already functional and
wasn't a big deal programming.

Tatu Saloranta

unread,
Jun 10, 2011, 12:55:00 PM6/10/11
to java-high-performance...@googlegroups.com
On Fri, Jun 10, 2011 at 6:27 AM, Luc Hogie <luc....@laposte.net> wrote:
>
> I found that none of the serialization libraries available on the market
> were really satisfactory. Either they simply fail or they
> require the modification of the code or painful configuration (and maybe
> will fail the same :)).

It is next to impossible to write a serialization library that will
automatically work for all kinds of objects, including ones that are
not designed to play nicely with serialization (just in general), so I
am not surprised in this sense.
Collection types especially need some special handling to work
efficiently, more so than things that are exposed as POJOs.

Did you try contacting authors of any of those libraries? Which ones
did you try?
Even just notifying authors on issues encountered would help others in
knowing what works, what does not, and perhaps work on resolving (or
at least documenting) those issues.

> I ended writing a custom solution, which is already functional and wasn't a
> big deal programming.

Alas, this does not help others that may have similar issues. They
will have to do the same.
This is why I suggested that working with others would be helpful and
allow sharing of some effort.

Anyway; as I said, if anyone else finds need to serialize/deserialize
HPPC types, I would be interested in collaborating.
My own use case is currently memory-only, without need for
serialization, but that may change in future.
This could also be one more feature to help HPCC get more adoption,
over earlier libraries.

-+ Tatu +-

Dawid Weiss

unread,
Jun 11, 2011, 2:19:15 AM6/11/11
to java-high-performance...@googlegroups.com
> Anyway; as I said, if anyone else finds need to serialize/deserialize
> HPPC types, I would be interested in collaborating.

Absolutely! You're welcome to contribute. Like I said -- either fork
on github, add your stuff and make a contrib request, or file an issue
to the project's JIRA here:

http://issues.carrot2.org/browse/HPPC

Dawid

Luc Hogie

unread,
Jun 11, 2011, 1:00:41 PM6/11/11
to java-high-performance...@googlegroups.com, Tatu Saloranta

> It is next to impossible to write a serialization library that will
> automatically work for all kinds of objects, including ones that are
> not designed to play nicely with serialization (just in general), so I
> am not surprised in this sense.

Well Java serialization DOES work nicely most of the time.

> Did you try contacting authors of any of those libraries? Which ones
> did you try?

Not all of them. Dunno if I will or not.
I tried Castor, JAXB, JSon...

>> I ended writing a custom solution, which is already functional and wasn't a
>> big deal programming.
>
> Alas, this does not help others that may have similar issues. They
> will have to do the same.

This helps in the sense that it testifies that ad hoc solutions should
never be neglected. Of course it doesn't help in serializing HPPC object. :(

> This is why I suggested that working with others would be helpful and
> allow sharing of some effort.

I started the development of a "yet new efficient serializer". I'll see
in the future it can be useful to HPPC users.

Dawid Weiss

unread,
Jun 11, 2011, 4:02:49 PM6/11/11
to java-high-performance...@googlegroups.com
> I started the development of a "yet new efficient serializer". I'll see in
> the future it can be useful to HPPC users.

Don't duplicate the efforts. How about if you try messagepack first?
It seems to be able to serialize all public fields of an object
(without any annotations on the object itself). This would be
identical to adding @Serializable (only more efficient, in fact). Try
it and let us know if this worked for you.

http://wiki.msgpack.org/display/MSGPACK/QuickStart+for+Java#QuickStartforJava-Withoutannotation

Dawid

Luc Hogie

unread,
Jun 11, 2011, 4:46:19 PM6/11/11
to java-high-performance...@googlegroups.com, Dawid Weiss

I tried MessagePack with my graph objects. It pitifully terminated by an
ugly exception. Since the solution didn't appear immediate to me, I
looked elsewhere.

Dawid Weiss

unread,
Jun 12, 2011, 2:13:09 AM6/12/11
to Luc Hogie, java-high-performance...@googlegroups.com
And the exception/ stack trace was?... :) I'm guessing messagepack
people would be willing to hear about it (and so would I, out of
curiosity).

Dawid

Tatu Saloranta

unread,
Jun 12, 2011, 2:22:14 PM6/12/11
to java-high-performance...@googlegroups.com
On Sat, Jun 11, 2011 at 1:46 PM, Luc Hogie <luc....@laposte.net> wrote:
>
> I tried MessagePack with my graph objects. It pitifully terminated by an
> ugly exception. Since the solution didn't appear immediate to me, I looked
> elsewhere.

I had horrible experience trying to make MessagePack do anything too
-- please do not consider that to be representative of good java
serialization libraries. Xstream is better one, for example, as a
general purpose serializer (which msgpack is not).

As to Java serialization -- even that requires that class obeys basic
rules such as marking non-serializable things to be marked as
transient, objects themselves as Serializable; and not count on a
constructor being called (since deserialization never calls any of
constructors).
This is what I mean that classes absolutely must take possibility of
serialization into account, even when using JDK serialization.

-+ Tatu +-

Tatu Saloranta

unread,
Jun 12, 2011, 2:24:43 PM6/12/11
to java-high-performance...@googlegroups.com

If anyone decides to try msgpack, just keep in mind its error
reporting is pretty bad, documentation sparse, so it's trial-and-error
all the way.
One thing I did learn was that anything that can be null MUST be
annotated with @Optional.

-+ Tatu +-

Tatu Saloranta

unread,
Jun 13, 2011, 1:22:17 AM6/13/11
to java-high-performance...@googlegroups.com
On Sat, Jun 11, 2011 at 1:02 PM, Dawid Weiss <dawid...@gmail.com> wrote:

I think that if it is possible to get HPPC types serializable with
msgpack, with suitable field declarations (including adding some
transient markers perhaps), they would also work with Xstream. As well
as standard JDK serialization.
It will probably also help with other serialization libraries that use
JDK-serialization-like approach of using only fields.
So that sounds like a good thing to do.

For what it's worth, I started this github project:

https://github.com/FasterXML/jackson-datatype-hppc

which adds module for Jackson (JSON processor) to make HPPC datatypes
automatically serialized/deserialized to/from JSON.
So far I just added support for basic primitive containers (as sort of
proof of concept), which was easy to do.
This effort should require no changes to HPPC (or Jackson) code or
class definitions.

-+ Tatu +-

Dawid Weiss

unread,
Jun 13, 2011, 3:20:38 AM6/13/11
to java-high-performance...@googlegroups.com
> I think that if it is possible to get HPPC types serializable with
> msgpack, with suitable field declarations (including adding some
> transient markers perhaps), they would also work with Xstream. As well


Transient markers should be present in those fields that are really
transient, but at the moment I just don't think there are any (all the
fields have a purpose and need to be serialized). I've added an issue
to add Java serialization to HPPC. Again -- this shouldn't be too
difficult, but I don't give it much priority. If you can provide a
patch (and tests) that provide serialization, go ahead.

http://issues.carrot2.org/browse/HPPC-64

Dawid

Tatu Saloranta

unread,
Jun 13, 2011, 2:53:51 PM6/13/11
to java-high-performance...@googlegroups.com

Understood.
Anyone who wants HPPC types to be serializable in certain way should
help in getting that done by suggesting necessary changes (scratch the
itch etc).

-+ Tatu +-

Reply all
Reply to author
Forward
0 new messages