Re: serializing and deserializing the HLL+ object

340 views
Skip to first unread message

Matt Abrams

unread,
Apr 27, 2013, 9:26:12 AM4/27/13
to stream-...@googlegroups.com
Rashil -

Thanks! The estimators have a 'getBytes()' method that will provide a
serialized form of the object. You can use the Builder classes to
deserialize the bytes back into a the Java object. Check out the
examples in the test classes, for example:

https://github.com/clearspring/stream-lib/blob/master/src/test/java/com/clearspring/analytics/stream/cardinality/TestHyperLogLogPlus.java

Matt

On Fri, Apr 26, 2013 at 8:50 PM, <rushil...@gmail.com> wrote:
> Hi,
>
> First of all; kudos to you guys for implementing this fine estimation
> method. Its a huge boost to the analytics both in terms of space and time.
> I am gonna write a HLL writable and for that I need to serialize and
> deserialize an HLL or LogLog object.
>
> My query is: Do I need to write all the instance variables to the output
> stream object or there are only some selective ones that I need to
> serialize?
>
> Thanks in Advance!
>
> --
> You received this message because you are subscribed to the Google Groups
> "stream-lib-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to stream-lib-us...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

rxin

unread,
Apr 18, 2014, 2:14:11 AM4/18/14
to stream-...@googlegroups.com, rgu...@tunein.com
Any opinion on making the HyperLogLog class serializable?

We are using streamlib in the Apache Spark project, but we have to resort to a wrapper class that uses the builder for serialization. If you guys are ok with it, I can submit a patch to make HyperLogLog itself serializable.

https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/SerializableHyperLogLog.scala

Cheers.

On Sunday, April 28, 2013 10:31:43 AM UTC-7, rgu...@tunein.com wrote:
Hey Thanks Matt!
The Builder classes are pretty cool for deserializing.

Thanks for prompt response

Matt Abrams

unread,
Apr 22, 2014, 10:33:30 AM4/22/14
to stream-...@googlegroups.com, rgu...@tunein.com
No objection to making the class Serializable. Glad you are using in
in Spark! I highly recommend using HyperLogLogPlus rather than
HyperLogLog. A patch is appreciated.

Matt
> For more options, visit https://groups.google.com/d/optout.

Reynold Xin

unread,
Apr 22, 2014, 10:27:03 PM4/22/14
to stream-...@googlegroups.com, rgu...@tunein.com


You received this message because you are subscribed to a topic in the Google Groups "stream-lib-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/stream-lib-user/jLDVn27ZYPE/unsubscribe.
To unsubscribe from this group and all its topics, send an email to stream-lib-us...@googlegroups.com.

Matt Abrams

unread,
Apr 23, 2014, 11:35:30 AM4/23/14
to stream-...@googlegroups.com, rgu...@tunein.com
Think that was just an unlucky build where HLL fell outside of
tolerance in that test. It hasn't happened recently and since HLL
will be deprecated soon I don't think its a big deal. I am a little
concerned about changing the public constructors. We should probably
make that a major version number change. Did you see my note on using
HLLPlus instead?


matt

Reynold Xin

unread,
Apr 23, 2014, 2:19:41 PM4/23/14
to stream-...@googlegroups.com, Rushil Gupta
Yup - if you like the current approach, I can change HLL+ too. Let me know. 

Ian Barfield

unread,
Apr 23, 2014, 5:59:20 PM4/23/14
to stream-...@googlegroups.com, Rushil Gupta
from an alternate pr I created (https://github.com/addthis/stream-lib/pull/71):

This PR is intended to replace: #70

It supports serialization of HyperLogLog objects using the (relatively) terse Externalizable format, but
without any of the unpleasant requirements that would otherwise be imposed on it. (public no-arg
constructor, inability to use final fields, public methods read/ write object methods that affect the private
state). Full disclosure is that it does generate this holder class as extra garbage, but it is tiny and
I make up for it (see next paragraph). It does also theoretically affect the overhead of writing the class
name during invocation of externalizable (the inner class has a longer fully qualified class name), but
people using externalizable are presumably already are aware of such risks.

I also fixed a rather unfortunate waste in both the existing and newly added deserialization options
where the byte array would make a nearly full copy of itself after reading only 8 bytes for log2m and
length prefix.


Reply all
Reply to author
Forward
0 new messages