On Monday, April 29, 2013 at 2:31 AM, overcl...@gmail.com wrote:
Hi,Thanks for the elaborate reply. It did help clearing a lot of distinctions among these methods.The dataset this code will address will have > 4 million unique values as well as < 100 values; (since unique values will have to be stored in multiple dimensions (grouped by each type) ) hence it makes sense to use HLL+ as it will adjust the memory accordingly; and at the same time sticking to the accuracy limits. However, the fact that I am not getting better than 0.4 % error rate is a bit troubling.I am using the following constructor to declare an object:AdaptiveCounting.Builder.obyCount(Integer.MAX_VALUE).build();
I checked; it is going into the loglog constructor.I am calculating this cardinality for a bunch of files; storing them in an array of type ICardinality; and then merging them all into one big ICardinality type.I do this for LogLog and HLL+ also.What I do not get is why I am getting different values when I am constructing my cardinality object as new LogLog(16) instead of AdaptiveCounting.Builder.obyCount(Integer.MAX_VALUE).build();although Adaptive should use same mechanism for offer, merge and cardinality() functions (am I right making this assumption?).Also I tried your suggestion of using (14,25). The results are the same.Thanks for looking into this.
On Sunday, April 28, 2013 10:29:37 AM UTC-7, overcl...@gmail.com wrote:Hey,Many thanks for implementing all this cool stuff. I have been working with library for a while now; trying to tweak various parameters to find the best suitable cardinality mechanism to use.So I have got this is:TYPE ErrorRate (Size)Adaptive 0.1% (65kB)HLL+ 0.4% (~11 kB p=14; sp=20)LogLog 1.8% (65 kB)So is Adaptive counting expected to outperform HLL+ everytime?because I have tried HLL+ for various values {(12, 18), (15, 21), (16,22)} but it is {14,20} that deviates least from actual results.ThanksCharles Adam