Hi all, these days I have played with MinHasher and HyperLogLog from Algebird. First of all I would like to thank you for great library, I really like it and appreciate the effort.
I have two questions regarding MinHasher:
1. Is there a way to get MinHashSignature from intersection of two sets ? I mean when counting different elements in estimateSimilarity() just create a new signature from all different hashes.
Problem is that I would like to calculate something like this :
(S1 n S2 ) u (S3 n S4)
and because after intersect I only get similarity I can not merge the results.
2. How can I estimate similarity between multiple sets ? I found out that if I just create pairs of signatures, then min will be actually similarity between all N sets.
val xs = Seq(sig1, sig2, sig3)
val simN = xs.combinations(2).map( p => minHash.similarity(p.head, p.last)).min
Does this make sense ? Can we include this in official API ?
Thank you in advance.
Nicolae