MinHasher API

18 views
Skip to first unread message

Nicolae Rosca

unread,
Apr 21, 2016, 8:52:04 AM4/21/16
to algebird
Hi all, these days I have played with MinHasher and HyperLogLog from Algebird. First of all I would like to thank you for great library, I really like it and appreciate the effort. 
I have two questions regarding MinHasher:
1. Is there a way to get MinHashSignature from intersection of two sets ? I mean when counting different elements in estimateSimilarity() just create a new signature from all different hashes. 
Problem is that I would like to calculate something like this : 

 (S1 n S2 ) u (S3 n S4) 

and because after intersect I only get similarity I can not merge the results. 

2. How can I estimate similarity between multiple sets ? I found out that if I just create pairs of signatures, then min will be actually similarity between all N sets. 

    val xs = Seq(sig1, sig2, sig3)
val simN = xs.combinations(2).map( p => minHash.similarity(p.head, p.last)).min

Does this make sense ? Can we include this in official API ? 

Thank you in advance.
Nicolae
Reply all
Reply to author
Forward
0 new messages