first timer on DKPro - I need help understanding how similarity works.

71 views
Skip to first unread message

Paulo Avelar

unread,
Jul 30, 2015, 1:01:22 PM7/30/15
to DKPro Similarity Users
Hello,
I'm just trying DKPro, got the example to work, but I'm not sure I'm understanding how the getSimilarity() works.

Given:

    public static void main(String[] args) throws SimilarityException {
TextSimilarityMeasure measure = new WordNGramJaccardMeasure(3);
String[] lemmatize1 = "This be a great book".split(" ");
String[] lemmatize2 = "This book be great".split(" ");  

double score = measure.getSimilarity(lemmatize1, lemmatize2);

System.out.println("Similarity: " + score);
}

Results in Similarity value equals to zero. Is this correct ?  Shouldn't these sentences be semantically equivalent  ?

Thank you!


Torsten Zesch

unread,
Jul 30, 2015, 2:40:14 PM7/30/15
to Paulo Avelar, DKPro Similarity Users
Hi Paulo,

your understanding is (almost) correct :)

However, as you initialize the measure as "WordNGramJaccardMeasure(3)"
- it uses 3-grams for computing the overlap.
As there are no shared 3-grams, the similarity value is correct.
Try WordNGramJaccardMeasure(1) to get results that are closer to your
intuition (but not necessarily more accurate in the long run).

-Torsten
> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to dkpro-similarity-...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Paulo Avelar

unread,
Jul 30, 2015, 9:24:37 PM7/30/15
to DKPro Similarity Users, phav...@gmail.com, torste...@gmail.com
Hi Torsten,

I'm experimenting with other types of "measure" and I'm getting better results with CosineSimilarity.
Thank you for your response !

Paulo
Reply all
Reply to author
Forward
0 new messages