first timer on DKPro - I need help understanding how similarity works.

Skip to first unread message

Paulo Avelar

Jul 30, 2015, 1:01:22 PM7/30/15
to DKPro Similarity Users
I'm just trying DKPro, got the example to work, but I'm not sure I'm understanding how the getSimilarity() works.


    public static void main(String[] args) throws SimilarityException {
TextSimilarityMeasure measure = new WordNGramJaccardMeasure(3);
String[] lemmatize1 = "This be a great book".split(" ");
String[] lemmatize2 = "This book be great".split(" ");  

double score = measure.getSimilarity(lemmatize1, lemmatize2);

System.out.println("Similarity: " + score);

Results in Similarity value equals to zero. Is this correct ?  Shouldn't these sentences be semantically equivalent  ?

Thank you!

Torsten Zesch

Jul 30, 2015, 2:40:14 PM7/30/15
to Paulo Avelar, DKPro Similarity Users
Hi Paulo,

your understanding is (almost) correct :)

However, as you initialize the measure as "WordNGramJaccardMeasure(3)"
- it uses 3-grams for computing the overlap.
As there are no shared 3-grams, the similarity value is correct.
Try WordNGramJaccardMeasure(1) to get results that are closer to your
intuition (but not necessarily more accurate in the long run).

> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to
> For more options, visit

Paulo Avelar

Jul 30, 2015, 9:24:37 PM7/30/15
to DKPro Similarity Users,,
Hi Torsten,

I'm experimenting with other types of "measure" and I'm getting better results with CosineSimilarity.
Thank you for your response !

Reply all
Reply to author
0 new messages