two other questions about similarity measure

28 views

Skip to first unread message

Andrea Apicella

unread,

Mar 14, 2015, 1:09:33 PM3/14/15

to wiki...@googlegroups.com

Hi again
I'm still using Similarity package and I have 3 questions (for now :)) ):

1) it gives me the scores of similarity between two terms, but during computation it tells me

Failed to load normalizer at /home/andrea/testwiki/./dat/sr/ensemble/simple/mostSimilarNormalizer. Setting it to be invalid.
mar 14, 2015 6:01:38 PM org.wikibrain.sr.BaseSRMetric configureBase

I checked if file exists, and it is in the right directory. Why this? what are the possible problem without loading this file?

2) I notice that if I want the similarity between the same words (e.g. Albert Einstein VS Albert Einstein) I obtain a value not equals to 1.0; the code is simply:

System.out.println(sr.similarity("Albert Einstein", "Albert Einstein", true));

is it a normal behavior? or maybe, the file of my first question could be the cause of this strange behavior?

3) the method LocalPageDao.getIdByTitle() returns me a local id ( I suppose) Is there a system to get the wikipedia real id?

thank you!

Shilad Sen

unread,

Mar 15, 2015, 2:50:43 PM3/15/15

to wiki...@googlegroups.com

Hi Andrea! Good questions

1) There are two modes for SR: similarity(X, Y) and mostSimilar(X). The latter returns the most similar Ys. By default we only train similarity because mostSimilar requires much more computational effort. The normalizer that is missing is only the mostSimilar one, so that should be harmless.

2) Hopefully the value is *close to* 1? Yes, this is an issue with normalization. Perhaps we should short-circuit normalization in the identity case.

3) A LocalId *is* the "real" Wikipedia Id. We call it "local" because it's language-specific.

-Shilad

--
You received this message because you are subscribed to the Google Groups "wikibrain" group.
To unsubscribe from this group and stop receiving emails from it, send an email to wikibrain+...@googlegroups.com.
To post to this group, send email to wiki...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/wikibrain/15dbd1f7-4d86-4315-9992-a60f03c8a3bf%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Shilad W. Sen
Associate Professor
Mathematics, Statistics, and Computer Science Dept.
Macalester College
ss...@macalester.edu

http://www.shilad.com

https://www.linkedin.com/in/shilad

651-696-6273

Reply all

Reply to author

Forward

0 new messages