How to normalize similarity measurements (lch, wup, path, res, lin, jcn) between [0,1]?

878 views
Skip to first unread message

AM

unread,
Mar 2, 2016, 10:47:57 AM3/2/16
to Python Programming for Autodesk Maya

I am trying to calculate semantic similarity between two words. I am using Wordnet-based similarity measures i.e Resnik measure(RES), Lin measure(LIN), Jiang, Conrath measure(JNC), and (LCH), but the similarity value is not between [0-1], So i need to normalize the similarity values as some measure give values between 0 and 1, while others give values greater than 1. even the word is similar or different.

Code example:

from nltk.corpus import wordnet as wn

from nltk.corpus import wordnet_ic

brown_ic = wordnet_ic.ic('ic-brown.dat')

s1 = wordnet.synsets("car")

s2 = wordnet.synsets("car")

wn.wup_similarity(s1[0], s2[0])

1.0

wn.lch_similarity(s1[0], s2[0])

3.6375861597263857

wn.path_similarity(s1[0], s2[0])

1.0

wn.jcn_similarity(s1[0], s2[0], brown_ic)

1e+300

wn.res_similarity(s1[0], s2[0], brown_ic)

7.591401417609093

wn.lin_similarity(s1[0], s2[0], brown_ic)

1.0

Adam Mechtley

unread,
Mar 2, 2016, 11:01:14 AM3/2/16
to python_in...@googlegroups.com

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/python_inside_maya/d7a23f0b-6fb1-436d-b89a-6ce30517f733%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

AM

unread,
Mar 2, 2016, 11:16:55 AM3/2/16
to Python Programming for Autodesk Maya
Yes, but I didn't find any response.


Kurian O.S

unread,
Mar 2, 2016, 12:24:05 PM3/2/16
to python_in...@googlegroups.com
from: 'AM' via Python Programming for Autodesk Maya <python_in...@googlegroups.com>
to: Python Programming for Autodesk Maya <python_in...@googlegroups.com>
date: Wed, Mar 2, 2016 at 7:47 AM

Actually what's going on ? 

On Wed, Mar 2, 2016 at 8:16 AM, 'AM' via Python Programming for Autodesk Maya <python_in...@googlegroups.com> wrote:
Yes, but I didn't find any response.


--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
--:: Kurian ::--

Justin Israel

unread,
Mar 2, 2016, 1:37:33 PM3/2/16
to python_in...@googlegroups.com

That stack overflow question is years old and answered. What is going on,  indeed?


AM

unread,
Mar 2, 2016, 1:46:54 PM3/2/16
to Python Programming for Autodesk Maya

Could you see what is stack overflow answered below?, I couldn't apply in my code, please if you could to applying on my example I would be thankful.
 I don't know what is ( words w, u) and what the difference between it and M(w,w).
If I assume that M(w,w) is (wn.res_similarity(s1[0], s2[0], brown_ic)),  then what is MN (w, u) really I couldn't understand. 

thanks.


How to normalize a single measure

Let's consider a single arbitrary similarity measure M and take an arbitrary word w.

Define m = M(w,w). Then m takes maximum possible value of M.

Let's define MN as a normalized measure M.

For any two words w, u you can compute MN(w, u) = M(w, u) / m.

It's easy to see that if M takes non-negative values, then MN takes values in [0, 1].

How to normalize a measure combined from many measures

In order to compute your own defined measure F combined of k different measures m_1, m_2, ..., m_k first normalize independently each m_i using above method and then define:

alpha_1, alpha_2, ..., alpha_k

such that alpha_i denotes the weight of i-th measure.

All alphas must sum up to 1, i.e:

alpha_1 + alpha_2 + ... + alpha_k = 1

Then to compute your own measure for w, u you do:

F(w, u) = alpha_1 * m_1(w, u) + alpha_2 * m_2(w, u) + ... + alpha_k * m_k(w, u)

It's clear that F takes values in [0,1]

Kurian O.S

unread,
Mar 2, 2016, 1:53:56 PM3/2/16
to python_in...@googlegroups.com
This group mainly for maya based python questions  not really for nltk kind of stuff.  But some one maybe will have some idea. But the real question is how you using this group as your ID ?


from:        'AM' via Python Programming for Autodesk Maya <python_in...@googlegroups.com>

How it become your from address ?

--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
--:: Kurian ::--

AM

unread,
Mar 2, 2016, 2:10:41 PM3/2/16
to Python Programming for Autodesk Maya
What I do then?  there are nltk-users group maybe anyone can solve it!!!


Justin Israel

unread,
Mar 2, 2016, 2:24:00 PM3/2/16
to Python Programming for Autodesk Maya

In the comments of that stack question it suggested you would need to be able to know the max range of each similarity measurement. If you know that, then you can normalise the values. So it's up to you to review the docs of each one.


On Thu, 3 Mar 2016 8:10 AM 'AM' via Python Programming for Autodesk Maya <python_in...@googlegroups.com> wrote:
What I do then?  there are nltk-users group maybe anyone can solve it!!!


--
You received this message because you are subscribed to the Google Groups "Python Programming for Autodesk Maya" group.
To unsubscribe from this group and stop receiving emails from it, send an email to python_inside_m...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages