Hi,
Sandra,Maciej - Thanks for all the inputs!
derivationally_related_forms() is definitely what I was looking for.
I did note the thing about multiple lemmas.
What I have eventually resorted to doing is:
(1)Use morphy(my_adj,'a') to "clean" my adjective first (I will call
the 'cleaned' adjective also 'my_adj' )
(2)For all adjective synsets returned with synsets(my_adj,'a'),
collect the lemmas with their full names e.g. so 'pure' becomes
'pure.a.01.pure'
(3)Use derivationally_related_forms() on these lemmas, and retain the
ones that are nouns. So if derv_lemma is a lemma returned by
derivationally_related_forms(), I check whether derv_lemma.synset.pos
== 'n'
As Maciej pointed out, there are often multiple noun lemmas that you
end up with, and in quite a few case these noun lemmas come from the
different noun synsets too. For ex:
Adjective: happy
Adjective synsets:
happy.a.01
felicitous.s.02
glad.s.02
happy.s.04
Lemma(s) found: 1
happiness (Lemma('happiness.n.01.happiness'))
Sense(s) found: 2
happiness.n.01 : state of well-being characterized by emotions ranging
from contentment to intense joy
happiness.n.02 : emotions experienced when in a state of well-being
In the above example the reason there are 2 synsets but only one lemma
is I do a set() on the list of lemmas to enumerate only distinct
lemmas.
And quite surprisingly, it turns out that:
>>> set([wn.lemma('happiness.n.01.happiness'), wn.lemma('happiness.n.01.happiness')])
set([Lemma('happiness.n.01.happiness')])
Then there are adjectives (these are tagged in my corpus), that have
no nominalizations. Ex. 'five', 'most','sparkling'
Some missing nominalizations are surprising. One example my dataset
turned up was 'intimate', which didnt have a derivationally related
form that could be linked to 'intimacy' (or any noun)
Step (2) ran into a slight problem too as there seems to be some
inconsistency wrt case (or maybe its my understanding) in how lemmas
are named.
While the full name of a lemma 'x' is supposed to be the
full_synset_name.x (
http://nltk.googlecode.com/svn/trunk/doc/api/
nltk.corpus.reader.wordnet.Lemma-class.html), I found this following
exception:
synset('union.a.01') in synsets('federal','a') ---> True
synset('union.a.01').lemmas : [Lemma('union.s.01.Union'),
Lemma('union.s.01.Federal')]
In case of multiple choices, I choose the first one. Seems to work out
well - here's a list of 10 adjectives and corresponding nouns:
commercial: commerce
new: newness
indistinguishable: indistinguishability
genuine: genuineness
absolute: absoluteness
necessary: necessity
young: young
distinctive: distinctiveness
important: importance
technical: technicality
Thanks, again!