Here is where I am so far.
I tracked down a free French wordnet at http://alpage.inria.fr/~sagot/wolf.html
It is a large xml file, with a few caveats (senses are sources, not
definitions). Any tips on how to interface or transform an xml wordnet
so the standard nltk functions can be used?
I'm still a bit fuzzy about how nltk hooks into the zipped (?) English
database. And of course how wolf is structured.
Here are some remarks from the site:
============
The WOLF is in the XML format used in the BalkaNet project. For now,
SENSE elements are filled with information on the sources thanks to
which the lexeme was found, and not with sense numbers.
For now, the WOLF and the Lefff are not mapped. In the following
months, Lefff entries should receive WOLF (i.e. PWN) synset ids.
============
Example of the contents of the database. First in English (search on
emphysema, not just as headword), then in French (emphysème):
<SYNSET><ID>ENG20-00006000-v</ID><POS>v</POS><SYNONYM></
SYNONYM><ILR><TYPE>hypernym</TYPE>ENG20-00005679-v</ILR><DEF>cough
spasmodically</DEF><USAGE>The patient with emphysema is hacking all
day</USAGE><DOMAIN>biology</DOMAIN><SUMO>Breathing<TYPE>+</TYPE></
SUMO></SYNSET>
<SYNSET><ID>ENG20-02603316-n</ID><POS>n</POS><SYNONYM></
SYNONYM><ILR><TYPE>hypernym</TYPE>ENG20-02802248-n</
ILR><ILR><TYPE>usage_domain</TYPE>ENG20-06425540-n</ILR><DEF>a
bronchodilator (trade names Ventolin or Proventil) used for asthma and
emphysema and other lung conditions; available in oral or inhalant
forms; side effects are tachycardia and shakiness</
DEF><DOMAIN>pharmacy</DOMAIN><SUMO>BiologicallyActiveSubstance<TYPE>+</
TYPE></SUMO></SYNSET>
<SYNSET><ID>ENG20-02856490-a</ID><POS>a</POS><SYNONYM></
SYNONYM><ILR><TYPE>derived</TYPE>ENG20-13341586-n</ILR><DEF>relating
to or resembling or being emphysema</DEF><DOMAIN>medicine</
DOMAIN><SUMO>DiseaseOrSyndrome<TYPE>+</TYPE></SUMO></SYNSET>
<SYNSET><ID>ENG20-03612030-n</ID><POS>n</POS><SYNONYM></
SYNONYM><ILR><TYPE>hypernym</TYPE>ENG20-02802248-n</
ILR><ILR><TYPE>usage_domain</TYPE>ENG20-06425540-n</ILR><DEF>a
bronchodilator (trade name Alupent) used to treat asthma and emphysema
and other lung conditions; available in oral or inhalant forms; side
effects include tachycardia and shakiness</DEF><DOMAIN>pharmacy</
DOMAIN><SUMO>BiologicallyActiveSubstance<TYPE>+</TYPE></SUMO></SYNSET>
<SYNSET><ID>ENG20-13455243-n</ID><POS>n</POS><SYNONYM></
SYNONYM><ILR><TYPE>hypernym</TYPE>ENG20-13448422-n</ILR><DEF>a chronic
emphysema of the horse that causes difficult expiration and heaving of
the flanks</DEF><DOMAIN>medicine</DOMAIN><SUMO>DiseaseOrSyndrome<TYPE>
+</TYPE></SUMO></SYNSET>
<SYNSET><ID>ENG20-13556463-n</ID><POS>n</POS><SYNONYM></
SYNONYM><ILR><TYPE>hypernym</TYPE>ENG20-13556330-n</ILR><DEF>form of
dyspnea in which the person can breathe comfortably only when standing
or sitting erect; associated with asthma and emphysema and angina
pectoris</DEF></SYNSET>
is an English entry; the following is a set of French entries:
<SYNSET><ID>ENG20-02802248-n</ID><POS>n</
POS><SYNONYM><LITERAL>bronchodilatateur<SENSE>0/1:enwikipedia</SENSE></
LITERAL></SYNONYM><ILR><TYPE>hypernym</TYPE>ENG20-03600430-n</
ILR><DEF>médicament destiné à traiter ou à prévenir la
bronchoconstriction ou bronchospasme, dans des maladies telles que
l'asthme, mais aussi l'emphysème, la pneumonie et les bronchites</
DEF><DOMAIN>pharmacy</DOMAIN><SUMO>BiologicallyActiveSubstance<TYPE>+</
TYPE></SUMO></SYNSET>
<SYNSET><ID>ENG20-13341586-n</ID><POS>n</
POS><SYNONYM><LITERAL>emphysème<SENSE>0/2:enwikipedia,frwiktionary</
SENSE></LITERAL></SYNONYM><ILR><TYPE>hypernym</TYPE>ENG20-13339337-n</
ILR><DEF>Au sens propre, l'emphysème est un terme d'anatomopathologie
désignant la destruction des voies aériennes distales</
DEF><DOMAIN>medicine</DOMAIN><SUMO>DiseaseOrSyndrome<TYPE>+</TYPE></
SUMO></SYNSET
Cheers,
Jordan
> --
> You received this message because you are subscribed to the Google Groups "nltk-users" group.
> To post to this group, send email to nltk-...@googlegroups.com.
> To unsubscribe from this group, send email to nltk-users+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/nltk-users?hl=en.
>
>
--
--------------------
Jordan Boyd-Graber
3155 AV Williams
University of Maryland
College Park, MD 20742
Voice: 920.524.9464
j...@umiacs.umd.edu
http://umiacs.umd.edu/~jbg
--------------------
"In theory, there is no difference between theory and practice. But,
in practice, there is."
- Jan L.A. van de Snepscheut
B
I assume it is http://nlpwww.nict.go.jp/wn-ja/index.en.html. (With an
Interesting article on bootstrapping new wordnets from other language
wordnets.) I found nothing about how they designed their sqlite
database, though I did find their python interface: http://gist.github.com/79057
Perhaps they simply used the format listed on the main wn site:
http://wnsql.sourceforge.net/ ?
All that does give me more grist for my mill than I can probably
handle!
Thanks!
B