AWN from nltk

843 views
Skip to first unread message

Meriem Khoudja

unread,
Jul 28, 2015, 5:49:38 PM7/28/15
to nltk-users
hello,

is it possible to import Arabic WordNet from nltk? if yes, how?

Francis Bond

unread,
Jul 28, 2015, 7:59:37 PM7/28/15
to nltk-...@googlegroups.com
For a recent NLTK (ver 3+) it is (using the Open Multilingual Wordnet
formatted version of the Arabic Wordnet, with code developed by
students at NTU):

>>>wn.synsets('bank')[0].lemma_names('arb')
['ضِفَّة']
>>> wn.synsets('ضِفَّة', lang='arb')[0].hypernyms()[0].lemma_names(lang='arb')
['اِنْحِدَار', 'مَيْل', 'مُنْحَدَر']
wn.synsets('ضِفَّة', lang='arb')[0].hypernyms()[0].lemma_names()
['slope', 'incline', 'side']

Just add lang='arb' to get Arabic.

I hope this helped,

Francis
http://compling.hss.ntu.edu.sg/omw/
> --
> You received this message because you are subscribed to the Google Groups
> "nltk-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to nltk-users+...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



--
Francis Bond <http://www3.ntu.edu.sg/home/fcbond/>
Division of Linguistics and Multilingual Studies
Nanyang Technological University

Francis Bond

unread,
Feb 14, 2016, 4:03:36 AM2/14/16
to arz...@gmail.com, nltk-users, Mrini Khalil, yasserr...@research.emi.ac.ma
You could try stripping them using the araby module:
https://pythonhosted.org/PyArabic/pyarabic.araby-module.html

I am CCing some experts who may have better answers (Khalil and Yassir).

On Sun, Feb 14, 2016 at 2:25 PM, <arz...@gmail.com> wrote:
> Hi there,
>
> Is it possible to get synsets of a word without having any tashkil/harakat ?
>
> thanks a lot.
> ARZ

arz...@gmail.com

unread,
Feb 16, 2016, 8:55:30 AM2/16/16
to nltk-users, bo...@ieee.org
Hi there,

Is it possible to get synsets of a word without having any tashkil/harakat ?

thanks a lot.
ARZ

On Wednesday, July 29, 2015 at 4:29:37 AM UTC+4:30, Francis Bond wrote:

Waheeb Ahmed

unread,
Sep 5, 2016, 5:12:53 AM9/5/16
to nltk-users, bo...@ieee.org, arz...@gmail.com
Hi,
I am using NLTK Version  3.2.1. when I type this >>>wn.synsets('bank')[0].lemma_names('arb')
I get:
['ضِفَّة']
But when I type this : wn.synsets('ضِفَّة', lang='arb')[0].hypernyms()[0].lemma_names(lang='arb')
I get this error:
"IndexError: list index out of range"
It should give this:
[اِنْحِدَار', 'مَيْل', 'مُنْحَدَر']
So why is that Erro?

Francis Bond

unread,
Sep 8, 2016, 7:10:06 AM9/8/16
to Waheeb Ahmed, nltk-users, arz...@gmail.com
G'day,

I think it is some issue with encoding in the python interpreter.

If we set:
>>> ba = wn.synsets('bank')[0].lemma_names('arb') [0]
>>> ba
'ضِفّة'
>>> wn.synsets(ba, lang='arb')[0].hypernyms()[0].lemma_names(lang='arb')
['اِنْحِدار', 'مُنْحدر', 'ميْل']
>>> wn.synsets('ضِفّة', lang='arb')[0].hypernyms()[0].lemma_names(lang='arb')
['اِنْحِدار', 'مُنْحدر', 'ميْل']


But if I try and cut and paste I get:
>>> wn.synsets('ضِفَّة', lang='arb')
[]

Even though it looks the same, it is not (the second character has an
extra line on top --- sorry I don't know the right name for it). So
somewhere the cutting and pasting is causing a problem. I think this
is not an nltk issue.
Reply all
Reply to author
Forward
0 new messages