NLTK and Greek

388 views
Skip to first unread message

anarchos78

unread,
May 5, 2015, 7:51:23 AM5/5/15
to nltk-...@googlegroups.com
Hi,

I would like to know if anyone has any experience using NLTK with Greek corpora?
What I would like to do is to extract specific names ad places (Named-entity recognition), based on custom rules.
Is NLTK the appropriate tool to do such task or do I have to look elsewhere? (openNLP)
Are there any tutorials on how to train NLTK for Greek?

Thanks in advance

Alexis Dimitriadis

unread,
May 5, 2015, 9:46:44 AM5/5/15
to nltk-...@googlegroups.com
The NLTK is perfectly well suited to working with Greek. It doesn't come with any resources trained on Greek, but it provides interfaces for linking and using any resources you do have, and for using them to train your own tools.

You don't need Greek-specific tutorials: You can train NLTK tools for Greek just like you'd train it for any European language that contains accented characters, namely: Specify the appropriate encoding when reading in the corpus, then proceed just like you would for English. Take a look at the NLTK book's materials on classification, and at the source for the named entity recognizer (which is statistical, not rule-based).

Alexis


Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

brother rain

unread,
Jun 2, 2015, 10:57:41 PM6/2/15
to nltk-...@googlegroups.com, rigasath...@gmail.com
Hi anarchos78,

I want exact same thing with Vietnamese. So are you still working with NLTK? Where did you get so far?
Reply all
Reply to author
Forward
0 new messages