using NLTK for an italian corpus

393 views
Skip to first unread message

Paolo Eusebi

unread,
Feb 13, 2017, 2:49:52 PM2/13/17
to nltk-users
Hi everybody,
there is an option to work with an italian corpus with NLTK?
I mostly need to extract features like tokens and position tags.

Dimitriadis, A. (Alexis)

unread,
Feb 14, 2017, 4:44:28 AM2/14/17
to nltk-...@googlegroups.com
Hi Paolo,

I believe punctuation rules for Italian are close enough to English that you should be able to just use the default nltk tokenizers on Italian. Part of speech tagging is language-specific, so you will need to use a third-party tagger for Italian or train your own on a POS-tagged Italian corpus. The NLTK provides numerous tagger and classifier classes that you can train with your own data. Read the nltk book.

Alexis

Dr. Alexis Dimitriadis | Assistant Professor and Senior Research Fellow | Utrecht Institute of Linguistics OTS | Utrecht University | Trans 10, 3512 JK Utrecht, room 2.33 | +31 30 253 65 68 | a.dimi...@uu.nl | www.hum.uu.nl/medewerkers/a.dimitriadis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages