using NLTK for an italian corpus

393 views

Skip to first unread message

Paolo Eusebi

unread,

Feb 13, 2017, 2:49:52 PM2/13/17

to nltk-users

Hi everybody,

there is an option to work with an italian corpus with NLTK?

I mostly need to extract features like tokens and position tags.

Dimitriadis, A. (Alexis)

unread,

Feb 14, 2017, 4:44:28 AM2/14/17

to nltk-...@googlegroups.com

Hi Paolo,

I believe punctuation rules for Italian are close enough to English that you should be able to just use the default nltk tokenizers on Italian. Part of speech tagging is language-specific, so you will need to use a third-party tagger for Italian or train your own on a POS-tagged Italian corpus. The NLTK provides numerous tagger and classifier classes that you can train with your own data. Read the nltk book.

Alexis

--
You received this message because you are subscribed to the Google Groups "nltk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to nltk-users+...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages