New Stemmer for German

26 views
Skip to first unread message

Leonie Weissweiler

unread,
Jun 8, 2018, 8:03:57 PM6/8/18
to nltk-dev

Hi everyone,

in Pull Request #2039, I've added a new stemmer for German which I developed. You can read the paper about it here:

Leonie Weißweiler, Alexander Fraser (2017). Developing a Stemmer for German Based on a Comparative Analysis of Publicly Available Stemmers. In Proceedings of the German Society for Computational Linguistics and Language Technology (GSCL)

I'm new here, but as I understand it this is a new feature so I'm making a post about it here.

The paper compared four existing German stemmers with CISTEM, the new one. We found that CISTEM had better stemming performance across two automatically compiled gold standards, and dramatically improved runtime. As the Snowball stemmer is the only stemmer available through NLTK at the moment, I think CISTEM would be a valuable addition for programmers wishing to stem German. It also includes a segmenter, which behaves similarly but also returns the stripped suffix. 

Reply all
Reply to author
Forward
0 new messages