mwetoolkit 1.0 is finally here!

33 views
Skip to first unread message

carlinho...@gmail.com

unread,
Apr 17, 2015, 9:26:04 AM4/17/15
to mweto...@googlegroups.com
Dear all,

We are very happy to announce a fresh new release of the mwetoolkit version 1.0


The mwetoolkit is a set of python scripts to deal with corpora and automatically extract multiword expressions. Even though it focuses on multiword expressions, the tool is quite complete and can also be useful in any corpus-based study in computational linguistics. It is very useful to perform advanced searches, lexicon extraction and filtering on POS-tagged and/or dependency-parsed corpora, independently of language, domain, MWE type, etc.

After some months of instability due to important modifications and improvements, you can finally download and use a much better version with additional features such as:

- full support to many file formats, including CONLL, Moses, plain text, XML, HTML, CSV, ARFF, gzip, etc.
- Improved documentation including a new online tutorial, accepted filetype descriptions, etc.
- more powerful regexp extraction patterns including negation, repetition, alternatives and several match modes (longest, shortest, overlap control)
- fully functional token-based annotation based on patterns or on lexicon projection
- Numerous bug fixes and corrections

Thanks a lot to all the anonymous and known users who notified us about the bugs and suggested us improvements. By the way, we have just created a public group to post questions and announcements, you can subscribe and share your questions and comments: https://groups.google.com/d/forum/mwetoolkit

We have much more improvements and features being developed right now, release 1.1 will be awesome!

Enjoy mwetoolkit 1.0 :-)

Carlos and Silvio
Reply all
Reply to author
Forward
0 new messages