Dear all,
We are very happy to announce a fresh new release of the mwetoolkit version 1.0
The mwetoolkit is a set of python scripts to deal with corpora and automatically extract multiword expressions. Even though it focuses on multiword expressions, the tool is quite complete and can also be useful in any corpus-based study in computational linguistics. It is very useful to perform advanced searches, lexicon extraction and filtering on POS-tagged and/or dependency-parsed corpora, independently of language, domain, MWE type, etc.
After some months of instability due to important modifications and improvements, you can finally download and use a much better version with additional features such as:
- full support to many file formats, including CONLL, Moses, plain text, XML, HTML, CSV, ARFF, gzip, etc.
- more powerful
regexp extraction patterns including negation, repetition, alternatives and several match modes (longest, shortest, overlap control)
- fully functional token-based
annotation based on patterns or on lexicon projection
- Numerous bug fixes and corrections
Thanks a lot to all the anonymous and known users who notified us about the bugs and suggested us improvements. By the way, we have just created a public group to post questions and announcements, you can subscribe and share your questions and comments:
https://groups.google.com/d/forum/mwetoolkit
We have much more improvements and features being developed right now, release 1.1 will be awesome!
Enjoy mwetoolkit 1.0 :-)
Carlos and Silvio