Support of other languages

43 views

Skip to first unread message

behzadr...@gmail.com

unread,

Jun 9, 2016, 8:52:19 AM6/9/16

to JATE2

Hi pals,

I am working with JATE1.1. I want to know if there is any way for me to make the program support keyword extraction for other langues, like Persian or Arabic.

Thanks in advance!

Yours Sincerely,

Behzad

Jie Gao

unread,

Jun 10, 2016, 12:52:14 PM6/10/16

to JATE2, behzadr...@gmail.com

Hi,

As ziqi has already replied, You can try with JATE2.0, which is based on Solr framework so that it can be used to process large number of documents.

JATE2.0 is language independent tool, but you need language dependent components to work with your language, typically like tokeniser and part-of-speech (PoS) tagger.

We implemented OpenNLP tokeniser & PoS tagger to work with Solr as plugin. So, you can either choose to train a tokenisation model for Persian language by yourself ( see example via https://github.com/rfarahmand/PersianPoSTagger) or use an pre-trained one.

If you have more advanced knowledge of Solr, you can also choose to develop your tokeniser/PoS solr plugin (e.g., using standford parser, universal tagger) to work within JATE2.0.

For language independent candidate extraction method, you can try out n-gram based approach.

You can have a look at our paper to get an overview. Also, JATE2.0 wiki page contains sufficient information to make a quirk start of JATE2.0. We are still working on a complete version of wiki now.

Thanks for your interests. Please feel free to ask if you need any help with set-up.