DKPro WSD usage question

43 views

Skip to first unread message

Shabnam Tafreshi

unread,

May 18, 2015, 6:07:26 AM5/18/15

to dkpro-w...@googlegroups.com

Would it be possible to use your tool to disambiguate number of word tokens in a blog dataset? My dataset is not annotated for any WSD and it is raw. Below I will demonstrate an example of my need:

Let's consider the following input:

Sentence: I would like to have a WSD tool with high accuracy.

token: like, pos: verb

Expected output:

pos: verb, sense: 02, synonym/meaning: wish

NLTK gives the following output:

Synset('wish'.v.02)

If your tool is capable to produce the above output (or similar to that regard), is there any tutorial, which points us to the required steps?

I appreciate your help.

Best,

Shabnam

Tristan Miller

unread,

May 18, 2015, 6:16:43 AM5/18/15

to dkpro-w...@googlegroups.com

Dear Shabnam,

Yes, DKPro WSD can do this. You'll need to write three things:

1) A small collection reader class to read in your raw text corpus and
apply WSDItem annotations to the words you want to be sense-annotated.

2) Another small class to print out the WSDResult annotations at the end
of the pipeline.

3) A pipeline which invokes your collection reader, then a POS tagger
and lemmatizer of your choice (several are included it DKPro Core), then
a WSD algorithm of your choice (several are included in DKPro WSD), and
then finally your output class.

I'm afraid there's no tutorial at the moment, though the DKPro WSD
source code includes several example pipelines that you can examine and
learn from.

Regards,
Tristan

--
Tristan Miller, Research Scientist
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/