Dear Marcel,
On 11/06/14 04:40 PM, Marcel Wlotzka wrote:
> Is there any easier example project that does not use the Semeval
> readers but a plain text?
I'm afraid not -- we'd love to provide some more examples but
unfortunately haven't found the time to do so yet. Though I agree a
plain text reader would be a good thing for us to implement.
> I already have downloaded and started the DKPro WSD GPL example project
> and changed the sense inventory to UBY. After running I can see in
> console that word senses are getting annotated. But I have not yet found
> out where these senses are stored in. If I use a simple Writer to output
> all annotations I cannot see any annotations that are giving me the word
> senses.
All word sense–related annotations are stored as UIMA annotations. A
DKPro WSD reader tags all words to be annotated in the input with a
WSDItem annotation. The disambiguation annotator iterates through all
of these WSDItem annotations and, if it can successfully disambiguate
the word, creates a new WSDResult annotation. The WSDResult contains
five features of interest:
1. wsdItem - this points to the WSDItem annotation the disambiguation
result applies to.
2. senseInventory - the name of the sense inventory used (e.g.,
"WordNet", "UBY")
3. disambiguationMethod - the name of the technique used to disambiguate
the word (e.g., "Lesk", "Random")
4. senses - an array of Sense annotations corresponding to the word
senses the disambiguator chose for this word
5. comment - an optional comment string
A Sense annotation consists of Strings representing the word sense ID
and (optionally) a sense description, and a Double representing the
disambiguator's confidence that this word sense is the correct one.
The WSDWriter also loops over all the WSDItem annotations in the CAS,
and for each one it displays a human-readable version of all the
WSDResults associated with it. If it's not working for you perhaps you
could post a minimal example and the output?
> My second problem was that I now wanted to create my own reader for
> simple text documents. For that I created a new project and imported all
> what I needed to use UBY and the WSD features.
>
> My reader currently annotates every token with LexicalItemConstituent
> and WSDItem.
>
> The LexicalItemConstituent gets a unique numeric ID and is declared as head.
>
> The WSDItem is currently set as noun for every token and
> the SubjectOfDisambiguation is the token itself. Of cause this is not
> the best but I wanted to create a simple pipeline and add the
> lemmatization and correct part of speech tags later. Even if for most of
> the words these information are not perfect DKPro WSD should be able to
> disambiguate a few tokens already.
>
> But: I do not get any outputs of DKPro in the WSD pipeline step. Do I
> miss anything?
>
> My pipeline does these steps: Reader => BreakIteratorSegmenter =>
> Annotator (LexicalItemConstituent and WSDItem annotations are added
> here) => simplifiedLesk => writer
Again, seems like it should work. Can you post a minimal example? If
it's too large then you can send it to me off-list and I'll have a look.
Regards,
Tristan
--
Tristan Miller, Research Scientist
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel:
+49 6151 16 6166 | Web:
http://www.ukp.tu-darmstadt.de/