IMS and CentralityDegree algorithm

49 views
Skip to first unread message

Luciano Del Corro

unread,
Jan 24, 2014, 10:13:22 AM1/24/14
to dkpro-w...@googlegroups.com
Hi 

I am trying to test IMS. The idea is to write a test as the example you have in Semeval1EnCGAWExample.java. I want to replace the SimpleLesk algorithm by IMS.

If I am correct the resource description should be something like

ExternalResourceDescription imsResource =
createExternalResourceDescription(ImsWsdDisambiguatorResource.class,
ImsWsdDisambiguatorResource.SENSE_INVENTORY_RESOURCE, wordnet21);

However I am having problems with the annotator. As I could guess from the code the annotator should be WSDAnnotatorDocumentDependentBasic.java 

However I am getting

Caused by: java.lang.IllegalArgumentException: Can not set de.tudarmstadt.ukp.dkpro.wsd.algorithm.WSDAlgorithmDocumentDependentBasic field de.tudarmstadt.ukp.dkpro.wsd.annotator.WSDAnnotatorDocumentDependentBasic.wsdMethod to de.tudarmstadt.ukp.dkpro.wsd.supervised.ims.resource.ImsWsdDisambiguatorResource

Please can you provide an example of how to create both the ExternalResourceDescription and AnalysisEngineDescription for the IMS?

It would be also nice if you could provide the same for the CentralityDegree algorithm.

Thank you in Advance
Luciano

Nicolai Erbs

unread,
Jan 24, 2014, 10:53:52 AM1/24/14
to dkpro-w...@googlegroups.com
Hi Luciano,

you should try and use WSDAlgorithmDocumentBasic instead of WSDAlgorithmDocumentDependentBasic.

Regards,
Nico


--
You received this message because you are subscribed to the Google Groups "DKPro WSD users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-wsd-use...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Tristan Miller

unread,
Jan 24, 2014, 11:19:57 AM1/24/14
to dkpro-w...@googlegroups.com
Greetings.

On 24/01/14 04:13 PM, Luciano Del Corro wrote:
> Hi
>
> I am trying to test IMS. The idea is to write a test as the example you
> have in Semeval1EnCGAWExample.java. I want to replace the SimpleLesk
> algorithm by IMS.
>
>
> It would be also nice if you could provide the same for the CentralityDegree algorithm.

To get Semeval1EnCGAWExample working with the degree centrality
algorithm, just make the following changes:

1. Create a new degree centrality annotator somewhere before you run
your pipeline. (You can optionally attach a GraphVisualizerResource if
you want to visualize the algorithm.)

ExternalResourceDescription graphVisualizer =
createExternalResourceDescription(GraphVisualizerResource.class);

ExternalResourceDescription degreeCentralityResource =
createExternalResourceDescription(
WSDResourceDegreeCentrality.class,
WSDResourceDegreeCentrality.SENSE_INVENTORY_RESOURCE,
wordnet21, WSDResourceDegreeCentrality.PARAM_MINIMUM_DEGREE,
"1", WSDResourceDegreeCentrality.PARAM_SEARCH_DEPTH, "4",
WSDResourceDegreeCentrality.GRAPH_VISUALIZER_RESOURCE,
graphVisualizer);

AnalysisEngineDescription degreeCentrality = createEngineDescription(
WSDAnnotatorCollectivePOS.class,
WSDAnnotatorCollectivePOS.WSD_ALGORITHM_RESOURCE,
degreeCentralityResource,
Sentence.class.getName(),
WSDAnnotatorCollectivePOS.PARAM_MAXIMUM_ITEMS_TO_ATTEMPT,
maxItemsToAttempt);


2. Change the evaluator so that it evaluates the degree centrality sense
assignments instead of the simplified Lesk ones:

AnalysisEngineDescription evaluator = createEngineDescription(
SingleExactMatchEvaluatorHTML.class,
SingleExactMatchEvaluatorHTML.PARAM_GOLD_STANDARD_ALGORITHM,
answerkey, SingleExactMatchEvaluatorHTML.PARAM_TEST_ALGORITHM,
DegreeCentralityWSD.class.getName(),
SingleExactMatchEvaluatorHTML.PARAM_BACKOFF_ALGORITHM,
MostFrequentSenseBaseline.class.getName(),
SingleExactMatchEvaluatorHTML.PARAM_OPEN_IN_BROWSER, true,
SingleExactMatchEvaluatorHTML.PARAM_OUTPUT_FILE,
"/tmp/WSDWriterHTML_evaluator.html",
SingleExactMatchEvaluatorHTML.PARAM_MAXIMUM_ITEMS_TO_ATTEMPT,
maxItemsToAttempt);


3. Make the pipeline run degreeCentrality instead of simplifiedLesk:

SimplePipeline.runPipeline(reader, answerReader,
convertSensevalToSensekey, mfsBaseline, degreeCentrality,
writer, evaluator);


With the previous tip from Nico you can probably work out the
corresponding changes to make the example use IMS.

Regards,
Tristan

--
Tristan Miller, Research Scientist
Ubiquitous Knowledge Processing Lab (UKP-TUDA)
Department of Computer Science, Technische Universität Darmstadt
Tel: +49 6151 16 6166 | Web: http://www.ukp.tu-darmstadt.de/

signature.asc

Nicolai Erbs

unread,
Jan 24, 2014, 11:34:14 AM1/24/14
to dkpro-w...@googlegroups.com
Hi Luciano,

I just tried to implement my tip but it resulted in the same exception.
Apparently, there is a bug in our re-implementation of IMS. I¹ve already
opened a bug: https://code.google.com/p/dkpro-wsd/issues/detail?id=56

We will keep you posted about our updates and will send you a running
example for IMS as soon as it is tested.

Regards
Nico

On 24/01/14 17:19, "Tristan Miller"

Luciano Del Corro

unread,
Jan 24, 2014, 11:49:08 AM1/24/14
to dkpro-w...@googlegroups.com, er...@ukp.informatik.tu-darmstadt.de
Tristan and Nico

Thanks a lot for the quick replies.

cheers
Luciano

Nicolai Erbs

unread,
Jan 25, 2014, 11:28:00 AM1/25/14
to dkpro-w...@googlegroups.com
Hi,

we created an annotator for IMS and added a wrapper for IMS. The following code is an example:

        ExternalResourceDescription wordnet = createExternalResourceDescription(

                LsrSenseInventoryResource.class,

                LsrSenseInventoryResource.PARAM_RESOURCE_NAME,"wordnet",

                LsrSenseInventoryResource.PARAM_RESOURCE_LANGUAGE,"en"

                );


        ExternalResourceDescription imsResource = createExternalResourceDescription(

                ImsWsdDisambiguatorResource.class,

                WSDResourceDocumentTextBasic.SENSE_INVENTORY_RESOURCE, wordnet,

                WSDResourceDocumentTextBasic.DISAMBIGUATION_METHOD,

                ImsWsdDisambiguator.class.getName());


        

        AnalysisEngineDescription imsAnnotator = createEngineDescription(

                ImsWSDAnnotator.class,

                ImsWSDAnnotator.WSD_ALGORITHM_RESOURCE, imsResource,

                WSDAnnotatorBase.PARAM_SET_SENSE_DESCRIPTIONS, false);

Regards,
Nico

Luciano Del Corro

unread,
Jan 25, 2014, 12:27:51 PM1/25/14
to dkpro-w...@googlegroups.com, er...@ukp.informatik.tu-darmstadt.de
Great! Thank you very much!

Luciano Del Corro

unread,
Jan 26, 2014, 12:06:22 PM1/26/14
to dkpro-w...@googlegroups.com, er...@ukp.informatik.tu-darmstadt.de
Hi Nicolai

Thanks a lot for the example. I have a couple of question regarding the IMS implementation.

1.

I tried to run the example but I am getting the following error

 de.tudarmstadt.ukp.dkpro.lexsemresource.exception.ResourceLoaderException: Unable to locate configuration file [resources.xml]

It seems that the file is missing. The error comes from

lsr = ResourceFactory.getInstance().get(resource, language); in LsrSenseInventory.java


2.

I tried to construct the inventory as in the Coarse grained example


ExternalResourceDescription imsResource = createExternalResourceDescription(
               ImsWsdDisambiguatorResource.class,
               WSDResourceDocumentTextBasic.SENSE_INVENTORY_RESOURCE, wordnet21,
               WSDResourceDocumentTextBasic.DISAMBIGUATION_METHOD,
               ImsWsdDisambiguator.class.getName()); 


This seems to make the algorithm work, but before the evaluation step I get the error

Caused by: de.tudarmstadt.ukp.dkpro.wsd.si.SenseInventoryException: Mapping from IMS to WSDItems cannot be performed.

Basically in ImsWSDAnnotarot.java the number of words is bigger that the number of disambiguated words. I examined the array generated and it seems that the array words contains many empty elements not present in senses.

3. I also tried to set the IMS algorithm to work with the coarse grained data. I set the parameters according to the bash example provided by IMS original code. Basically I replaced the corpus class name in the ImsWsdDesambiguator.java by setCorpusClassName("sg.edu.nus.comp.nlp.ims.corpus.CAllWordsCoarseTaskCorpus");

I get the error

Caused by: org.jdom.input.JDOMParseException: Error on line 1: Content is not allowed in prolog.
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:468)
    at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)
    at sg.edu.nus.comp.nlp.ims.corpus.CAllWordsCoarseTaskCorpus.load(CAllWordsCoarseTaskCorpus.java:76)
    at de.tudarmstadt.ukp.dkpro.wsd.supervised.ims.ImsWsdDisambiguator.test(ImsWsdDisambiguator.java:146)

It seems to me that the problem is that it is expecting XML format while the system is providing plain sentences.

Am i doing something wrong?

4. Finally a question regarding the coarse grained example (Semeval1EnCGExample.java)

Is it actually doing coarse grained disambiguation? I cannot see where the mapping between the senses and the clusters is happening. How can the clusters be incorporated?


Thanks a lot for your help
Luciano

Tristan Miller

unread,
Jan 27, 2014, 6:41:56 AM1/27/14
to dkpro-w...@googlegroups.com
Greetings.

On 26/01/14 06:06 PM, Luciano Del Corro wrote:
> Hi Nicolai
>
> Thanks a lot for the example. I have a couple of question regarding the
> IMS implementation.
>
> 1.
>
> I tried to run the example but I am getting the following error
>
> de.tudarmstadt.ukp.dkpro.lexsemresource.exception.ResourceLoaderException:
> Unable to locate configuration file [resources.xml]
>
> It seems that the file is missing. The error comes from
>
> lsr = ResourceFactory.getInstance().get(resource, language); in
> LsrSenseInventory.java

If you want to use DKPro LSR as your sense inventory then you need to
configure it with a resources.xml file. Unfortunately, it seems the
DKPro LSR website at <https://code.google.com/p/dkpro-lsr/> doesn't
provide any instructions on how to do this. I've opened a bug report
there: <https://code.google.com/p/dkpro-lsr/issues/detail?id=1>

If you'd like, in the meantime I can provide a sample resources.xml file
and basic instructions on how to modify it and where to put it.
However, you may find it more convenient to just use the extJWNL-backed
WordNetSenseKeySenseInventoryResource as your sense inventory; you've
presumably already got this working from your SemEval1EnCGAWExample.


Your second and third questions maybe Nico can deal with as I think
they're specific to IMS.


> 4. Finally a question regarding the coarse grained example
> (Semeval1EnCGExample.java)
>
> Is it actually doing coarse grained disambiguation? I cannot see where
> the mapping between the senses and the clusters is happening. How can
> the clusters be incorporated?

The example does coarse-grained disambiguation only in the sense that
the answer key specifies multiple correct answers, so if the
highest-scoring sense assigned by the SimplifiedLesk annotator happens
to match any of these correct answers, it will be counted as correct.

Of course, you are free to provide your own WSD annotator or evaluator
which reads in the cluster file distributed with the Semeval-1 CGAW data
set and makes more intelligent use it. For example, for my COLING 2012
paper I wrote an evaluator which read in the cluster file and summed
SimplifiedLesk's confidence scores for every sense per cluster, and then
selected the highest-scoring cluster. (I don't think I've contributed
this code to DKPro WSD yet, as it was specific to a much older
development version.)
signature.asc

Nicolai Erbs

unread,
Jan 27, 2014, 8:36:00 AM1/27/14
to dkpro-w...@googlegroups.com, Luciano Del Corro
Hi Luciano,

thanks for the feedback.

Regarding the second question: This error happens when the output words of IMS cannot be mapped to the input words. I just added the test case ImsWSDAnnotatorTest. The mapping in ImsWSDAnnotator might not be correct for every text. In that case the mapping method should be improved. Which text are you trying to disambiguate?

Luciano Del Corro

unread,
Jan 27, 2014, 9:07:47 AM1/27/14
to Nicolai Erbs, dkpro-w...@googlegroups.com
Hi Nicolai

thanks a lot. I am just trying to run the example for Semeval 2007 as in the SemEval1EnCGAWExample but instead of SimpleLesk I am using IMS.


        ExternalResourceDescription imsResource = createExternalResourceDescription(
                ImsWsdDisambiguatorResource.class,
                WSDResourceDocumentTextBasic.SENSE_INVENTORY_RESOURCE, wordnet21,
                WSDResourceDocumentTextBasic.DISAMBIGUATION_METHOD,
                ImsWsdDisambiguator.class.getName());
       
        AnalysisEngineDescription imsAnnotator = createEngineDescription(
                ImsWSDAnnotator.class,
                ImsWSDAnnotator.WSD_ALGORITHM_RESOURCE, imsResource,
                WSDAnnotatorBase.PARAM_SET_SENSE_DESCRIPTIONS, false);


Any clue on the third question?

Luciano

Nicolai Erbs

unread,
Jan 27, 2014, 10:01:56 AM1/27/14
to dkpro-w...@googlegroups.com, Luciano Del Corro
HI Luciano,

do you know which sentence/document is causing this problem? It should be straightforward to extend the current test and see why this problem occurs.

Unfortunately, I don’t have any experience running IMS with coarse grained data.

Regards,
Nicolai

Tristan Miller

unread,
Jan 27, 2014, 10:59:05 AM1/27/14
to dkpro-w...@googlegroups.com
Greetings.

On 27/01/14 12:41 PM, Tristan Miller wrote:
>> 1.
>>
>> I tried to run the example but I am getting the following error
>>
>> de.tudarmstadt.ukp.dkpro.lexsemresource.exception.ResourceLoaderException:
>> Unable to locate configuration file [resources.xml]
>>
>> It seems that the file is missing. The error comes from
>>
>> lsr = ResourceFactory.getInstance().get(resource, language); in
>> LsrSenseInventory.java
>
> If you want to use DKPro LSR as your sense inventory then you need to
> configure it with a resources.xml file. Unfortunately, it seems the
> DKPro LSR website at <https://code.google.com/p/dkpro-lsr/> doesn't
> provide any instructions on how to do this. I've opened a bug report
> there: <https://code.google.com/p/dkpro-lsr/issues/detail?id=1>

One of the DKPro LSR developers just posted documentation for the
configuration file at
<https://code.google.com/p/dkpro-core-asl/wiki/Configuration>.
signature.asc

Luciano Del Corro

unread,
Jan 29, 2014, 11:01:31 AM1/29/14
to Nicolai Erbs, dkpro-w...@googlegroups.com
It is the SemEval 2007 corpus, the same corpus specified in SemEval1EnCGAWExample. If you run in debug mode you can spot the mismatch between word and senses right at the beginning of the arrays, in the first sentence.

thanks
Luciano

Luciano Del Corro

unread,
Feb 22, 2014, 6:02:27 PM2/22/14
to dkpro-w...@googlegroups.com
I just saw this, sorry, it did not came to my mailbox. Sorry for the lare reply thank you very much.
Reply all
Reply to author
Forward
0 new messages