using SemEval 2013 baseline

Anna Kazantseva

unread,

Dec 10, 2013, 2:15:38 PM12/10/13

to dkpro-simil...@googlegroups.com

Hi everybody,

I have yet another question about DKProSimilarity. I would like to use the model specified as SemEval 2013 baseline to measure similarity between pairs of sentences on the fly. What is the best way to go about it?

I managed to get dkpro.similarity.experiments.sts2013baseline.Pipeline to train/test and it seems to work. I want now to load the model and repeatedly run it on pairs of sentences. Is creating another UIMA pipeline the best way to go about it? Or is there another possibility?

Thanks in advance,

Anna

Torsten Zesch

unread,

Dec 10, 2013, 4:28:22 PM12/10/13

to Anna Kazantseva, DKPro Similarity Users

Hey Anna.

If train/test already works, the easiest way would be to just add another dataset to be tested that contains your text pairs.

In case you want to do real "on-the-fly" computation, I have added a new class to the sts-2013-baseline module that shows the general principle.
However, there still seems to be a bug that I cannot find right now.

I also want to note that there is an implementation of the STS task using the new and more flexible DKPro Text Classification framework:

https://github.com/zesch/semeval

-Torsten

--
You received this message because you are subscribed to the Google Groups "DKPro Similarity Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-similarity-...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Torsten Zesch

unread,

Dec 10, 2013, 4:42:08 PM12/10/13

to DKPro Similarity Users

ok, found and fixed the bug.
OnTheFlyComputation pipeline should now work.

-Torsten

2013/12/10 Torsten Zesch <torste...@gmail.com>:

Anna Kazantseva

unread,

Dec 13, 2013, 3:41:03 PM12/13/13

to dkpro-simil...@googlegroups.com

Hi Torsten,

Thank you so much for that class and for all your help. I will try see if I can use it right away.

But in the meanwhile, I was trying to use the first way that you suggested, i.e., by creating a dataset. But for some reason now my dkpro.similarity.experiments.sts2013baseline.Pipeline no longer works in Test mode (either -T or -S). It works in train mode though (-D). The error I get in test mode is the following (it occurs after features are generated and after arffs are generated too):

[.....]
Generating ARFF file
- done
Generating ARFF file
- done
log4j:WARN No appenders could be found for logger (org.springframework.core.io.support.PathMatchingResourcePatternResolver).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
Exception in thread "main" org.apache.uima.resource.ResourceInitializationException: Unexpected Exception thrown when initializing Custom Resource "dkpro.similarity.uima.resource.ml.LinearRegressionResource" from descriptor "<unknown>".
    at org.apache.uima.impl.CustomResourceFactory_impl.produceResource(CustomResourceFactory_impl.java:94)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:243)
    at org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:573)
    at org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:450)
    at org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:182)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:123)
    at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314)
    at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425)
    at org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:204)
    at org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:278)
    at dkpro.similarity.experiments.sts2013baseline.util.Evaluator.runLinearRegression(Evaluator.java:90)
    at dkpro.similarity.experiments.sts2013baseline.Pipeline.runTest(Pipeline.java:150)
    at dkpro.similarity.experiments.sts2013baseline.Pipeline.main(Pipeline.java:96)
Caused by: java.lang.IllegalArgumentException: Errors initializing [class dkpro.similarity.uima.resource.ml.LinearRegressionResource]
Field 'logFilter' is required
    at org.apache.uima.fit.component.initialize.ConfigurationParameterInitializer.initialize(ConfigurationParameterInitializer.java:178)
    at org.apache.uima.fit.component.initialize.ConfigurationParameterInitializer.initialize(ConfigurationParameterInitializer.java:200)
    at org.apache.uima.fit.component.initialize.ConfigurationParameterInitializer.initialize(ConfigurationParameterInitializer.java:214)
    at org.apache.uima.fit.component.Resource_ImplBase.initialize(Resource_ImplBase.java:57)
    at dkpro.similarity.uima.resource.ml.LinearRegressionResource.initialize(LinearRegressionResource.java:35)
    at org.apache.uima.impl.CustomResourceFactory_impl.produceResource(CustomResourceFactory_impl.java:92)
    ... 18 more

Thanks for any ideas!

Anna

>> email to dkpro-similarity-users+unsub...@googlegroups.com.

Anna Kazantseva

unread,

Dec 13, 2013, 3:45:29 PM12/13/13

to dkpro-simil...@googlegroups.com

Hello again Torsten,

It seems that OnTheFlyComputation gives me the same error in much the same place ( after generating an ARFF). Must be I am doing something wrong...

*********************

Exception in thread "main" org.apache.uima.resource.ResourceInitializationException: Unexpected Exception thrown when initializing Custom Resource "dkpro.similarity.uima.resource.ml.LinearRegressionResource" from descriptor "<unknown>".
    at org.apache.uima.impl.CustomResourceFactory_impl.produceResource(CustomResourceFactory_impl.java:94)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:243)
    at org.apache.uima.resource.impl.ResourceManager_impl.registerResource(ResourceManager_impl.java:573)
    at org.apache.uima.resource.impl.ResourceManager_impl.initializeExternalResources(ResourceManager_impl.java:450)
    at org.apache.uima.resource.Resource_ImplBase.initialize(Resource_ImplBase.java:182)
    at org.apache.uima.analysis_engine.impl.AnalysisEngineImplBase.initialize(AnalysisEngineImplBase.java:157)
    at org.apache.uima.analysis_engine.impl.PrimitiveAnalysisEngine_impl.initialize(PrimitiveAnalysisEngine_impl.java:123)
    at org.apache.uima.impl.AnalysisEngineFactory_impl.produceResource(AnalysisEngineFactory_impl.java:94)
    at org.apache.uima.impl.CompositeResourceFactory_impl.produceResource(CompositeResourceFactory_impl.java:62)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:269)
    at org.apache.uima.UIMAFramework.produceResource(UIMAFramework.java:314)
    at org.apache.uima.UIMAFramework.produceAnalysisEngine(UIMAFramework.java:425)
    at org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:204)
    at org.apache.uima.fit.factory.AnalysisEngineFactory.createEngine(AnalysisEngineFactory.java:278)
    at dkpro.similarity.experiments.sts2013baseline.util.Evaluator.runLinearRegression(Evaluator.java:90)

at dkpro.similarity.experiments.sts2013baseline.OnTheFlyComputation.main(OnTheFlyComputation.java:72)

Caused by: java.lang.IllegalArgumentException: Errors initializing [class dkpro.similarity.uima.resource.ml.LinearRegressionResource]
Field 'logFilter' is required
    at org.apache.uima.fit.component.initialize.ConfigurationParameterInitializer.initialize(ConfigurationParameterInitializer.java:178)
    at org.apache.uima.fit.component.initialize.ConfigurationParameterInitializer.initialize(ConfigurationParameterInitializer.java:200)
    at org.apache.uima.fit.component.initialize.ConfigurationParameterInitializer.initialize(ConfigurationParameterInitializer.java:214)
    at org.apache.uima.fit.component.Resource_ImplBase.initialize(Resource_ImplBase.java:57)
    at dkpro.similarity.uima.resource.ml.LinearRegressionResource.initialize(LinearRegressionResource.java:35)
    at org.apache.uima.impl.CustomResourceFactory_impl.produceResource(CustomResourceFactory_impl.java:92)

... 17 more
********************

Torsten Zesch

unread,

Dec 13, 2013, 3:56:00 PM12/13/13

to DKPro Similarity Users

Hi Anna,

this is related to a bug that I recently found.
I already fixed this in the latest snapshot version.
If you found the OnTheFlyComputation, I am assuming that you have
checked out the latest snapshot.
Please try an update from SVN to get the fix.
It should work then, if not just get back to the list :)

-Torsten

2013/12/13 Anna Kazantseva <mmac...@gmail.com>:

>>> >> email to dkpro-similarity-...@googlegroups.com.

>>> >> For more options, visit https://groups.google.com/groups/opt_out.
>

> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to dkpro-similarity-...@googlegroups.com.

Anna Kazantseva

unread,

Dec 15, 2013, 3:35:25 PM12/15/13

to dkpro-simil...@googlegroups.com

Hi Torsten,

Thanks so much for that example, after I updated all packages it works fine. This will make my life so much easier!

Could I please ask you for some details about the upper and lower bounds of that text similarity metric (one used in the UKP Semeval STS system in 2012)? For the examples that you provide in your code I get 2.1834 ('This is an example.' vs. 'I need an example.') and 1.4733 ('Example this is.' vs. 'Colorless green ideas sleep furiously.'). If I try two identical sentences I get something around 2.95. So I guess the result is not normalized to be between 0 and 1. What are the bounds? Does it depend on the length of the text? I am asking because I would like to incorporate the results of my own metric, very different but the two numbers must be somehow normalized to be in the same range.

I tried to find the answer in the paper 'UKP: Computing Semantic Textual Similarity by Combining Multiple Content Similarity Measures' but it does not seem to be there. Any pointers?

Thanks in advance!

Anna

>>> >> email to dkpro-similarity-users+unsub...@googlegroups.com.

>>> >> For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to dkpro-similarity-users+unsub...@googlegroups.com.

Torsten Zesch

unread,

Dec 15, 2013, 5:17:27 PM12/15/13

to DKPro Similarity Users

The STS data has been annotated in the range from 0 and 5 and the
system learns to output a number in the same range.
Single measures mostly use [0,1] intervalls.

-Torsten

2013/12/15 Anna Kazantseva <mmac...@gmail.com>:

>> >>> >> email to dkpro-similarity-...@googlegroups.com.

>> >>> >> For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "DKPro Similarity Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an

>> > email to dkpro-similarity-...@googlegroups.com.

>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to dkpro-similarity-...@googlegroups.com.

Anna Kazantseva

unread,

Dec 16, 2013, 2:54:03 PM12/16/13

to dkpro-simil...@googlegroups.com

Thanks a lot, that what I needed to know!

Anna

>> >>> >> email to dkpro-similarity-users+unsub...@googlegroups.com.

>> >>> >> For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "DKPro Similarity Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an

>> > email to dkpro-similarity-users+unsub...@googlegroups.com.

>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to dkpro-similarity-users+unsub...@googlegroups.com.

Anna Kazantseva

unread,

Dec 29, 2013, 9:44:34 PM12/29/13

to dkpro-simil...@googlegroups.com

Hi again Torsten and everybody,

I hope I can ask another question about the STS baseline. Using dkpro.similarity.experiments.sts2013baseline.OnTheFlyComputation I managed to measure similarity between arbitrary pairs on sentences on the fly, just as I needed. But the problem is that with writing out arffs for each pair, this is really quite slow.

Torsten suggested that a better way may be to create another dataset containing my pairs of sentences and then classify those. However, I am still mostly interested in similarity between pair of sentences, not the overall result for a dataset. How can I extract individual judgements after running the system? What classes should I be looking at?

Thanks for any help!

Anna

>> >>> >> email to dkpro-similarity-users+unsub...@googlegroups.com.

>> >>> >> For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "DKPro Similarity Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an

>> > email to dkpro-similarity-users+unsub...@googlegroups.com.

>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to dkpro-similarity-users+unsub...@googlegroups.com.

Torsten Zesch

unread,

Jan 4, 2014, 8:02:48 AM1/4/14

to DKPro Similarity Users

Hi Anna,

happy New Year.

If you just want to output the computed scores for a set of sentence
pairs, you can have a look at the "Evaluator" class.

It should be quite easy to simply output the computed scores.

-Torsten

2013/12/30 Anna Kazantseva <mmac...@gmail.com>:

>> >> >>> >> email to dkpro-similarity-...@googlegroups.com.

>> >> >>> >> For more options, visit
>> >> >>> >> https://groups.google.com/groups/opt_out.
>> >> >
>> >> > --
>> >> > You received this message because you are subscribed to the Google
>> >> > Groups
>> >> > "DKPro Similarity Users" group.
>> >> > To unsubscribe from this group and stop receiving emails from it,
>> >> > send
>> >> > an

>> >> > email to dkpro-similarity-...@googlegroups.com.

>> >> > For more options, visit https://groups.google.com/groups/opt_out.
>> >
>> > --
>> > You received this message because you are subscribed to the Google
>> > Groups
>> > "DKPro Similarity Users" group.
>> > To unsubscribe from this group and stop receiving emails from it, send
>> > an

>> > email to dkpro-similarity-...@googlegroups.com.

>> > For more options, visit https://groups.google.com/groups/opt_out.
>
> --
> You received this message because you are subscribed to the Google Groups
> "DKPro Similarity Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an

> email to dkpro-similarity-...@googlegroups.com.

shashi parashar

unread,

Mar 6, 2017, 12:48:28 AM3/6/17

to DKPro Similarity Users

Hi Zesch,

I have to train a model based on the following training data set attributes.

pair_ID sentence_A sentence_B relatedness_score(0-5) entailment_judgment

I want to use dkpro-similarity-experiments-sts-2013-baseline-gpl to train the model. Do I need to tailor my training data set to use this pipeline?

Could you please point me to the java files to achieve this?

Best,

Shashi

Torsten Zesch

unread,

Mar 6, 2017, 2:47:44 PM3/6/17

to shashi parashar, DKPro Similarity Users

You could have a look here:

https://github.com/dkpro/dkpro-similarity/blob/master/dkpro-similarity-experiments-sts-2013-baseline-gpl/src/main/java/org/dkpro/similarity/experiments/sts2013baseline/OnTheFlyComputation.java

All you need is sentence A and B from your file and feed that into the code.

-Torsten

--
You received this message because you are subscribed to the Google Groups "DKPro Similarity Users" group.

To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-similarity-users+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

shashi

unread,

Mar 6, 2017, 3:07:15 PM3/6/17

to Torsten Zesch, DKPro Similarity Users

Thank you Mr. Zesch.

I tried to set up the maven project using the pom kept at https://github.com/dkpro/dkpro-similarity/blob/master/dkpro-similarity-experiments-sts-2013-baseline-gpl/pom.xml

But It's giving me following error:

Project build error: Non-resolvable parent POM : Could not find artifact org.dkpro.similarity:dkpro-similarity-experiments:pom:2.2.0 in ukp-oss-releases (http://zoidberg.ukp.informatik.tu-darmstadt.de/artifactory/public-releases/) and 'parent.relativePath' points at wrong local POM pom.xml /com.shashi.test

line 3 Maven pom Loading Problem

Is there an updated version of this pom?

Have a nice day!

Best,

Shashi

To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-similarity-users+unsubscri...@googlegroups.com.

Torsten Zesch

unread,

Mar 6, 2017, 3:18:43 PM3/6/17

to shashi, DKPro Similarity Users

That is hard to resolve from here.

Are you using the latest snapshot?

I think you need to checkout at least all the experiment projects including the experiments-parent in order to make this work.

-Torsten

Reply all

Reply to author

Forward