Similarity features

13 views

Skip to first unread message

Ulrike Pado

unread,

Sep 25, 2015, 3:50:06 AM9/25/15

to dkpro-tc-users

Dear all,

I've been playing with the similarity features for document pairs. I'm using version 0.7.0-SNAPSHOT of dkpro-tc-features-pair-similarity (from the build server as described at https://dkpro.github.io/dkpro-core/pages/setup-maven.html) and added the features (e.g. CosineFeatureExtractor or GreedyStringTilingFeatureExtractor) to the feature list prior to training.

Preprocessing is going well, but the FeatureExtraction step runs for hours (>8 for 5000 pairs) without producing output. Has anyone encountered this, or are there any ideas why this might be? The problem is the same for all the pair similarity features, including the SimilarityPairFeatureExtractor with different similarity resources.

I assume I just forgot to do something or did something wrong, but I can't spot the problem right now.

Best,

Ulrike

Hochschule für Technik Stuttgart
Fakultät Vermessung, Informatik und Mathematik

Prof. Dr. Ulrike Pado
Professorin für Informatik, Fachgebiet Computerlinguistik

Büro: Bau 2, Zimmer 449

Schellingstr. 24
70174 Stuttgart
www.hft-stuttgart.de

T +49 (0)711 8926 2811
F +49 (0)711 8926 2553
ulrik...@hft-stuttgart.de

signature.asc

Johannes Daxenberger

unread,

Sep 25, 2015, 6:35:02 AM9/25/15

to Ulrike Pado, dkpro-tc-users

Hi Ulrike,

by increasing the debug level of your logger (e.g. setting
log4j.rootLogger=info, file, stdout in log4j.properties), you should get
information on whether the experiments hangs somewhere when initializing
the task, or whether the processing of one (or all) instances takes so
long. I would suspect that you have one or more instances with really long
texts, and this comparison hangs forever. If that is the case, you can add
a check and discard long text before doing measure.getSimilarity(text1,
text2). If the experiments hangs before anything is processed, there is
probably a problem with the instantiation of one of your feature
extractors.

Best,
Johannes

Am 25.09.15 09:50 schrieb "Ulrike Pado" unter
<dkpro-t...@googlegroups.com on behalf of
ulrik...@hft-stuttgart.de>:

>--
>You received this message because you are subscribed to the Google Groups
>"dkpro-tc-users" group.
>To unsubscribe from this group and stop receiving emails from it, send an
>email to dkpro-tc-user...@googlegroups.com.
>For more options, visit https://groups.google.com/d/optout.

Reply all

Reply to author

Forward

0 new messages