Reusing output of previous tasks

6 views
Skip to first unread message

gla...@gmail.com

unread,
Feb 14, 2017, 8:43:31 AM2/14/17
to dkpro-tc-users
Hi all,

I'm trying to reuse the output of previous task executions by setting the execution policy:

batch.setExecutionPolicy(ExecutionPolicy.USE_EXISTING)

I did notice that I need to set unique feature extractor names in order for this to work, e.g.:

Dimension.create(Constants.DIM_FEATURE_SET,
new TcFeatureSet(
TcFeatureFactory.create("f1", NrOfTokensPerSentence.class),
TcFeatureFactory.create("f2", LuceneCharacterNGram.class,
LuceneCharacterNGram.PARAM_NGRAM_USE_TOP_K, 20,
LuceneCharacterNGram.PARAM_NGRAM_MIN_N, 2,
LuceneCharacterNGram.PARAM_NGRAM_MAX_N, 5))

At first this seemed to work ok, only the evaluation task was run again. However, after changing the feature extractors and leaving the preprocessing alone, the init task was run again. The reason seems to be that the feature set is also a discriminator of the init task and therefore changing the feature set means that the previous execution of the init task cannot be matched.

Is this intended behavior? It would be very much more helpful if I can change the extracted features while reusing the preprocessing output since preprocessing can take some time on large corpora (and changing the feature set should have no influence on the output of the init task?).

Cheers,
Günter

Johannes Daxenberger

unread,
Feb 15, 2017, 6:03:45 AM2/15/17
to gla...@gmail.com, dkpro-tc-users
Hi,

I fear this is due to InitTask, which has a dimension for the feature extractors:

@Discriminator(name = DIM_FEATURE_SET)
private TcFeatureSet featureExtractors;

Thus, whenever the feature set changes, it has to be re-run, no matter what. I assume the only reason for this dimension is that the ValidityCheckConnector checks the validity of the experiment, among other things whether the feature extractors used are valid for the specific feature mode requested by the experiment.

The only way to avoid this would probably be to manually exclude this dimension from InitTask (and all references to featureExtractors).
If anybody knows a better solution, let me know.

Best,
Johannes

Am 14.02.17, 14:43 schrieb "dkpro-t...@googlegroups.com im Auftrag von gla...@gmail.com" <dkpro-t...@googlegroups.com im Auftrag von gla...@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "dkpro-tc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-tc-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Richard Eckart de Castilho

unread,
Feb 15, 2017, 9:37:37 AM2/15/17
to Johannes Daxenberger, gla...@gmail.com, dkpro-tc-users
Hi,

isn't it possible to add another task for the pre-processing *before* the init task?

Cheers,

-- Richard

Johannes Daxenberger

unread,
Feb 18, 2017, 9:51:39 AM2/18/17
to Richard Eckart de Castilho, gla...@gmail.com, dkpro-tc-users
Hi,

the whole reason for the checks run by the InitTask is to make sure that experiments with invalid settings are stopped early (i.e. *before* preprocessing).
The best way to achieve what we need is probably to sneak the parameter ``featureExtractors`` (set as dimension) into InitTask, without using the discriminator annotation. Theoretically, that should be a simple modification to the InitTask, but I haven’t thought much about potential side effects.

Best,
Johannes

Am 15.02.17, 15:37 schrieb "dkpro-t...@googlegroups.com im Auftrag von Richard Eckart de Castilho" <dkpro-t...@googlegroups.com im Auftrag von richard...@gmail.com>:
Reply all
Reply to author
Forward
0 new messages