TaskContextNotFoundException - Pre-executed tasks not being imported

21 views
Skip to first unread message

Pedro Santos

unread,
Oct 13, 2014, 12:10:16 PM10/13/14
to dkpro-t...@googlegroups.com
Hi,

I am trying to run a pipeline using dkpro lab 0.11 in groovy, and unfortunately I getting a TaskContextNotFoundException.
I extend a BatchTask, and initialize it by the following way:

       if ((experimentName == null) || (preprocessingPipeline == null))

        {
            throw new IllegalStateException(
                    "You must set Experiment Name, DataWriter and Preprocessing pipeline.");
        }

        // check the validity of the experiment setup first
        checkTask = new ValidityCheckTask();

        // preprocessing on training data
        preprocessTask = new PreprocessTask();
        preprocessTask.setPreprocessingPipeline(preprocessingPipeline);
        preprocessTask.setTesting(false);
        preprocessTask.setType(preprocessTask.getType() + "-" + experimentName);

        // get some meta data depending on the whole document collection that we need for training
        metaTask = new MetaInfoTask();
        metaTask.setType(metaTask.getType() + "-" + experimentName);
        metaTask.addImport(preprocessTask, PreprocessTask.OUTPUT_KEY_TRAIN,
                MetaInfoTask.INPUT_KEY);

        // feature extraction on training data
        featuresTask = new ExtractFeaturesTask();
        featuresTask.setType(featuresTask.getType() + "-" + experimentName);
        featuresTask.addImport(metaTask, MetaInfoTask.META_KEY);
        featuresTask.addImport(preprocessTask, PreprocessTask.OUTPUT_KEY_TRAIN,
                ExtractFeaturesTask.INPUT_KEY);

        // test task operating on the models of the feature extraction train and test tasks
        clusteringTask = new ClusteringTask();
        clusteringTask.setType(clusteringTask.getType() + "-" + experimentName);

        if (innerReports != null) {
            for (Class<? extends Report> report : innerReports) {
             clusteringTask.addReport(report);
            }
        }

        clusteringTask.addImport(featuresTask, ExtractFeaturesTask.OUTPUT_KEY,
                ClusteringTask.CLUSTERING_TASK_INPUT_KEY);

        // DKPro Lab issue 38: must be added as *first* task
        addTask(checkTask);
        addTask(preprocessTask);
        addTask(metaTask);
        addTask(featuresTask);
        addTask(clusteringTask);

However, even though the preprocessing task is executed without any error, looks like the tasks which should import it, metaTask and featuresTaks, cannot find it. This is the stackTrace of the exception thrown:


Exception in thread "main" de.tudarmstadt.ukp.dkpro.lab.storage.UnresolvedImportException: 
 -Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.MetaInfoTask-20NewsGroupsClustering-Groovy] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy/preprocessorOutputTrain]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy] has never been executed.
 -Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-20NewsGroupsClustering-Groovy] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy/preprocessorOutputTrain]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy] has never been executed.
 -Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.experiments.usefulcomments.tasks.ClusteringTask-20NewsGroupsClustering-Groovy] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-20NewsGroupsClustering-Groovy/output]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-20NewsGroupsClustering-Groovy] has never been executed.; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.UnresolvedImportException: Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.MetaInfoTask-20NewsGroupsClustering-Groovy] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy/preprocessorOutputTrain]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy] has never been executed.
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.executeConfiguration(BatchTask.java:282)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.execute(BatchTask.java:185)
at de.tudarmstadt.ukp.dkpro.tc.experiments.usefulcomments.tasks.BatchTaskClustering.execute(BatchTaskClustering.java:78)
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.ExecutableTaskEngine.run(ExecutableTaskEngine.java:55)
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.DefaultTaskExecutionService.run(DefaultTaskExecutionService.java:48)
at de.tudarmstadt.ukp.dkpro.lab.Lab.run(Lab.java:97)
at de.tudarmstadt.ukp.dkpro.lab.Lab$run.call(Unknown Source)
at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:45)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:108)
at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:116)
at de.tudarmstadt.ukp.dkpro.tc.experiments.usefulcomments.scripts.ClusteringPipeline.main(ClusteringPipeline.groovy:98)
Caused by: de.tudarmstadt.ukp.dkpro.lab.storage.UnresolvedImportException: Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.MetaInfoTask-20NewsGroupsClustering-Groovy] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy/preprocessorOutputTrain]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy] has never been executed.
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask$ScopedTaskContext.resolve(BatchTask.java:549)
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.DefaultTaskContextFactory.resolveImports(DefaultTaskContextFactory.java:142)
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.DefaultTaskContextFactory.createContext(DefaultTaskContextFactory.java:98)
at de.tudarmstadt.ukp.dkpro.lab.uima.engine.simple.SimpleExecutionEngine.run(SimpleExecutionEngine.java:80)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.runNewExecution(BatchTask.java:350)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.executeConfiguration(BatchTask.java:255)
... 10 more
Caused by: de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-20NewsGroupsClustering-Groovy] has never been executed.
at de.tudarmstadt.ukp.dkpro.lab.engine.impl.ImportUtil.createContextNotFoundException(ImportUtil.java:125)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.getLatestExecution(BatchTask.java:327)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.access$000(BatchTask.java:70)
at de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask$ScopedTaskContext.resolve(BatchTask.java:546)
... 15 more


If someone has a clue, I would be glad to hear. I might be missing something really silly here.

Regards,
Pedro

Richard Eckart de Castilho

unread,
Oct 14, 2014, 10:59:21 AM10/14/14
to Pedro Santos, dkpro-t...@googlegroups.com
Hi,

try turning on "debug" level logging for these classes

de.tudarmstadt.ukp.dkpro.lab.engine.impl.ImportUtil
de.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask

Then you should see what discriminators do not match when it searches for
the existing context.

Cheers,

-- Richard

Martin Wunderlich

unread,
Jun 5, 2015, 1:42:42 PM6/5/15
to dkpro-t...@googlegroups.com
Hi Pedro,

Did you ever get this problem resolved? I am having the same issue running a single-label unit classification task in train-test mode.
Below is an example of the logging output that I am getting and the exception that is thrown in the end.

Just to provide some background: I have a longish text which my data reader annotates with unit-level annotations on a segment level based on certain exact string matches. To create training and test data, I have split this text in the middle to get two files: one for training and one for testing. I am using OpenNlpSegmenter for pre-processing and LuceneNGramUFE to get the top 100 unigrams as features (for starters, there will be more). Except the experiment never gets as far as feature extraction because of this error.

Cheers,

Martin

2015-06-05 19:29:14  INFO [main] (BatchTask) - Executing task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Test-AIFdbClassificationDemoTrainTest]
2015-06-05 19:29:14 DEBUG [main] (ImportUtil) - No value match: [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask|readerTestParams] [^[language, en, sourceLocation, src/main/re
sources
/data/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [Ljava.lang.String;@2947bde8]$] [[language, en, sourceLocation, src/main/resources/
data
/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [+]*concatenated.txt]]
2015-06-05 19:29:14 DEBUG [main] (ImportUtil) - No value match: [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask|readerTestParams] [^[language, en, sourceLocation, src/main/re
sources
/data/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [Ljava.lang.String;@2947bde8]$] [[language, en, sourceLocation, src/main/resources/
data
/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [+]*concatenated.txt]]
2015-06-05 19:29:14 DEBUG [main] (BatchTask) - Deferring execution of task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Test-AIFdbClassificationDemoTrainTest]: Unable
 to resolve
import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Test-AIFdbClassificationDemoTrainTest] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.
core
.task.PreprocessTask-Test-AIFdbClassificationDemoTrainTest/preprocessorOutputTest]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task
[de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-Test-AIFdbClassificationDemoTrainTest] has never been executed.



TaskContextNotFoundException: Task [de.tudarmstad
t
.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Train-AIFdbClassificationDemoTrainTest] has never been executed.; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.UnresolvedIm
portException
: Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.MetaInfoTask-AIFdbClassificationDemoTrainTest] pointing to [task-latest://de.tudarmstadt.ukp.
dkpro
.tc.core.task.PreprocessTask-Train-AIFdbClassificationDemoTrainTest/preprocessorOutputTrain]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundExcep
tion
: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-Train-AIFdbClassificationDemoTrainTest] has never been executed.

    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.executeConfiguration(BatchTask.java:282)
    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.execute(BatchTask.java:185)

    at de
.tudarmstadt.ukp.dkpro.tc.ml.ExperimentTrainTest.execute(ExperimentTrainTest.java:91)

    at de
.tudarmstadt.ukp.dkpro.lab.engine.impl.ExecutableTaskEngine.run(ExecutableTaskEngine.java:55)
    at de
.tudarmstadt.ukp.dkpro.lab.engine.impl.DefaultTaskExecutionService.run(DefaultTaskExecutionService.java:48)
    at de
.tudarmstadt.ukp.dkpro.lab.Lab.run(Lab.java:97)

    at com
.martinwunderlich.nlp.arg.test.AIFdbClassificationDemo.runTrainTest(AIFdbClassificationDemo.java:78)
    at com
.martinwunderlich.nlp.arg.test.AIFdbClassificationDemo.main(AIFdbClassificationDemo.java:64)
Caused by: de.tudarmstadt.ukp.dkpro.lab.storage.UnresolvedImportException: Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.MetaInfoTask-AIFdbClassificationD
emoTrainTest
] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-Train-AIFdbClassificationDemoTrainTest/preprocessorOutputTrain]; nested exception is d
e
.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-Train-AIFdbClassificationDemoTrainTest] has never been
executed
.

    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask$ScopedTaskContext.resolve(BatchTask.java:549)
    at de
.tudarmstadt.ukp.dkpro.lab.engine.impl.DefaultTaskContextFactory.resolveImports(DefaultTaskContextFactory.java:142)
    at de
.tudarmstadt.ukp.dkpro.lab.engine.impl.DefaultTaskContextFactory.createContext(DefaultTaskContextFactory.java:98)
    at de
.tudarmstadt.ukp.dkpro.lab.uima.engine.simple.SimpleExecutionEngine.run(SimpleExecutionEngine.java:80)
    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.runNewExecution(BatchTask.java:350)
    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.executeConfiguration(BatchTask.java:255)

   
... 7 more
Caused by: de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-Train-AIFdbClassificationDemoTrainTest] has
 never been executed
.

    at de
.tudarmstadt.ukp.dkpro.lab.engine.impl.ImportUtil.createContextNotFoundException(ImportUtil.java:125)
    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.getLatestExecution(BatchTask.java:327)
    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask.access$000(BatchTask.java:70)
    at de
.tudarmstadt.ukp.dkpro.lab.task.impl.BatchTask$ScopedTaskContext.resolve(BatchTask.java:546)

   
... 12 more



Martin Wunderlich

unread,
Jun 5, 2015, 1:52:21 PM6/5/15
to dkpro-t...@googlegroups.com
PS: I have noticed something else that seems strange and may or may not be related: During pre-processing one binary CAS file gets create per _instance_ of my annotations. So, say the data reader has created 1000 annotation instances, I will end up with 1000 binary files in the output directory that have the full text and annotations in them. This is independent of the kind of pre-processing applied (OpenNlpSegmenter, NoOp etc.). Seems a bit odd...

Richard Eckart de Castilho

unread,
Jun 5, 2015, 3:02:43 PM6/5/15
to Martin Wunderlich, dkpro-t...@googlegroups.com
Hi Martin,

DKPro TC uses the "readerTestParams" parameter (which is a list) to pass on parameters to the pipeline. In there, you have set the "patterns" parameter to an array value. So we have a list containing a nested array. DKPro Lab (used under the hood by TC) doesn't go well with such nested arrays.

You can try three things:

1) use a list instead of an array for PARAM_PATTERNS
2) not use a list/array at all if PARAM_PATTERNS is a single value
3) try turning a multi-valued PARAM_PATTERNS into a single value, e.g. if you use PARAM_PATTERNS with a list of all ".txt" files in some directory, you could as well use "*.txt"

I hope that fixes your problem.

Cheers,

-- Richard
> --
> You received this message because you are subscribed to the Google Groups "dkpro-tc-users" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-tc-user...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Johannes Daxenberger

unread,
Jun 5, 2015, 6:58:11 PM6/5/15
to Martin Wunderlich, dkpro-t...@googlegroups.com
Hi Martin,

this behavior is expected; to be able to access the context of units, all annotations (after preprocessing) of a CAS need to be kept. In Unit-Mode, after preprocessing, the CAS is multiplied times the number of units you have set. Like this, the remaining part of the experiment can be run in the same manner as in Document-Mode.

Best,
Johannes

Martin Wunderlich

unread,
Jun 5, 2015, 7:31:26 PM6/5/15
to Richard Eckart de Castilho, dkpro-t...@googlegroups.com
Thanks a lot, Richard. That seems to have fixed the problem. At least, I get to the meta task stage now, before hitting a Java heap space error. 

I tried to backtrack and figure out why I would have been using an array in the first place, since I only had a single string. I guess this crept in by copy&paste from the Brown corpus example, where we have the following lines: 

Arrays.asList(new Object[] { BrownCorpusReader.PARAM_LANGUAGE, "en",
                        BrownCorpusReader.PARAM_SOURCE_LOCATION, corpusFilePathTrain,
                        BrownCorpusReader.PARAM_PATTERNS,
                        new String[] { INCLUDE_PREFIX + "*.xml", INCLUDE_PREFIX + "*.xml.gz" } }));

In fact, I did have several patterns before, but know I using only one, so no problem to use a string. 

Cheers, 

Martin 

Martin Wunderlich

unread,
Jun 5, 2015, 7:33:36 PM6/5/15
to Johannes Daxenberger, dkpro-t...@googlegroups.com
Hi Johannes, 

Thanks for clarifying, but I am not sure I understand the connection here between unit mode and document mode. What exactly is the difference between each of the different instances of binary CASes? 

Cheers, 

Martin 

Johannes Daxenberger

unread,
Jun 6, 2015, 7:10:13 AM6/6/15
to Martin Wunderlich, dkpro-t...@googlegroups.com
Hi Martin,

some clarification on the differences between Unit-Mode and Document-Mode:

  • in Unit-Mode, you need to annotate all Units and their respective Outcomes in a document in the reader (whereas in Document-Mode, you have just one Outcome for the entire document and no units)
  • preprocessing is run once over all documents/CASes both in Unit-Mode and Document-Mode
  • at the end of preprocessing, in Unit-Mode a CasMultiplier will copy each CAS times the number of units in it, and it will set a different focus in each CAS, such that each each unit is once in the focus of the resulting CASes
  • subsequent FeatureExtractors will see all units in their own CAS, but they extract information only from that unit (and ist context) which is in focus in a particular CAS
  • learning and evaluation is equal for Unit- and Document-Mode, either on unit-level or on document-level (which technically makes no difference at this stage of the pipeline)
I hope this helps to better understand the concept behind DKPro TC’s Unit-Mode.

Martin Wunderlich

unread,
Jun 6, 2015, 1:49:26 PM6/6/15
to Johannes Daxenberger, dkpro-t...@googlegroups.com
This helps, indeed. Thank you, Johannes. 

Cheers, 

Martin 

Martin Wunderlich

unread,
Jun 7, 2015, 3:08:20 PM6/7/15
to dkpro-t...@googlegroups.com
Just in case anyone else runs into this same problem: I have managed to identify the actual root cause of the issue. There were two mistakes, one somewhat deliberate, the other accidental. In my custom reader class, I was adding two types of TC annotations. The first one was something like relevant vs not relevant. The second one was a further level of classification for the relevant bits. There were two problems: First, I was creating TC units for both levels of classification. Second, I wasn't checking for overlapping ranges (the ranges were derived from a simple string matching, as mentioned above). This had the effect that for a duplicate entry in my test file of strings to identify, four overlapping TC annotations would get created (two each for the base classification and for the second-level classification).

Apparently, the solution is to create separate readers for the different levels or at least have a flag to specify what level is being annotated. At least, this works ok now. I will also try and run a multi-class classification on the second level only and add the class label "non-relevant".

Cheers,
Martin

Am Freitag, 5. Juni 2015 19:42:42 UTC+2 schrieb Martin Wunderlich:
Hi Pedro,

Did you ever get this problem resolved? I am having the same issue running a single-label unit classification task in train-test mode.
Below is an example of the logging output that I am getting and the exception that is thrown in the end.

Just to provide some background: I have a longish text which my data reader annotates with unit-level annotations on a segment level based on certain exact string matches. To create training and test data, I have split this text in the middle to get two files: one for training and one for testing. I am using OpenNlpSegmenter for pre-processing and LuceneNGramUFE to get the top 100 unigrams as features (for starters, there will be more). Except the experiment never gets as far as feature extraction because of this error.

Cheers,

Martin

2015-06-05 19:29:14  INFO [main] (BatchTask) - Executing task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Test-AIFdbClassificationDemoTrainTest]
2015-06-05 19:29:14 DEBUG [main] (ImportUtil) - No value match: [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask|readerTestParams] [^[language, en, sourceLocation, src/main/re
sources
/data/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [Ljava.lang.String;@2947bde8]$] [[language, en, sourceLocation, src/main/resources/
data
/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [+]*concatenated.txt]]
2015-06-05 19:29:14 DEBUG [main] (ImportUtil) - No value match: [de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask|readerTestParams] [^[language, en, sourceLocation, src/main/re
sources
/data/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [Ljava.lang.String;@2947bde8]$] [[language, en, sourceLocation, src/main/resources/
data
/aifdb/araucaria/test, , src/main/resources/data/aifdb/araucaria/nodesets, patterns, [+]*concatenated.txt]]
2015-06-05 19:29:14 DEBUG [main] (BatchTask) - Deferring execution of task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Test-AIFdbClassificationDemoTrainTest]: Unable
 to resolve
import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Test-AIFdbClassificationDemoTrainTest] pointing to [task-latest://de.tudarmstadt.ukp.dkpro.tc.
core
.task.PreprocessTask-Test-AIFdbClassificationDemoTrainTest/preprocessorOutputTest]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundException: Task
[de.tudarmstadt.ukp.dkpro.tc.core.task.PreprocessTask-Test-AIFdbClassificationDemoTrainTest] has never been executed.



TaskContextNotFoundException: Task [de.tudarmstad
t
.ukp.dkpro.tc.core.task.ExtractFeaturesTask-Train-AIFdbClassificationDemoTrainTest] has never been executed.; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.UnresolvedIm
portException
: Unable to resolve import of task [de.tudarmstadt.ukp.dkpro.tc.core.task.MetaInfoTask-AIFdbClassificationDemoTrainTest] pointing to [task-latest://de.tudarmstadt.ukp.
dkpro
.tc.core.task.PreprocessTask-Train-AIFdbClassificationDemoTrainTest/preprocessorOutputTrain]; nested exception is de.tudarmstadt.ukp.dkpro.lab.storage.TaskContextNotFoundExcep
tion
: Task [de.tudarmstadt.ukp.dkpro.tc.core.
...

Martin Wunderlich

unread,
Mar 15, 2016, 4:28:10 AM3/15/16
to Johannes Daxenberger, dkpro-t...@googlegroups.com
Hi Johannes, 

I had another closer look at the results of the pre-processing to understand how the different focus would be reflected in the CAS and so I ran a diff on the XMI representations. The strange thing is that the CASes don’t actually have much difference: The only difference I found were the documentId and documentary. The token and sentence annotations from the preprocessing stage are there alright, but none of the TC outcomes that were added by the reader. But I still seem to be lacking some understanding on a conceptual level, so maybe let me describe in detail what I am trying to achieve here: 

- I have two inputs: 
1) a text consisting of several sentences and 
2) a second text file with labeled strings (most them being sentences), where most of the strings will occur in the first text
- The goal is to train a classifier on the labeled strings and apply the classifier to sentences. 
- In order to form a train/test experiment, I have split this text in the middle to create training and test data.
- I have created two readers (for train and test) extending from ResourceCollectionReaderBase:
- Both readers read the respective text (train or test) and the labeled strings. 
- Annotations are added on a unit level based on a simple string match: for each string from the set of labeled strings, get the matching range in the text. then, create a TC unit for that range. Note that some units might have more than one label here. 
- The first reader (for train) sets the string’s label as the TC outcome; the second reader (for test) sets the label to „Unknown“. 
- For preprocessing, I am using the OpenNlpSegmenter to add token and sentence annotations. 

Johannes Daxenberger

unread,
Mar 17, 2016, 5:21:12 AM3/17/16
to Martin Wunderlich, dkpro-t...@googlegroups.com
Hi Martin,

welcome back :)

> …so I ran a diff on the XMI representations…

Which CASes have you been comparing? As said below, in unit mode, the input documents are split, one for each TextClassificationUnits you created in the reader. The resulting CASes have the same content as the original one, but additionally they also have a TextClassificationFocus annotation which indicates the current unit to be processed during feature extraction. If you don't see this when comparing the resulting CASes, something is going wrong in your setup I guess.


In general, is hard to guess what's happening without knowing the details of your setup. Could you post the relevant lines of code from the experiment setup?
* in Unit-Mode, you need to annotate all Units and their respective Outcomes in a document in the reader (whereas in Document-Mode, you have just one Outcome for the entire document and no units)
* preprocessing is run once over all documents/CASes both in Unit-Mode and Document-Mode
* at the end of preprocessing, in Unit-Mode a CasMultiplier will copy each CAS times the number of units in it, and it will set a different focus in each CAS, such that each each unit is once in the focus of the resulting CASes
* subsequent FeatureExtractors will see all units in their own CAS, but they extract information
only from that unit (and ist context) which is in focus in a particular CAS
* learning and evaluation is equal for Unit- and Document-Mode, either on unit-level or on document-level (which technically makes no difference at this stage of the pipeline)

Martin Wunderlich

unread,
Mar 17, 2016, 3:24:01 PM3/17/16
to Johannes Daxenberger, dkpro-t...@googlegroups.com
Hi Johannes,

This is message came as a surprise. I have no idea why my email has reached. I might have been sitting in a drafts folder for quite a look time.
In any case, the question is not relevant anymore, so I would like to apologize, for the time this has cost you to reply to. Unfortunately, I am not doing any work with DKpro right now, hence my long silence, but I hope this will change again.

Cheers,

Martin
signature.asc
Reply all
Reply to author
Forward
0 new messages