WG: [dkpro-tc-dev] Question on SaveModelWekaBatchTask

Johannes Daxenberger

unread,

Jul 23, 2015, 6:39:04 AM7/23/15

to dkpro-l...@googlegroups.com, Martin Wunderlich (martin@wunderlich.com)

Forwarding this to DKPro Lab, as the problem seem to be occurring there.

- Johannes

Von: dkpro-...@googlegroups.com [mailto:dkpro-...@googlegroups.com] Im Auftrag von Martin Wunderlich
Gesendet: Donnerstag, 23. Juli 2015 10:03
An: Tobias Horsmann
Cc: dkpro-tc-dev
Betreff: Re: [dkpro-tc-dev] Question on SaveModelWekaBatchTask

Thanks a lot, Tobias, for the comments. I tried to implement 4) yesterday (getting the tasks from the results of getTasks()), but at that stage getTasks() returns an empty list, strangely enough.
Here is the code:

String experimentName = „MyExperiment";
ExperimentTrainTest batch = new ExperimentTrainTest(experimentName,
WekaClassificationAdapter.class, getPreprocessing());
batch.setParameterSpace(pSpace);
batch.setExecutionPolicy(policy);
batch.addReport(BatchTrainTestReport.class);
batch.addReport(BatchRuntimeReport.class);

Set<Task> tasks = batch.getTasks();
MetaInfoTask meta = getMetaTask(tasks);
ExtractFeaturesTask fe = getFeatureExtractionTask(tasks);
ModelSerializationTask saveModelTask = new ModelSerializationTask();
File outputDir = (new File(outputfolder + experimentName + "/")).getAbsoluteFile();
String type = saveModelTask.getType() + "-" + experimentName;
saveModelTask.setType(type);
saveModelTask.setOutputFolder(outputDir);

saveModelTask.addImport(metaTask, MetaInfoTask.META_KEY);
saveModelTask.addImport(featuresTrainTask, ExtractFeaturesTask.OUTPUT_KEY, Constants.TEST_TASK_INPUT_KEY_TRAINING_DATA);

batch.addTask(saveModelTask);

...

public static ExtractFeaturesTask getFeatureExtractionTask(Set<Task> tasks) {
for(Task task : tasks)
if(task instanceof ExtractFeaturesTask)
return (ExtractFeaturesTask) task;

return null;
}

public static MetaInfoTask getMetaTask(Set<Task> tasks) {
for(Task task : tasks)
if(task instanceof MetaInfoTask)
return (MetaInfoTask) task;

return null;
}

I am not sure why the tasks list would be empty at that stage.

Cheers,

Martin

Am 23.07.2015 um 07:27 schrieb Tobias Horsmann <tobias....@gmail.com>:

I would tend to take option 1

1)
I probably wouldnt create an abstract class, but rather the current "without" version as base class and then just add a subclass like "ExperimentTrainTestWithStore" (chose a name you like, just an idea)
With some minor changes to the TrainTest and CrossVal Task it should be possible to make a subclass that overrides a single method in the "without storing" base class where the model saving is done.
This would avoid that people who want only train-test would automatically get store, too.

Option 4 Is possible, but I think BatchTask is in DKPro Lab and not in TC and I would want to keep the changes locally.

--
You received this message because you are subscribed to the Google Groups "dkpro-tc-dev" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-tc-dev...@googlegroups.com.
To post to this group, send email to dkpro-...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Richard Eckart de Castilho

unread,

Jul 24, 2015, 8:43:30 AM7/24/15

to dkpro-l...@googlegroups.com, Martin Wunderlich (martin@wunderlich.com)

It is not Lab issue. You add the subtasks in ExperimentTrainTest.init() which is called by ExperimentTrainTest.execute() only when the experiment is actually run. At the time where you want to query the subtasks, init() has not been called yet.

If you want to do what you try to do here, you could:

* make init() a protected method
* create an anonymous subclass of ExperimentTrainTest in your code below
* override init() in this subclass
* first call super.init()
* then inject your additional task

Cheers,

-- Richard

Martin Wunderlich

unread,

Jul 24, 2015, 10:01:03 AM7/24/15

to Richard Eckart de Castilho, dkpro-l...@googlegroups.com

OK, thanks a lot, Richard, for the analysis and the explanation of how I could get this to work. I will see, if I can get it working that way by subclassing ExperimentTrainTest and adding the SaveModel task in the init() method in this subclass.

Cheers,

Martin

Martin Wunderlich

unread,

Jul 25, 2015, 4:47:58 AM7/25/15

to Richard Eckart de Castilho, dkpro-l...@googlegroups.com

Hi guys,

I have now gone down a slightly different route than the one suggested here below and implemented the original suggestion #1 - create a subclass of ExperimentTrainTest with the classifier storage capability. For this to work, I had to modify the visibility of init() and experimentName in the super class (as suggested by Richard) in my local fork of DKPro lab. So, in essence this creates a new batch task class, which is a merge from the existing ExperimentTrainTest and SaveModelWekaBatchTask.

I have just run a test and it works fine. I am attaching the class file here, in case anyone else is interested, and I could also submit a pull request for this addition, if people find it useful.

Cheers,

Martin

ExperimentTrainTestStore.java

Richard Eckart de Castilho

unread,

Jul 25, 2015, 4:55:48 AM7/25/15

to Martin Wunderlich, dkpro-l...@googlegroups.com

Hi Martin,

On 25.07.2015, at 10:47, Martin Wunderlich <mar...@wunderlich.com> wrote:

> Hi guys,
>
> I have now gone down a slightly different route than the one suggested here below and implemented the original suggestion #1 - create a subclass of ExperimentTrainTest with the classifier storage capability. For this to work, I had to modify the visibility of init() and experimentName in the super class (as suggested by Richard) in my local fork of DKPro lab.

Did you really have to modify DKPro Lab?

> So, in essence this creates a new batch task class, which is a merge from the existing ExperimentTrainTest and SaveModelWekaBatchTask.

my idea when I implemented DKPro Lab was that people would do a lot of subclassing and anonymous classes. This was inspired by the way that Apache Wicket works. A lot of functionality there is not customized through getters/setters but through inheritance and overriding/implementing of methods. So in my mind people would in most cases be taking abstract base classes from DKPro Lab or some extension of it (like DKPro TC), and derive classes from them into their experiment setup code. The purpose of such derived "Task" classes would largely be to serve as adapters that take valeus from parameter space of the experiment (ParameterSpace) and inject them into underlying code (e.g. UIMA pipelines, etc.) - any basically not anything more.

> I have just run a test and it works fine. I am attaching the class file here, in case anyone else is interested, and I could also submit a pull request for this addition, if people find it useful.

Cheers,

-- Ciahrd

Martin Wunderlich

unread,

Jul 25, 2015, 5:28:11 AM7/25/15

to Richard Eckart de Castilho, dkpro-l...@googlegroups.com

> Am 25.07.2015 um 10:55 schrieb Richard Eckart de Castilho <richard...@gmail.com>:
>
> Hi Martin,
>
> On 25.07.2015, at 10:47, Martin Wunderlich <mar...@wunderlich.com> wrote:
>
>> Hi guys,
>>
>> I have now gone down a slightly different route than the one suggested here below and implemented the original suggestion #1 - create a subclass of ExperimentTrainTest with the classifier storage capability. For this to work, I had to modify the visibility of init() and experimentName in the super class (as suggested by Richard) in my local fork of DKPro lab.
>
> Did you really have to modify DKPro Lab?

Sorry, my bad. I meant DKpro TC ML (which is where ExperimentTrainTest is located), not Lab.

>
>> So, in essence this creates a new batch task class, which is a merge from the existing ExperimentTrainTest and SaveModelWekaBatchTask.
>
> my idea when I implemented DKPro Lab was that people would do a lot of subclassing and anonymous classes. This was inspired by the way that Apache Wicket works. A lot of functionality there is not customized through getters/setters but through inheritance and overriding/implementing of methods. So in my mind people would in most cases be taking abstract base classes from DKPro Lab or some extension of it (like DKPro TC), and derive classes from them into their experiment setup code. The purpose of such derived "Task" classes would largely be to serve as adapters that take valeus from parameter space of the experiment (ParameterSpace) and inject them into underlying code (e.g. UIMA pipelines, etc.) - any basically not anything more.

The alternative (without modifying TC) would have been to create yet another batch task class for the experiment which does training, testing and storing of the trained classifier. But since this would have lead to a lot of unnecessary code duplication, I preferred to subclass the existing Experiment class instead. In order to make it work, though, I had to change the visibility as described.

Cheers,

Martin

Reply all

Reply to author

Forward