missing headline in id2outcome.txt

michael....@googlemail.com

unread,

Oct 29, 2015, 9:10:26 AM10/29/15

to dkpro-tc-users

Hi everybody,

when adding ml.report.BatchCrossValidationReport as a report in a cross validation, it produces a id2outcome.txt in the crossvalidation folder. I guess it is similar to the files that are generated by WekaOutcomeIDReport as a inner report.
However, the file is missing the headline to interpret the columns (#ID=PREDICTION;GOLDSTANDARD ?).

Or am I missinterpreting the file?

cheers,
Michael

Johannes Daxenberger

unread,

Nov 16, 2015, 3:50:44 AM11/16/15

to michael....@googlemail.com, dkpro-tc-users

Hi Michael,

sorry for the late reply.
ml.report.BatchCrossValidationReport itself does not create aggregated Id2Outcome files in a cross-validation setup. Which setup and report did you use?

In any case, the report which picks up the results (in terms of Id2Outcome files) from various CV runs needs to make sure that the label-index mappings (in case of multiclass or multilabel classification) are compatible when merging various files. The new evaluation module takes care of this. To see how to use it, please have a look at e.g. TwentyNewsgroupsUsingTCEvaluationDemo in the examples module.

Hope that helps,
Johannes

-----Ursprüngliche Nachricht-----
Von: dkpro-t...@googlegroups.com [mailto:dkpro-t...@googlegroups.com] Im Auftrag von michael....@googlemail.com
Gesendet: Donnerstag, 29. Oktober 2015 14:10
An: dkpro-tc-users
Betreff: [dkpro-tc-users] missing headline in id2outcome.txt

--
You received this message because you are subscribed to the Google Groups "dkpro-tc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-tc-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

michael....@googlemail.com

unread,

Nov 16, 2015, 4:37:43 AM11/16/15

to dkpro-tc-users, michael....@googlemail.com

Hi Johannes,

thank you for the reply.

I am running a single-lable classification in document mode. The setting is a CV (3folds).
I use WekaOutcomeIDReport and WekaFeatureValuesReport as inner reports and BatchCrossValidationReport as an outer report.

The Id2Outcome.txt-file I am referring to is generated in the folder "ExperimentCrossValidation$[...]".

In each "WekaTestTask"folder I get a Id2Outcome.txt-file that starts with #ID=PREDICTION;GOLDSTANDARD. However, I'm interested in the mapping for all instances (I want to stack classifiers).
This mapping can (at least I assume) be found in the file in the ExperimentCrossValidation$[...] folder. However this file does not have this headline...

thanks again for your help!
Michael

Johannes Daxenberger

unread,

Nov 16, 2015, 4:56:02 AM11/16/15

to michael....@googlemail.com, dkpro-tc-users

Hi Michael,

ok, I can now see where the problem arises. Your experiment uses the WekaClassificationAdapter as Machine Learning adapter, and this adapter specifies BatchTrainTestReport as inner report for the crossvalidation task (which indeed writes the faulty id2outcome.txt). This behavior is actually wrong (as explained in my previous email, since under certain circumstances the mapping between labels and indices can be different across CV runs). I'll open an issue to fix this behavior.

To arrive at a correct aggregated id2outcome file, please use WekaClassificationUsingTCEvaluationAdapter in your experiment. This will create a file "id2harmonizedOutcome.txt" in your ExperimentCrossValidation$[...] folder. Please let us know if you have any further problems with this adapter.

Best,

Johannes

-----Ursprüngliche Nachricht-----
Von: dkpro-t...@googlegroups.com [mailto:dkpro-t...@googlegroups.com] Im Auftrag von michael....@googlemail.com

Gesendet: Montag, 16. November 2015 10:38
An: dkpro-tc-users
Cc: michael....@googlemail.com
Betreff: Re: [dkpro-tc-users] missing headline in id2outcome.txt

michael....@googlemail.com

unread,

Nov 16, 2015, 5:27:08 AM11/16/15

to dkpro-tc-users, michael....@googlemail.com

Hi Johannes,

thanks for the reply. I now get the id2harmonizedOutcome.txt.
A quick question regarding the notation:

#ID=PREDICTION;GOLDSTANDARD;THRESHOLD
#labels 0=A 1=B 2=C

191.xml=1,0,0;0,0,1;-1.0

I guess that 1,0,0 refers to class A?

Further, the adapter causes a problem when using BatchCrossValidationReport (everything is fine when using BatchCrossValidationUsingTCEvaluationReport):
the IOexception tells '..../evaluation_results.csv' exists but is a directory

Johannes Daxenberger

unread,

Nov 16, 2015, 5:47:06 AM11/16/15

to michael....@googlemail.com, dkpro-tc-users

Hi Michael,

> I guess that 1,0,0 refers to class A?

Yes.

With the new adapter, you also need to use a new top-level report (BatchCrossValidationUsingTCEvaluationReport, as you already saw).

Best,
Johannes

-----Ursprüngliche Nachricht-----
Von: dkpro-t...@googlegroups.com [mailto:dkpro-t...@googlegroups.com] Im Auftrag von michael....@googlemail.com

Gesendet: Montag, 16. November 2015 11:27

An: dkpro-tc-users
Cc: michael....@googlemail.com
Betreff: Re: [dkpro-tc-users] missing headline in id2outcome.txt

Reply all

Reply to author

Forward