Crossvalidation with custom performance measures using data from the reader

19 views
Skip to first unread message

Elias

unread,
Jul 27, 2017, 10:40:43 AM7/27/17
to dkpro-tc-users
Hi,

I want to do cross validation with custom measures that use data provided by a reader I have written. At the moment, I struggle finding a suitable approach to that.
My setting is document mode single label text calssification with a TFxIDF weighted bag-of-words feature model and Liblinear as learner, though any other SVM suitable for the feature model would also suffice.

The required instance-level data for the measures are the labels assigned during classification along with the gold standard and two or more float or String values per instance.
The cross validation shall include my custom measures along with the standard measures (Accuracy, F1, etc.) and each final measure value shall be the average over the folds as common.

Which would be the best way to achieve that?
My current approach under (stuck) investigation includes writing a file similar to Id2Outcome, that contains the (instanceId, data0, ..., dataN)-tuples and is to be read by the individual measures in addition to Id2Outcome. If that is the way to go, where can I get the instance IDs from and what is the best place and method to store the file? Wich classes/modules need to be used/forked/created?

Best regards,
Elias

Johannes Daxenberger

unread,
Jul 29, 2017, 5:21:10 AM7/29/17
to Elias, dkpro-tc-users
Hi Elias,

the IDs of instances in DKPro TC are created from the JCasIds
JCasUtil.selectSingle(jcas, JCasId.class)
and, insofar you’re dealing with unit- or sequence-classification, the sequence id/suffix (relevant code is in dkpro-tc-core: org.dkpro.tc.core.feature.InstanceIdFeature).
This information is added as a feature to the feature store and then transformed by the respective DataWriter into a feature for the ML framework (e.g. an attribute in Weka). The name of the feature/attribute is Constants.ID_FEATURE_NAME.

Evaluation files are typically created by reports that are executed after the TestTask (prediction) has finished, for reference, see e.g. in dkpro-tc-ml-liblinear: org.dkpro.tc.ml.liblinear.LiblinearOutcomeIdReport
Reports on the TestTask level need to be plugged-in for crossvalidation setups using
cvExperiment. addInnerReport(Class<? extends Report> innerReport)

Files created by the report are stored within the context of the Task.

Best,
Johannes

Am 27.07.17, 16:40 schrieb "dkpro-t...@googlegroups.com im Auftrag von Elias" <dkpro-t...@googlegroups.com im Auftrag von elias....@gmail.com>:
--
You received this message because you are subscribed to the Google Groups "dkpro-tc-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to dkpro-tc-user...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Reply all
Reply to author
Forward
0 new messages