I want to do cross validation with custom measures that use data provided by a reader I have written. At the moment, I struggle finding a suitable approach to that.
My setting is document mode single label text calssification with a TFxIDF weighted bag-of-words feature model and Liblinear as learner, though any other SVM suitable for the feature model would also suffice.
The required instance-level data for the measures are the labels assigned during classification along with the gold standard and two or more float or String values per instance.
The cross validation shall include my custom measures along with the standard measures (Accuracy, F1, etc.) and each final measure value shall be the average over the folds as common.
Which would be the best way to achieve that?
My current approach under (stuck) investigation includes writing a file similar to Id2Outcome, that contains the (instanceId, data0, ..., dataN)-tuples and is to be read by the individual measures in addition to Id2Outcome. If that is the way to go, where can I get the instance IDs from and what is the best place and method to store the file? Wich classes/modules need to be used/forked/created?
Best regards,
Elias