Hi,
Maybe this should be an issue but I’ll start by asking. Is there a reason why the NameNumber.value is ignored in CrfSuiteStringOutcomeDataWriter when encoding even though the CRFSuite implementation should be able to handle real-valued features? See following links:
http://fnl.es/tag/nlp.html#feature-modeling
http://python-crfsuite.readthedocs.org/en/latest/pycrfsuite.html#api-reference
@Override
public void writeEncoded(List<NameNumber> features, String outcome) {
this.trainingDataWriter.print(outcome);
for (NameNumber nameNumber : features) {
this.trainingDataWriter.print(featureSeparator);
this.trainingDataWriter.print(nameNumber.name);
}
this.trainingDataWriter.println();
}
If having the encoding like this, what would be the best way of encoding a a non-binary feature e.g. the length of the covered text?
There is a similar question regarding the MalletCRFStringOutcomeDataWriter:
https://groups.google.com/forum/#!topic/cleartk-users/16YBkiQ_Sk4