MalletCRFStringOutcomeDataWriter and feature encoding

31 views
Skip to first unread message

Guy Dumais

unread,
Sep 9, 2014, 9:01:26 AM9/9/14
to cleart...@googlegroups.com
Hi all,
Any reason why MalletCRFStringOutcomeDataWriter does not write the Feature values?  I am referring to this piece of code in ClearTk's MalletCRFStringOutcomeDataWriter.

  public void writeEncoded(List<NameNumber> features, String outcome) {
    for (NameNumber nameNumber : features) {
      this.trainingDataWriter.print(nameNumber.name);
      this.trainingDataWriter.print(" ");
    }
    this.trainingDataWriter.print(outcome);
  }

Note that this is not totally obvious from this piece of code but for Features of String type, the nameNumber.name field contains the encoded value with the name whereas for any other type (e.g. Boolean, Number, etc) the field nameNumber.name contains only the Feature name and not the value.

This seems like a bug to me, unless there is something that I miss.

Thanks in advance,
Guy Dumais 

Steven Bethard

unread,
Sep 9, 2014, 11:56:52 PM9/9/14
to cleart...@googlegroups.com
I believe the issue is that the Mallet CRF format doesn't support anything other than strings:


Please still file an issue though. If nothing else, MalletCRFStringOutcomeDataWriter should throw an exception to inform the user that non-String values aren't supported. An alternative would be to convert numbers into Strings and pass them on to Mallet, but I'm not confident that would do the sensible thing for, say, doubles.

Steve 

--
You received this message because you are subscribed to the Google Groups "cleartk-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to cleartk-user...@googlegroups.com.
To post to this group, send email to cleart...@googlegroups.com.
Visit this group at http://groups.google.com/group/cleartk-users.
For more options, visit https://groups.google.com/d/optout.

Guy Dumais

unread,
Sep 10, 2014, 5:02:46 PM9/10/14
to cleart...@googlegroups.com
I followed the approach you mentioned, that is, I convert integers to strings when creating a feature.  I will file an issue anyway.

Thanks for your help,
Guy.
Reply all
Reply to author
Forward
0 new messages