Tim Miller
unread,May 21, 2013, 3:39:11 PM5/21/13Sign in to reply to author
Sign in to forward
You do not have permission to delete messages in this group
Either email addresses are anonymous for this group or you need the view member email addresses permission to view the original message
to cleart...@googlegroups.com
It seems that cleartk token features will normally be case-sensitive,
but for some learning tasks having case-insensitive versions may
increase statistical strength (e.g., in clinical text the phrases "NO
TUMOR", "No tumor", and "no tumor" all should be equally certainly
negated). Is there any built-in mechanism or best practice for doing
this? I was thinking of just going through all the extracted features
and lower-casing them but that seems very hacky.
Tim