positive unlabeled learning in cleartk

3 views
Skip to first unread message

Miller, Timothy

unread,
Aug 22, 2014, 12:55:31 PM8/22/14
to cleartk-d...@googlegroups.com
I've been running into a few problems recently where we have or suspect
we have partially annotated corpora (positive examples only). This is
called Positive Unlabeled Learning (PUL -- thanks to Steve for the
pointer). One promising method for doing this is by weighting unlabeled
instances according to a classifier trained on all examples:

http://users.csc.tntech.edu/~weberle/Fall2008/CSC6910/Papers/posonly.pdf

From what I can tell it is not possible to give individual instances
weights in clearTK, even though I think it's possible in libsvm:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances

and weka.

Would this be as simple as adding a weight field to the Instance class
(and changing the appropriate data writers)? Or are there any mechanisms
for doing PUL in cleartK that I'm not aware of?


--
Tim Miller
Instructor
Boston Children's Hospital and Harvard Medical School
timothy...@childrens.harvard.edu
617-919-1223

Steven Bethard

unread,
Aug 26, 2014, 10:28:40 AM8/26/14
to cleartk-d...@googlegroups.com
On Fri, Aug 22, 2014 at 11:55 AM, Miller, Timothy
<Timothy...@childrens.harvard.edu> wrote:
> I've been running into a few problems recently where we have or suspect
> we have partially annotated corpora (positive examples only). This is
> called Positive Unlabeled Learning (PUL -- thanks to Steve for the
> pointer). One promising method for doing this is by weighting unlabeled
> instances according to a classifier trained on all examples:
>
> http://users.csc.tntech.edu/~weberle/Fall2008/CSC6910/Papers/posonly.pdf
>
> From what I can tell it is not possible to give individual instances
> weights in clearTK, even though I think it's possible in libsvm:
> http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/#weights_for_data_instances
>
> and weka.
>
> Would this be as simple as adding a weight field to the Instance class
> (and changing the appropriate data writers)? Or are there any mechanisms
> for doing PUL in cleartK that I'm not aware of?

Yes, that's the approach I would take, and no, there's nothing in
ClearTK for PUL yet.

Adding an weight field to Instance would not be binary compatible, but
it should be source compatible, so we could introduce this in the next
2.X release.

Steve
Reply all
Reply to author
Forward
0 new messages