online training of ChainCRF

106 views
Skip to first unread message

James Jensen

unread,
Jun 19, 2015, 9:52:31 AM6/19/15
to pyst...@googlegroups.com
Hello,

A ChainCRF would work well for my application, and the data will accumulate one sample at a time and could potentially be quite large, so I'd like to do online learning. I see there are several learners to choose from, at least two of which (StructuredPerceptron and FrankWolfeSSVM) appear to have options for online learning. I intend to use the FrankWolfeSSVM if possible.

I'm unclear on how to use these learners in a truly online way. Following the scikit-learn convention, I would expect there to be a partial_fit() method. Repeated calls to fit() with new data do not incrementally update the model, but rather completely re-fit it, discarding what was learned from previous data. Is there some option I'm missing? Or is partial_fit() a feature that will be added in the future?

Thanks for the help,

James

Andy

unread,
Jun 19, 2015, 10:14:25 AM6/19/15
to pyst...@googlegroups.com
Hi James.

You are right, it would be possible to implement partial_fit for FrankWolfeSSVM, but I haven't done so yet.
I'm not sure if I'll have time to work on this any time soon.
My datasets were usually small enough to fit in memory. It shouldn't be that hard to add, though, so if you
decide to work on it I'd be happy to help.

Cheers,
Andy
--
You received this message because you are subscribed to the Google Groups "pystruct" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pystruct+u...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

James Jensen

unread,
Jul 13, 2015, 4:02:20 PM7/13/15
to pyst...@googlegroups.com
Hi, Andy,

Sorry to be slow in responding.

If I understand correctly, the online feature could be useful for any setting in which the data accrues incrementally, even when it is not too large to fit in memory. The cost of updating the model using partial_fit() should be less than for refitting from scratch on the full dataset.

Given what you've told me, it might make sense to change the documentation of the FrankWolfeSSVM learner to reflect the fact that there is not yet a true online option. It says "With batch_mode=False, this implements the online (block-coordinate) version of the algorithm (BCFW)," which may lead users like me to expect it to be online. And documentation aside, I am a bit confused because from the BCFW paper I gather that the algorithm is inherently online. Does the current implementation use the online algorithm as though it were a batch algorithm, so as to offer some of the other benefits (optimal step-size, computable duality gap guarantee)?

I've looked a bit into the code and into the partial_fit methods of some of the scikit-learn online estimators. I'd be happy to try to help with the implementation but I'm afraid I lack the expertise to be of much assistance; it could take me a long time and a lot of coaching.

Andy

unread,
Jul 17, 2015, 7:38:19 AM7/17/15
to pyst...@googlegroups.com
Hi James.
The meaning of "online" is not always straight-forward.
By "online" I think the authors of the BCFW paper mean "updating after each sample". I don't think the authors tried a setting where data arrives sequentially.
Usually you want to do multiple passes over the data.
If you want to keep training on arriving data, you might want to do "warm starts", that is not starting from scratch.
You don't want to use the new arriving sample only once, and you probably don't want to forget about the examples that you saw earlier.

Btw, the algorithm and  implementation is in fact inherently online, but there is no interface to do this with pystruct.
Doing it is just a small change. I'm travelling right now and don't have that much time, though.

Cheers,
Andy

Kevad

unread,
Apr 27, 2016, 9:01:02 AM4/27/16
to pystruct
Hello Andreas,

    That's for the awesome library. Is there any update on this post ? May we expect the 'partial_fit'  anytime soon ?

Thanks a lot,
Kevad.

Andreas Mueller

unread,
Apr 27, 2016, 3:25:28 PM4/27/16
to pyst...@googlegroups.com
Hi Kevad.
I'm not currently working on it, and I don't know if I'll have time within the next year.
It shouldn't be that hard to do, though.

Cheers,
Andy
Reply all
Reply to author
Forward
0 new messages