Incrementally updating Word2Vec models with simple vocabulary expansion
(as implemented in Gensim) does work, at least for some specific tasks.
See this paper:
http://aclanthology.info/papers/D17-1194/d17-1194
Of course this is kind of a dirty hack, considering all the issues with
the learning rate. But it performed better than training on all the data
in this particular setting. So, this feature is useful.
By the way, there is another paper which suggests a more complicated
approach to incremental skipgram models with online vocabulary
expansion. But it would have to be implemented from scratch, as their
code is proprietary:
http://aclanthology.info/papers/D17-1037/d17-1037
On 09/14/2017 06:01 PM, Gordon Mohr wrote:
> Issue #1019 looks to me like the report of one specific seg-fault crash
> when trying to use Word2Vec vocabulary-expansion in Doc2Vec.
>
> Even if that crash is fixed, these users' desire for a fully
> online/incremental training for Doc2Vec (or even Word2Vec) won't
> necessarily be met. There's really no write-ups on whether or how this
> might be done effectively – the steps and parameters to use – and the
> naive things people try may mostly drive a model 'sideways' in its
> desirable properties. Making it work, even in just a few well-defined
> situations, would be a research project, and then a documentation
> effort, beyond any simple crash fixes.
>
> - Gordon
>
> On Thursday, September 14, 2017 at 1:14:42 AM UTC-7, Ivan Menshikh wrote:
>
> Yep, issue #1019
> <
https://github.com/RaRe-Technologies/gensim/issues/1019>
> <
https://github.com/RaRe-Technologies/gensim/issues/1019>
> <
https://groups.google.com/d/topic/gensim/_JH8BXkdEn4/unsubscribe>.
> <
https://groups.google.com/d/optout>.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "gensim" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to
gensim+un...@googlegroups.com
> <mailto:
gensim+un...@googlegroups.com>.
--
Solve et coagula!
Andrey