Re. the incremental tf-idf, when adding extra documents:
1) can be done by simply incrementing document counts. Basically the
same thing as happens now inside `initialize`, but factored out into a
separate function. Difficulty: very easy.
2) is not necessarily a good idea. If your original collection was
large enough, the inverse document weights are likely set reasonably
already. Re-adjusting the weights (or adding new weights for new
vocabulary) has the unpleasant effect that subsequent methods like
online LSI will need to be retrained, because they cannot deal with
modifying input features dynamically, as I explained in the previous
post.
Best,
Radim