On 30 Oct 2013, at 4:48 PM, Valentin Tablan <
v.ta...@gmail.com> wrote:
> lot of time writing code, I thought I'd check with the experts whether
> my current plan makes sense, or if there's a better solution.
Sigh. That's on the todo list since my last visit to Sheffield.
> - implement some form of in-memory Index
As we discussed, the current posting-list representation used in-memory during the index is, in fact, fully searchable. It is not really optimized (e.g., no skips), but for a small collection is fine. It's just a matter of orchestrating correctly the order of update of the internal variables so that each posting list is never in an inconsistent state.
> - create a documental cluster containing the main on-disk index and the
> in-memory one
> - when new documents are added to the index, they go directly into the
> in-memory index
> - when the memory is full (and also at regular time intervals):
> - dump the in-memory index to an on-disk batch
> - add the new on-disk batch to the documental cluster
> - start a new empty in-memory index, and add it to the cluster
Yes, that's how I would do it, to.
> - at regular (longer) time intervals, append all the on-disk batches to
> the main index.
There are of course classes, like Combine/Concatenate/Merge/Paste, that can combine the on-disk batches into larger batches or to the main index.
But how would you expose the in-memory index? A server?
Ciao,
seba