On 20 Dec 2013, at 3:16 AM, Valentin Tablan <
v.ta...@gmail.com> wrote:
> Hi,
>
>
> My interpretation of DocumentalConcatenatedCluster is that it expects
> its sub-indexes to return global document pointers, i.e.
>
> - member-0 has documents 0..N_0
> - member-1 has documents N_0+1 .. N_1,
Nonono.
* <p>This class assumes that the global document pointers returned by each index will be increasing.
* Using this assumption, no merge is performed; simply, when an index iterator is exhausted we look
* into the next one.
*Global* means the documents returned from the strategy. *Local* is the numbering in each index.
The idea is that you use a ContiguousDocumentalStrategy. The strategy gives you the cutpoints of the global space, essentially, 0, ndoc0, ndoc0+ndoc1, ndoc0+ndoc1+ndoc2 etc.
Each local index is a standard index starting at 0. The strategy will turn the local indices into global indices.
> If I'm reading the javadocs correctly, I should be able to use
> zero-based sub-indexes if I put them all into a merged cluster. However,
> that seems to have to do more work that is strictly necessary, as it
> needs to remap document pointers.
You just need a DocumentalConatenatedCluster and a strategy.
> Am I correct in assuming I should be able to write arbitrary document
> pointers into an index?
You can, but the numberOfDocuments parameter must be a strict upper bound to the document pointers you put in.
In principle, you could write actual global pointers in sub-indices and use something like an IdentityDocumentalStragegy, but compression would be horrible.
If you have a look at Scan's usage of ContiguousDocumentalStrategy, things should be clearer.
And before you tell me, yes, the whole process is underdocumented. Actually, everything is documented carefully in the Javadocs, but there's nothing global guiding you through the process. :(
Ciao,
seba