Hi,
I'm using some of the MG4J tools as the first steps in my processing pipeline, which is more or less like this:
1. start from the actual documents;
2. mg4j.tool.IndexBuilder;
3. mg4j.tool.PartitionDocumentally (or PartitionLexically);
4. a couple of other steps using ad hoc tools that are used to transform the resulting index (should not matter in this context).
The big picture is that I'm building a distributed IR framework for testing purposes, so I want to be able to query the partitions using term ids from the monolithic index. But after the third step what I obtain are completely independent indexes, in which the relationship with the original one is lost.
I think this problem is mentioned in [1] but it's not clear to me the solution. Up to now I'm only using the bundled tools of MG4J; I didn't really dig into the library yet.
Thanks in advance,