Hello Iain,
I don't think this is directly possible to do in lucene, but lucene
experts may provide some tips that I am not aware of. Of course, if
there are specific cases that you want to filter out, you could use a
not filter. For example: "homo sapiens" NOT "XXX", but I doubt this is
what you are after.
I can also think of a few (not so great) solutions:
1) it's possible to create separate indexes. For example, you could
create a separate text index for each species.
2) you could create a post-lucene filter. For example, query lucene
for matches, and then run these matches through a non-lucene species
specific filter.
The problem with 1 is lots indexes, and they grow as we add more
species. The problem with 2 is performance.
Do you need this for batch processing or general purpose web viewing?
Ethan
iain.d...@gsk.com wrote:
>
> Hi Guys,
>
> I have a follow-up question about querying index terms. Please correct
> any floored assumptions I have made.
>
> I get the impression that the indexing process reviews each interaction
> at a time; it collects terms from the interactors involved in the
> interaction as well as terms from the interaction. These terms, in
> addition to general content, then make up a lucene record which can be
> queried against.
>
> My problem lies in querying on distinct interactor information. For
> example, querying on species will bring back records which match that
> species, so if an interaction contains 3 interactors from 3 different
> species and one of them match, the interaction will be returned.
>
> We currently have data from a number of species and will wish to query
> interactions which involve exclusively, only one species. I get the
> impression that this currently can't be done?
>
> A possible solution would be if Lucene allowed to search records which
> contained only one value for a term; in my example, records which only
> had one species. Do you know if this is possible using Lucene? or
> should I pose it to a Lucene message board?
>
> Thanks for the continued support.
>
> Iain K
>
>
>
> *"Ethan Cerami" <
cer...@cbio.mskcc.org>*