Hello everyone,
After some complaints from our customers, we did some quick trials and my colleague and I were a bit surprised to find out that it looks like XTF does not support true AND searches on full text. If you search for two terms that must both appear, it only finds results where the two terms are within the maximum proximity range, that is only 20 words apart (approx 1 line) by default.
From http://xtf.cdlib.org/documentation/under-the-hood/#QueryOperations, if you look at the paragraph "AND Query on Text" and "NOT Clause on Text", it appears that this behavior is by design:
"... Thus XTF interprets AND queries on the full text as NEAR queries instead, with the slop factor set to the maximum for that index. ..."
The problem is that users (and probably even developers) do not know about this and do not find documents they are expecting to find or should be finding, e.g. if "Amsterdam" appears on one page, and "Africa" a few sentences later, XTF will NOT find this document if you search for "amsterdam" and "africa". Only when both terms are within the max proximity range, the document will be found.
In the metadata fields, it appears that there is no such a proximity limit on and searches, only for full text.
Before digging into this deeper, I would like to know if there is a simple way to change this behavior, I did not see this immediately when looking at the XSLT, but I did not look deep into this, so maybe I missed things. Or do I have to go into the Java code to change this? Also, would it have any side effects (e.g. on ranking) if I would mess with this mechanism?
If there is no easy way to change this, I think it would be better to change the user interface in such a way that it becomes clear that it is not possible to perform searches where two terms are separated more than the max proximity range if they both must appear. That way users will at least understand what they are really searching for and hopefully understand why they are not finding documents they know contain both terms...
Also, on the results page, XTF suggests (Search: "amsterdam" and "africa" in ...) that an AND search is performed. I think this is a bit misleading and it would be better to change this to e.g. Search: "amsterdam" near "africa" in ...
I understand that for large collections it may give more accurate results when performing NEAR queries instead of AND queries, and I think it is also useful to use proximity to improve ranking, but I think it is not a good thing to just throw away other results where the terms are not near each other, especially when XTF is used in scientific research. Ideally, using NEAR instead of AND will just be a suggestion, or maybe even the default, but it should be possible to also perform real AND searches I think.
Regards,
Jasper