Re: Default Boolean Search

5 views
Skip to first unread message

Peter Van Garderen

unread,
Apr 13, 2011, 11:58:13 AM4/13/11
to sup...@artefactual.com, qubit-dev

On 04/13/2011 07:27 AM, MJ Suhonos wrote:
> Hi Jessica,
>
> Your analysis is absolutely right -- the default boolean combiner for Zend Search Lucene (what Qubit uses) is OR. I have no idea why, but this is definitely inconsistent with search behaviour like Google and others (Solr, ElasticSearch). I came across the same issue in LAC testing, and it's a one-liner in the code to change the default to AND. I was going to implement this as an enhancement in 1.2.

Yes, let's change this default in 1.2

>
> As for ranking, I'm not entirely certain how ZSL orders results, or how ranking is calculated exactly -- only that there are some Lucene-like options to control ranking. Unfortunately the best documentation is probably that from Zend, which is pretty technical:
>
> http://framework.zend.com/manual/en/zend.search.lucene.html

I am not sure how the ZSL ranking algorithm works but I can note that we add
boosting to certain fields (e.g. title) to increase their ranking score.

>
> I'd really like to dig into our search implementation in June and make some significant improvements, both to performance as well as output -- so if you have requirements or features related to search, we definitely want to file them in the issue tracker.

Yes, please create an issue for the 'AND' default.

Cheers,

--peter

Tim Hutchinson

unread,
Apr 14, 2011, 10:52:34 AM4/14/11
to qubi...@googlegroups.com, Peter Van Garderen, sup...@artefactual.com
Hi all,

I'm not sure how this came up, but I also noticed this behaviour in the
context of searches limited to a repository. In the current release, it
causes what I'd call a bug in the holdings search. E.g on the demo site,
choose any repository and then do a holdings search for "correspondence
reports" (no quotes) - you get more than just the current institution's
holdings.

In our development of the filtered browse by repository, we addressed
that by adding parentheses around the search terms.

Tim


--
Tim Hutchinson
University of Saskatchewan Archives
301 Main Library, 3 Campus Drive
Saskatoon, SK S7N 5A4
tel: (306) 966-6028
fax: (306) 966-6040
e-mail: tim.hut...@usask.ca
web: http://www.usask.ca/archives/

Peter Van Garderen

unread,
Apr 14, 2011, 12:17:02 PM4/14/11
to qubi...@googlegroups.com
Interesting. Can someone please file this as an issue? Thanks, --peter

Peter Van Garderen
President/Systems Archivist
Artefactual Systems Inc.
--
email: pe...@artefactual.com
web: http://artefactual.com
phone: (+1) 604.527.2056
--

Tim Hutchinson

unread,
Apr 14, 2011, 12:30:02 PM4/14/11
to qubi...@googlegroups.com
I can do that. Note that it should be resolved (via grouping) as part of
issue 1948.

Tim

Tim Hutchinson

unread,
Apr 14, 2011, 7:34:30 PM4/14/11
to qubi...@googlegroups.com
I've filed issue 1977 re the default operator, and issue 1978 re the
resulting error in a holdings search.

Tim

MJ Suhonos

unread,
Apr 15, 2011, 9:12:42 AM4/15/11
to qubi...@googlegroups.com
Thanks, Tim. I had wanted to reply to your email yesterday, but ran out of time. This is a dead-simple fix, but will be a good point to hinge an overhaul of search functionality upon.

MJ

> --
> You received this message because you are subscribed to the Google Groups "Qubit Toolkit Developers" group.
> To post to this group, send email to qubi...@googlegroups.com.
> To unsubscribe from this group, send email to qubit-dev+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/qubit-dev?hl=en.
>

Reply all
Reply to author
Forward
0 new messages