Search in default_collection minus a specific collection

6 views
Skip to first unread message

Mirac

unread,
Feb 26, 2015, 5:27:09 PM2/26/15
to Google-Search-...@googlegroups.com
In our GSA index of 500K documents half of the documents are coming from an internal bug tracking system.
We have been hearing some power users complain about results from the bug tracking system pushing down other useful results from many other sources.
We discussed about using result biasing to lower the importance of bug tracking documents but I am not very keen on this approach as I believe we should let GSA do its magic and decide on the relevancy of the results.
Instead what I want to provide users as an option is a UI (checkbox for each collection) where they can pick what collections they want to perform the search.

my non-default collections does not include everything that is under the default_collection. So when user checks each and every checkbox they may think that that is everything in the index while it is not.
Because of this I want the checkboxes to behave as exclude rather than include (i,e. check to exclude this collection). 

Finally my question: Is there a way to use the default collection but filter out results that belong to a specific collection (bug tracking collection).
When you want to use multiple collections you do &site=col1|col2|col3.. 
What I am after is something like &site=default_collection-col1 (that's a minus in between).
Is there a way to do this?
Any alternative approaches to this problem?

Chang Ahn

unread,
Feb 26, 2015, 8:24:27 PM2/26/15
to Google-Search-...@googlegroups.com
You can still use biasing policy with your default frontend. There's a request parameter, entsp, where you can specify a biasing policy.  That with the a selection option to select the biasing policy, should do the trick.   I haven't tried it but according to the gsa doc, you can exclude sites.

Dave Watts

unread,
Feb 26, 2015, 11:20:04 PM2/26/15
to Google-Search-...@googlegroups.com
> We discussed about using result biasing to lower the importance of bug
> tracking documents but I am not very keen on this approach as I believe
> we should let GSA do its magic and decide on the relevancy of the results.

I generally agree with you about biasing - I try to avoid using it if
I can. But you could specify a biasing policy on the fly, using a form
field like a checkbox to apply the policy, using the entsp request
parameter mentioned by Chang Ahn. You'd have to customize your XSLT
manually to do this, but it wouldn't be very difficult.

> Finally my question: Is there a way to use the default collection but filter
> out results that belong to a specific collection (bug tracking collection).
> When you want to use multiple collections you do &site=col1|col2|col3..
> What I am after is something like &site=default_collection-col1 (that's a
> minus in between).
> Is there a way to do this?
> Any alternative approaches to this problem?

No, you can't use one collection to exclude results from another
collection. But you can solve this pretty easily by creating another
collection that contains everything except the specific URL patterns
in the bug tracking collection. This is a pretty common approach for
"everything but x" searches. Whatever you do, don't delete or modify
the default collection or you won't have a way to see everything
(either in searches or in Index Diagnostics and other reports).

Dave Watts, CTO, Fig Leaf Software
1-202-527-9569
http://www.figleaf.com/
http://training.figleaf.com/

Fig Leaf Software is a Service-Disabled Veteran-Owned Small Business
(SDVOSB) on GSA Schedule, and provides the highest caliber vendor-
authorized instruction at our training centers, online, or onsite.
Reply all
Reply to author
Forward
0 new messages