Re: "fielded" searching, solr_helper,

30 views
Skip to first unread message

Jamie Orchard-Hays

unread,
Apr 6, 2009, 2:58:36 PM4/6/09
to blacklight-...@googlegroups.com
All good points. I don't think it's KISS vs DRY, but I get your point. (When I look at a large file and don't know that 4 request handlers are mostly copy-n-paste, I just see complexity.)

Really, it's where do we want the complexity to reside. 

For now, since this task is on my plate, I'm going to go the easiest first-pass route, which is use multiple Request Handlers. Then we can visit this again. Myself, I don't have a clear opinion on a best way yet--I just see pluses and minus to both approaches. 

I've posted this to the (new) group list for posterity.

Jamie

On Apr 6, 2009, at 2:45 PM, Naomi Dushay wrote:

Okay, now I'm getting a little uncomfortable.  When we get a "meta handler" and a database and other stuff to manage our solr requests and solr configuration, then I feel like we've gotten away from KISS.   We've got a sort of DRY vs. KISS problem.

my_regular_catalog_search:
   everything search
   title search
   author search
   standard number search

      Things that will stay the same across solr request handlers:  
          fl  (fields returned)
          facet fields
      What will differ:
          qf (which fields are searched, and the boosting formula)
          pf (phrase fields)
          highlighting fields

music "view" of catalog:
   everything search
   composer search
   performer search
   instrumentation search
   title search
      Things that will remain same (but will differ from regular catalog search)
          fl 
          facets
      Differ:    qf, pf, highlighting fields

I would say that you do the configuration as close to the thing-to-be-configured as possible.    Doesn't it make more sense to twiddle your boosting formula and your query fields in solrconfig, because it's more related to solr?  Stanford has about 40 or 50 fields that will probably end up in the regular catalog "everything" search.   For me, I'm more comfortable putting that in solrconfig.xml than having to manipulate a crazy big, complicated list of parameters and values.

Also, I want to twiddle relevance ranking.  Do I want to create a temporary request handler in solrconfig, or do I want to create something in blacklight?  The former lets me do raw solr requests;  the latter will require me to set up a testing form so I can do posts because the solr URL will be way too long.

I'm not saying we make it impossible to do it another way ... other sites will have simpler request handlers.   Also, when I'm a great programmer that grasps all 47 technologies in the stack and thinks a "meta-layer" and an "auto-config" doesn't obscure what I'm trying to figure out ... then I'll probably be on board.

Of course, I have to argue the other side as well.  I've been hearing that we need to do the following:
  relevancy should reflect proximity of words, left anchoring of words (prefix searching)
  users should be able to use boolean, to do phrase searches.
This stuff won't work with dismax, and Erik has pointed out to me.  So I've already got a lovely scenario for sticking this complicated formula into the blacklight code somewhere.  (option POST to SOLR, not GET:  my URL will be too long)


KISS over DRY?

Am I alone on this?

One other data point:  I've been *the* early adopter for Bob Haschart's 2.0 refactoring of solrmarc.  The more he tries to automate the installation, the more resistant I became.  I don't *want* something to say "here's your build.properties file for ant".  Likewise, I probably don't want something to say "here's your solrconfig".  I want to have the flexibility to use my existing stuff.

Don't forget:  RoR is daunting to a lot of the potential library blacklight adopters.  One reason VuFind is appealing: PHP seems easier than RoR.  It's not a quest for market share ... but it is a quest to make the tool low barrier for its users.  Having the blacklight <--> solr communication harder to find, harder to put together conceptually ... I think that's raising the barrier.

This is also why I'm trying to keep the stack of technologies small.  I can see git is inevitable;  I'll be ready for it in a little while.  But we're not all savvy RoR people.  We're not all edge folks.  Our institutions aren't all ready for leading edge.   

- Naomi






On Apr 6, 2009, at 11:00 AM, Matt Mitchell wrote:

Sorry, I only sent this to Jamie by accident...

---------- Forwarded message ----------
From: Matt Mitchell <good...@gmail.com>
Date: Mon, Apr 6, 2009 at 1:59 PM
Subject: Re: "fielded" searching, solr_helper,
To: Jamie Orchard-Hays <jami...@mac.com>


Yeah I see. Good points you guys. There is a trade-off for sure and that discussion makes it clear.

If we just provide a hook in the code somehow. We need a place that can has access to the request context, and returns solr request params (along with the request handler path). This was originally done by a single method in the controller (not the best place) called map_solr_params or something. All you had to do was override it and return a new hash based on the request context. That method was literally a switch statement that inspected the request, and returned solr params accordingly. I think it's a pretty simple solution to this, we could just add that method back in.

The nice thing about BL is that it's ruby. So to override that behavior in your own app you could open up SolrHelper in your initialize file:

config/initializers/blacklight.rb

module Standford::SolrHelper
  def map_solr_params(request)
    if request.params[:field] == 'title'
      # ...
    else
      super(request)
    end
  end
end

# then mixin...
Blacklight::SolrHelper.send(:include, Standford::SolrHelper)

Done!


On Mon, Apr 6, 2009 at 1:42 PM, Jamie Orchard-Hays <jami...@mac.com> wrote:
Matt and I and then Erik and I had chats about this. Here's the chat transcript between Erik and me. The upside of using separate Request Handlers is that it makes the Rails side of things simpler at the expense of DRYness on solrconfig.xml. On the other hand, using one Request Handler makes things more complex on the Rails side. Comments please! :-)

1:17:42 PM Erik Hatcher: i think there needs to be some abstraction here... a filter.... of sorts
1:18:09 PM Erik Hatcher: i think its not quite right what she suggests
1:18:15 PM Erik Hatcher: where the drop down maps directly to qt

1:18:24 PM jamieorc: why's that?

1:18:55 PM Erik Hatcher: i think it's more general than that... that the http params (session, et al) goes through something that returns a solr connection kinda, or at least the default params, where qt is only one
1:19:09 PM Erik Hatcher: the drop-down could then map to just twiddling qf directly
1:19:23 PM Erik Hatcher: rather than requiring more solrconfig
1:19:29 PM Erik Hatcher: make sense?

1:19:40 PM jamieorc: yeah

1:19:56 PM Erik Hatcher: mapping just to qt is too limiting
1:20:16 PM Erik Hatcher: you will very likely want blacklight side control of all fiddly solr params stuff

1:21:08 PM jamieorc: so how would I pass in a modified qf?

1:21:31 PM Erik Hatcher: based on the state of a drop-down, as pseudo-code case statement.....
1:21:55 PM Erik Hatcher: solr_params = {:qt => :search, :q => params[:q]}
1:22:05 PM Erik Hatcher: switch drop-down-value
1:22:24 PM Erik Hatcher:    case "title": solr_params[:qf] = "title"
1:22:43 PM Erik Hatcher:   case "author: solr_params[:qf] = "author"
1:23:08 PM Erik Hatcher:   else solr_params[:qf] = "title^2 author^4 description"
1:23:13 PM Erik Hatcher: # or something like that
1:23:25 PM Erik Hatcher: toggle qf instead of qt

1:23:30 PM jamieorc: yeah

1:23:41 PM Erik Hatcher: or parameterize it with all that nested query mojo in yoniks blog entry on our site

1:24:28 PM jamieorc: what I like about this is you don't end up with 4 or 5 gigantic RequestHandlers that are mostly copy-pasted info
1:24:33 PM jamieorc: which is brittle

1:24:37 PM Erik Hatcher: exactly

1:24:57 PM jamieorc: "oops, I forgot to update the title_search RH when I added some field"

1:25:08 PM Erik Hatcher: and of course, you could envision those param mappings being in a db and ui tweakable on the blacklight admin side ;)

1:25:16 PM jamieorc: sure

1:25:23 PM Erik Hatcher: that's why i'm thinking of this as a mapping
1:25:27 PM Erik Hatcher: i mean a filter

1:25:48 PM jamieorc: sure, well I added the blacklight.rb initializer the other day which allows this rather easily

1:25:58 PM Erik Hatcher: some overridable method can look up base solr params for the request
1:26:26 PM Erik Hatcher: cause there is good reason to use different solr request handler configurations too

1:26:28 PM jamieorc: what about pf ?

1:26:32 PM Erik Hatcher: the balance in what to modify where
1:26:39 PM Erik Hatcher: pf?  what about it?

1:27:00 PM jamieorc: that can just remain the same, right?

1:27:11 PM Erik Hatcher: not necessarily
1:27:19 PM Erik Hatcher: it'd likely be similar to the qf

1:28:54 PM jamieorc: ok, so now we have both qf and pf to modify
1:28:58 PM jamieorc: anything else?

1:29:40 PM Erik Hatcher: well, lots of things potentially....
1:29:41 PM Erik Hatcher: bf
1:29:44 PM Erik Hatcher: bq
1:29:54 PM Erik Hatcher: lots of fiddly relevancy related tweaks that can be made

1:30:21 PM jamieorc: ok, once you start having that many things to change, then just having a new RH looks appealing

1:30:34 PM Erik Hatcher: that's why i think it just needs to be some method that takes the current request context and returns the solr request
1:30:45 PM Erik Hatcher: well, there's still the copy-paste argument
1:30:57 PM Erik Hatcher: this is where "generator" sounds appealing ;)
1:31:01 PM Erik Hatcher: from a ruby dsl of sorts
1:32:05 PM Erik Hatcher: so a ruby model that can say main catalog, music, semester-at-sea.... map that to separate request handlers, but still parameterized at request time with blacklight-side tweaks to deep solr params
1:32:14 PM Erik Hatcher: dig?

1:32:38 PM jamieorc: barely. I'm not up to speed on the solr side of this

1:33:32 PM Erik Hatcher: you got it figured out pretty well.... you're right about the copy/paste
1:33:49 PM Erik Hatcher: things like this cry out for a metalevel
1:34:10 PM Erik Hatcher: i still think you need some kind of ActiveSolr-like infrastructure

1:34:25 PM jamieorc: yeah, possibly

1:34:28 PM Erik Hatcher: not something heavy
1:34:31 PM Erik Hatcher: definitely not

On Apr 2, 2009, at 5:21 PM, Naomi Dushay wrote:

http://blacklightopac.org:8080/jira/browse/CODEBASE-16    is a request for a pull down box in the plugin UI so end users can do "title" "author" etc. searching in addition to a plain (or "everything" or "default") search.

The way the "default" search works:

1.  the user enters a query in the search box.  

2.  catalog controller index action starts

    @response = get_search_results(params[:q], params[:f], params[:per_page], params[:page])

where q is the user query, and f represents the facets selected.

3.  In solr_helper, get_search_results does what is expected.   The qt parameter that is part of the solr_params is hardcoded to "search" as you can see.   What this actually means is that the "search" SOLR requestHandler, as defined in solrconfig.xml,  is designated to process the query.


To do other types of searches ("title", "author" ...) effectively, we need to configure SOLR to have some different SOLR document fields it's matching against (e.g. in a "title" search, it might look for a match in title_t, but not in author_t), and different boosting.   As you might guess, the MARC format allows 27 different flavors of titles, of authors, of subjects, of everything.   The best way to handle the magic search algorithms for "title" and "author" and other types of searches is to set up additional RequestHandlers in the solrconfig.xml.   See the attached solrconfig.xml for examples.

So, given a RequestHandler named "search_title",  solr_helper should have a way to do something like this

    solr_params = mapper.map({
      :q=>user_query,
      :phrase_filters => facets,
      :qt=>"title_search",
      :per_page=>num_per_page,
      :page=>page
    })

instead of what is done for the default:

    solr_params = mapper.map({
      :q=>user_query,
      :phrase_filters => facets,
      :qt=>:search,
      :per_page=>num_per_page,
      :page=>page
    })

The way I see it, the selected value in the pull down box sends what will become the "qt" parameter to catalog_controller, and catalog_controller will pass the qt value to a solr_helper method that doesn't have it hardcoded.

Please note:  need way for sites to set up their own values in the pull down box and map them to their own RequestHandler names in their solrconfig, and possibly to opt out of fielded searching altogether.   And of course, the default chosen for the pulldown box is an "everything" search.

There might well be a better way to do this, but this way should work, and it's all I can think of at the moment.

- Naomi

<solrconfig.xml>










Jonathan Rochkind

unread,
Apr 6, 2009, 3:10:56 PM4/6/09
to blacklight-...@googlegroups.com
I know what you're saying Naomi, but in my opinion difficulty of
configuration can also be a barrier.

In my experience/observation, I feel like something saying "Here is your
config file" -- or better yet, not requiring a config file at all for
the simple default case -- is a much lower barrier to entry for non-edge
people than "Here's five different giant config files with lots of
config you don't understand in it, but which you need to fill out
anyway, and some of it is the same config in two different places that
better match or it's all going to break."

You know?

Naomi Dushay

unread,
Apr 6, 2009, 4:08:59 PM4/6/09
to blacklight-...@googlegroups.com
Jamie,

I think your plan is a good idea.  Not because it's mine, but because it's agile:  let's meet the current need expeditiously and refactor when more requirements appear.

- Naomi

Jamie Orchard-Hays

unread,
Apr 6, 2009, 4:19:41 PM4/6/09
to blacklight-...@googlegroups.com
Yeah, that's where I was coming from: get the immediate requirement
done and then revisit.

Erik Hatcher

unread,
Apr 6, 2009, 4:19:54 PM4/6/09
to blacklight-...@googlegroups.com

On Apr 6, 2009, at 4:08 PM, Naomi Dushay wrote:
> I think your plan is a good idea. Not because it's mine, but
> because it's agile: let's meet the current need expeditiously and
> refactor when more requirements appear.

As always. Don't read into any "grand vision" I happen to toss out as
meaning anything other than that, actually.

Toggling a qt based off a drop-down is for sure a worthy start to
this. I think you'll find that the copy/paste brittleness to change
of request handler defaults/invariants will be a bit of a pain with
that long term, but with only a few variations it's still not too
unpleasantly tractable.

I think you'll want to evolve it to being able to have various
parameter setting groups that correspond to those drop down values
though, such that instead of toggling qt, you're using the same qt but
twiddling qf/pf/bq/bf/defType/etc from the drop-down value. That
gives you the ability to leverage rather than constrain Blacklight-
side Solr configurability.

It's a balance, different request handler mappings for some things,
Blacklight-side configurability on those request handlers too.

Erik

Reply all
Reply to author
Forward
0 new messages