Hi David,
1. sort_by
The sort_by sounds tempting but doesn't usually work. The problem is this:
Assuming you have 100'000 search results, sorting them will take an extremely long time.
What you get by default from Picky are the top 20 results. Sorting them would be quick. However, to correctly sort the results you have to go through all of them (eg. 100K), sort them, and return the top 20 results.
So I suggest something akin to the following:
# Build an index for each sort order.
#
# Also create a separate search interface for each index.
#
# Also: Create a Sinatra action for each search interface.
#
[:name, :surname, :age].each do |order_attribute|
index = Picky::Index.new order do
source { Person.order(order_attribute) }
category:name
category :surname
category :age
end
finder = Picky::Search.new index ...
get "/#{order_attribute}" do
results = finder.search params[:query], params[:ids] || 20, params[:offset] || 0
results.to_json
end
end
This is just an example. You are completely free on how to do this. For example, you could install a single Sinatra action:
# searches is a hash:
# { 'order' => search_instance, ... }
#
get '/search' do
order = params[:order]
results = searches[order].search params[:query], params[:ids] || 20, params[:offset] || 0
results.to_json
end
# Search using:
# curl 'localhost:3000/search?query=blah&order=surname'
#
# And the order param will define the sort order of the results.
#
As usual, you are completely free in how you proceed. However, Picky just doesn't yet do any ordering work for you. I could encapsulate the whole order stuff but haven't had the time yet – sorry!
2. Matching relevancy
I hope this helps already a bit:
Picky groups results in allocations, where search tokens are allocated to an index' categories. So for example, "David Lowenfels" contains two tokens, "David" and "Lowenfels", which are allocated to categories :first_name and :last_name, but also :first_name and :first_name. The first allocation is judged to be more important than the second one by Picky, and so it assigns that allocation a higher score.
Please note that all results in that allocation get the same score. And since Picky is not a full text search engine, some person could be called "David David Miller", with "David David" as first name. That person would not get preference over you, even though there's a repetition of "David". For Picky, word frequency in a single category is not important. It is much more interested in allocating search tokens as well as possible to categories (imho much more important, especially if you have multiple categories – which you usually do, as opposed to a standard full text case, where you have exactly one category, namely "text").
See the above link on the Picky result to find the score.
Does this help? If not, please tell me what you meant.
3. Boost
To each allocation (group of results) Picky assigns a score based on the index data. Sometimes you want to influence this – perhaps you noticed that a certain combination, eg. [:surname, :firstname] is never used in searches, so you can tell Picky that that combination is unlikely by assigning it a "boost" of -3. Picky will then regard that combination as relatively unlikely, even if the index data suggests otherwise.
Or, you found that people usually search for [:zipcode, :city]. You'd then boost exactly that combination, so that if somebody searched for "1234 Seattle", Picky would not think that this is the weight of a truck in seattle (if you had a database with trucks and their locations ;) ).
Note that you can only boost a given allocation [:something, :something_else], not a single attribute. I hope this is helpful to you.
So, in closing, to answer your question: Boost is a number added to the score generated from the index data for that search result:
Final Score = Score calculated from the index data for this search + Boost for this specific allocation.
Does that help?
If you want to, you can describe the kind of data you have, and what you are looking for in it usually. I'm probably better able to help if I know.
Cheers,
Florian