sort order / boost

26 views
Skip to first unread message

David Lowenfels

unread,
Jul 18, 2012, 4:29:19 PM7/18/12
to picky...@googlegroups.com
I figure I can do simply use #sort_by on my own after the query.
Is there a way I can get the matching relevancy from picky? And how does this relate to boost?

Thanks,
David

Florian wrote:
> One thing: You'd like to change sort order on the fly. Now, in Picky I made the conscious design decision to not include a sorter in the process (time consuming, therefore faster – also, in most cases, developers only need a single sort order). That means the order in which the items are in the index will be the order they are returned as results. That means the order of items per index are fixed. You'd have to have multiple indexes, each with a different sort order, and call them depending on which sort order you want. This is very fast, of course, but the price to pay is space. If you still decide to go ahead with it, don't hesitate to ask the list for help regarding how to do the choosing of the right sort order, ie. index. I suggest just trying it :) Also, remember that you can create the indexes in a nice [:order1, :order2, :order3].each do ... end interation, and don't have to copy paste.
>

Picky / Florian Hanke

unread,
Jul 18, 2012, 11:49:58 PM7/18/12
to picky...@googlegroups.com
Hi David,

1. sort_by

The sort_by sounds tempting but doesn't usually work. The problem is this:
Assuming you have 100'000 search results, sorting them will take an extremely long time.
What you get by default from Picky are the top 20 results. Sorting them would be quick. However, to correctly sort the results you have to go through all of them (eg. 100K), sort them, and return the top 20 results.

So I suggest something akin to the following:

# Build an index for each sort order.
#
# Also create a separate search interface for each index.
# Also: Create a Sinatra action for each search interface.
#
[:name, :surname, :age].each do |order_attribute|
   index = Picky::Index.new order do
      source { Person.order(order_attribute) }
      category:name
      category :surname
      category :age
   end

   finder = Picky::Search.new index ...

   get "/#{order_attribute}" do
      results = finder.search params[:query], params[:ids] || 20, params[:offset] || 0
      results.to_json
   end
end

This is just an example. You are completely free on how to do this. For example, you could install a single Sinatra action:

# searches is a hash:
# { 'order' => search_instance, ... }
#
get '/search' do
   order = params[:order]
   results = searches[order].search params[:query], params[:ids] || 20, params[:offset] || 0
   results.to_json
end

# Search using:
# curl 'localhost:3000/search?query=blah&order=surname'
#
# And the order param will define the sort order of the results.
#

As usual, you are completely free in how you proceed. However, Picky just doesn't yet do any ordering work for you. I could encapsulate the whole order stuff but haven't had the time yet – sorry!

2. Matching relevancy

I hope this helps already a bit:

Picky groups results in allocations, where search tokens are allocated to an index' categories. So for example, "David Lowenfels" contains two tokens, "David" and "Lowenfels", which are allocated to categories :first_name and :last_name, but also :first_name and :first_name. The first allocation is judged to be more important than the second one by Picky, and so it assigns that allocation a higher score.

Please note that all results in that allocation get the same score. And since Picky is not a full text search engine, some person could be called "David David Miller", with "David David" as first name. That person would not get preference over you, even though there's a repetition of "David". For Picky, word frequency in a single category is not important. It is much more interested in allocating search tokens as well as possible to categories (imho much more important, especially if you have multiple categories – which you usually do, as opposed to a standard full text case, where you have exactly one category, namely "text").

See the above link on the Picky result to find the score.

Does this help? If not, please tell me what you meant.

3. Boost

To each allocation (group of results) Picky assigns a score based on the index data. Sometimes you want to influence this – perhaps you noticed that a certain combination, eg. [:surname, :firstname] is never used in searches, so you can tell Picky that that combination is unlikely by assigning it a "boost" of -3. Picky will then regard that combination as relatively unlikely, even if the index data suggests otherwise.
Or, you found that people usually search for [:zipcode, :city]. You'd then boost exactly that combination, so that if somebody searched for "1234 Seattle", Picky would not think that this is the weight of a truck in seattle (if you had a database with trucks and their locations ;) ).

Note that you can only boost a given allocation [:something, :something_else], not a single attribute. I hope this is helpful to you.

So, in closing, to answer your question: Boost is a number added to the score generated from the index data for that search result:
Final Score = Score calculated from the index data for this search + Boost for this specific allocation.

Does that help?

If you want to, you can describe the kind of data you have, and what you are looking for in it usually. I'm probably better able to help if I know.

Cheers,
   Florian

David Lowenfels

unread,
Jul 19, 2012, 12:52:43 PM7/19/12
to picky...@googlegroups.com
I'm only expecting around 300 results, so sorting in ruby shouldn't be too heinous. Actually, can I tell picky to limit the results?

My search data is sporting goods products, very similar to the site I showed earlier for facets by brand, etc.

products_index = Picky::Index.new :products do
source { Product.all }
category :product_name
category :brand_name
category :keywords
category :short_description
category :variants
end

If the query has matches in keywords, I want to boost its relevancy. It it matches in product_name I want to boost it even more.
I suppose I need to do this with index weights rather than boost?

Thanks,
-D

David Lowenfels

unread,
Jul 19, 2012, 4:51:37 PM7/19/12
to picky...@googlegroups.com
okay, I just clarified something that I was confused about...
For some reason I thought boost could be used only for category combinations.

from the features page:
> Weighing not only categories, but combinations!{ [:title, :author] => +3, [:isbn, :author] => -5 }
this kind of makes it seem like only combinations can be used. it would be good to show an example.

also, it would be nice if we could provide an argument without the array,
e.g. { :foo => 1, :bar => 2 , [:some, :combo] => -1 }
is this already possible but just not documented? I'm not sure where to look in the code to see for myself.

Thanks,
-David

Picky / Florian Hanke

unread,
Jul 19, 2012, 10:45:24 PM7/19/12
to picky...@googlegroups.com, da...@internautdesign.com
Hi David,

You can tell Picky to limit the results using the second parameter. In the example I gave it used 20 as a default:
some_search.search params[:query], params[:ids] || 20, params[:offset] || 0

Weights instead of boosts: Very good assumption! If you'd like to boost a single category globally, you boost it using weights in the index. It's defined there since this automatically tells you it is inherently heavier for all searches that use this index.

I could add the option of:
weight: +6
That sounds like a good addition. Thoughts?

For now you could use:
weight: Weights::Constant.new(6)
to get a really heavy weight for your keywords. Or, of course define your own instance of an object that responds to #weight_for(amount_of_ids_for_token) and returns a float.

Does that help?

Cheers,
   Florian

Picky / Florian Hanke

unread,
Jul 19, 2012, 10:48:13 PM7/19/12
to picky...@googlegroups.com, da...@internautdesign.com
Hi David,

I'm sorry. Your first assumption was correct.
Picky makes a bit of a point in that people should think about their searches and how people search and thus only allows combinations to be boosted.
However, as described in the last mail you can give more weight to your categories.

Cheers,
   Florian

David Lowenfels

unread,
Jul 20, 2012, 2:12:16 AM7/20/12
to picky...@googlegroups.com
this is why I was confused…

https://github.com/floere/picky/wiki/Searches-Configuration
> # weights option
> books = Search.new books_index do
> boost [:author] => 6,
> [:title, :author] => 5,
> [:author, :year] => 2
> end

> Giving [:author] => 6 means that if results are found where Picky thinks that one or all search terms are in the title, it is weighed by 6 (a lot) higher.

looks like this page needs to be updated?
I think it makes logical sense to put linear weighting here (where it is all together in one place), but I guess you have it on the index now rather than the search.

-D

David Lowenfels

unread,
Jul 20, 2012, 2:21:11 AM7/20/12
to picky...@googlegroups.com
Yes, that would be nice. Picky::Weights::Constant.new is quite a mouthful.
And I'm assuming that if I change the weights in the index definition, then I need to rebuild the index?

It seems to take quite a long time to reindex, the more categories there are.
i.e. first category is indexed quickly, then each one subsequently takes almost exponentially longer :(
I currently have about 33k products (and that's only for a test-driving a prototype, plan to add maybe 1 million)

-D

Picky / Florian Hanke

unread,
Jul 20, 2012, 2:40:37 AM7/20/12
to picky...@googlegroups.com, da...@internautdesign.com
See below…


On Friday, 20 July 2012 16:12:16 UTC+10, David Lowenfels wrote:
this is why I was confused…

https://github.com/floere/picky/wiki/Searches-Configuration
> # weights option
> books = Search.new books_index do
>   boost [:author]          => 6,
>             [:title, :author]  => 5,
>             [:author, :year]   => 2
> end

> Giving [:author] => 6 means that if results are found where Picky thinks that one or all search terms are in the title, it is weighed by 6 (a lot) higher.

looks like this page needs to be updated?

It is correct. If you search using a single word, then if it is found as an author, it will get +6.

However, I should clarify it, you are quite right. Thank you!
 
I think it makes logical sense to put linear weighting here (where it is all together in one place), but I guess you have it on the index now rather than the search.
 
It does make sense – however, the semantics need to be slightly different: If defined on the search, it is only valid for the search. if defined on the index, it is valid for all searches using the index.

If this was defined:
[:name, :surname] => +6,
:name => +3
Would the name surname combination then get +9? I am wondering how the semantics should be.

Cheers,
   Florian

Picky / Florian Hanke

unread,
Jul 20, 2012, 2:50:17 AM7/20/12
to picky...@googlegroups.com, da...@internautdesign.com
On Friday, 20 July 2012 16:21:11 UTC+10, David Lowenfels wrote:
Yes, that would be nice. Picky::Weights::Constant.new is quite a mouthful.

Alright, it's in 4.5.7 :)

Also, the Constant one is a constant value that is set as weight. The one I introduced right now (weight: +7) will instantiate a Picky::Weights::Logarithmic.new(+7) one. This one uses the default logarithmic measure, but adds +7 to the result, as boosting does. 
 
And I'm assuming that if I change the weights in the index definition, then I need to rebuild the index?

As it stands, yes.
 
It seems to take quite a long time to reindex, the more categories there are.
i.e. first category is indexed quickly, then each one subsequently takes almost exponentially longer :(

That sounds quite strange and I haven't seen behaviour such as that one ever in all the Picky projects I've personally seen (around 10-12). The separate categories do not interact, so they should not slow each other down. It could be that all subsequent categories get more complex to index, but it still shouldn't be that slow.
 
I currently have about 33k products (and that's only for a test-driving a prototype, plan to add maybe 1 million)

Also, 33k products should be very quick to index.
I am guessing you should be able to index about 2-5K objects per second.

How long does it take in total?
I think it would help if you could post your Picky configuration (app.rb) somewhere that we can look at it (assuming it's ok to publish). Maybe there is an obvious problem – probably not, but who knows?
Also, if you could describe the data (or even send the prototype data), we could try it ourselves and advise.

Cheers,
   Florian

David Lowenfels

unread,
Jul 20, 2012, 5:46:18 PM7/20/12
to picky...@googlegroups.com
indexing took on the order of 20 minutes or so… I didn't time it but it was long enough that I had to leave my computer and come back later. I just sent you my data and configuration off-list.

On further thought the slowness could very likely be that some of the categories are virtual attributes… two of which get parsed from XML (colors, sizes). The third one, gender is parsed from a regex on the product_name. I attempted a migration to store these virtual attributes in the database, but aborted in the middle as I didn't think it was a high priority. I guess I will try this again and see how it improves things.

-D

David Lowenfels

unread,
Jul 20, 2012, 6:08:03 PM7/20/12
to picky...@googlegroups.com
okay, silly me. it was those virtual attributes mucking things up.
I just ran it and took 141s.

Florian R. Hanke

unread,
Jul 20, 2012, 8:00:23 PM7/20/12
to picky...@googlegroups.com
Hi David,

Good to hear! So, are all speed issues gone, or does the facets one persist? (I'm baffled by the low performance, and will look into it)

Cheers,
   Florian

David Lowenfels

unread,
Jul 20, 2012, 8:26:36 PM7/20/12
to picky...@googlegroups.com
The facets subquery takes an average of 1.5 seconds.

ruby-1.9.3-p194 :032 > Benchmark.realtime{ Product.facet(:brand_name,"climbing jacket") }
=> 1.690054

Can you tell me if it's possible to index a category without splitting on whitespace?

Thanks,
David

Florian Hanke

unread,
Jul 20, 2012, 10:28:32 PM7/20/12
to picky...@googlegroups.com
Something is very wrong here - I'm out in the country until tomorrow or the evening. Hope to investigate it soon.

Yes, us the :indexing => { hash with options as if it was the indexing method on an index } option on category:
category :something, indexing: { split... etc. } or pass in a tokenizer.

Picky / Florian Hanke

unread,
Jul 21, 2012, 9:23:00 AM7/21/12
to picky...@googlegroups.com
Hi David,

See the other post for an answer on this. Also, I'll send you an email regarding your specific case.

Cheers,
   Florian
Reply all
Reply to author
Forward
0 new messages