facets?

47 views
Skip to first unread message

David Lowenfels

unread,
Jul 17, 2012, 10:26:47 PM7/17/12
to picky...@googlegroups.com
Hello Florian / Picky-world,
I am wondering if picky can do facets? Something like the refinement links with numbers, on the left hand side (brand, gender, etc)
http://www.trailspace.com/gear/boots/midweight/

Thanks,
-David

Picky / Florian Hanke

unread,
Jul 17, 2012, 10:54:10 PM7/17/12
to picky...@googlegroups.com
Hi David,

Facets aren't in Picky … yet. That is, I haven't had anybody request them, until now :) So I haven't thought about an API yet.

Would you mind trying some code and telling me whether this is what you need?

See this example:
https://gist.github.com/3133802

I hope it is :)

Cheers,
   Florian

Picky / Florian Hanke

unread,
Jul 17, 2012, 10:56:13 PM7/17/12
to picky...@googlegroups.com
P.S: I spoke too soon – not only would you like the links with numbers, but also a refined query. That can also be done :)

Picky / Florian Hanke

unread,
Jul 17, 2012, 10:59:07 PM7/17/12
to picky...@googlegroups.com
Regarding the refinements:

Assuming you got the facets from the category :brand (see code in other mail):
{ 'salomon' => 19, 'raichle' => 72, etc. }
you can then put together queries that look like:
"brand:#{key_of_above_hash} #{original_query}"
ie. "brand:salomon my original query"

Does that help or would you like some real world example?

Cheers,
   Florian

On Wednesday, 18 July 2012 12:26:47 UTC+10, David Lowenfels wrote:

David Lowenfels

unread,
Jul 18, 2012, 3:53:03 PM7/18/12
to picky...@googlegroups.com
yes, this is just what I was looking for, thanks!

Florian R. Hanke

unread,
Jul 18, 2012, 11:02:14 PM7/18/12
to picky...@googlegroups.com
Great! It is possible that I will add an actual API soon that will help with generating the queries and the facets (it's easy enough, but still).

If you run into trouble or have questions, just ask again.

Cheers,
   Florian

"Quand tu veux construire un bateau, ne commence pas par rassembler du bois, couper des planches et distribuer du travail, mais reveille au sein des hommes le desir de la mer grande et large." -- Antoine de Saint-Exupery

David Lowenfels

unread,
Jul 20, 2012, 2:05:15 AM7/20/12
to picky...@googlegroups.com
It is the facets that I want to use to refine the query. For instance, search for skis, then refine by men's or women's, or by brand. How could it be possible to run facets only on a subset of the index?

-D

Picky / Florian Hanke

unread,
Jul 20, 2012, 2:28:06 AM7/20/12
to picky...@googlegroups.com
Hi David,

Until I have actually built and optimised facets into the system, the following solution might work quite well for you:

Let's assume you have :name and :surname.
Let's also assume you have filtered on name "david".

So you take all the unfiltered facets (see the gist I sent earlier, finder is a Picky::Search):

filtered_facets_with_totals = index.facets(:surname).map do |token, size|
   result = finder.search "name:david surname:#{token}", 0, 0 # name:david represents all the filters that have already been selected
   [token, result.total]
end.select { |_, total| total > 0 }

In words: Take all the unfiltered facets for a certain category. For each unfiltered facet, run a query including the previously selected filter(s) it to determine if the facet needs to be shown. It needs to be shown if it has more than one result.
Note that I wrote this off the top of my head.

This might look very pedestrian, but it won't be slow – Picky breaks off prematurely if there isn't a successful result in sight.
Does that help?

Cheers,
   Florian

David Lowenfels

unread,
Jul 20, 2012, 4:54:05 PM7/20/12
to picky...@googlegroups.com
okay, here's what a I have so far… it's works great but is very slow… 5 seconds!

ruby-1.9.3-p194 :001 > Product.facet(:brand_name,"climbing jacket")
=> [["mountain", 3], ["hardwear", 3], ["outdoor", 2], ["arc'teryx", 16], ["the", 2], ["north", 2], ["face", 2], ["marmot", 2], ["patagonia", 2], ["research", 2], ["norraana", 1], ["norrona", 1], ["rab", 1]]

this takes a whopping 5 seconds to process!!


class Product < ActiveRecord::Base
...
def self.facet category, query=nil
facet = @@products_index.facets(category)
return facet unless query
facet.map do |token, size|
[token, query("#{query} #{category}:#{token.inspect}", 1000).total ] # I changed this to 1000 because the zeros that you had nulled everything out
end.select { |_, total| total > 0 }
end
end


also, for the indexing of :brand_name category I don't want the text to be split, just sucked in verbatim (and case insensitive I suppose) Can I configure splits_text_on per-category??

Thanks,
David

Picky / Florian Hanke

unread,
Jul 21, 2012, 9:14:07 AM7/21/12
to picky...@googlegroups.com, da...@internautdesign.com
Hi David,

I looked at your data – problem is that for example the brand name contains many unique words and therefore a very large number of facets. As far as I know, other search engines filter out rare words. (I'll send you a separate email with my ideas etc.) This filtering is done using the more_than parameter, see below.
Another idea is to clean up the data that is used for facets in a preprocessing step by normalizing data or removing non-standard keywords, for example.

Picky 4.5.8 (I'll release it soon) will contain two new experimental methods:
Index#facets :brand_name
Index#facets :brand_name, more_than: 0 # More than a certain weight. This example would not include brand names only occurring once.
Search#facets :brand_name, filter: 'climbing jacket', more_than: 2 # Contains only relevant (ie. weight > 2) brand names filtered with climbing jacket. Remember that if you add eg. +4 weight, more_than needs to be +4 higher, so 6 in this case.

Also, yes, you can configure splits_text_on on a category:
data = Index.new :products do
  source { Product.order(:brand_name) }
  indexing splits_text_on: /[\s,]/,
           stopwords: /\b(and|the|of|it|in|for)\b/i
  category :product_name, weight: +5
  category :brand_name, indexing: { splits_text_on: /@/ } # <==== See here.

I hope that helps,
   Florian

Picky / Florian Hanke

unread,
Jul 21, 2012, 9:46:06 AM7/21/12
to picky...@googlegroups.com, da...@internautdesign.com
P.S: 4.5.9 is released including the mentioned experimental methods, also see https://github.com/floere/picky/blob/master/history.textile#version-458--459-%E2%80%9Cruby-and-its-many-facets%E2%80%9D.

David Lowenfels

unread,
Jul 23, 2012, 7:26:50 PM7/23/12
to picky...@googlegroups.com
Now I'm getting weird results with the latest version (4.5.10)

> Product.class_eval("@@products_index").facets(:gender)
=> {"women"=>8.899, "men"=>8.76}

where are the decimals coming from??


def Product.facet category, query=nil
facet = @@products_index.facets(category)
return facet unless query
facet.map do |token, size|
[token, query("#{query} #{category}:#{token.inspect}", 1000).total ]
end.select { |_, total| total > 0 }
end
> Product.facet(:gender,"climbing jacket") => [["women", 5], ["men", 25]]

these facets total up to 30. (The query is much faster now with the new version, even before I reindexed. )
but this says 33:

Product.query("climbing jacket",1000).ids.count
=> 33


and may I suggest that uniq be incorporated into results#ids ?
ruby-1.9.3-p194 :051 > Product.query("climbing jacket",1000).ids.uniq.count
=> 22


-David


Picky / Florian Hanke

unread,
Jul 23, 2012, 8:31:51 PM7/23/12
to picky...@googlegroups.com, da...@internautdesign.com
Hi David,

On Tuesday, 24 July 2012 09:26:50 UTC+10, David Lowenfels wrote:
Now I'm getting weird results with the latest version (4.5.10)

> Product.class_eval("@@products_index").facets(:gender)
 => {"women"=>8.899, "men"=>8.76}

where are the decimals coming from??

Very sorry about not telling you about this change. Facets (as I added them in in 4.5.9+) use the indexed weights instead of the ids sizes.
 
def Product.facet category, query=nil
    facet = @@products_index.facets(category)
    return facet unless query
    facet.map do |token, size|
      [token, query("#{query} #{category}:#{token.inspect}", 1000).total ]
    end.select { |_, total| total > 0 }
end
> Product.facet(:gender,"climbing jacket")      => [["women", 5], ["men", 25]]

these facets total up to 30.

Ok, got it. Please note that in the latest Picky version you can use the facet method on Picky::Search:
your_search_instance.search :gender, filter: 'climbing jacket'
 
(The query is much faster now with the new version, even before I reindexed.)

Yes, I implemented the first optimisation as described in:
 
but this says 33:

Product.query("climbing jacket",1000).ids.count
 => 33

Yes. The difference between queries:
1. 'climbing jacket gender:unisex'
2. 'climbing jacket'
is that Picky searches the last token as if it was searched for partially (it's what's expected in 90% of the cases). So it actually searches:
1. 'climbing jacket gender:unisex*' (I added a non-partial searching " in the version, iirc)
2. 'climbing jacket*'

I assume that it finds 3 more results partially on "jacket" (in query 2).

Currently, Picky still searches partially on the "gender:unisex" part – I will change this for the next version. Sorry for being so wobbly on this feature – it's still experimental, and needs to be refined iteratively. Thanks for helping me.
 
and may I suggest that uniq be incorporated into results#ids ?
ruby-1.9.3-p194 :051 > Product.query("climbing jacket",1000).ids.uniq.count
 => 22

Picky::Search#search has the signature:
def search text, ids = 20, offset = 0, options = {}
If you pass in unique: true in the options, it will return unique results. It will be unique top down. That is, id an id has been used for one allocation of categories, eg. [:name, :surname], it will not be used anymore in a following allocation.

However, this is not yet passed into the queries when filtering facets (which I think, is not the way to go, as one wants a true result count for each facet).

I hope this helps.

Cheers,
   Florian

David Lowenfels

unread,
Jul 23, 2012, 9:44:42 PM7/23/12
to picky...@googlegroups.com
David L wrote:
> > Product.facet(:gender,"climbing jacket") => [["women", 5], ["men", 25]]
> these facets total up to 30.
>
> but this says 33:
>
> Product.query("climbing jacket",1000).ids.count
> => 33

Ah okay I realize the extra three must not have a gender! (at least I hope that's the case because it is an easy explanation)

On Jul 23, 2012, at 5:31 PM, Picky / Florian Hanke wrote:
> Facets (as I added them in in 4.5.9+) use the indexed weights instead of the ids sizes.
> Also see https://github.com/floere/picky/blob/master/history.textile (4.5.9) for details.

> Product.class_eval("@@products_search").facets(:gender)
=> {"women"=>8.899, "men"=>8.76}

how do I convert this from weights to numbers? Weights don't seem to be very useful to me as an end user…
or do I need to use the code I was using previously to do the sub filter? (see bottom of message)

> Product.class_eval("@@products_search").facets(:gender, filter:"climbing jacket")
=> {}
why am I getting an empty hash for this? I was expecting this would give me the same as my previous code.


> Picky::Search#search has the signature:
> def search text, ids = 20, offset = 0, options = {}
> If you pass in unique: true in the options, it will return unique results. It will be unique top down. That is, id an id has been used for one allocation of categories, eg. [:name, :surname], it will not be used anymore in a following allocation.
okay. I never saw this before because I didn't dig into the code.
shouldn't this method be explained on http://florianhanke.com/picky/documentation.html ?
even in the rdoc there is no mention of options[:unique]


this is the code I refer to above:

Picky / Florian Hanke

unread,
Jul 23, 2012, 9:56:24 PM7/23/12
to picky...@googlegroups.com, da...@internautdesign.com
On Tuesday, 24 July 2012 11:44:42 UTC+10, David Lowenfels wrote:
On Jul 23, 2012, at 5:31 PM, Picky / Florian Hanke wrote:
>  Facets (as I added them in in 4.5.9+) use the indexed weights instead of the ids sizes.
> Also see https://github.com/floere/picky/blob/master/history.textile (4.5.9) for details.

> Product.class_eval("@@products_search").facets(:gender)
 => {"women"=>8.899, "men"=>8.76}

how do I convert this from weights to numbers? Weights don't seem to be very useful to me as an end user…
or do I need to use the code I was using previously to do the sub filter? (see bottom of message)

You are right. (Yes, for now, please use the old code to override the Picky code).

I'm currently unsure how to proceed. Using the size is important for the end user. Weight however is important to sort.

Perhaps it should use the size – it's clearer to the end user. One big problem: Returning the correct size in a filtered query is far more complicated than what we're doing already. We're talking adding new indexes to Picky, and I'm not sure I want to go there. Yet.

I need to think more deeply about this – sorry about that.
 
> Product.class_eval("@@products_search").facets(:gender, filter:"climbing jacket")
 => {}
why am I getting an empty hash for this? I was expecting this would give me the same as my previous code.

I'll have to look into it tomorrow morning – my time is very limited at the moment.

Perhaps, for now, it is best to go back to your code until the Picky code gets more refined?
 
> Picky::Search#search has the signature:
> def search text, ids = 20, offset = 0, options = {}
> If you pass in unique: true in the options, it will return unique results. It will be unique top down. That is, id an id has been used for one allocation of categories, eg. [:name, :surname], it will not be used anymore in a following allocation.
okay. I never saw this before because I didn't dig into the code.
shouldn't this method be explained on http://florianhanke.com/picky/documentation.html ?
even in the rdoc there is no mention of options[:unique]

Good point – I updated the documentation. Thanks!

Sorry I can't be more helpful at the moment. I hope you can understand. If all fails, perhaps you could switch to a more advanced (in years/development) search engine?

Cheers,
   Florian

David Lowenfels

unread,
Jul 23, 2012, 10:15:47 PM7/23/12
to picky...@googlegroups.com
you are being plenty helpful, thanks! Far more than I expected :)
My need is not urgent, so I am not in a big rush to get these feature working.

-D

Picky / Florian Hanke

unread,
Jul 23, 2012, 11:40:03 PM7/23/12
to picky...@googlegroups.com, da...@internautdesign.com
My pleasure. Glad to hear it – it's turned into a surprisingly complex feature, but I think I have thought of a good path towards solving it.

Picky / Florian Hanke

unread,
Jul 24, 2012, 4:34:20 AM7/24/12
to picky...@googlegroups.com, da...@internautdesign.com
Alright. If you would like to try with 4.5.12:

On index:
index.facets :surname # => { 'text' => 12, 'other' => 3 }
index.facets :surname, counts: false # => ['text', 'other']
index.facets :surname, at_least: 5 # => { 'text' => 12 }
index.facets :surname, counts: false, at_least: 5 # => ['text']

On search:
search.facets :surname, filter: 'name:david' # => { 'other' => 1 } # Notice: It's not 3 since it is filtered!
search.facets :surname, filter: 'name:david', counts: false # => ['other']
search.facets :surname, filter: 'name:david', at_least: 5 # => {}
search.facets :surname, filter: 'name:david', counts: false, at_least: 5 # => []

I lost some performance, as – as opposed to the weights – Picky does not have a readily available "counts" index. It should not impact too drastically however. 

Cheers & as always: Have fun,
   Florian

David Lowenfels

unread,
Jul 25, 2012, 3:14:06 AM7/25/12
to picky...@googlegroups.com
> Product.facets(:gender)
=> {"women"=>7324, "men"=>6372}
> Product.facets(:gender, filter:"climbing jacket")
=> {}
not sure why I'm getting an empty hash here??

> Product.facets(:gender, filter:"keywords:climbing jacket")
=> {"women"=>4, "men"=>17}
it only works when I specify the category to filter, but there are other matches which come on a more global query when not scoped by category.
i.e. this is the result of my previous code which works by subqueries:
> Product.facet(:gender, "climbing jacket")
=> [["women", 9], ["men", 19]]

Thanks,
David

def self.facets category, opts={}
@@products_search.facets category, opts
end

def self.facet category, query=nil
facet = @@products_search.facets(category)
return facet unless query
facet.map do |token, size|
[token, query("#{query} #{category}:#{token.inspect}", 1000).total ]
end.select { |_, total| total > 0 }
end

Florian R. Hanke

unread,
Jul 25, 2012, 3:23:33 AM7/25/12
to picky...@googlegroups.com
Hi David,

Thanks for trying! Which version were you using?

On Wednesday, 25. July 2012 at 17:14, David Lowenfels wrote:
Product.facets(:gender)
=> {"women"=>7324, "men"=>6372}
Product.facets(:gender, filter:"climbing jacket")
=> {}
not sure why I'm getting an empty hash here??
Not sure either. Maybe because I've rewritten the code to not search partially on the last token.

That is:
Instead of searching (implicitly) for "climbing jacket*" I now search for "climbing jacket". The assumption here is that facets are usually used on an enumeration of "subcategories".

But atm I can only guess. Hope to look into it soon.
Product.facets(:gender, filter:"keywords:climbing jacket")
=> {"women"=>4, "men"=>17}
it only works when I specify the category to filter, but there are other matches which come on a more global query when not scoped by category.
That is curious.

So Products.facets(:gender, filter: 'keywords:climbing jacket') returns a result, but Products.facets(:gender, 'climbing jacket') does not? (Is Products.facets your code or Picky's?)
i.e. this is the result of my previous code which works by subqueries:
Product.facet(:gender, "climbing jacket")
=> [["women", 9], ["men", 19]]
Thanks!

Cheers,
   Florian 

Florian R. Hanke

unread,
Jul 25, 2012, 3:26:46 AM7/25/12
to picky...@googlegroups.com
One idea would be to verify that these 28 results are actually what you need in the results – if it doesn't use up too much time of yours.

On Wednesday, 25. July 2012 at 17:14, David Lowenfels wrote:

Florian R. Hanke

unread,
Jul 25, 2012, 3:43:40 AM7/25/12
to picky...@googlegroups.com
Hi David,

I just tried to reproduce your results.

Running

p products.facets :gender
p products.facets :gender, filter: 'keywords:climbing jacket'
p products.facets :gender, filter: 'climbing jacket'

with your data yields me

{"women"=>7324, "men"=>6372}
{"women"=>10, "men"=>47}
{"women"=>10, "men"=>47}

I'm using the latest Picky::Search#facets code.

Does that help/look ok somehow?

Cheers,
   Florian

On Wednesday, 25. July 2012 at 17:14, David Lowenfels wrote:

David Lowenfels

unread,
Jul 25, 2012, 11:53:04 AM7/25/12
to picky...@googlegroups.com
On Jul 25, 2012, at 12:23 AM, Florian R. Hanke wrote:
> Thanks for trying! Which version were you using?

4.5.12

David Lowenfels

unread,
Jul 25, 2012, 12:00:45 PM7/25/12
to picky...@googlegroups.com
Here are my results with 4.5.12, which look different than yours :-/

products = Product.class_eval("@@products_index")
p products.facets :gender
p products.facets :gender, filter: 'keywords:climbing jacket'
p products.facets :gender, filter: 'climbing jacket'
{"women"=>7324, "men"=>6372}
{"women"=>7324, "men"=>6372}
{"women"=>7324, "men"=>6372}

products = Product.class_eval("@@products_search")
p products.facets :gender
p products.facets :gender, filter: 'keywords:climbing jacket'
p products.facets :gender, filter: 'climbing jacket'

{"women"=>7324, "men"=>6372}
{"women"=>4, "men"=>17}
{}


and my configuration:

@@products_index = Picky::Index.new :products do
source { Product.all }
category :name, weight: 5
category :brand, indexing: { splits_text_on: /$/ }
category :keywords, weight: 4
category :short_description, weight: 3
category :colors, weight: 2
category :sizes
category :gender
end

@@products_search = Picky::Search.new @@products_index do
ignore :colors
ignore_unassigned_tokens
boost [:brand_name, :product_name] => 2
terminate_early
max_allocations 4
# searching max_words: 5
end

David Lowenfels

unread,
Jul 25, 2012, 12:11:24 PM7/25/12
to picky...@googlegroups.com
Yes, the 28 results appear to be good. FYI I am using the search object, not the index.

> Product.facet(:gender,"climbing jacket")
=> [["women", 9], ["men", 19]]

> Product.search("climbing jacket gender:women").map(&:name)
=> ["Arc'teryx Solano Jacket - Women's", "The North Face Sanction Fleece Jacket - Women's", "Norrna Falketind Gore-Tex Pro Shell Jacket - Women's", "Arc'teryx Alpha SL Jacket - Women's", "Patagonia Simple Guide Softshell Jacket - Women's", "Arc'teryx Delta LT Fleece Jacket - Women's", "Outdoor Research Ferrosi Hooded Softshell Jacket - Women's", "Arc'teryx Fission SV Jacket - Women's", "The North Face Verto Jacket - Women's"]

> Product.search("climbing jacket gender:women").map(&:name).size
=> 9

> Product.search("climbing jacket gender:men").map(&:name)
=> ["Arc'teryx Theta AR Jacket - Men's", "Montane eVent Super-Fly XT Jacket - Men's", "Arc'teryx Alpha LT Jacket - Men's", "Arc'teryx Beta AR Jacket - Men's", "Mountain Hardwear Compressor Insulated Hooded Jacket - Men's", "Outdoor Research Transcendent Hooded Down Jacket - Men's", "The North Face Sentinel Windstopper Softshell Jacket - Men's", "Norrna Trollveggen Dri3 Jacket - Men's", "Arc'teryx Theta SV Jacket - Men's", "Arc'teryx Gamma AR Softshell Jacket - Men's", "Marmot Vars Hooded Fleece Jacket - Men's", "Marmot Greenland Baffled Down Jacket - Men's", "Mountain Hardwear G50 Softshell Jacket - Men's", "Arc'teryx Epsilon AR Jacket - Men's", "Arc'teryx Alpha SV Jacket - Men's", "Arc'teryx Delta LT Fleece Jacket - Men's", "Marmot Zeus Down Jacket - Men's", "Outdoor Research Ferrosi Hooded Jacket - Men's", "Rab Latok Jacket - Men's"]

> Product.search("climbing jacket gender:men").map(&:name).size
=> 19

def Product.search query, limit=1000
obj = find @@products_search.search(query, limit, 0, unique: true).ids
end


BTW I wrote all the virtual attributes to hard database columns for faster indexing… if you want an updated SQL dataset I can email you off-list.

Picky / Florian Hanke

unread,
Jul 25, 2012, 11:24:31 PM7/25/12
to picky...@googlegroups.com, da...@internautdesign.com
Hi David,

We didn't use the exact same configuration – or data, which explains our differences. So yes, please, send me an updated set.
Or even better – open a (open/closed) source project online?

> Product.facets(:gender, filter:"climbing jacket")

Do you know why you are getting an empty hash here? One thing I noticed is that you do not remove any special characters (might have changed by now) or split only on /\s/, so that some terms are indexed strangely.

For example:
If keywords are indexed using the default Picky indexer (splitting only on /\s/), it results in eg. this being indexed:
12694,wooljacketwindproofjacketwindproofjacketwinterjacketcasualjacket
which would explain why "jacket" is not found.

I've made it easy in Picky to look at how the data is loaded into the indexes. Open the index/development/products folder and look at the *.txt files, for example.
This will give you the indexed terms, in the order they will be shown in the results.

I hope that helps :)

Cheers,
   Florian

On Thursday, 26 July 2012 02:11:24 UTC+10, David Lowenfels wrote:
Yes, the 28 results appear to be good. FYI I am using the search object, not the index.

> Product.facet(:gender,"climbing jacket")
 => [["women", 9], ["men", 19]]

[...]
Reply all
Reply to author
Forward
0 new messages