Google Groups

Re: [picky:90] facets?

Picky / Florian Hanke Jul 21, 2012 6:14 AM
Posted in group: Picky-Ruby
Hi David,

I looked at your data – problem is that for example the brand name contains many unique words and therefore a very large number of facets. As far as I know, other search engines filter out rare words. (I'll send you a separate email with my ideas etc.) This filtering is done using the more_than parameter, see below.
Another idea is to clean up the data that is used for facets in a preprocessing step by normalizing data or removing non-standard keywords, for example.

Picky 4.5.8 (I'll release it soon) will contain two new experimental methods:
Index#facets :brand_name
Index#facets :brand_name, more_than: 0 # More than a certain weight. This example would not include brand names only occurring once.
Search#facets :brand_name, filter: 'climbing jacket', more_than: 2 # Contains only relevant (ie. weight > 2) brand names filtered with climbing jacket. Remember that if you add eg. +4 weight, more_than needs to be +4 higher, so 6 in this case.

Also, yes, you can configure splits_text_on on a category:
data = :products do
  source { Product.order(:brand_name) }
  indexing splits_text_on: /[\s,]/,
           stopwords: /\b(and|the|of|it|in|for)\b/i
  category :product_name, weight: +5
  category :brand_name, indexing: { splits_text_on: /@/ } # <==== See here.

I hope that helps,

On Saturday, 21 July 2012 06:54:05 UTC+10, David Lowenfels wrote:
okay, here's what a I have so far… it's works great but is very slow… 5 seconds!

ruby-1.9.3-p194 :001 > Product.facet(:brand_name,"climbing jacket")
 => [["mountain", 3], ["hardwear", 3], ["outdoor", 2], ["arc'teryx", 16], ["the", 2], ["north", 2], ["face", 2], ["marmot", 2], ["patagonia", 2], ["research", 2], ["norraana", 1], ["norrona", 1], ["rab", 1]]

this takes a whopping 5 seconds to process!!

class Product < ActiveRecord::Base
  def self.facet category, query=nil
    facet = @@products_index.facets(category)
    return facet unless query do |token, size|
      [token, query("#{query} #{category}:#{token.inspect}", 1000).total ] # I changed this to 1000 because the zeros that you had nulled everything out { |_, total| total > 0 }

also, for the indexing of :brand_name category I don't want the text to be split, just sucked in verbatim (and case insensitive I suppose) Can I configure splits_text_on per-category??