How to index association fields properly

45 views
Skip to first unread message

Tiago Cardoso

unread,
Feb 26, 2013, 9:59:52 AM2/26/13
to picky...@googlegroups.com
Hi, I've been researching possible full-text-search solutions for this project of mine in rails and since I'm handling mostly categorized data, I decided to research on your gem. I just couldn't find in the documentation how to add a field of an association to the index. Let's say, I handle cars, and cars are branded have many wheels which are also branded:

class Car < ActiveRecord::Base
  # brand varchar field
  has_many :wheels
end
class Wheel < ActiveRecord::Base
  # brand varchar field
  belongs_to :car
end

So this is (more or less) what I'm trying to achieve:

Picky::Index.new :cars do
  source { Car.all }
  category :brand
  association wheels do
    category :brand
  end
end

so, this association option does not exist. Is there any option available that achieves the behaviour?

I'm using version 4.13.0 (I believe it is the most recent).

Picky / Florian Hanke

unread,
Feb 26, 2013, 5:20:18 PM2/26/13
to picky...@googlegroups.com
Hi Tiago,

Sadly I don't have too much time right now – but let me try :)

Picky is very agnostic when it comes to ORMs and just uses methods to get the data.

Two examples:
category :brand # Calls method #brand and uses the return value as data
category :brand, :from => :all_brands # Calls method #all_brands and uses the return value as data.

In your case it's probably a good idea to create a new method – off the top of my head:

def brands
  [brand, *wheels.map(&:brand)].join(' ')
end

(Or similar – I'm sure there's more elegant ways to generate a string with all brands. The string is necessary as Picky does not yet take data arrays. But there's an issue on Github dedicated to that)
Then, in the index definition:

category :brand, :from => :brands

This is all assuming you want to find a certain car even when "brand:wheel_brand" (and "wheel_brand" is the brand of one of its wheels) is entered into Picky.

I hope this helps even though short – all the best,
   Florian

P.S: I might be convinced to make the :from option more complex, such as, category :brand, :from => [:wheels, :brand], or :from => ->(car) { car.wheels.map(&:brand).join(' ') } :)

Tiago Cardoso

unread,
Feb 27, 2013, 4:36:01 AM2/27/13
to picky...@googlegroups.com
Thks for the explanation.

So basically for such cases I need to have a category that lists all brands separated by white space. What if the brands are themselves whitespaced (like 'American Express')? Will that influence the results in some way?

The :from option answering to a proc would be indeed very helpful. Having the possibilty of nesting from associations would also be cool, but maybe that represents a lot more work from your side and also supporting a new DSL. I'd be alright with the proc option, though.

Cheers


2013/2/26 Picky / Florian Hanke <floria...@gmail.com>

--
You received this message because you are subscribed to the Google Groups "Picky-Ruby" group.
To unsubscribe from this group and stop receiving emails from it, send an email to picky-ruby+...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Tiago Cardoso

unread,
Feb 27, 2013, 5:49:20 AM2/27/13
to picky...@googlegroups.com
Hi Florian

Managed to set it following the leads you gave. I just had an issue with a certain character. Let's say I'd index a title field which value for an instance would be "title1-". If I search for "title", it is found. If I search for "title1", still there. Once I search for "title1-", it is gone. How's Picky working with special characters exactly?

Picky / Florian Hanke

unread,
Feb 28, 2013, 4:21:12 AM2/28/13
to picky...@googlegroups.com
Hi Tiago


On Tuesday, 26 February 2013 23:36:01 UTC-10, Tiago Cardoso wrote:
So basically for such cases I need to have a category that lists all brands separated by white space. What if the brands are themselves whitespaced (like 'American Express')? Will that influence the results in some way?
 
Good question. Normally not, but it depends. If you don't want to split the text data on /\s/ (whitespaces, the default), but have whitespaces indexed and for example split the incoming text into tokens using commas, then you would have to define a different indexing:

Index.new ... do
  indexing splits_text_on: /\,/
end

This would result in the text being tokenized so:
"the input text, another token" -> ['the input text', ' another token']
instead of the default:
"the input text, another token" -> ['the', 'input', 'text', 'another', 'token']

Also see http://pickyrb.com/documentation.html#tokenizing-options for more info on tokenizer/indexing options.

The :from option answering to a proc would be indeed very helpful. Having the possibilty of nesting from associations would also be cool, but maybe that represents a lot more work from your side and also supporting a new DSL. I'd be alright with the proc option, though.

I think I prefer the less-funky-code-is-more approach. With category :bla, Picky simply calls method #bla on the object. When defining a source on the index, it will simply call #each on the source to get each object to index.

Example:
Index.new :bla do
  source { Car.all }
  category :name
end

This simply calls #each on Car.all to get each car, then calls #name on each car to get the name category data.

As an idea, feel free to make a SearchableCar class that uses Car and Wheels and offers a #brand method that returns a combination.

But yeah, I'll be thinking about the proc idea, definitely – or feel free to create a new issue on Github! :)
   Florian

Picky / Florian Hanke

unread,
Feb 28, 2013, 4:23:07 AM2/28/13
to picky...@googlegroups.com
P.S: Depends on how you have set up the Picky project – but there might be a few Rake tasks that help with how data is tokenized. Try for example: rake try[hello]

Picky / Florian Hanke

unread,
Feb 28, 2013, 4:38:04 AM2/28/13
to picky...@googlegroups.com
With the exception of a few reserved characters (eg. '~') it does not do anything differently than with normal characters. But let's find out what is happening. Can you show us your index definition if possible?

Tiago Cardoso

unread,
Feb 28, 2013, 4:54:54 AM2/28/13
to picky...@googlegroups.com
Gladly. My index is as simple as it gets:

Picky::Index.new cars do
  category :brand
  category :wheel_brands, :from => get_wheel_brands
end

and one of the car brands is called 'title-' (it is defined as such in a spec of mine). All the other brands (called /title\d+/) are fetched, except for that one. 

I haven't made any assumptions yet concerning tokenizing, as stated, I'm still only experimenting with it and seeing the possibilities. Perhaps I'd need to tune that.


2013/2/28 Picky / Florian Hanke <floria...@gmail.com>
With the exception of a few reserved characters (eg. '~') it does not do anything differently than with normal characters. But let's find out what is happening. Can you show us your index definition if possible?


On Wednesday, 27 February 2013 00:49:20 UTC-10, Tiago Cardoso wrote:
Managed to set it following the leads you gave. I just had an issue with a certain character. Let's say I'd index a title field which value for an instance would be "title1-". If I search for "title", it is found. If I search for "title1", still there. Once I search for "title1-", it is gone. How's Picky working with special characters exactly?

--

Picky / Florian Hanke

unread,
Feb 28, 2013, 4:23:39 PM2/28/13
to picky...@googlegroups.com
Hi Tiago,

Thanks for the information. You might just have chanced upon a bug!

Run the code between the --- below to see that it doesn't work with a dash at the end, but with a cedilla it does.

Sorry about that! (2.2x the LOC in tests than code, but that seems to be not quite enough yet) I'll look into it tonight. Cheers & thanks,
   Florian

---
require 'picky'

thing = Struct.new :id, :title

data = Picky::Index.new :test do
  category :title
end

data.add thing.new(1, 'title1-')

things = Picky::Search.new data

p things.search 'title'
p things.search 'title1'
p things.search 'title1-'

data = Picky::Index.new :test do
  indexing splits_text_on: /\s/,
           removes_characters: /'[^\w\-]'/
  category :title
end

data.add thing.new(1, 'title1ç')

things = Picky::Search.new data

p things.search 'title'
p things.search 'title1'
p things.search 'title1ç'
---

Tiago Cardoso

unread,
Feb 28, 2013, 5:25:34 PM2/28/13
to picky...@googlegroups.com
Hey Florian,

No problemo, it's all very early stage by now and I wanted to experiment with a full ruby full text search solution.

Just another aside: I'm also experimenting with Sidekiq for queue processing. When I run Sidekiq with Picky, I get some 'stack level too deep' exceptions when I kickstart workers: it marshalls a Hash of attributes that gets sent to Redis, which relies (if I'm not mistaken) on the #to_json method from Hashes, which is overwritten in Picky. It works if I remove Picky from the equation. Didn't test it with resque or delayed_job, though.

Cheers and have a nice weekend,
Tiago


2013/2/28 Picky / Florian Hanke <floria...@gmail.com>
Hi Tiago,
--

Florian Hanke

unread,
Feb 28, 2013, 5:38:22 PM2/28/13
to picky...@googlegroups.com
Just quickly - can you provide a backtrace?

Tiago Cardoso

unread,
Mar 1, 2013, 3:06:05 AM3/1/13
to picky...@googlegroups.com
/Users/tiagocardoso/Projects/project1/vendor/bundler_gems/ruby/1.9.1/gems/multi_json-1.6.1/lib/multi_json/adapters/json_common.rb:22

The stacktrace is not very helpful in this case, but there's something very easy one can do, provided you have a rails application with both gems included in the gemfile:

{a: 1}.to_json

If you have only sidekiq, this will return "{\"a\":1}". If you have both gems, 'stack level is too deep'. I only saw that the to_json method is redefined in picky.

This one is quite hard to track: if I require both gems in the console, the to_json method works. Something about the rails application loading, apparently...


2013/2/28 Florian Hanke <floria...@gmail.com>

Picky / Florian Hanke

unread,
Mar 1, 2013, 4:36:08 AM3/1/13
to picky...@googlegroups.com
Hi Tiago,

Thanks a lot! I'll try to remove the to_json method.
I've also reopened this issue: https://github.com/floere/picky/issues/97
I don't have the nerves/time currently to find out what is happening, sadly. Simply requiring rails does not trigger the issue. What I'd love to see is a minimal problem case (as in: a one script failing case).

Let's try to get this fixed. Cheers and thanks for all your help so far!
   Florian

Tiago Cardoso

unread,
Mar 1, 2013, 5:13:26 AM3/1/13
to picky...@googlegroups.com
Ok, I'll try to check on this further over the weekend, it might be that sidekiq is not the dependency provoking it. I'll let you know about it.
Cheers


2013/3/1 Picky / Florian Hanke <floria...@gmail.com>
--
You received this message because you are subscribed to a topic in the Google Groups "Picky-Ruby" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/picky-ruby/yiEAqi_pnGY/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to picky-ruby+...@googlegroups.com.

Florian Hanke

unread,
Mar 1, 2013, 5:15:04 AM3/1/13
to picky...@googlegroups.com
Great, thanks Tiago!

Picky / Florian Hanke

unread,
Mar 1, 2013, 7:19:38 AM3/1/13
to picky...@googlegroups.com
I have removed the to_json method on Hash in Picky 4.13.1 – can you please try again? Cheers


On Thursday, 28 February 2013 22:06:05 UTC-10, Tiago Cardoso wrote:

Tiago Cardoso

unread,
Mar 1, 2013, 9:41:16 AM3/1/13
to picky...@googlegroups.com
Hi,

yup, it does seem to work, thx!

Cheers


2013/3/1 Picky / Florian Hanke <floria...@gmail.com>
--
You received this message because you are subscribed to a topic in the Google Groups "Picky-Ruby" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/picky-ruby/yiEAqi_pnGY/unsubscribe?hl=en.
To unsubscribe from this group and all its topics, send an email to picky-ruby+...@googlegroups.com.

Picky / Florian Hanke

unread,
Mar 1, 2013, 10:14:26 PM3/1/13
to picky...@googlegroups.com
Awesome! Let me know if you decide to use it :)

Cheers,
   Florian
Reply all
Reply to author
Forward
0 new messages