Indexing and searching against related objects

Joel

unread,

May 24, 2011, 11:13:49 AM5/24/11

to picky...@googlegroups.com

Hi everyone -

This morning I took my first pass at integrating picky with an app and ran into one difficult spot. After reading through the docs I see that having a source that responds to .each can be indexed without a whole lot of pain. What about if you have a couple of objects tied together through relationships - has and belongs to type relationships? Has there been a case where you had to index multiple tables or objects and return the parent object as a search result? Right now it looks like it's searching against a somewhat flat data-set - like a single table and its properties, or the sample CSV file.

Example use-case for my purposes - Blog posts and their comments, or messageboard topics and the posts.

The app I'm working with is also backed by mongo (with mongoid) so I'm guessing I might be adding to my own complexity. I imagine, however, if the picky API supported specifying the child objects and their properties then it would be somewhat ORM or database agnostic.

Anyway, Florian thanks again for the great work on Picky!

- Joel

Picky

unread,

May 25, 2011, 1:17:43 AM5/25/11

to picky...@googlegroups.com

Hi Joel and everyone,

First off, thank you for the praise!

Great question – there's a few ways to answer this. I, personally, have used two of them myself. See below.

There might be other ideas others could contribute.

> What about if you have a couple of objects tied together through relationships - has and belongs to type relationships? Has there been a case where you had to index multiple tables or objects and return the parent object as a search result?

The first application, the one Picky originated from, actually had exactly this case.

For me, there are three ways of resolving this:

- Aggregating the data in the query (the backend).

- Aggregating the data in the model.

- Aggregating the data anywhere, then exporting into a CSV.

Aggregating the data in the query:

The first way is the one I used in the first app, whose data came from a MySQL DB.

Apart from just using each to iterate over objects to index, Picky also provides explicit Sources. One of these Sources – Sources::CSV you've seen in the example app – is Sources::DB.

In the simple form, using a simple SELECT it gathers data on the DB like this:

Sources::DB.new('SELECT id, title, author, year FROM books', file: 'app/db.yml')

You'd pass this to the source method in the index definition:

books_index = Index::Memory.new(:books) do

source Sources::DB.new('SELECT id, title, author, year FROM books', file: 'app/db.yml')

category :title

# ...

end

But you can use as complex SELECT statements as you'd like (here we JOIN two tables and get the data ordered as we want it to be):

"SELECT a.*, p.phone1 AS home_phone, p.phone2 AS mobile_phone FROM addresses a LEFT OUTER JOIN phone_numbers p ON p.address_id = a.id WHERE a.company = 1 AND a.deleted_at iS NULL ORDER BY LTRIM(CONCAT_WS(' ', a.last_name, a.first_name))"

On this you could define categories "id", "first_name", "last_name" (from a.*) and potentially more, and "home_phone" and "mobile_phone".

So what you do is put together a combined address-phone record on the DB on the fly which is then indexed.

I don't know if this is doable with Mongo. http://github.com/ClintKrollwood actually started on a Sources::Mongo, but he is now – afaik – using the #each method.

And with that we come to the second of my options.

Aggregating the data in the model:

First off, I have to admit that my experience with MongoDB and especially its Ruby adapter is very limited.

It seems that Mongoid also has a notion of proxies (called criteria) and kickers (called finders).

So you'd find e.g. a famous spy using

Person.where(last_name: "Bond").and(first_name: /^J(\.|ames)?$/).asc(:last_name, :first_name).all

Of course, we all know that it's even easier finding a famous spy by employing a beautiful female spy as a honeypot, after buying a remote volcanic island and a few sharks, but yeah…

The thing here is that Mongoid is all geared toward the idea of the object, so Person, or Address etc. and not so much around having compound objects.

Just from the (snazzy) Mongoid documentation I'd do it this way. If I wanted to index a Post object together with its Comment-s, I'd create another class, IndexedPost, which describes the parent object together with the associated objects:

class IndexedPost

include Mongoid::Document

store_in :posts # To denote where the data for the parent object comes from.

# Accessor for the text of the comments.

#

def comment_texts

comments.map(&:text).join(' ')

end

# Accessor for the authors of the comments.

#

def comment_authors

comments.map(&:author).join(' ')

end

Then, in Picky, use e.g. as follows:

posts_index = Index::Memory.new :posts do

key_format :to_s # To tell Picky to use String keys.

source IndexedPost.asc(:title).all # Might work without .all, not sure.

category :title,

qualifiers: [:t, :title],

partial: Partial::Substring.new(:from => 1),

similarity: Similarity::DoubleMetaphone.new(2)

category :author, partial: Partial::Substring.new(:from => -2)

# ...

category :comment_texts, partial: Partial::None.new

category :comment_authors, partial: Partial::Substring.new(:from => -2)

end

This gets all indexed posts in ascending order and indexes title, author, comment_texts, and comment_authors.

But this is all from reading the documentation.

The third way is just using powerful export / aggregation options of whatever data source one is using and saving the prepared data into a CSV, then indexing from there.

This is sometimes the way to go when the data needs to be heavily preprocessed using sed, or Ruby.

Ok, I hope that answers part of your question(s). If anyone more versed in MongoDB has ideas, please chime in!

Picky has a bit of a way to go regarding data sources, and questions like that really help! So thanks and all the best

Florian

Picky

unread,

May 25, 2011, 1:19:11 AM5/25/11

to picky...@googlegroups.com

P.S: I forgot to define the relationship with the comments in class IndexedPost. So just imagine an embeds_many :comments there.

Reply all

Reply to author

Forward