Hi Joel and everyone,
First off, thank you for the praise!
Great question – there's a few ways to answer this. I, personally, have used two of them myself. See below.
There might be other ideas others could contribute.
> What about if you have a couple of objects tied together through relationships - has and belongs to type relationships? Has there been a case where you had to index multiple tables or objects and return the parent object as a search result?
The first application, the one Picky originated from, actually had exactly this case.
For me, there are three ways of resolving this:
- Aggregating the data in the query (the backend).
- Aggregating the data in the model.
- Aggregating the data anywhere, then exporting into a CSV.
Aggregating the data in the query:
The first way is the one I used in the first app, whose data came from a MySQL DB.
Apart from just using each to iterate over objects to index, Picky also provides explicit Sources. One of these Sources – Sources::CSV you've seen in the example app – is Sources::DB.
In the simple form, using a simple SELECT it gathers data on the DB like this:
Sources::DB.new('SELECT id, title, author, year FROM books', file: 'app/db.yml')
You'd pass this to the source method in the index definition:
books_index = Index::Memory.new(:books) do
source Sources::DB.new('SELECT id, title, author, year FROM books', file: 'app/db.yml')
category :title
# ...
end
But you can use as complex SELECT statements as you'd like (here we JOIN two tables and get the data ordered as we want it to be):
"SELECT a.*, p.phone1 AS home_phone, p.phone2 AS mobile_phone FROM addresses a LEFT OUTER JOIN phone_numbers p ON p.address_id =
a.id WHERE a.company = 1 AND a.deleted_at iS NULL ORDER BY LTRIM(CONCAT_WS(' ', a.last_name, a.first_name))"
On this you could define categories "id", "first_name", "last_name" (from a.*) and potentially more, and "home_phone" and "mobile_phone".
So what you do is put together a combined address-phone record on the DB on the fly which is then indexed.
I don't know if this is doable with Mongo.
http://github.com/ClintKrollwood actually started on a Sources::Mongo, but he is now – afaik – using the #each method.
And with that we come to the second of my options.
Aggregating the data in the model:
First off, I have to admit that my experience with MongoDB and especially its Ruby adapter is very limited.
It seems that Mongoid also has a notion of proxies (called criteria) and kickers (called finders).
So you'd find e.g. a famous spy using
Person.where(last_name: "Bond").and(first_name: /^J(\.|ames)?$/).asc(:last_name, :first_name).all
Of course, we all know that it's even easier finding a famous spy by employing a beautiful female spy as a honeypot, after buying a remote volcanic island and a few sharks, but yeah…
The thing here is that Mongoid is all geared toward the idea of the object, so Person, or Address etc. and not so much around having compound objects.
Just from the (snazzy) Mongoid documentation I'd do it this way. If I wanted to index a Post object together with its Comment-s, I'd create another class, IndexedPost, which describes the parent object together with the associated objects:
class IndexedPost
include Mongoid::Document
store_in :posts # To denote where the data for the parent object comes from.
# Accessor for the text of the comments.
#
def comment_texts
comments.map(&:text).join(' ')
end
# Accessor for the authors of the comments.
#
def comment_authors
comments.map(&:author).join(' ')
end
end
Then, in Picky, use e.g. as follows:
posts_index = Index::Memory.new :posts do
key_format :to_s # To tell Picky to use String keys.
source IndexedPost.asc(:title).all # Might work without .all, not sure.
category :title,
qualifiers: [:t, :title],
partial: Partial::Substring.new(:from => 1),
similarity: Similarity::DoubleMetaphone.new(2)
category :author, partial: Partial::Substring.new(:from => -2)
# ...
category :comment_texts, partial: Partial::None.new
category :comment_authors, partial: Partial::Substring.new(:from => -2)
end
This gets all indexed posts in ascending order and indexes title, author, comment_texts, and comment_authors.
But this is all from reading the documentation.
The third way is just using powerful export / aggregation options of whatever data source one is using and saving the prepared data into a CSV, then indexing from there.
This is sometimes the way to go when the data needs to be heavily preprocessed using sed, or Ruby.
Ok, I hope that answers part of your question(s). If anyone more versed in MongoDB has ideas, please chime in!
Picky has a bit of a way to go regarding data sources, and questions like that really help! So thanks and all the best
Florian