Obtain results limited to ids from other results

22 views
Skip to first unread message

Luis Lavena

unread,
Jul 26, 2012, 4:20:51 PM7/26/12
to thinkin...@googlegroups.com
Hello,

I-ve been trying to make Sphinx solve a problem for me: obtain a list
of Tags, ordered by usage (@count) but avoid having tags that combined
cause an empty result.

Given the models:

class Post < ActiveRecord::Base
define_index do
indexes :title, :description
indexes tags(:name)
end
end

class Tagging < ActiveRecord::Base
belongs_to :post
belongs_to :tag

delegate :name, :slug, :to => :tag

define_index do
indexes tag(:name)

has post(:id), :as => :post_id
end
end

I want to obtain a list of tags that only matches the ones used in the
posts, so if I do:

Post.search "foo"

I would like the tags be also limited to the list of posts and not other tags.

Right now, I do this by limiting taggings to a list of Post IDs:

total_posts = Post.search_count(keywords)
post_ids = Post.search_for_ids keywords, :limit => total_posts

taggings = Tagging.search keywords, :with => { :post_id => post_ids }

As you notice, this does not scale since post_ids can contain 10K records.

I can't think on a way to retrieve a list of tags that only apply to
the list of posts that are involved in the results, so when combined
keywords it does not end in a empty results scenario.

Think of this in a combinatory tag cloud or something like that :P

Perhaps (most likely) I'm doing it wrong, so will appreciate any
comments and suggestions that help me think out of the box.

Thank you.
--
Luis Lavena
AREA 17
-
Perfection in design is achieved not when there is nothing more to add,
but rather when there is nothing more to take away.
Antoine de Saint-Exupéry

Pat Allan

unread,
Aug 4, 2012, 12:24:39 PM8/4/12
to thinkin...@googlegroups.com
Hi Luis

Sorry for not responding sooner. Unfortunately, I can't think of any way to do this - there's no way to refer to other indices or have subqueries within queries. Not saying what you want to do is impossible… but certainly, I've no idea how to do it.

Cheers

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.
> To post to this group, send email to thinkin...@googlegroups.com.
> To unsubscribe from this group, send email to thinking-sphi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/thinking-sphinx?hl=en.
>



Luis Lavena

unread,
Aug 6, 2012, 12:36:53 PM8/6/12
to thinkin...@googlegroups.com
On Saturday, August 4, 2012 1:24:39 PM UTC-3, Pat Allan wrote:
Hi Luis

Sorry for not responding sooner. Unfortunately, I can't think of any way to do this - there's no way to refer to other indices or have subqueries within queries. Not saying what you want to do is impossible… but certainly, I've no idea how to do it.


Thank you Pat,

I figure out a workaround by using Facets tags(:id) on the post:

class Post
  define_index do
    # ...
    has tags(:id), :as => :tag_ids, :type => :multi
  end
end

And combined with some scopes, I was able to retrieve the accumulated tag_ids from it.

But, while I got the results from Sphinx pretty fast, iterating over search bundle and collecting the counters takes considerable amount of time, leaving me in the same situation as first scenario.

I'm thinking use a C extension to iterate over the search results and extract the sphinx attributes, but haven't figure out all the details yet.

Perhaps there is a way to collect these sphinx attributes without looping through? maybe use Riddle directly instead?

While looking into this, some colleagues comment to me that ElasticSearch gives you indexed terms ordered by usage when you do a query, but those terms can't be paginated (which is something I need too).

I'll keep investigating into this and if I find an alternative will post it here for others.

Thank you again for your time,
--
Luis Lavena

Pat Allan

unread,
Aug 6, 2012, 12:56:11 PM8/6/12
to thinkin...@googlegroups.com
Hi Luis

It wouldn't be too tricky to use Riddle to query Sphinx directly - AR models are only queried for the sake of string facets (and that's something I'm looking at working around when I implement facets in my rewrite of Thinking Sphinx).

Feel free to give it a shot, but if you get stuck let me know and I'll piece some code together to help.

Cheers

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.

> To view this discussion on the web visit https://groups.google.com/d/msg/thinking-sphinx/-/dyfIrSkUj60J.

Luis Lavena

unread,
Sep 8, 2012, 10:32:17 AM9/8/12
to thinkin...@googlegroups.com
On Mon, Aug 6, 2012 at 1:56 PM, Pat Allan <p...@freelancing-gods.com> wrote:
> Hi Luis
>
> It wouldn't be too tricky to use Riddle to query Sphinx directly - AR models are only queried for the sake of string facets (and that's something I'm looking at working around when I implement facets in my rewrite of Thinking Sphinx).
>
> Feel free to give it a shot, but if you get stuck let me know and I'll piece some code together to help.
>

Hello Pat,

I found that querying sphinx directly still give me a fast response,
but was Ruby processing of the response the main issue.

After isolating the query I wanted, I wrote some benchmarks and
started to profile Riddle responses and noticed 14K calls to
Riddle::Client::Response#next_int which took 26% (0.32 sec) of the
time of every call.

That combined with Riddle::Client#attribute_from_type who took another
0.58 secs, combined later with iteration over attributes and placing
those results into an array of hashes resulted in the poor performance
I was experiencing.

Decided to took a simple approach and rewrote Riddle::Client::Response
class entirely in C, no longer using String#pack/unpack to read bytes
from the string but instead looked at libsphinxclient for unpacking
these values.

After doing that I went from 0.8 secs per query to 0.3 secs, still
having Riddle::Client#attribute_from_type and Array#each, Hash#[] and
Hash#[]= taking a lot of time parsing such responses (inside
Riddle::Client#run command)

I'm still testing this out (not on production yet) but once we have it
will look for approval to open-source it.

Regards,

Pat Allan

unread,
Sep 14, 2012, 6:57:03 PM9/14/12
to thinkin...@googlegroups.com
Hi Luis

Wow, sounds like you've been working hard! Would love to see a patch for the C bindings... but of course, only when you have the time.

Cheers

--
Pat

> --
> You received this message because you are subscribed to the Google Groups "Thinking Sphinx" group.

Reply all
Reply to author
Forward
0 new messages