Hi there,
I've run into some scaling problems in the way the indexer handles
long lists of multi-valued-attributes. Worst case scenario I have
items with over 25000 attributes attached. Indexing these through a
left-join with group_concat took a long time and caused quite some
load on the database.
Reading up on the sphinx-documentation I found that multi-valued-
attributes could also be indexed through a separate query that simply
retrieves all the <document, attribute>-pairs. A quick test showed
that this speeds up the indexing tremendously.
This feature isn't supported by thinking-sphinx so I took a stab at it
in my fork at
http://github.com/menno/thinking-sphinx/commits/mva
It's tested in production for my use case which is along the line of
Item.has_many :tags, :through => :taggings. For which it can "select
item_id, tag_id from taggings" to get all the pairs. There are specs
and code for other has-many-associations but they, and other cases,
haven't been thoroughly tested.
Another point of concern is that I needed access to the unique-id-
expression used in the select-query to match up the ids. I've moved
this logic to ThinkingSphinx.unique_id_expression(offset) but I still
needed to pass around the offset a lot more than I'd like.
So I hope this can be of use to anyone, and feel free to comment on
the implementation/tests as it's my first encounter with the internals
of thinking-sphinx, cucumber and rspec ;)
Cheers,
Menno van der Sman