MVA parsing (and facets)

286 views
Skip to first unread message

pascal

unread,
Aug 18, 2009, 7:51:54 AM8/18/09
to Thinking Sphinx
Hi

Some questions on MVA, hope there is someone around who can help me...

How does Sphinx/TS parse MVA attributes? i could not find a
description of what is acceptable.
I found something that said it has to be CSV (comma separated values)
while it looks like space and dot are also ok?

I've got some troubles getting MVA and facets working. Is it
supported?
rake ts:in generates
WARNING: attribute 'path_ids_facet' not found - IGNORING
and the path_id attribute is not indexed.

the attributes appear in the ThinkingSphinx.facets hash but 'path_ids'
is always empty.


Thanks for your help and time.

Pascal


Christian Aust

unread,
Aug 18, 2009, 9:04:35 AM8/18/09
to Thinking Sphinx
Hi Pascal,

an attribute is multi-valued if Sphinx sees more than one value for
it. (Well, that's not too surprising.) Since Thinking Sphinx generates
a configuration for the Sphinx indexer that tells it how to get data
from the SQL database, MVA are limited to everything you can do in
SQL. In particular, you can't have a method of your model that returns
a Ruby array of some sort and directly convert this into a MVA,
because the indexer doesn't speak to your model but only to the
database.

OK, so MVA are related to ActiveRecord associations. Like, class
Book :has_many :chapters or else. In the index definitions of that
class "Book" you could write:

indexes chapters.title, :as=>'chapter_title', :facet=>true

Depending on whether your activate facets or not (using :facet=>true)
Sphinx will either see all values of chapter.title concatenated by
spaces, or (if it's a facet) as a list of CRC32 checksums of all
chapter titles, separated by commas. Only the latter is a MVA from
Sphinx' perspective, the first is just one long string.

There's currently an open issue with facets of MVA that leads to
unexpected results, but it is about to be resolved. Regards,

Christian

pascal

unread,
Aug 19, 2009, 8:26:09 AM8/19/09
to Thinking Sphinx
Christian thanks for your explanations.

I am playing around with the "acts as tree with dotted ids" from
xavier (http://github.com/xavier/acts-as-tree-with-dotted-ids/tree/
master) which basically stores the "path" from a tree node in dotted
form, i.e. "3.12.34.65" indicates that the parent of node with ID 65
is the node with ID 34 and so on.

I indexed this dotted id as an attribute:

has geographic_feature.dotted_ids, :as => 'path', :type => :multi

and this works for filtering (though i'm not sure it should work;
since i thought sphinx only supports integers for attributes?). This
is nice since i do not need an aditional query or queries (as would be
required with acts as tree or nested set).

But unfortuanetly it does not work for facets. The facet hash for this
attribute is always empty.

The sentence
"....in the second one the field will be parsed for integer values..."
from http://www.sphinxsearch.com/docs/current.html#mva made me think
that sphinx is able to extract integers from a column.

Since i don't see how i can find the parents of a node in pure SQL
(MySQL does not support splitting of strings) i might be moving back
to nested set (where this should be a simple query with left/right in
the condition).

As said, i'm only playing around to see how the dotted ids version
plays with sphinx... of course i'd still be happy if someone could
give me a hint how i can create a facetted MVA for a dotted id string.


BTW: I really appreciate that the people on this list are helpful and
have a positive attitude. Thank you very much!

On Aug 18, 3:04 pm, Christian Aust <datenimpera...@googlemail.com>
wrote:

Christian Aust

unread,
Aug 19, 2009, 8:44:40 AM8/19/09
to Thinking Sphinx
Just a quick thought;

has 'REPLACE(geographic_feature.dotted_ids, '.',
',')', :as=>'path', :type=>:multi, :facet=>true

or something along that line should make the attribute :path usable
for a facet query, since Sphinx sees multiple integers separated by
comma. Regards,

Christian

On 19 Aug., 14:26, pascal <pascal.b...@gmail.com> wrote:
> Christian thanks for your explanations.
>
> I am playing around with the "acts as tree with dotted ids" from
> xavier (http://github.com/xavier/acts-as-tree-with-dotted-ids/tree/
> master) which basically stores the "path" from a tree node in dotted
> form, i.e.  "3.12.34.65" indicates that the parent of node with ID 65
> is the node with ID 34 and so on.
>
> I indexed this dotted id as an attribute:
>
> has geographic_feature.dotted_ids, :as => 'path', :type => :multi
>
> and this works for filtering (though i'm not sure it should work;
> since i thought sphinx only supports integers for attributes?). This
> is nice since i do not need an aditional query or queries (as would be
> required with acts as tree or nested set).
>
> But unfortuanetly it does not work for facets. The facet hash for this
> attribute is always empty.
>
> The sentence
> "....in the second one the field will be parsed for integer values..."
> fromhttp://www.sphinxsearch.com/docs/current.html#mvamade me think

pascal

unread,
Aug 20, 2009, 5:10:00 AM8/20/09
to Thinking Sphinx
@Christian: thanks again for the reply.

I tried the suggested version before and it does not seem to work:
1. ThinkingSphinx.facets does not show a "path" facet but a
"path_facet" facet in the result hash (also see script/console sample
below). Why the "path_facet" name?
2. The hash for this facet is always empty

My playground example is: Wine has_one GeographicFeature,
GeographicFeture is a tree (with the world as root and the deeper the
tree, the more specific the origin of the wine : world -> europe ->
italy -> tuscany)

So i created a query which returns wines.id, geographic_features.id
pairs like this:
+-----------+-----------+
| id | id |
+-----------+-----------+
| 51711515 | 140325849 |
| 51711515 | 566630862 |
| 51711515 | 574838058 |
| 51711515 | 930672604 |
| 887636893 | 27016372 |
| 887636893 | 140325849 |
| 887636893 | 574838058 |
| 887636893 | 930672604 |
+-----------+-----------+

(there are two wines, both from world -> europe -> italy but one is
from tuscany region while the other is from piedmont region)

and mapped it with

has SQL, :as => 'path', :source => :query, :facet => true

in wine.rb


rake ts:stop && rake ts:in && rake ts:start

script/console

Wine.facets
Querying Sphinx:
Wine Load (0.3ms) SELECT * FROM `wines` WHERE (`wines`.`id` IN
(887636893,51711515))
Querying Sphinx:
Querying Sphinx:
=> {:feature_name_facet=>{}, :path_facet=>{}, :name=>{"chianti"=>1,
"Asti Spumante"=>1}}

As you can see the path_name facet is empty.

I guess i'm doing a "facet noob" thing but can't see whats wrong. Any
idea?

As a second question: Why is there a SELECT * FROM wines? Why does
sphinx load the models when queried for facets?


Greetings from switzerland

Pascal








On Aug 19, 2:44 pm, Christian Aust <datenimpera...@googlemail.com>
wrote:
> Just a quick thought;
>
> has 'REPLACE(geographic_feature.dotted_ids, '.',
> ',')', :as=>'path', :type=>:multi, :facet=>true
>
> or something along that line should make the attribute :path usable
> for a facet query, since Sphinx sees multiple integers separated by
> comma. Regards,
>
> Christian
>
> On 19 Aug., 14:26, pascal <pascal.b...@gmail.com> wrote:
>
>
>
> > Christian thanks for your explanations.
>
> > I am playing around with the "acts as tree with dotted ids" from
> > xavier (http://github.com/xavier/acts-as-tree-with-dotted-ids/tree/
> > master) which basically stores the "path" from a tree node in dotted
> > form, i.e.  "3.12.34.65" indicates that the parent of node with ID 65
> > is the node with ID 34 and so on.
>
> > I indexed this dotted id as an attribute:
>
> > has geographic_feature.dotted_ids, :as => 'path', :type => :multi
>
> > and this works for filtering (though i'm not sure it should work;
> > since i thought sphinx only supports integers for attributes?). This
> > is nice since i do not need an aditional query or queries (as would be
> > required with acts as tree or nested set).
>
> > But unfortuanetly it does not work for facets. The facet hash for this
> > attribute is always empty.
>
> > The sentence
> > "....in the second one the field will be parsed for integer values..."
> > fromhttp://www.sphinxsearch.com/docs/current.html#mvamademe think

mix

unread,
Aug 25, 2009, 8:21:19 PM8/25/09
to Thinking Sphinx
Hi Pascal, i'm in a situation like yours.
I've a nested set for the category model, which has a structure like
yours:
world -> europe -> italy -> tuscany etc
I've saved all the ids of the categories in a column of each article
row (like "1, 5, 47, 102")
And then indexed them with
has all_categories, :type => :multi, :as => :all_categories, :facet
=> true

Actually i'm using the column instead of get them directly during the
indexing with a query like "lft <= #{category.lft} and rgt >=
{category.rgt}" because after 2 days i've found any solution on how do
that query, with an association in the category model like
has_many :all_categories, :class_name => 'Category', :finder_sql =>
'SELECT * FROM categories c WHERE c.lft <= #{lft} and c.rgt >= #{rgt}
ORDER BY c.lft ASC'
it doesn't work with sphinx, though it works using the console :
( (if you have any hint about this it would be greatly
appreciated ;) )
btw..
when i search with Article.facets :with => {:all_categories => 12345}
i get the result with all the data, like "1, 456, 678, 12345, 423545".
Actually i'm thinking to use that string to get the children
categories, with something like:
all_categories = Article.facets(:with => {:all_categories =>
12345}, :facets => :all_categories)[:all_categories]
index = all_categories.first[0].split(' , ').index('12345')+1
and then use the index on the all_categories, with
ids = {}
all_categories.each do |a|
ids[a[0].split(' , ')[index]] = a[1]
end

And finally get all the categories using the ids.keys ( Category.find
(:all, :conditions => {:id => ids.keys}) ) and in the view show the
name with near how many articles has that category and subcategories.

My big worry is that it would be quite cpu consuming all this
calculations done for each request. But for now i've found anything
else to solve this problem.
I hope that this could help you, and eventually find a better solution
together ;)
Reply all
Reply to author
Forward
0 new messages