Faceting problem with Solr backend

39 views
Skip to first unread message

Jonathan S

unread,
Nov 4, 2009, 10:52:11 AM11/4/09
to django-haystack
Hi Daniel,

First of all, thanks for your work on Haystack.
I have some trouble with faceting, but couldn't find any newsgroup or
forum specific for haystack, so therefor I'm mailing you now.
I'm not sure if the problem lies in your framework or the Solr back-
end.


Most things work fine. The database consists of records, all having a
'group' field.

In my search_sites.py

class NewsIndex(BaseIndex):
text = indexes.CharField(document=True, use_template=True)
group = indexes.CharField(model_attr='group__url_name',
indexed=False)

When I do a search for objects matching 'creek' with a facet on
'group', I get the following output:

>>> SearchQuerySet().facet('group').filter
(content='creek').facet_counts()['fields']
{u'group': [(u'citi', 1), (u'cityl', 1), (u'live', 1), ... (u'aa', 0),
(u'aatest', 0), (u'age', 0), (u'mobil', 0), (u'mobilevik', 0), ...
(u'vike', 0), (u'vld', 0), (u'vriendengroep', 0), ... ]}


The name of the group fields has been split over several records.
(The counts are right by the way.) We have a group called "cityllive",
no group called "cityl" or "live". A group "mobillevikings", no
"mobil" or "mobilvik".
I browsed the source of the Solr backend, but couldn't find the cause.
Probably this has already been splitted up in the raw result from the
Solr server.

>>> SearchQuerySet().facet('group').filter(content='creek')[0].group
u'city-live'

Have you any idea what may solve this problem? Also, as you can see
above, the real group name contains a hyphen. I wonder why this
doesn't appear in the facet fields.


I really hope you can help us with this. You did a great job with the
Haystack framework, but trouble like this is a little frustrating.

Thanks!

Jonathan

Daniel Lindsley

unread,
Nov 4, 2009, 11:30:23 AM11/4/09
to django-...@googlegroups.com
Jonathan,


Good catch. This is indeed a problem with the Solr backend. What's
happening is that, while those fields aren't being indexed, they are
still being assigned the 'text' field type, which means the data gets
post-processed (lowercased, tokenized, stemmed, etc). This causes the
data you're seeing. I just committed a fix
(http://github.com/toastdriven/django-haystack/commit/689ec28a0822519843f1cf88b9dcc8ae7959b310)
which ought to solve the problem for you. You'll need to re-run
`./manage.py build_solr_schema` (putting that schema in place), and
running a `./manage.py reindex`.


Daniel

Jonathan S

unread,
Nov 5, 2009, 3:16:44 AM11/5/09
to django-haystack
Thanks Daniel!

It works now, but for getting the faceting to work i also had to
change after your patch `indexed` from "false" to "true".

So actually, after all, the only thing that I should have been changed
was type="string".

<field name="group" type="string" indexed="true" stored="true"
multiValued="false" />


Jonathan

Daniel Lindsley

unread,
Nov 5, 2009, 3:55:45 AM11/5/09
to django-...@googlegroups.com
Jonathan,


Just so you're aware, by indexing that and using the string, only
exact matches will get picked up in the search. Adding faceting as a
field kwarg is a planned improvement before 1.0. It's a pain point for
many people, myself included. Sorry.


Daniel

Jonathan S

unread,
Nov 5, 2009, 11:37:34 AM11/5/09
to django-haystack
That's no problem. In this case, all I needed were exact matches. :)
It's working great now.

Bogdan Licar

unread,
Nov 26, 2009, 4:01:45 PM11/26/09
to django-haystack
Hi.

I'm facing similar problems. I have a tags = indexes.MultiValueField()
and
<field name="tags" type="string" indexed="true" stored="true"
multiValued="true" />

If some tags are composed by multiple words (ie 'this is a tag'), the
faceting splits a tag name by spaces, so I get
{'this': 1, 'is': 1, 'a': 1, 'tag': 1}.

Moreover an odd thing happen:
if I have some tag names that end by another tag name, ie in addition
to the tag name above let's say I have a tag 'soma',
I'll get this facet count: {'this': 1, 'is': 1, 'a': 1, 'tag': 1,
'som': 1}

since there is another tag, for its understanding, 'a'.

So
1. It splits field values by space
2. Truncates the end of field values with other existing field names.

ps: tried the last svn version

On Nov 5, 9:55 am, Daniel Lindsley <polarc...@gmail.com> wrote:
> Jonathan,
>
>    Just so you're aware, by indexing that and using the string, only
> exact matches will get picked up in the search. Adding faceting as a
> field kwarg is a planned improvement before 1.0. It's a pain point for
> many people, myself included. Sorry.
>
> Daniel
>

Bogdan Licar

unread,
Nov 26, 2009, 6:35:49 PM11/26/09
to django-haystack
Nevermind, with the latest version, I have no more of these problems.
Reply all
Reply to author
Forward
0 new messages