multivalue tags facet

682 views
Skip to first unread message

LarryEitel

unread,
Jan 27, 2010, 3:09:28 PM1/27/10
to django-haystack
I am exploring a number of Django Solr apps such as Solango,
http://code.google.com/p/kochief/ and roll my own. Can anyone suggest
a best practice approach to indexing a variable number of tags
maintained per record? For example:

Id: 1
Name: Joe Smith
Tags: plumber, hiker, reader

Id: 2
Name: Mary Johnson
Tags: chef, reader, real estate

Facets: Tags

I want to see Facets on each multivalue tag:
chef(1)
hiker(1)
plumber(1)
reader(2) <------
real estate(1)

I am considering the Django tagging app and if/how to have it play
nice with an index.

Thank you 

Daniel Lindsley

unread,
Jan 29, 2010, 2:15:25 AM1/29/10
to django-...@googlegroups.com
Larry,


Haystack (and the Solr/Xapian backends) can indeed handle this. The
common practice using Haystack is to use the ``MultiValueField``. For
example:

====
from haystack.indexes import *
from haystack import site
from myapp.models import MyModel


class MyModelSearchIndex(SearchIndex):
text = CharField(document=True, use_template=True)
# Perhaps more fields here for filtering purposes...
# Then the tags.
# If you're faceting on them, you should mark them as ``indexed=False`` as
# this prevents them from being post-processed (such as stemming), leaving
# the original data intact for the facets.
tags = MultiValueField(indexed=False, stored=True)

# In the case of ``django-tagging``:
def prepare_tags(self, obj):
return [tag.name for tag in obj.tags]


site.register(MyModel, MyModelSearchIndex)
====

Once you've reindexed your data, you can then pull these facets by
using ``SearchQuerySet().facet('tags').facet_counts()``. You'd get
back something like:

====
{
'dates': {},
'fields': {
'tags': [
('reader', 2),
('plumber', 1),
('chef', 1),
# .. snipping more...
],
},
'queries': {}
}
====

There's some documentation on this at
http://haystacksearch.org/docs/faceting.html. Hope that helps.


Daniel

> --
> You received this message because you are subscribed to the Google Groups "django-haystack" group.
> To post to this group, send email to django-...@googlegroups.com.
> To unsubscribe from this group, send email to django-haysta...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/django-haystack?hl=en.
>
>

LarryEitel

unread,
Jan 29, 2010, 9:34:10 AM1/29/10
to django-haystack
Daniel,
Thank you very much for your timely reply. I will now proceed with
haystack/solr.
Larry

LarryEitel

unread,
Jan 29, 2010, 12:30:37 PM1/29/10
to django-haystack
Daniel,

I have made some Progress. One thing though. I changed the following:

def prepare_tags(self, obj):
if (obj.tags):
return obj.tags.split(',')

return None

I can see some records in the index using Luke. However, I cannot seem
to successfully run:

SearchQuerySet().facet('tags').facet_counts()
returns: {'dates': {}, 'fields': {u'tags': []}, 'queries': {}}

SearchQuerySet().facet('cats')[0].tags
returns: [u'one', u'two', u'three']


I have not installed django-tagging as of yet. Perhaps your example
assumes this.

Any suggestions are appreciated.

Daniel Lindsley

unread,
Jan 29, 2010, 6:58:19 PM1/29/10
to django-...@googlegroups.com
Larry,


I know we got this sorted out over IRC, but for posterity, I'm
reposting the solution here.

I was incorrect in suggesting the "indexed=False" bit. You do need
"indexed=True". You then need to tweak your Solr schema slightly and
set the data type on the "tags" field to "string". This prevents Solr
from post-processing/mangling the data, giving you proper facets.

I'm planing on further improving faceting for the 1.1 release and
have an issue out there for it.


Daniel

LarryEitel

unread,
Jan 30, 2010, 4:04:57 PM1/30/10
to django-haystack
Daniel, I SO appreciate catching you on IRC. I was at a point of
deciding which approach to take for integrating Solr into my project.
Your assistance with resolving this sealed my choice. I look forward
to watching this app grow and improve. I especially look forward to
your enhancements to facets support.
Thank you :)

LarryEitel

unread,
Mar 6, 2010, 8:36:05 PM3/6/10
to django-haystack
I am revisiting this again. Although I can run a direct query and see
facet counts, the seemingly same haystack query fails to return
expected results.


----- the following returns facet tag counts
http://localhost:8983/solr/select/?q=*%3A*&start=0&rows=1&indent=on&fq=nb_tags:user&facet=on&facet.field=nb_tags


----- the following haystack query does not
>>> from haystack.query import SearchQuerySet
>>> SearchQuerySet().facet('nb_tags').facet_counts()
{'fields': {u'nb_tags': []}, 'dates': {}, 'queries': {}}


----- here are the relative details: http://dpaste.com/168851/

Thank you for having a look.

Reply all
Reply to author
Forward
0 new messages