I'm new to Django, but I've been writing web applications for years in
Perl and PHP. A popular feature these days, are of course tagging
things.
Tagging things is easy. You have a relational table, containing your
entry ID and a tag associated with it. This could easily be done with a
ManyToManyField() in Django. There is one problem, though.
I have a search functionality on my page, which ideally allows people
to write a list of tags, and retrieve the entries that has those
associated. Maybe I'm overlooking the obvious here--I hope I am--but
this has become increasingly difficult to implement. It's not as simple
as writing WHERE tag = 'foo' OR tag = 'bar'. That assumes that both
values are met in a single row/join.
I'd like some feedback on how this could be implemented, preferably in
native Django models. It's been the source of my frustration for days
now, and I sincerely hope this is not as heavy/daunting as it seems.
Thanks in advance.
PS: I heard there were some discussions on this list with regards to
the subject, but I've been unable to dig these up.
--
Jesper Noehr <jno...@gmail.com>
Developer, Opera Software ASA
>
> Hi list,
>
> I have a search functionality on my page, which ideally allows people
> to write a list of tags, and retrieve the entries that has those
> associated. Maybe I'm overlooking the obvious here--I hope I am--but
> this has become increasingly difficult to implement. It's not as
> simple
> as writing WHERE tag = 'foo' OR tag = 'bar'. That assumes that both
> values are met in a single row/join.
>
Hi Jnoehr.
I've done something similar in zyons.com -- http://svn.zilbo.com/svn/
django/magic-removal/common/tag/models.py
it implements what you are trying to do .. I call it a tag union/
intersection.
it's a work in progress.. but it will give you a tip.
Thanks for your quick reply. You have indeed implemented what I'm
looking for, unfortunately, it is not as ellegant as I hoped for (no
offense!).
I'm perhaps mistaken, but it seems like you're brute-checking per-entry
if all the tags exists. What I'd love to do is simply apply a .filter()
that will pull these entries out. It must be possible, alas with a
database structure allowing so.
The issue with building a dynamic filter (dict adressed as **), is that
you cannot define several identical keys (by design, of course). This
means that lists of values per key will be interpreted as 'IN
(foo,bar)', which, as the concept suggests, does not limit the result
to both, but either.
The endeavour is far from exotic, I'd be as bold as to suggest. Your
code is far too big for me to comprehend by skimming it, and of course
it also implements a whole lot of extra things. Including syntax I've
never seen before.
As I said, though, I really appreciate you digging this up for me. Can
anyone spare a minute explaining how this could/would be done in a
simple scale?
I think I'm understanding what you are saying here, but maybe not. So
let me make my assumptions clear up front...
Your problem with the above WHERE clause is just that the tags might be
in a different table, right? As they would be if implemented with
Django's ManyToManyField?
Django handles the attribute access between models like this reasonably
transparently. To wit:
class Tag(models.Model):
name = models.CharField(maxlength = 50)
class Entry(models.Model):
title = models.CharField(maxlength = 50)
tags = models.ManyToManyField(Tag)
# Get all entries corresponding to tags in the list tag_list
entries = Entry.objects.filter(tags__name__in = tag_list)
As per normal SQL, if one or more of the entries in tag_list have no
corresponding entry, then that is no problem.
I suspect I am missing something here, since what I have shown here is
more or less the same as the database API documentatiokn explains
(http://www.djangoproject.com/documentation/db_api/#many-to-many-relationships ). So why doesn't this do what you want?
Regards,
Malcolm
I was helped by Malcom on IRC, so the only decent thing to do, is to
post the solution here.
I'll try to explain as well as I understand it.
First some code:
class Tags(models.Model):
tag = models.CharField(maxlength = 50)
def __str__(self):
return self.tag
class TagManager(models.Manager):
def from_tags(self, tag_list):
id_list = [int(t.id) for t in Tags.objects.filter(tag__in = tag_list)]
all = Contract.objects.extra(tables = ['contracts_tags'], where =
[('contracts_tags.contract_id = contracts.id'), ('(SELECT COUNT(*)
FROM contracts_tags AS t WHERE t.tags_id IN (%s) AND t.contract_id =
contracts.id) = %d' % (','.join([str(tag) for tag in id_list]),
len(id_list)))]).distinct()
return all
class Contract(models.Model):
# Basic things
file = models.FileField()
name = models.CharField(maxlength = 200)
update = models.DateTimeField('date uploaded')
description = models.TextField()
objects = TagManager()
# These are used for indexing
excerpt = models.TextField()
tags = models.ManyToManyField(Tags)
You can call it like this:
tags = ["foo","bar"]
all = Contract.objects.from_tags(tags)
What happens is that you count how many tags match on each contract
ID. Since you only pick out the relevant tags, the count will equal
len(tags), if there's a match. This allows you to search for records
where there's only specific relations, when data is divided across
tables, in this case with a ManyToManyField().
It works fairly well and fast on my somewhat limited set of set, but I
think this would scale fine on a larger scale as well.
Thanks malcom!
--
Jesper Nøhr, Information Systems Developer, Opera Software
tel: +47-24164348 / cell: +47-46056753 / jes...@opera.com
[...]
> You can call it like this:
>
> tags = ["foo","bar"]
>
> all = Contract.objects.from_tags(tags)
>
> What happens is that you count how many tags match on each contract
> ID. Since you only pick out the relevant tags, the count will equal
> len(tags), if there's a match. This allows you to search for records
> where there's only specific relations, when data is divided across
> tables, in this case with a ManyToManyField().
>
> It works fairly well and fast on my somewhat limited set of set, but I
> think this would scale fine on a larger scale as well.
For the archives: the problem Jesper was trying to solve was to find all
contracts that have both "foo" _and_ "bar" tags (and possibly others,
but at least that set), not just contracts that had either "foo" or
"bar".
Malcolm