I'm new to Django, but I've been writing web applications for years in Perl and PHP. A popular feature these days, are of course tagging things.
Tagging things is easy. You have a relational table, containing your entry ID and a tag associated with it. This could easily be done with a ManyToManyField() in Django. There is one problem, though.
I have a search functionality on my page, which ideally allows people to write a list of tags, and retrieve the entries that has those associated. Maybe I'm overlooking the obvious here--I hope I am--but this has become increasingly difficult to implement. It's not as simple as writing WHERE tag = 'foo' OR tag = 'bar'. That assumes that both values are met in a single row/join.
I'd like some feedback on how this could be implemented, preferably in native Django models. It's been the source of my frustration for days now, and I sincerely hope this is not as heavy/daunting as it seems.
Thanks in advance.
PS: I heard there were some discussions on this list with regards to the subject, but I've been unable to dig these up.
-- Jesper Noehr <jno...@gmail.com> Developer, Opera Software ASA
On 18/05/2006, at 5:56 AM, jno...@gmail.com wrote:
> Hi list,
> I have a search functionality on my page, which ideally allows people > to write a list of tags, and retrieve the entries that has those > associated. Maybe I'm overlooking the obvious here--I hope I am--but > this has become increasingly difficult to implement. It's not as > simple > as writing WHERE tag = 'foo' OR tag = 'bar'. That assumes that both > values are met in a single row/join.
Hi Jnoehr. I've done something similar in zyons.com -- http://svn.zilbo.com/svn/ django/magic-removal/common/tag/models.py it implements what you are trying to do .. I call it a tag union/ intersection. it's a work in progress.. but it will give you a tip.
> I'd like some feedback on how this could be implemented, preferably in > native Django models. It's been the source of my frustration for days > now, and I sincerely hope this is not as heavy/daunting as it seems.
> Thanks in advance.
> PS: I heard there were some discussions on this list with regards to > the subject, but I've been unable to dig these up.
Thanks for your quick reply. You have indeed implemented what I'm looking for, unfortunately, it is not as ellegant as I hoped for (no offense!).
I'm perhaps mistaken, but it seems like you're brute-checking per-entry if all the tags exists. What I'd love to do is simply apply a .filter() that will pull these entries out. It must be possible, alas with a database structure allowing so.
The issue with building a dynamic filter (dict adressed as **), is that you cannot define several identical keys (by design, of course). This means that lists of values per key will be interpreted as 'IN (foo,bar)', which, as the concept suggests, does not limit the result to both, but either.
The endeavour is far from exotic, I'd be as bold as to suggest. Your code is far too big for me to comprehend by skimming it, and of course it also implements a whole lot of extra things. Including syntax I've never seen before.
As I said, though, I really appreciate you digging this up for me. Can anyone spare a minute explaining how this could/would be done in a simple scale?
On Wed, 2006-05-17 at 12:56 -0700, jno...@gmail.com wrote: > Hi list,
> I'm new to Django, but I've been writing web applications for years in > Perl and PHP. A popular feature these days, are of course tagging > things.
> Tagging things is easy. You have a relational table, containing your > entry ID and a tag associated with it. This could easily be done with a > ManyToManyField() in Django. There is one problem, though.
> I have a search functionality on my page, which ideally allows people > to write a list of tags, and retrieve the entries that has those > associated. Maybe I'm overlooking the obvious here--I hope I am--but > this has become increasingly difficult to implement. It's not as simple > as writing WHERE tag = 'foo' OR tag = 'bar'. That assumes that both > values are met in a single row/join.
I think I'm understanding what you are saying here, but maybe not. So let me make my assumptions clear up front...
Your problem with the above WHERE clause is just that the tags might be in a different table, right? As they would be if implemented with Django's ManyToManyField?
Django handles the attribute access between models like this reasonably transparently. To wit:
class Tag(models.Model): name = models.CharField(maxlength = 50)
class Entry(models.Model): title = models.CharField(maxlength = 50) tags = models.ManyToManyField(Tag)
# Get all entries corresponding to tags in the list tag_list entries = Entry.objects.filter(tags__name__in = tag_list)
As per normal SQL, if one or more of the entries in tag_list have no corresponding entry, then that is no problem.
On 5/18/06, Malcolm Tredinnick <malc...@pointy-stick.com> wrote: [...]
I was helped by Malcom on IRC, so the only decent thing to do, is to post the solution here.
I'll try to explain as well as I understand it. First some code:
class Tags(models.Model):
tag = models.CharField(maxlength = 50)
def __str__(self):
return self.tag
class TagManager(models.Manager):
def from_tags(self, tag_list):
id_list = [int(t.id) for t in Tags.objects.filter(tag__in = tag_list)]
all = Contract.objects.extra(tables = ['contracts_tags'], where = [('contracts_tags.contract_id = contracts.id'), ('(SELECT COUNT(*) FROM contracts_tags AS t WHERE t.tags_id IN (%s) AND t.contract_id = contracts.id) = %d' % (','.join([str(tag) for tag in id_list]), len(id_list)))]).distinct()
What happens is that you count how many tags match on each contract ID. Since you only pick out the relevant tags, the count will equal len(tags), if there's a match. This allows you to search for records where there's only specific relations, when data is divided across tables, in this case with a ManyToManyField().
It works fairly well and fast on my somewhat limited set of set, but I think this would scale fine on a larger scale as well.
Thanks malcom!
-- Jesper Nøhr, Information Systems Developer, Opera Software tel: +47-24164348 / cell: +47-46056753 / jesp...@opera.com
On Thu, 2006-05-18 at 12:48 +0200, Jesper Nøhr wrote:
[...]
> I was helped by Malcom on IRC, so the only decent thing to do, is to > post the solution here.
[...]
> You can call it like this:
> tags = ["foo","bar"]
> all = Contract.objects.from_tags(tags)
> What happens is that you count how many tags match on each contract > ID. Since you only pick out the relevant tags, the count will equal > len(tags), if there's a match. This allows you to search for records > where there's only specific relations, when data is divided across > tables, in this case with a ManyToManyField().
> It works fairly well and fast on my somewhat limited set of set, but I > think this would scale fine on a larger scale as well.
For the archives: the problem Jesper was trying to solve was to find all contracts that have both "foo" _and_ "bar" tags (and possibly others, but at least that set), not just contracts that had either "foo" or "bar".