Finding entries with multiple tags

7 views
Skip to first unread message

joeto...@gmail.com

unread,
Aug 26, 2006, 7:02:08 PM8/26/06
to Django users
Hello -

I'm playing around with a blog application and I want it to have a
tagging feature. I have the usual ManyToMany relationship set up with
Post and Tag (I'm not going to past my model since I think everyone is
familiar with this kind of setup).

I have three Posts tagged as follows:

Post1: articles, python, django
Post2: articles, python
Post3: articles, music

Essentially, I would like to be able to go to a URL like:

http://blog/tag/articles/python/django

And get Post1 as a result.

Likewise, I'd like to go to:

http://blog/tag/articles/python

And get Post1 and Post3 as a result.

I have a URL entry set up as follows:

(r'^blog/tag/(.*)', 'ssrc.blog.views.tags'),

I have my tags view set up like so:

def tags(request, url=""):
if url:
posts = {}
for post in Post.objects.all():
for tag in [part for part in url.split("/") if part != ""]:
try:
post.tags.get(name__exact=tag)
posts[post.id] = post
except:
posts[post.id] = None
results = [post for post in posts.values() if post != None]
if len(results) == 0:
raise Http404
else:
return render_to_response("blog/tags.html", {
'posts': results,
})

This works. My question is just if there's a better way to accomplish
this. I realize that it'd be a lot cleaner if I just used hierarchical
categories. However, I'd like to keep this as flexible as possible by
using simple tags.

I apologize if this might have been covered elsewhere -- I did my best
to search the site and archives ahead of time, but I could have missed
something.

Thanks!
-Joe

Gary Wilson

unread,
Aug 28, 2006, 1:19:09 AM8/28/06
to Django users
joeto...@gmail.com wrote:
> Hello -
>
> I'm playing around with a blog application and I want it to have a
> tagging feature. I have the usual ManyToMany relationship set up with
> Post and Tag (I'm not going to past my model since I think everyone is
> familiar with this kind of setup).
>
> I have three Posts tagged as follows:
>
> Post1: articles, python, django
> Post2: articles, python
> Post3: articles, music
>
> Essentially, I would like to be able to go to a URL like:
>
> http://blog/tag/articles/python/django
>
> And get Post1 as a result.
>
> Likewise, I'd like to go to:
>
> http://blog/tag/articles/python
>
> And get Post1 and Post3 as a result.
>
> I have a URL entry set up as follows:
>
> (r'^blog/tag/(.*)', 'ssrc.blog.views.tags'),

Using .* in your URL regexes is not recommended, you only want to match
what you absolutely need to match. Something like the following would
be better:

(r'^tags/(?P<url>([a-zA-Z0-9-]+/)+)$', 'ssrc.blog.views.tags')

This would only allow one or more tags with each tag consisting of one
or more of the characters a-z, A-Z, 0-9, and/or hyphens. The (?P<url>
snip ) part creates a named group, so a parameter named url will be
passed to your view.

> I have my tags view set up like so:
>
> def tags(request, url=""):
> if url:
> posts = {}
> for post in Post.objects.all():
> for tag in [part for part in url.split("/") if part != ""]:
> try:
> post.tags.get(name__exact=tag)
> posts[post.id] = post
> except:
> posts[post.id] = None
> results = [post for post in posts.values() if post != None]
> if len(results) == 0:
> raise Http404
> else:
> return render_to_response("blog/tags.html", {
> 'posts': results,
> })
>
> This works. My question is just if there's a better way to accomplish
> this. I realize that it'd be a lot cleaner if I just used hierarchical
> categories. However, I'd like to keep this as flexible as possible by
> using simple tags.

This could be very much simplified now, and with the regex I mentioned
above, the view would turn into something like:

def tags(request, url):
# Don't need the last item in the list since it will
# always be an empty string since Django will append
# a slash character to the end of URLs by default.
tags = url.split('/')[:-1]
posts = Post.objects.filter(tags__name__in=tags)
return render_to_response("blog/tags.html", {'posts': posts})

joeto...@gmail.com

unread,
Aug 28, 2006, 8:21:35 AM8/28/06
to Django users
> Using .* in your URL regexes is not recommended, you only want to match
> what you absolutely need to match. Something like the following would
> be better:
>
> (r'^tags/(?P<url>([a-zA-Z0-9-]+/)+)$', 'ssrc.blog.views.tags')

OK, cool. Thanks!

> This could be very much simplified now, and with the regex I mentioned
> above, the view would turn into something like:
>
> def tags(request, url):
> # Don't need the last item in the list since it will
> # always be an empty string since Django will append
> # a slash character to the end of URLs by default.
> tags = url.split('/')[:-1]
> posts = Post.objects.filter(tags__name__in=tags)
> return render_to_response("blog/tags.html", {'posts': posts})

If I'm not mistaken, __in will return results that aren't tagged by all
tags. So using the original example:

Post1: articles, python, django
Post2: articles, python
Post3: articles, music

and tags has [articles, python, django], all 3 posts will be returned
since IN just OR's the values together, correct? That's why I came up
with that mess of a loop.

Gary Wilson

unread,
Aug 28, 2006, 11:59:20 PM8/28/06
to Django users
joeto...@gmail.com wrote:
> > def tags(request, url):
> > # Don't need the last item in the list since it will
> > # always be an empty string since Django will append
> > # a slash character to the end of URLs by default.
> > tags = url.split('/')[:-1]
> > posts = Post.objects.filter(tags__name__in=tags)
> > return render_to_response("blog/tags.html", {'posts': posts})
>
> If I'm not mistaken, __in will return results that aren't tagged by all
> tags. So using the original example:
>
> Post1: articles, python, django
> Post2: articles, python
> Post3: articles, music
>
> and tags has [articles, python, django], all 3 posts will be returned
> since IN just OR's the values together, correct? That's why I came up
> with that mess of a loop.

Yes, you are right. I was not thinking straight. Anyone know what the
best method for performing this in SQL would be? Select all posts for
each tag and use intersect?

DavidA

unread,
Aug 29, 2006, 9:24:42 AM8/29/06
to Django users

With ManyToMany relationships, you have to think of chasing the
relationship backwards. Instead of finding the posts with a given tag,
start with the tag and find the related posts:

>>> from danet.blog.models import Post, Tag
>>> t = Tag.objects.get(pk='programming')
>>> t.post_set.all()
[<Post: ASP.NET 2.0>, <Post: Code Highlighting>]
>>>

My models look like this:

class Tag:
...
class Post:
tags = models.ManyToManyField(Tag)
...

Thus Post.tags is the set of tags for a given post and the reverse
relationship is created automatically as Tag.post_set (you can override
this with the 'related_name' arg to something like 'posts' if you
like).

-Dave

joeto...@gmail.com

unread,
Aug 29, 2006, 10:05:40 AM8/29/06
to Django users
> > Yes, you are right. I was not thinking straight.

Not a problem. Help is always appreciated!

> > Anyone know what the
> > best method for performing this in SQL would be? Select all posts for
> > each tag and use intersect?
>
> With ManyToMany relationships, you have to think of chasing the
> relationship backwards. Instead of finding the posts with a given tag,
> start with the tag and find the related posts:
>
> >>> from danet.blog.models import Post, Tag
> >>> t = Tag.objects.get(pk='programming')
> >>> t.post_set.all()
> [<Post: ASP.NET 2.0>, <Post: Code Highlighting>]
> >>>

That works easily when you're just looking up one Tag. What I'm trying
to figure out is the best way to search for multiple tags and return
only the Posts common to all of those tags:

>>> from danet.blog.models import Post, Tag
>>> t = Tag.objects.get(pk='programming')

>>> t2 = Tag.objects.get(pk='ASP')


>>> t.post_set.all()
[<Post: ASP.NET 2.0>, <Post: Code Highlighting>]

>>> t2.post_set.all()
[<Post: ASP.NET 2.0>]

So I guess now the question is how to generate a list that contains
items only contained in each of the tag lists. I'm sure some simple
python can take care of that.

DavidA

unread,
Aug 29, 2006, 1:45:38 PM8/29/06
to Django users
joeto...@gmail.com wrote:
> That works easily when you're just looking up one Tag. What I'm trying
> to figure out is the best way to search for multiple tags and return
> only the Posts common to all of those tags:

Joe,

My bad. I misunderstood your question. I think the only way to do this
(in SQL) is with a subselect or with multiple joins to the same table,
neither of which are possible in the direct DB API. But you can use the
extra() method to get around some of the constraints.

To solve your specific problem, I started with the custom SQL:

select slug from blog_post
where id in (select post_id from blog_post_tags
where tag_id in ('programming', 'finance'))

This can be implemented in Django as:

>>> from danet.blog.models import Post, Tag

>>> Post.objects.all().extra(where=["blog_post.id in (select post_id from blog_post_tags where tag_id in ('programming', 'finance'))"])
[<Post: Probability and Expectation>, <Post: ASP.NET 2.0>, <Post: Code
Highlight
ing>]
>>>

So I've just copied the where-clause from the SQL (and explicitly
qualified the id column with 'blog_post.id').

Its debatable whether this is an improvement over direct SQL. You could
just as easily do

>>> from django.db import connection
>>> cursor = connection.cursor()
>>> cursor.execute("select post_id from blog_post_tags where tag_id in ('program
ming', 'finance')")
3L
>>> Post.objects.in_bulk([row[0] for row in cursor.fetchall()])
{16L: <Post: Probability and Expectation>, 1L: <Post: ASP.NET 2.0>, 6L:
<Post: C
ode Highlighting>}
>>>

In either case you've got about the same amount of custom SQL in your
Python code so its really a six-of-one situation.
-Dave

Malcolm Tredinnick

unread,
Aug 29, 2006, 2:38:51 PM8/29/06
to django...@googlegroups.com
On Tue, 2006-08-29 at 14:05 +0000, joeto...@gmail.com wrote:
> > > Yes, you are right. I was not thinking straight.
>
> Not a problem. Help is always appreciated!
>
> > > Anyone know what the
> > > best method for performing this in SQL would be? Select all posts for
> > > each tag and use intersect?
> >
> > With ManyToMany relationships, you have to think of chasing the
> > relationship backwards. Instead of finding the posts with a given tag,
> > start with the tag and find the related posts:
> >
> > >>> from danet.blog.models import Post, Tag
> > >>> t = Tag.objects.get(pk='programming')
> > >>> t.post_set.all()
> > [<Post: ASP.NET 2.0>, <Post: Code Highlighting>]
> > >>>
>
> That works easily when you're just looking up one Tag. What I'm trying
> to figure out is the best way to search for multiple tags and return
> only the Posts common to all of those tags:

I am really close to finishing the rewrite work necessary to make this
easy. It's a bug that it doesn't work already. You should be able to
filter using

Post.objects.filter(tag = 'django').filter(tag = 'python')

and have it return Post instances that have both 'django' and
'python' (and possibly other tags) associated with them. It sounds like
this is what you are after. Right now, like I said, it's a bug that this
doesn't already work (since we say that concatenating filters should act
like "and"-ing them together). I'm going to get back to doing some
Django core dev work this week and this is top of my list.

In the interim, you might like to try this solution, which also works,
but is a little fiddlier:
http://www.pointy-stick.com/blog/2006/06/14/custom-sql-django/

(There have been other solutions to the same problem posted on this
list, too).

Cheers,
Malcolm


Chris Kelly

unread,
Aug 29, 2006, 3:22:22 PM8/29/06
to django...@googlegroups.com
agreed with Malcom on this one:

you should be splitting the url string at the top (removing the last blank entry) to get the array of tags.

from there you can just iterate over the tags array until you hit the end in a for loop.

then in that for loop, just call a filter on the queryset, using the current tag as the param in the filter call.

(note, I am not near a django install at the moment, so this is all untested, ymmv)

tags = url.split("/")[:-1]

posts = Post.objects.all()

for tag in tags:
posts.filter(posttag = tag)


return render_to_response("blog/tags.html", {'posts': posts})

if the filter call isn't working correctly, that's another story :)

good luck!

-C

joeto...@gmail.com

unread,
Aug 29, 2006, 3:52:05 PM8/29/06
to Django users
> I am really close to finishing the rewrite work necessary to make this
> easy. It's a bug that it doesn't work already. You should be able to
> filter using
>
> Post.objects.filter(tag = 'django').filter(tag = 'python')

Yes! This is what I tried before but noticed it did not work (the
second filter would always return nothing). It's good to know this is a
bug but my thinking was still correct. Thanks for clarifying this

> (There have been other solutions to the same problem posted on this
> list, too).

I apologize for not noticing the other solutions. I searched the
archive with keywords that I thought would turn something up, but
couldn't find anything.

And everyone else, thank you for your input, too. My solution for the
now will be to keep the current Python code (as mentioned by Dave, it's
a tradeoff between the Python and SQL loop) until filter is fixed.

I appreciate everyone's input!

robbie

unread,
Sep 5, 2006, 9:27:52 PM9/5/06
to Django users
Malcolm Tredinnick wrote:
> I am really close to finishing the rewrite work necessary to make this
> easy. It's a bug that it doesn't work already.

Is there a Django ticket for this bug? I think it's also in the way for
something I'm trying to do, so it would be good to track its
progress...

(I promise I looked first, but I didn't find anything that looked
pertinent)

robbie,

Malcolm Tredinnick

unread,
Sep 5, 2006, 9:52:35 PM9/5/06
to django...@googlegroups.com

Not explicitly, no. There are about half a dozen tickets that the
rewrite impacts. I have a list of them on another machine so that we can
close them when everything's done.

The progress is really blocking on me not sucking quite so much and
finding more time. When I'm finished and the changes have been reviewed,
we'll make an announcement here for sure, since it affects a bunch of
stuff (in a positive way).

Cheers,
Malcolm

Reply all
Reply to author
Forward
0 new messages