single queryset from multiple tables

576 views
Skip to first unread message

veearrsix

unread,
Apr 10, 2009, 4:40:38 PM4/10/09
to Django users
This question seems to have been asked a few times, but never answered
fully.

I am looking to query more than one table, but return the results to a
template as one queryset ordered by date.

The reason i'm doing this is to create a single queryset including
information from a blog table, twitter table and delicious table to be
displayed as a sort of log of my online activity sorted by date.

the blog post model:

class Post(models.Model):
slug = models.SlugField(unique=True)
tags = TagField()
title = models.CharField(max_length=80)
pubdate = models.DateTimeField(default=datetime.datetime.now)
modifieddate = models.DateTimeField(auto_now=True)
body = models.TextField()
draft = models.BooleanField(default=True)
closed = models.BooleanField()

the delicious model (taken from syncr app)

class Bookmark(models.Model):
# description, href, tags, extended, dt
description = models.CharField(max_length=250, blank=True)
url = models.URLField(unique=True)
tags = TagField()
extended_info = models.TextField(blank=True)
post_hash = models.CharField(max_length=100)
saved_date = models.DateTimeField()

the twitter model (taken from syncr app)

class Tweet(models.Model):
pub_time = models.DateTimeField()
twitter_id = models.PositiveIntegerField()
text = models.TextField()
user = models.ForeignKey('TwitterUser')


I'd like to get these into one queryset, possibly including a field
that indicates what type of listing is being displayed. Obviously I
dont need all of the fields from each of the models, just enough to
display the title and a possible link.

Thanks in adance, hope someone can help me out, I don't know where to
start with this.

Alex Gaynor

unread,
Apr 10, 2009, 4:44:08 PM4/10/09
to django...@googlegroups.com
There is no way to get a single QuerySet with results from multiple models.  What you want to do is best approximated like this:

from itertools import chain

sorted(chain(Model1.obejcts.all(), Model2.objects.all()), key=lambda o: o.pub_date)


Alex
--
"I disapprove of what you say, but I will defend to the death your right to say it." --Voltaire
"The people's good is the highest law."--Cicero

Malcolm Tredinnick

unread,
Apr 10, 2009, 7:40:59 PM4/10/09
to django...@googlegroups.com
On Fri, 2009-04-10 at 16:44 -0400, Alex Gaynor wrote:
>
>
> On Fri, Apr 10, 2009 at 4:40 PM, veearrsix <stu...@googlemail.com>
> wrote:
>
> This question seems to have been asked a few times, but never
> answered
> fully.

Not quite true. It's been answered fully a lot of times: querysets are
iterators, so you have multiple sorted iterators and combining them into
a single sorted iterator is just a merge sort.

Creating a single queryset is almost always not the real problem people
are trying to solve. Which is fortunate, since it doesn't make sense on
a definitional level: a queryset is the result of filtering a set of
model results, not combining arbitrary SQL. Combining multiple querysets
into a single result that can be iterated through is the typical problem
trying to be solved.

[...]


>
> There is no way to get a single QuerySet with results from multiple
> models. What you want to do is best approximated like this:
>
> from itertools import chain
>
> sorted(chain(Model1.obejcts.all(), Model2.objects.all()), key=lambda
> o: o.pub_date)

That's a bit crufty, since it pulls everything into memory immediately.
Not cool with a few tens of thousands items in the result set (creating
all those Python objects is noticeable).

Merge or heap sort is your friend here. Sure, us old guys have the
advantage here, because that stuff was very standard back in the age of
dinosaurs when RAM was scarcer and hard- and tape-drives were slower, so
random access wasn't always an option.

Particularly with iterators, storing the (next head item, rest of
iterator) pair in a heap leads to a very neat and fast way to create a
combined iterator. Here's one professional-quality implementation that I
would recommend: http://code.activestate.com/recipes/491285/

Regards,
Malcolm

Alex Gaynor

unread,
Apr 10, 2009, 7:44:34 PM4/10/09
to django...@googlegroups.com

It has one unfortunate failing(which is the fault of the heapq library), it doesn't take a key/cmp function like sorted does.  It is otherwise completely excellent though.
 

Regards,
Malcolm



Malcolm Tredinnick

unread,
Apr 10, 2009, 7:52:31 PM4/10/09
to django...@googlegroups.com
On Fri, 2009-04-10 at 19:44 -0400, Alex Gaynor wrote:
>
>
> On Fri, Apr 10, 2009 at 7:40 PM, Malcolm Tredinnick
> <mal...@pointy-stick.com> wrote:
[...]

>
> Particularly with iterators, storing the (next head item, rest
> of
> iterator) pair in a heap leads to a very neat and fast way to
> create a
> combined iterator. Here's one professional-quality
> implementation that I
> would recommend: http://code.activestate.com/recipes/491285/
>
> It has one unfortunate failing(which is the fault of the heapq
> library), it doesn't take a key/cmp function like sorted does. It is
> otherwise completely excellent though.

It's Python and so is the heap module! Modifying things is easy. I've
used the same solution with custom comparison functions one a number of
occasions. Even writing a small heap implementaiton with a custom
comparison is only about two dozen lines. Come on, think a little bit
outside the box!

Regards,
Malcolm

Alex Gaynor

unread,
Apr 10, 2009, 7:55:40 PM4/10/09
to django...@googlegroups.com

I'm not!  A heap sort is a great solution to this problem(but you already knew that), it's just that the implementation in the python stdlib doesn't work perfectly for this usecase(indeed I filed a ticket about this upsteam several months ago, for reference it was rejected).  I did the same thing you suggested for a blog post on this topic a number of months ago, however it wasn't nearly as optimial as this(I believe you even commented to that effect on the post).
 

Regards,
Malcolm





veearrsix

unread,
Apr 14, 2009, 10:50:18 AM4/14/09
to Django users
Thanks for the help so far guys, I've used that recipe suggested by
Malcolm, where by I pass a number of querysets into the method. It
works as expected I guess, BUT, I would like to be able to sort by the
datetime field from each queryset, however the fieldnames for each of
the date time fields from each queryset are different. Is there a way
around this, should I actually be looking at a custom sql query?

On Apr 11, 12:55 am, Alex Gaynor <alex.gay...@gmail.com> wrote:
> On Fri, Apr 10, 2009 at 7:52 PM, Malcolm Tredinnick <

Alex Gaynor

unread,
Apr 14, 2009, 10:53:52 AM4/14/09
to django...@googlegroups.com
Well, if you use some sorting mechanism that takes a key function(like the one sorted takes) you could write one up that's something like:

DATE_FIELD_MAPPING = {
    Model1: 'date',
    Model2: 'pubdate',
}

def my_key_func(obj):
    return getattr(obj, DATE_FIELD_MAPPING[type(obj)])

And then sorted(chain(Model1.objects.all(), Model2.objects.all()), key=my_key_func)

Or something similar using the heap sort method malcolm and I discussed.

veearrsix

unread,
Apr 15, 2009, 8:49:08 AM4/15/09
to Django users
Thanks guys that was perfect

On Apr 14, 3:53 pm, Alex Gaynor <alex.gay...@gmail.com> wrote:

DG

unread,
Sep 15, 2017, 1:09:50 PM9/15/17
to Django users
After reading this thread and answers on SO (e.g. How to combine 2 or more querysets in a Django view?) I'm still not sure if this code is fully lazy with Django 1.11+ or if it is going to load everything into memory:

qs1 = Model_A.objects.filter(foofield='foo').order_by('created_at')
qs2
= Model_B.objects.filter(barfield='bar').order_by('created_at')

paginator
= Paginator(sorted(chain(qs1, qs2),
                             
key=lambda i:i.created_at
                           
),
                     
50)


If the querysets each referred to 500k results, would this blow up?

Michal Petrucha

unread,
Sep 18, 2017, 9:10:23 AM9/18/17
to django...@googlegroups.com
On Fri, Sep 15, 2017 at 07:26:27AM -0700, DG wrote:
> After reading this thread and answers on SO (e.g. How to combine 2 or more
> querysets in a Django view?
> <https://stackoverflow.com/questions/431628/how-to-combine-2-or-more-querysets-in-a-django-view?rq=1>)
> I'm still not sure if this code is fully lazy with Django 1.11+ or if it is
> going to load everything into memory:
>
> qs1 = Model_A.objects.filter(foofield='foo').order_by('created_at')
> qs2 = Model_B.objects.filter(barfield='bar').order_by('created_at')
>
> paginator = Paginator(sorted(chain(qs1, qs2),
> key=lambda i:i.created_at
> ),
> 50)
>
>
> If the querysets each referred to 500k results, would this blow up?

Yes.*

* Unless you have a very powerful application server that can hold a
copy of your entire databse in memory, and efficiently process it.

What would happen is that the sorted() builtin function consumes the
entire iterator you pass in, which pulls all the matching results from
qs1, and then qs2, which will build model instances for all of them,
and sorts the combined list in memory. Then you'd be taking the result
of that sorting routine, and pass that to the paginator, which would
discard all but 50 items from the potentially huge list of model
instances.

I don't think there's a way to not do this with the sorting that is
built-in in Python, because it's always eager.

Cheers,

Michal
signature.asc

Matthew Pava

unread,
Sep 18, 2017, 9:27:28 AM9/18/17
to django...@googlegroups.com
I'm assuming that Model_A and Model_B have some similar columns if you are going to be showing them as an index view. It would seem to me that you need to reconsider your model design. Perhaps you can pull out what is common between them and create another model that they both inherit from or create a ForeignKey or OneToOne to, Model_C. Then you can use Model_C in your paginator design.
--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/20170918130933.GP8762%40koniiiik.org.
For more options, visit https://groups.google.com/d/optout.
Reply all
Reply to author
Forward
0 new messages