How to get SQL "NOT IN" query from filter()

Michal Ludvig

unread,

Dec 16, 2007, 1:16:10 PM12/16/07

to django...@googlegroups.com

Hi all,

I have a simple model with 'id' as a primary key and I have a list of
IDs that were already processed in variable 'processed_ids'.

How can I retrieve all objects that are *not* in that list?

Something like "Model.objects.filter(id__not_in = processed_ids)".
Unlike filter(id__in=...) which works just fine the __not_in modifier
apparently isn't understood by django.

I hope I won't have to fetch everything and then filter manually in a loop.

Any ideas?

Thanks!

Michal

Tim Chase

unread,

Dec 16, 2007, 1:18:58 PM12/16/07

to django...@googlegroups.com

> How can I retrieve all objects that are *not* in that list?
>
> Something like "Model.objects.filter(id__not_in = processed_ids)".
> Unlike filter(id__in=...) which works just fine the __not_in modifier
> apparently isn't understood by django.

As commented recently on the list, you want "filter()"'s evil
twin "exclude()":

exclude(id_in=...)

http://www.djangoproject.com/documentation/db-api/#exclude-kwargs

-tim

Michal Ludvig

unread,

Dec 16, 2007, 1:34:26 PM12/16/07

to django...@googlegroups.com

Tim Chase wrote:
>> How can I retrieve all objects that are *not* in that list?
>>
>> Something like "Model.objects.filter(id__not_in = processed_ids)".
>> Unlike filter(id__in=...) which works just fine the __not_in modifier
>> apparently isn't understood by django.
>
>
> As commented recently on the list, you want "filter()"'s evil
> twin "exclude()":
>
> exclude(id_in=...)

That's it. Thanks.

Now with the above example if I don't have a list of IDs but a list of
the processed objects I'm not able to pass these as an argument to the
__in operator.

This: Model.objects.filter(...).exclude(id__in = processed)
gives me: Truncated incorrect DOUBLE value: 'Model object'

I have to use id__in=[p.id for p in processed] instead.

Am I doing something wrong again or is it expected behaviour?

Thanks

Michal

Alex Koshelev

unread,

Dec 16, 2007, 2:36:29 PM12/16/07

to Django users

Your objects aren't ids, so you must get ids explicitly.

Tim Chase

unread,

Dec 17, 2007, 1:56:03 PM12/17/07

to django...@googlegroups.com

>> This: Model.objects.filter(...).exclude(id__in = processed)
>> gives me: Truncated incorrect DOUBLE value: 'Model object'

I've seen the above request on the ML several times now.

Is there any hope that, in the QuerySet refactor, the "__in" form
can sniff the datatype of the parameter for other querysets?

Thus, you could have something like

users = Users.objects.filter(...)
things = Foo.objects.filter(owner_id__in=users)

it seems that this would just translate to an IN/EXISTS query,
using the query definition from "users" to build the sub-query.
It's far more efficient to do that all on the server and just
return the correct results, rather than

things = Foo.objects.filter(owner_id__in=[
user.id for user in users])

which would force the entire set of users to be brought back and
translated into a list, and then ship off a potentially-long list
of IDs back in a 2nd query. (Imagine len(users) > REALLY_BIG_NUM)

The pseudo-code as I'd imagine it would be something like

if query_type = "in":
if issubclass(paramenter, QuerySet):
where_clause = 'IN (SELECT ID FROM %s WHERE %s)' % (
param.from_clause(), param.where_clause()
)
else:
where_clause = do_whatever_it_currently_does()

I haven't explored the QS refactor branch, but am looking forward
to its arrival so I can get started on tinkering with the
aggregate code.

-tkc

Malcolm Tredinnick

unread,

Dec 19, 2007, 2:01:03 AM12/19/07

to django...@googlegroups.com

On Mon, 2007-12-17 at 07:56 -0600, Tim Chase wrote:
> >> This: Model.objects.filter(...).exclude(id__in = processed)
> >> gives me: Truncated incorrect DOUBLE value: 'Model object'
>
> I've seen the above request on the ML several times now.
>
> Is there any hope that, in the QuerySet refactor, the "__in" form

No, since I'm not adding every single enhancement possible in that
branch. You're talking about a change that requires nested sub-queries,
which will be easier/possible with that code, but not something I'm
going to implement immediately.

It's also not a showstopper user-case in practice. Even if you do pull
back 10,0000 rows, extract their id values (which you do with
a .values() query the first time) and sent a second query, it isn't that
bad. Databases are fast here in the 21st century. Sometimes it will be a
problem, but for the majority of cases, not so much. Nice behaviour to
have one day, but less important than a number of other issues.