ForeignKey with null=True

Roald de Vries

unread,

Dec 16, 2010, 4:56:39 AM12/16/10

to django-d...@googlegroups.com

Dear all,

I have a model with a foreign key with null=True:

class Many(Model):
ForeignKey(One, null=True)

class One(Model):
pass

Now if I do:

one = One()

... then:

one.many_set.all()

... returns all Many's with 'self.one_id == None'.

From an implementational (SQL) standpoint, this seems logical, but
from an OO standpoint, it surely doesn't. I would expect an empty
queryset as a result, or maybe even some exception. Is the current
behavior a (conscious) choice/feature, or a bug?

Cheers, Roald

Luke Plant

unread,

Dec 16, 2010, 9:47:45 AM12/16/10

to django-d...@googlegroups.com

On Thu, 2010-12-16 at 10:56 +0100, Roald de Vries wrote:

> Now if I do:
>
> one = One()
>
> ... then:
>
> one.many_set.all()
>
> ... returns all Many's with 'self.one_id == None'.
>
> From an implementational (SQL) standpoint, this seems logical, but
> from an OO standpoint, it surely doesn't. I would expect an empty
> queryset as a result, or maybe even some exception. Is the current
> behavior a (conscious) choice/feature, or a bug?

This is related to http://code.djangoproject.com/ticket/14615 - see
Gabriel's comments.

It is basically a mistake to attempt to do 'one.many_set.all()' when
'one' hasn't been saved. We should either leave it as it is (it is the
developers fault), or raise an exception.

For the latter, I think we should use instance._state.adding to reliably
detect whether a PK has been set or not. This could cause some
incompatibilities: in the unusual case of creating an object, setting a
primary key manually to some value that you know to be in the database,
not saving the object, and then using the object to do queries, you will
get exceptions where you didn't before. I think we can regard that
workflow as broken, because:

1) there are other places where using model instances in similar ways
will currently break - the places where we rely on
instance._state.adding to be correct.

2) there are more obvious ways to do queries.

3) the current behaviour is inconsistent with ManyToMany fields -
they already do the checks that ForeignKey lookups are not doing.

There is in fact one test in the test suite that breaks if we change
this - it does Poll().choice_set.all() and expects an empty set. The
reason it does this is to implicitly test null queries, but that sounds
like a broken test to me (testing implicit behaviour rather than
explicit).

I've made a patch and attached to the ticket. It required one fix to the
existing tests in the test suite (in the null_queries tests), the rest
are additions.

While I was there, I fixed _get_next_or_previous_by_FIELD to use
self._state.adding, rather than 'not self.pk', for the similar check
that it does. I also fixed OneToOne fields, and updated ManyToMany
lookups to check _state.adding rather than check the PK field.

If people agree that the existing behaviour is a bug and that workflows
that rely on it are broken, I'll go ahead and commit. Russell closed the
bug as WONTFIX, but I think based on a misunderstanding (I misunderstood
the original report in the same way).

Luke

--
"Outside of a dog, a book is a man's best friend... inside of a
dog, it's too dark to read."

Luke Plant || http://lukeplant.me.uk/

tonnzor

unread,

Dec 16, 2010, 1:10:38 PM12/16/10

to Django developers

My idea was to fix this in a very straight way - change ForeignKey
behavior (the piece of code that generates many_set method) - so it
doesn't take into account null.

Let me explain a bit more. If foreign key is null - then there's no
link between Many and One object. So it many_set should return empty
set disregarding DB state.

Using object state (saved/not saved) makes this thing much more
complicated and have unpredicted not covered areas. If user changed
object PK and didn't save that into DB - that's his decision, we
should not keep the state.

Another edge case is null PK (user used custom field as PK) - that is
simply covered by initial code, but not by object state.

On 16 дек, 17:47, Luke Plant <L.Plant...@cantab.net> wrote:
> On Thu, 2010-12-16 at 10:56 +0100, Roald de Vries wrote:
> > Now if I do:
>
> > one = One()
>
> > ... then:
>
> > one.many_set.all()
>
> > ... returns all Many's with 'self.one_id == None'.
>
> > From an implementational (SQL) standpoint, this seems logical, but
> > from an OO standpoint, it surely doesn't. I would expect an empty
> > queryset as a result, or maybe even some exception. Is the current
> > behavior a (conscious) choice/feature, or a bug?
>

> This is related tohttp://code.djangoproject.com/ticket/14615- see

Luke Plant

unread,

Dec 16, 2010, 2:14:57 PM12/16/10

to django-d...@googlegroups.com

On Thu, 2010-12-16 at 10:10 -0800, tonnzor wrote:
> My idea was to fix this in a very straight way - change ForeignKey
> behavior (the piece of code that generates many_set method) - so it
> doesn't take into account null.
>
> Let me explain a bit more. If foreign key is null - then there's no
> link between Many and One object. So it many_set should return empty
> set disregarding DB state.

This isn't true if the field pointed to (i.e. primary key by default)
allows NULL values - in that case a ForeignKey field with a NULL value
can and should return a non-empty set of values when the related objects
lookup is done.

> Using object state (saved/not saved) makes this thing much more
> complicated and have unpredicted not covered areas. If user changed
> object PK and didn't save that into DB - that's his decision, we
> should not keep the state.

Your point about changing the PK is something I hadn't thought of. It is
much more broadly relevant, actually, since a ForeignKey can point to
*any* field, not just a primary key. (Although in practice I doubt that
happens much).

This means that any time you use a ForeignRelatedObjectsDescriptor on an
object, and the relevant fields that might be pointed to by other data
in the database have not been saved to the database, you can get
incorrect results. So, the 'adding' flag is not enough, since that only
covers the PK.

So, my patch covers probably 99% or more of cases, in that it refuses to
do a query using a PK value of an object that has not yet been added to
the DB. But it doesn't catch the other cases - other fields that have
changed, and the primary key being changed.

The general solution requires tracking all the fields to see whether
they've been saved to the DB or not, which I don't think is worth it for
this edge case

We can either say:

- with the provided patch we catch most of the errors, so let's go
with that.

- leave the code as it is, and just be clear in documentation that
retrieving related objects from the DB when the related information
isn't actually saved to the DB is obviously not going to work.

Christophe Pettus

unread,

Dec 16, 2010, 2:30:32 PM12/16/10

to django-d...@googlegroups.com

On Dec 16, 2010, at 11:14 AM, Luke Plant wrote:
> This isn't true if the field pointed to (i.e. primary key by default)
> allows NULL values - in that case a ForeignKey field with a NULL value
> can and should return a non-empty set of values when the related objects
> lookup is done.

If I'm understanding your point, this isn't true; NULL does not match NULL on a join.

--
-- Christophe Pettus
x...@thebuild.com

Ian Clelland

unread,

Dec 16, 2010, 2:34:27 PM12/16/10

to django-d...@googlegroups.com

I've been bitten by this behaviour before, with code that looked like

an_object.related_object_set.delete()

where the foreign key could be null.

On Thu, Dec 16, 2010 at 11:14 AM, Luke Plant <L.Pla...@cantab.net> wrote:
> On Thu, 2010-12-16 at 10:10 -0800, tonnzor wrote:
>> My idea was to fix this in a very straight way - change ForeignKey
>> behavior (the piece of code that generates many_set method) - so it
>> doesn't take into account null.
>>
>> Let me explain a bit more. If foreign key is null - then there's no
>> link between Many and One object. So it many_set should return empty
>> set disregarding DB state.
>
> This isn't true if the field pointed to (i.e. primary key by default)
> allows NULL values - in that case a ForeignKey field with a NULL value
> can and should return a non-empty set of values when the related objects
> lookup is done.

Why is this necessarily true? NULL is explicitly defined as the
absence of a meaningful value. NULL values are not supposed to compare
as equal to each other (this is why you need to test for "IS NULL" in
SQL, rather than "= NULL")

As well, in SQL, a foreign key join like

select * from A join B on (A.fk = B.pk)

will not join rows where A.fk and B.pk are both NULL. I would expect
to be able to safely perform a query like

delete from A where A.fk = NULL

knowing that no rows would be deleted, and I expected (before I
discovered this behaviour) the Django ORM to do the same.

--
Regards,
Ian Clelland
<clel...@gmail.com>

Gabriel Hurley

unread,

Dec 16, 2010, 4:00:29 PM12/16/10

to django-d...@googlegroups.com

I'm in favor of the patch you've provided, Luke. Raising an exception seems like the best option here to me.

While it *is* a logical error to get the related objects of an unsaved instance, the current behavior seems buggy and (as Ian Clelland points out) it can be potentially harmful.

FWIW, I'm almost certain Russ's original wontfix was based on the same understanding we all had upon first reading the ticket. It was only when I tested it myself that I saw the behavior the OP was really trying to describe.

All the best,

- Gabriel

Luke Plant

unread,

Dec 16, 2010, 5:31:04 PM12/16/10

to django-d...@googlegroups.com

With the SQL currently generated (which uses LEFT OUTER JOIN and an IS
NULL comparison), with appropriate models and values matching the case I
described, you can indeed get objects returned from the manager created
by the ForeignRelatedObjectsDescriptor when you have a foreign key value
to a nullable primary key.

But having said that I'm not sure if its a case we need to worry about
though, as it is pretty rare, and even the reverse descriptor
ReverseSingleRelatedObjectDescriptor does not handle this case - the
code explicitly assumes NULL in a FK value (None in the Python value)
means 'no related object'.

That being so, there is a case for arguing that
ForeignRelatedObjectsDescriptor should not retrieve objects where the
field pointed to is NULL - for consistency with the inverse operation.

So I'm now in two minds. And both minds are tired, so I'm going to bed.

Christophe Pettus

unread,

Dec 16, 2010, 7:20:09 PM12/16/10

to django-d...@googlegroups.com

On Dec 16, 2010, at 2:31 PM, Luke Plant wrote:
> That being so, there is a case for arguing that
> ForeignRelatedObjectsDescriptor should not retrieve objects where the
> field pointed to is NULL - for consistency with the inverse operation.

I agree with this. If the FK field is NULL, it should never return related objects; ditto for the reverse situation.

Carl Meyer

unread,

Dec 17, 2010, 9:21:25 AM12/17/10

to Django developers

Hi Luke,

On Dec 16, 9:47 am, Luke Plant <L.Plant...@cantab.net> wrote:
> For the latter, I think we should use instance._state.adding to reliably
> detect whether a PK has been set or not. This could cause some

[snip]

> While I was there, I fixed _get_next_or_previous_by_FIELD to use
> self._state.adding, rather than 'not self.pk', for the similar check
> that it does. I also fixed OneToOne fields, and updated ManyToMany
> lookups to check _state.adding rather than check the PK field.
>
> If people agree that the existing behaviour is a bug and that workflows
> that rely on it are broken, I'll go ahead and commit. Russell closed the
> bug as WONTFIX, but I think based on a misunderstanding (I misunderstood
> the original report in the same way).

This makes me a little bit uneasy. Before we haphazardly sneak
instance._state.adding into more and more places, I think we need to
have a broader discussion about the desired semantics of insert vs
update in the ORM.

Historically, Django has avoided tracking any add-vs-update state on
model instances. Insert-vs-update was (and is) decided at save-time,
by checking whether the PK exists in the DB. This works fine, although
it requires an extra SELECT query at every save.

Other code (in Django, or third-party) that needed to know whether an
instance was "new" would check for "self.pk is None" (including
ManyToManyField, as you note). This idiom is broken with CharField PKs
whose value is set explicitly on new instances before saving, but
apparently that's never been enough of a problem to motivate anyone to
fix it.

Instance._state.adding is new in 1.2 (in 1.2 it was instance._adding
and only ever monkeypatched onto a model instance by ModelForm; I
changed it to instance._state.adding and encapsulated it inside the
ORM just a few weeks ago in r14612). It was added as part of model-
validation, in order to allow uniqueness-validation of PKs to work
correctly in the very same edge cases (explicitly-set CharField PK)
that generally break the "pk is None" idiom. Currently it is used only
by uniqueness validation checks, nowhere else in the ORM.

It bothers me that we now have two parallel systems in the ORM for
deciding whether an instance is "new," and no real rhyme or reason to
when one is used vs the other. I don't want to exacerbate that
situation.

In working on r14612 I spent hours looking for a way to get rid of
instance._adding entirely, but concluded that it was not possible
without breaking uniqueness validation on explicit char PKs.

ISTM that we should either keep the scope of instance._state.adding as
limited as possible (i.e. avoid introducing new uses of it), if we're
satisfied with the status quo and the limitations of "pk is None," or
we should consider fully embracing instance._state.adding as the
endorsed way to determine whether a model instance is new or saved,
and banishing the "pk is None" idiom from Django's codebase. The
latter option could potentially include even changing the behavior of
Model.save() to use instance._state.adding rather than performing the
extra SELECT query, though the backwards-compatibility implications in
edge cases there might be prohibitive.

Instance._state.adding is set to True by default on new model
instances, and set to False anywhere the ORM generates a model
instance from a database query. Using instance._state.adding implies
that we do not support manually creating a model instance and setting
its PK to a value known to exist in the database, as a way to force an
update of that row (doing this already will break uniqueness
validation, unless you also manually tweak instance._state.adding to
False). On the upside, using instance._state.adding provides better
support for explicitly-set PKs than "pk is None," because it doesn't
overload the meaning of the PK field with new/not-new information.

On the whole, I think instance._state.adding has some real advantages
over "pk is None," and I'm even intrigued by the possibility of using
it to avoid the extra SELECT query at save-time. But if we're going to
make this change, let's be fully clear about the change we're making,
and make it consistently across the codebase.

Carl

Carl Meyer

unread,

Dec 17, 2010, 9:27:40 AM12/17/10

to Django developers

On Dec 17, 9:21 am, Carl Meyer <carl.j.me...@gmail.com> wrote:
> Instance._state.adding is new in 1.2 (in 1.2 it was instance._adding
> and only ever monkeypatched onto a model instance by ModelForm; I
> changed it to instance._state.adding and encapsulated it inside the
> ORM just a few weeks ago in r14612).

Oh - and r14613.

Carl

Tobias McNulty

unread,

Dec 17, 2010, 9:43:03 AM12/17/10

to django-d...@googlegroups.com

On Dec 16, 2010 7:20 PM, "Christophe Pettus" <x...@thebuild.com> wrote:
>
>
> On Dec 16, 2010, at 2:31 PM, Luke Plant wrote:
> > That being so, there is a case for arguing that
> > ForeignRelatedObjectsDescriptor should not retrieve objects where the
> > field pointed to is NULL - for consistency with the inverse operation.
>
> I agree with this. If the FK field is NULL, it should never return related objects; ditto for the reverse situation.