changing the on_delete=CASCADE default

Showing 1-12 of 12 messages
changing the on_delete=CASCADE default Carl Meyer 9/26/13 11:16 AM
Hi all,

I filed a ticket last week suggesting that we change the default from
on_delete=CASCADE to on_delete=SET_NULL for nullable ForeignKeys.[1]

There's been some lively discussion on the ticket, and Michael Manfre
has suggested that we should instead transition to making on_delete a
required argument with no default. This forces the developer to think
about what deletion behavior they want, and means that there will never
be cascading data loss of any kind unless explicitly requested.

I think either of these changes, but particularly the latter, is
significant enough that it deserves a mention here before a decision is
made.

Any thoughts or arguments that haven't already been presented on the ticket?

Carl


[1] https://code.djangoproject.com/ticket/21127
Re: changing the on_delete=CASCADE default Xof 9/26/13 2:17 PM

On Sep 26, 2013, at 11:16 AM, Carl Meyer <ca...@oddbird.net> wrote:
> I think either of these changes, but particularly the latter, is
> significant enough that it deserves a mention here before a decision is
> made.

It's a breaking change, so that's going to be a significant amount of upgrade work for existing applications.

I also think we *really* need to push execution of this functionality into the database rather than having the Django core do it, if we're going to be making more use of on_delete.

--
-- Christophe Pettus
   x...@thebuild.com

Re: changing the on_delete=CASCADE default Carl Meyer 9/26/13 2:32 PM
On 09/26/2013 03:17 PM, Christophe Pettus wrote:
> It's a breaking change, so that's going to be a significant amount of
> upgrade work for existing applications.

Yes, it would certainly require a deprecation path, so people would have
the course of several Django versions to update their code.

> I also think we *really* need to push execution of this functionality
> into the database rather than having the Django core do it, if we're
> going to be making more use of on_delete.

We already provide the on_delete=DO_NOTHING option for people who want
to push cascade handling to the database. Making on_delete a required
argument might encourage more people to consider this option, since
they'll have to actually consider the issue in the first place.

I don't think it's feasible to push all on_delete handling to the
database, for several reasons:

1) The on_delete API allows you to execute arbitrary Python code at
deletion time, via the SET() option, and to access the Python-level
field default (which can also be callable), via the SET_DEFAULT option.
These can't be pushed to the database, so we'd have to remove
functionality to do this.

2) Even if we wanted to just switch the CASCADE option to be executed in
the database by default, this would have significant
backwards-compatibility impacts in terms of signals not being fired for
related objects, related objects in memory not being updated, etc.
You're welcome to take a stab at this, but my feeling is that it's not
worth it; better for those people who actually need the efficiency of
db-level cascades to opt into those changes explicitly via
on_delete=DO_NOTHING.

Carl
Re: changing the on_delete=CASCADE default Anssi Kääriäinen 9/26/13 2:38 PM
On Friday, September 27, 2013 12:17:26 AM UTC+3, Xof wrote:
I also think we *really* need to push execution of this functionality into the database rather than having the Django core do it, if we're going to be making more use of on_delete.

Do you mean that cascading deletes should be handled by the DB, or just modifying the DB constraints to match on_delete (that is, use foreign key on delete cascade/set null/...)?

Doing cascades automatically in the DB will not work due to how signals are handled. However, adding a new option to on_delete, something like on_delete=DB_CASCADE could work. Just document that you wont get delete signals for the cascades in this case. But you get a lot faster deletion code, and automatically get constraints created in the DB with on delete cascade option. Maybe there could even be safety check for signals, Django could spot if you have delete signals registered for possible cascades.

 - Anssi
Re: changing the on_delete=CASCADE default charettes 9/26/13 3:02 PM
I think it would make a lot of sense to require an explicit `on_delete` when
a ForeignKey is nullable because of the ambiguity concerns raised on Trac.

However I'm not convinced this should be required for non-nullable ones
since, IMHO, CASCADE is the most sensible default in this case.
Re: changing the on_delete=CASCADE default Xof 9/26/13 3:28 PM

On Sep 26, 2013, at 2:32 PM, Carl Meyer <ca...@oddbird.net> wrote:
> We already provide the on_delete=DO_NOTHING option for people who want
> to push cascade handling to the database.

It's better than the previous situation, but the steps required to make this work make it a non-starter for any but the most trivial of projects.  I do, however, accept that we're painted into a corner with the current API.

I would strongly advocate for a way of doing this push, however: It's much more efficient for cascading without exotic additions such as signals.  The current way one has to do it has several problems:

1. You are, in essence, lying in your model about what is going to happen, by saying on_delete=DO_NOTHING and then doing something in the database itself.
2. Since Django creates the foreign key constraints and gives them unpredictable names, you have to write a very tedious, error-prone South migration to install the appropriate foreign key constraints, something that Django could very easily do.

Perhaps a CASCADE_DB and SET_NULL_DB options on on_delete?
Re: changing the on_delete=CASCADE default Xof 9/26/13 3:29 PM

On Sep 26, 2013, at 3:28 PM, Christophe Pettus <x...@thebuild.com> wrote:
> Perhaps a CASCADE_DB and SET_NULL_DB options on on_delete?

And, to be clear, I *am* volunteering to take a go at this code, not just whine. :)
Re: changing the on_delete=CASCADE default Carl Meyer 9/26/13 3:39 PM
Hi Christophe,
Agreed on all counts. Presuming that the documentation is clear about
the tradeoffs, I think these could be very nice additions to save some
of that tedium and make in-db cascade easier to achieve.

Carl
Re: changing the on_delete=CASCADE default Anssi Kääriäinen 9/27/13 2:56 PM
On Friday, September 27, 2013 1:29:29 AM UTC+3, Xof wrote:

On Sep 26, 2013, at 3:28 PM, Christophe Pettus <x...@thebuild.com> wrote:
> Perhaps a CASCADE_DB and SET_NULL_DB options on on_delete?

And, to be clear, I *am* volunteering to take a go at this code, not just whine. :)

Some things to consider:
  1. What to do if given DB doesn't support cascades in DB (sqlite at least, no idea of MySQL)? Initial feeling is that Django should do the cascades in Python code in these cases.
  2. What to do if you have delete signals + db cascades set for given model? Options are to do nothing at all, give a warning (manage.py check might be able to do so) or raise an error in model validation.
  3. A model definition like A -- db cascade -> B -- cascade in python -> C is another problematic case. a_obj.delete() will cascade to B, but then that deletion will fail because of C constraint not cascading. Again possibilities are do nothing/warn/error
  4. A slight variation of above - generic foreign key cascades - here it will be impossible to handle the cascades in DB (unless we want to write custom triggers for this). And, the inconsistent state left behind will not be spotted by the DB either as there aren't any constraints in the DB for generic foreign keys. So, this is slightly worse than #3.
  5. Parent cascades: If you have model Child(Parent), then there will be foreign key from child to parent, but not from parent to child. This means that DB can't cascade child model deletion to the parent model. So, there is again possibility for inconsistent state. So, if you have Child -- db cascade -> SomeModel, and you delete somemodel instance then what to do to get the Child's parent table data deleted?

Numbers #4 and #5 seem hardest. Especially the parent cascade case seems hard, for generic foreign keys just documenting "don't do that" seems good enough.


For reference, the problematic data model is this:

class Parent(Model):
    id = AutoField()

class SomeModel(Model):
    pass

class Child(Parent):
    parent_ptr = models.OneToOneField(Parent, on_delete=DB_CASCADE, parent_link=True)
    somemodel = models.ForeignKey(SomeModel, on_delete=DB_CASCADE)

It generates schema like this:

create table parent(
    id serial primary key
);
create table child(
    parent_ptr_id integer primary key references parent(id) on delete cascade,
    somemodel_id integer not null references somemodel(id) on delete cascade
);

create table somemodel(
    id serial primary key
);

Now, parent data deletion will correctly cascade to child in the DB, but child data deletion will not cascade to parent. Django's interpretation is that when you delete a child instance, you delete also all parent data at the same time. When somemodel delete cascades to child in the DB then associated parent data will not be deleted. Adding nullable foreign key from parent to child might work, that is add a SymmetricOneToOneField into Django. Data model would be:

create table parent(
    id serial primary key,
    child_id integer references child(parent_ptr_id) on delete cascade -- this of course also needs deferrable initially deferred qualifier to work...
);
create table child(
    parent_ptr_id integer primary key references parent(id) on delete cascade,
    ...
);

 - Anssi
Re: changing the on_delete=CASCADE default Xof 9/27/13 6:31 PM

On Sep 27, 2013, at 2:56 PM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:

>   1. What to do if given DB doesn't support cascades in DB (sqlite at least, no idea of MySQL)? Initial feeling is that Django should do the cascades in Python code in these cases.

It would behave like the standard version, then, yes.

>   2. What to do if you have delete signals + db cascades set for given model? Options are to do nothing at all, give a warning (manage.py check might be able to do so) or raise an error in model validation.

If we document that the _DB variation doesn't fire signals, I believe that's sufficient.

>   3. A model definition like A -- db cascade -> B -- cascade in python -> C is another problematic case. a_obj.delete() will cascade to B, but then that deletion will fail because of C constraint not cascading. Again possibilities are do nothing/warn/error

Interesting question.  I believe we can just document that it won't work properly, because in those DBs that support proper cascading behavior, what you get in the B -> C cascade will be an error.

>   4. A slight variation of above - generic foreign key cascades - here it will be impossible to handle the cascades in DB (unless we want to write custom triggers for this). And, the inconsistent state left behind will not be spotted by the DB either as there aren't any constraints in the DB for generic foreign keys. So, this is slightly worse than #3.

We can, of course, just disallow using the _DB variations for generic foreign keys.

>   5. Parent cascades: If you have model Child(Parent), then there will be foreign key from child to parent, but not from parent to child. This means that DB can't cascade child model deletion to the parent model. So, there is again possibility for inconsistent state. So, if you have Child -- db cascade -> SomeModel, and you delete somemodel instance then what to do to get the Child's parent table data deleted?

Either:

(a) You disallow that.
(b) You allow it, but warn that if you delete the child, the parent is not cleaned up.

I lean towards (a).

--

The _DB variations should be considered something like .update and .raw; they're for performance benefits where you know you are doing.  They don't need to solve every edge case.
Re: changing the on_delete=CASCADE default Anssi Kääriäinen 9/28/13 10:16 AM


On Saturday, September 28, 2013 4:31:18 AM UTC+3, Xof wrote:

On Sep 27, 2013, at 2:56 PM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:

>   1. What to do if given DB doesn't support cascades in DB (sqlite at least, no idea of MySQL)? Initial feeling is that Django should do the cascades in Python code in these cases.

It would behave like the standard version, then, yes.

>   2. What to do if you have delete signals + db cascades set for given model? Options are to do nothing at all, give a warning (manage.py check might be able to do so) or raise an error in model validation.

If we document that the _DB variation doesn't fire signals, I believe that's sufficient.

>   3. A model definition like A -- db cascade -> B -- cascade in python -> C is another problematic case. a_obj.delete() will cascade to B, but then that deletion will fail because of C constraint not cascading. Again possibilities are do nothing/warn/error

Interesting question.  I believe we can just document that it won't work properly, because in those DBs that support proper cascading behavior, what you get in the B -> C cascade will be an error.

>   4. A slight variation of above - generic foreign key cascades - here it will be impossible to handle the cascades in DB (unless we want to write custom triggers for this). And, the inconsistent state left behind will not be spotted by the DB either as there aren't any constraints in the DB for generic foreign keys. So, this is slightly worse than #3.

We can, of course, just disallow using the _DB variations for generic foreign keys.

>   5. Parent cascades: If you have model Child(Parent), then there will be foreign key from child to parent, but not from parent to child. This means that DB can't cascade child model deletion to the parent model. So, there is again possibility for inconsistent state. So, if you have Child -- db cascade -> SomeModel, and you delete somemodel instance then what to do to get the Child's parent table data deleted?

Either:

(a) You disallow that.
(b) You allow it, but warn that if you delete the child, the parent is not cleaned up.

I lean towards (a).

Yes, I think we need to disallow  #4 and #5. It will be too easy to miss these edge cases, as things will seem to work correctly.

The data model in #4 is this:

class SomeModel(models.Model):
    fk = models.ForeignKey(SomeOtherModel, on_delete=DB_CASCADE)
    gen_rel = GenericRelation(GFKModel)
 
This is quite an edge case, but it would be nice to detect & prevent this. I am not sure if GenericRelation actually respects to_delete currently at all.

For multitable inheritance it will be easiest to prevent db-cascades in all foreign keys, both from parent models and child models. That is likely overly restrictive, the only really problematic case seems to be db cascade foreign key in child models. But it will be possible to improve multitable cascades later on, so lets just get something working implemented first.

Probably time to move this into Trac... You can open a ticket there and assign it to yourself.

 - Anssi
Re: changing the on_delete=CASCADE default Xof 2/5/14 11:44 PM
After far too long, this ticket has been created:

        https://code.djangoproject.com/ticket/21961

If there's general consensus that this feature is worth working on, I'll see about a 1.7-targeted patch for it.
> --
> You received this message because you are subscribed to the Google Groups "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> For more options, visit https://groups.google.com/groups/opt_out.