Support for database level on delete cascade 21961

248 views
Skip to first unread message

Nick Stefan

unread,
Jul 24, 2017, 10:34:33 PM7/24/17
to Django developers (Contributions to Django itself)
Hi All,

I've taken interest to implimenting built in django support of ON DELETE CASCADE. As a general concept, the ticket was accepted 3 years ago, and there was some discussion around that time: https://code.djangoproject.com/ticket/21961 . I've put together a small PR with some but not all of the proposed design decisions, https://github.com/django/django/pull/8661. The original ticket proposed adding something like models.DB_CASCADE in addition to models.CASCADE, models.SET_NULL. That is the tact that I took in the PR.


At the time of the ticket there were some concerns:

1. Q: What do we do with models.DB_CASCADE when using a DB that doesn't support it? 
A: MySQL, PostgreSQL, SQLite3 and Oracle all now support this functionality.

2. Q: Won't it be confusing that signals will not fire? 
A: We can document that the application code mimics models.DO_NOTHING, no signals will fire, and that all the deletion will be handled by SQL.

3. Q: Won't it be confusing that models.DB_CASCADE won't trigger a foreign model's models.CASCADE?
A: Is it any different than having a models.DO_NOTHING table pointing to a models.CASCADE table? We can document that on the application side, models.DB_CASCADE works like models.DO_NOTHING.

4. Q: generic foreign keys?
A: We could forbid models.DB_CASCADE on these types of relations?

5. Q: implicit foreign keys between child to parent in model inheritance?
A: We could forbid models.DB_CASCADE on these types of relations?


Additionally there was a proposal by Carl Meyer to split out database based delete into a separate kwarg. To summarize: ForeignKey(on_delete=models.CASCADE) versus ForeignKey(on_delete_db=models.DB_CASCADE). This would help avoid having to implement a magical fallback mode when the DB didn't support models.DB_CASCADE. It also would help eliminate two very different code paths overloading the same kwarg.

My thoughts on the above splitting of the kwargs:
All of the django 2.0 databases can support ON DELETE CASCADE. So there's no longer any need for a magical fall back mode. There's also the added burden on explaining that we can only pass one of the two kwargs, and not use both at the same time. I think that favors overloading a single kwarg.

Simon Charrette made a suggestion to make the models.CASCADE, models.SET_NULL, models.DB_CASCADE, etc all inherit from an OnDelete class. This allows the old application callables to still be simple callables, new SQL based operations such as DB_CASCADE to access an as_sql method, and yet both be instances of the same class for migration serialization. 


Seeking feedback,
Nick

Anssi Kääriäinen

unread,
Jul 27, 2017, 4:19:19 AM7/27/17
to Django developers (Contributions to Django itself)
On Tuesday, July 25, 2017 at 5:34:33 AM UTC+3, Nick Stefan wrote:
5. Q: implicit foreign keys between child to parent in model inheritance?
A: We could forbid models.DB_CASCADE on these types of relations?

There's actually a variation of this that seems a bit hairy to solve properly. For models:

class Category:
    pass

class Child:
    pass

class Parent(Child):
    category = models.ForeignKey(Category, on_delete=DB_CASCADE)

Then, when you delete a category instance, the associated parent instance will get deleted, but the child instance won't (as there's no foreign key from child to parent, we can't make this work in the DB without using triggers).

Now, the problem here is that this actually leaves the database in an inconsistent state. I guess the solution might be to document this, add a warning to checks done on startup, and let users deal with the inconsistency if they so decide.

We could solve these problems by including an O2O from both parent to child and from child to parent, and making them both DB_CASCADE. That's not a good idea for other reasons (complexity of saving a new instance to the database for one). If we want to make these cases work seamlessly directly in the database, it's likely better to add in triggers. It wouldn't be horribly complex code to write. The hardest part would be making sure the triggers are kept in sync when doing migrations.

For the other parts your proposal does make a lot of sense to me. Don't fall in the trap of complexities in parent -> child cascade. Just add warnings and that's it.

 - Anssi

Nick Stefan

unread,
Sep 5, 2017, 1:12:44 PM9/5/17
to Django developers (Contributions to Django itself)
Thanks for the feedback Anssi.

The original ticket talked about making DB_CASCADE work where it can, and adding warnings to most other places. I've updated the PR with checks warnings for the following cases (including tests for said checks cases):

  • GenericForeignKey + DB_CASCADE (#4 in the Original Post)
  • concrete model inheritance + DB_CASCADE (e.g. the example Anssi gave above, and #5 in the OP)
  • model A DB_CASCADE --> model B CASCADE (wont get triggered) --> model C (#3 in the OP)
If anyone wants to take a look: https://github.com/django/django/pull/8661

Nick

Dylan Young

unread,
Dec 11, 2017, 1:56:31 PM12/11/17
to Django developers (Contributions to Django itself)
There's also the added burden on explaining that we can only pass one of the two kwargs, and not use both at the same time. I think that favors overloading a single kwarg.

I'm not clear *why* you wouldn't be able to use both at the same time?   In fact, this seems like the primary benefit of using two separate kwargs.  Can you explain?

I'm also curious why you wouldn't put the contraints directly on the DB table instead of injecting them into the SQL commands.

Can you explain the rationale behind these decisions/constraints?


Best,

Dylan

Nick Stefan

unread,
Dec 11, 2017, 7:28:53 PM12/11/17
to Django developers (Contributions to Django itself)

My understanding of why you wouldn't want to support using both kwargs at the same time is that it would negate any performance gains from using the DB's delete.

DB_CASCADE has the advantage of not making django run extra queries against every model relation to determine which other tables need cascading delete queries. For example, pretend we've got an app with one particular "shard key", if you will, that relates to about 20 different tables. When I delete that record from the root shard key table, django will run a query against every one of those 20 related tables looking for that key. DB_CASCADE means the DB could just do that for me after deleting one record in the root shard key table with a single query between the server and the database.

If we use both, then we have to first run all of those extra queries against the other tables. Its true that they'd be SELECTs versus DELETE, but still queries. Then we'd delete that one record and let the DB do its thing.

It's possible that someone would want that, I guess, to allow DB constraints and signals to work together, but it seems rather inefficient. I guess it is possible though.

> I'm also curious why you wouldn't put the contraints directly on the DB table instead of injecting them into the SQL commands.

In the PR, I actually do use the existing SQL commands that define table columns, but add a little extra constraints (ON DELETE CASCADE). So I believe it is running against the DB table directly? Did I understand you correctly?

Nick
Reply all
Reply to author
Forward
0 new messages