SQL Referential Integrity (WAS: Bulk delete? on django-users)

176 views
Skip to first unread message

Russell Keith-Magee

unread,
Jan 19, 2006, 9:10:48 AM1/19/06
to django-d...@googlegroups.com
Hi all,

I've been looking at the code for normal object deletion in an attempt
to get the same behaviour for bulk delete. It seems like there is a
lot of logic dedicated to maintaining referential integrity that the
database could be doing (and would probably do more efficiently).

Is there any particular reason that Django has its own implementation
of referential integrity rather than using the referential integrity
capabilities of SQL?

It seems like most of the internal reference walking logic in the
object.delete() call would be eliminated if the foreign key fields
were declared ON DELETE CASCADE or ON DELETE SET NULL (as
appropriate).

Similarly, if an ON UPDATE CASCADE/SET NULL was added to the field
definition, it would eliminate the logic required to keep keys in sync
during updates.

So - is there a reason we don't use SQL referential integrity?

Thanks,
Russ Magee %-)

Julio Nobrega

unread,
Jan 19, 2006, 9:22:43 AM1/19/06
to django-d...@googlegroups.com
I think it's because some DBs don't implement it. SQLite afaik
doesn't have anything like that, and Mysql, only on InnoDB tables
(default from 4.0+, but still).

Plus, it's not always about the data living on tables, running each
delete() allows developers to exploit this, doing something in case an
object is removed (delete files? email? log?).

Adrian Holovaty

unread,
Jan 19, 2006, 9:44:23 AM1/19/06
to django-d...@googlegroups.com
On 1/19/06, Russell Keith-Magee <freakb...@gmail.com> wrote:
> I've been looking at the code for normal object deletion in an attempt
> to get the same behaviour for bulk delete. It seems like there is a
> lot of logic dedicated to maintaining referential integrity that the
> database could be doing (and would probably do more efficiently).
>
> Is there any particular reason that Django has its own implementation
> of referential integrity rather than using the referential integrity
> capabilities of SQL?

The database would no doubt be more efficient, but SQLite and MySQL
(some versions) don't support referential integrity. We opted to do it
at the code level.

> It seems like most of the internal reference walking logic in the
> object.delete() call would be eliminated if the foreign key fields
> were declared ON DELETE CASCADE or ON DELETE SET NULL (as
> appropriate).

This is exactly what Django used to do, actually, back when it was
Postgres-only. Such are the compromises of having to support other
DBs. :-/

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Russell Keith-Magee

unread,
Jan 19, 2006, 10:17:58 AM1/19/06
to django-d...@googlegroups.com

Hrrm... /me breaks into verse of "get a real database"... :-)

Having a quick poking around the net, it looks like mysql5 supports
referential integrity. I also found this page:

http://www.justatheory.com/computers/databases/sqlite/foreign_key_triggers.html

that talks about a workaround implementation for SQLite 2.5+ using
triggers. Nothing in that implementation seems SQLite specific - it
might be able to be adapted for MySQL<5

Are you opposed to going back to an in-database model of referential
integrity if it turns out this trigger mechanism will work? What do
you consider minimum versions that must be supported? Alternatively,
are you opposed to a scheme that uses referential integrity if it is
available, but falls back on the Python-side implementation if it is
not?

It just seems a waste to give up all the useful, efficient
characteristics of a database like PostgreSQL for the sake of
supporting less featured databases. Any in-code implementation of
something like referential integrity will always be slower and more
bug-prone. Writing an application that avoids using referential
integrity in order to maintain compatibility with less featured
databases seems like asking for trouble, either when a bug in the
referential integrity code bites us, or when the server load becomes
an issue.

Russ Magee %-)

Amit Upadhyay

unread,
Jan 19, 2006, 10:58:46 AM1/19/06
to django-d...@googlegroups.com
On 1/19/06, Russell Keith-Magee <freakb...@gmail.com> wrote:

On 1/19/06, Adrian Holovaty <holo...@gmail.com> wrote:
>
> On 1/19/06, Russell Keith-Magee <freakb...@gmail.com > wrote:
> > It seems like most of the internal reference walking logic in the
> > object.delete() call would be eliminated if the foreign key fields
> > were declared ON DELETE CASCADE or ON DELETE SET NULL (as
> > appropriate).
>
> This is exactly what Django used to do, actually, back when it was
> Postgres-only. Such are the compromises of having to support other
> DBs. :-/

Hrrm... /me breaks into verse of "get a real database"... :-)
<snip>

It just seems a waste to give up all the useful, efficient
characteristics of a database like PostgreSQL for the sake of
supporting less featured databases. Any in-code implementation of
something like referential integrity will always be slower and more
bug-prone. Writing an application that avoids using referential
integrity in order to maintain compatibility with less featured
databases seems like asking for trouble, either when a bug in the
referential integrity code bites us, or when the server load becomes
an issue.

What would happen to _pre_delete and _post_delete? If I have "real database", and I after profiling I found a few bulk_delete too ineffecient for my application, what is stopping me from taking the database cursor, and executing the sql myself?

--
Amit Upadhyay
Blog: http://www.rootshell.be/~upadhyay
+91-9867-359-701

hugo

unread,
Jan 19, 2006, 12:55:37 PM1/19/06
to Django developers
>I've been looking at the code for normal object deletion in an attempt
>to get the same behaviour for bulk delete. It seems like there is a
>lot of logic dedicated to maintaining referential integrity that the
>database could be doing (and would probably do more efficiently).

It's not just the referential integrety - even though that's
complicated enough, with ISPs not allways installing newest versions of
databases - but the .delete() overloading, too. If your class overloads
.delete() to do specific stuff, that code will not be called when you
do bulk deletes.

Actually I am not really sure we can do fully symmetric (in the way
that they do exactly the same things as single object deletes) deletes.

BTW: the bulk delete stuff can be made more complete database-wise by
using the delete sql clause to construct deletes against related tables
(and doing updates against related tables).

If you have a maste rand a slave table and have a bulk delete like:

Master.objects.delete(name__contains='foo')

this will become the primary delete statement:

delete from master where name like '%foo%'

and this would be turned into the following statement for the related
table:

delete from slave where slave.master_id in (select id from master where
name like '%foo%')

Same with updates (turning around the master/slave relation in that we
now don't delete related objets, but do a set null):

update slave set master_id = NULL where master_id in (select id from
master where name like '%foo%')

Something along these lines (first collecting all related tables, then
building the needed update and delete statements and last doing the
actual table data delete) might be a way to at least get bulk-delete up
to par with multiple object deletes, while not losing too much
performance. It's still far from perfect, as you actually run the inner
query for the master table multiple times, though. And we still have
the clash in semantics with regard to the overloaded .delete() method.

bye, Georg

hugo

unread,
Jan 19, 2006, 12:58:03 PM1/19/06
to Django developers
>This is exactly what Django used to do, actually, back when it was
>Postgres-only. Such are the compromises of having to support other
>DBs. :-/

Maybe it would be a good idea to add stuff to the Django backend
modules so that real databases can make use of referential integrity
and cascaded deletes and stuff like that - while providing hooks to add
manual stuff for those databases, that don't support this?

Something like a before_bulk_delete(list_of_related_tables) hook that
is run before the actual delete and would have to construct the needed
statements for related tables to delete or update related rows, while
in real databases this is just a dummy that doesn't do anything,
because the database already does everything right?

bye, Georg

oggie rob

unread,
Jan 19, 2006, 3:31:45 PM1/19/06
to Django developers
> It's not just the referential integrety - even though that's
complicated enough, with ISPs not allways installing newest versions of
databases - but the .delete() overloading, too. If your class overloads
.delete() to do specific stuff, that code will not be called when you
do bulk deletes.

Doesn't the delete function also checks permissions on all the
referenced rows? I don't think you can implement this in SQL.

Frankly, the bulk delete seems to have some problems with usability,
too. I've deleted objects that have somewhere around 20 or 30
references and the screen it filled. If I was trying to delete in bulk,
it would be unusable.

I think it would be better to treat a "bulk" operation differently from
the regular one, somehow. For example, I would suggest an extra
"can_bulk_delete_model" permission, and instead of showing all the
related objects in the confirm delete screen, just show the parent
objects themselves. This would avoid confusion with "custom" delete
functions because those functions would never be called.

-rob

oggie rob

unread,
Jan 19, 2006, 6:44:36 PM1/19/06
to Django developers
I said:
> Doesn't the delete function also check permissions on all the referenced rows?

I tested this and found a bug (#1250), but I confirmed that you cannot
delete items through the admin interface if you don't have delete
permissions to ALL related objects.

-rob

hugo

unread,
Jan 20, 2006, 3:16:19 AM1/20/06
to Django developers
>Doesn't the delete function also checks permissions on all the
>referenced rows? I don't think you can implement this in SQL.

You are talking about the admin interface - we are talking about the
database API. The database API doesn't check any permissions, as it's a
Python call done by the programmer, and we still trust the programmer
;-)

bye, Georg

oggie rob

unread,
Jan 20, 2006, 2:01:20 PM1/20/06
to Django developers
> You are talking about the admin interface - we are talking about the database API. The database API doesn't check any permissions, as it's a Python call done by the programmer, and we still trust the programmer
;-)

Yep, I realize that, and wasn't trying to ignore the db changes to
support this. I just wanted to consider that cascading deletes would
not necessarily work through the admin interface with the current
permissions and lack of "confirm" behaviour.

-rob

Russell Keith-Magee

unread,
Jan 20, 2006, 10:08:26 PM1/20/06
to django-d...@googlegroups.com
On 1/20/06, hugo <g...@hugo.westfalen.de> wrote:

> BTW: the bulk delete stuff can be made more complete database-wise by
> using the delete sql clause to construct deletes against related tables
> (and doing updates against related tables).

I have no difficulty in seeing how it _can_ be done. However, I am
still wondering if it _should_ be done. It certainly can't be done
efficiently - at least, nowhere near as efficiently as a single line
SQL query would handle it.

The more I look at the problem, the worse things get. There are all
sorts of edge cases that the existing implementation doesn't handle.
For example, if you delete an object that is referenced from a
ForeignKey field, the ForeignKey is set to NULL. But what if that
field is marked NOT NULL? In the current implementation, the stale key
remains in the referencing field.

Referential integrity is supported by PostgreSQL, MSSQL, Oracle, and
MySQL 5. This referential integrity is debugged, tested, efficient,
and flexible. It might be possible to fake referential integrity in
SQLite2.5+ using triggers. At the moment, the FAQ doesn't set any
minimum version numbers on any database backend. I would suggest that
it should.

Yes, this means that people with MySQL 3/4 databases will need to
upgrade or choose another database. But why should we spend so much
time and effort writing, testing, and debugging a reimplementation of
referential integrity in Django simply to support those who come to
Django asking "Hey, I've got this nail; how do I use this bratwurst as
a hammer?". Django is a complex enough without complicating the
problem by reimplementing half the features of a database to support
those in the community that can't/won't upgrade.

Russ Magee %-)

Amit Upadhyay

unread,
Jan 21, 2006, 6:48:49 AM1/21/06
to django-d...@googlegroups.com
On 1/21/06, Russell Keith-Magee <freakb...@gmail.com> wrote:
Yes, this means that people with MySQL 3/4 databases will need to
upgrade or choose another database. But why should we spend so much
time and effort writing, testing, and debugging a reimplementation of
referential integrity in Django simply to support those who come to
Django asking "Hey, I've got this nail; how do I use this bratwurst as
a hammer?". Django is a complex enough without complicating the
problem by reimplementing half the features of a database to support
those in the community that can't/won't upgrade.


Hi, I appreciate your work but if it works by break it? I am thinking a good percentage of users will be using shared hosting etc to deploy their application, and they have very little control over the version of SQL server. [ object.delete() for object in get_list()] works, if its too slow, use custom sql [work on making it easy if it is not]. Delete is an edge case anyways.
Reply all
Reply to author
Forward
0 new messages