SQL VIEWs as Django Models for aggregation --- delete() issues with django.db.models.query: delete_objects

31 views

Skip to first unread message

JeffH

unread,

Jul 13, 2007, 11:02:19 AM7/13/07

to Django users

This post relates to two separate Django issues that intertwined for
me:
1. Lack of aggregation support in Django's ORM
(I am delighted to see http://groups.google.com/group/django-developers/browse_thread/thread/f433edf7b0ebdbcb/
)
2. Django does not use delete() methods for related objects
(as noted previously in:
http://groups.google.com/group/django-users/browse_thread/thread/3e37575e42de86c9/
and
http://groups.google.com/group/django-users/browse_thread/thread/2d1dc42cea162c4b/
)

I would like to hear from others how they are dealing with #1 and what
their thoughts are about #2.

Here goes:
Experimenting with ways to add aggregated database information into my
Django models, I came across a blog entry several weeks ago where the
author (my apologies for not having the reference to cite) said they
used VIEWs as Django Models, simply putting in the SQL code to drop
the table and create a like-named VIEW in sql/model_name as discussed
at http://www.djangoproject.com/documentation/0.96/model-api/#providing-initial-sql-data
for providing initial data. I chose to make my VIEW models one-to-one
with the actual Django model that uses the aggregation data, so I can
say event.event_view.capacity_remaining for instance and the
calculation for the number of seats remaining at an event (taking into
account reserved tables, etc) is quietly done by the SQL VIEW, the
Django query just looks up the result. I added empty delete() and
save() methods to the view models to prevent such activities (or so I
thought!) on a non-updatable view.

This approach, as opposed to coding an aggregation query as a method
of a Model, has the advantage of allowing the use of Django's ORM
elegance to do things like EventView.objects.filter(whatever__lookup)
using one database query rather than doing list comprehensions of
method calls and having separate queries for each instance.

Here the fun begins:
When I tried to delete an inline object in the Django Admin by
clearing all the core=True fields, I got an OperationError 1228, VIEW
not UPDATEable. Argh! I had just discovered that Django does not use
the delete() methods of Models to delete related objects.

My fix for this specific problem was to patch
django.db.models.query.delete_objects() to see if the related model
overrides the delete() method, and if so, use it instead of the
existing batch SQL DELETE approach:

Starting at line 1059:
# Now delete the actual data
for cls in ordered_classes:
seen_objs[cls].reverse()
pk_list = [pk for pk,instance in seen_objs[cls]]
# if a related object's class overrides the
django.db.models.Model delete method - use it
if filter(lambda o: o.__dict__.has_key('delete'),cls.__mro__)
[0].__module__ != 'django.db.models.base':
map(lambda o: o.delete(),
cls._default_manager.filter(pk__in=pk_list))
else:
for offset in range(0, len(pk_list),
GET_ITERATOR_CHUNK_SIZE):
cursor.execute("DELETE FROM %s WHERE %s IN (%s)" % \
(qn(cls._meta.db_table), qn(cls._meta.pk.column),
','.join(['%s' for pk in pk_list[offset:offset
+GET_ITERATOR_CHUNK_SIZE]])),
pk_list[offset:offset+GET_ITERATOR_CHUNK_SIZE])

It's 4 lines of code - the patch starts with the second comment - plus
indenting within my else clause the original code used for the delete.

Most likely this patch does not cover cases like ManyToManyFields,
etc, I only had time to make the original error go away.

Thoughts? Comments?

Russell Keith-Magee

unread,

Jul 15, 2007, 1:24:39 AM7/15/07

to django...@googlegroups.com

On 7/13/07, JeffH <jeff...@gmail.com> wrote:
>
> This post relates to two separate Django issues that intertwined for
> me:
> 1. Lack of aggregation support in Django's ORM
> (I am delighted to see http://groups.google.com/group/django-developers/browse_thread/thread/f433edf7b0ebdbcb/
> )

For the record, I'm still keen to get aggregates supported in Django.
After a few distractions along the way (the unicode and oracle branch
merges), I believe Malcolm is currently working on the Query rewrite
that is the prerequisite for aggregates. Hopefully I will get a chance
to get some aggregates happening in the near future (but I've said
that before... :-)

> I would like to hear from others how they are dealing with #1 and what
> their thoughts are about #2.

When I need aggregates at the moment, I'm generally falling back into
raw SQL, or occasionally using the extra() clause on a queryset. Using
the 'extra' clause can get you where you want (annotated rows in a
database) by adding items to the select clause, which will appear as
extra attributes on objects from the query set.

The difficulty comes in getting GROUP BY and HAVING clauses into your
query. This is because Django doesn't currently have support for these
clauses inside Q objects. You can cheat a little by exploiting the
fact that WHERE clauses are the last in the chain, but it's not
particularly robust approach to the problem.

> Here goes:
> Experimenting with ways to add aggregated database information into my
> Django models, I came across a blog entry several weeks ago where the
> author (my apologies for not having the reference to cite) said they
> used VIEWs as Django Models, simply putting in the SQL code to drop
> the table and create a like-named VIEW in sql/model_name as discussed
> at http://www.djangoproject.com/documentation/0.96/model-api/#providing-initial-sql-data
> for providing initial data.

This is certainly a novel approach.

On the general topic - adding Django support for DB-level views is
something I have had banging around in my brain for a while. Its an
obvious blind spot in the existing functionality of Django, and it
should be a relatively clean extension of the existing Model/metamodel
framework.

However, I haven't looked at the problem in detail, and probably won't
have time to for a while. If someone else were to have a go at
implementing support for Views, I'm sure it would be looked upon
favorably.

> My fix for this specific problem was to patch
> django.db.models.query.delete_objects() to see if the related model
> overrides the delete() method, and if so, use it instead of the
> existing batch SQL DELETE approach:

I'm not sure I'm wild about this as a solution to the problem.
Deleting via a queryset is a 'bulk delete' operation, which is handled
quite separately from a normal delete. Django deliberately doesn't
call delete on individual objects in this case. If you are deleting a
lot of objects, calling delete on each object to be deleted can easily
be a very expensive operation, especially when you look at the
signalling and processing overhead.

Yours,
Russ Magee %-)

JeffH

unread,

Jul 16, 2007, 5:32:50 PM7/16/07

to Django users

Russ: Thanks for your reply!

1. I don't consider my views-as-models approach elegant at the
implementation level, but:
* it is dead simple to use once set up (it's just the Django model
and database API)
* it allows for limited aggregation support (views are created at
install time, so real ad hoc reporting would have to be done with
custom model methods)
* it works with querysets since the aggregated columns are defined
as Django model fields, so presenting a table of aggregated data with
an arbitrary selection of rows only requires one actual database query
(versus many queries when iterating over instances invoking a model
method)
* SQL code stays in SQL script files that are part of Django's
design (sqlinitialdata) and stays out of my python code
Ideally Django classes for the table and view functionality would
share a common ancestor so they share an API, and table-specific
behavior like delete_objects would be defined above the common (i.e.
SELECT) functionality. Realistically, such a refactoring is unlikely
before my next few projects are due (no sarcasm intended).

2. With hindsight, delete_objects() behavior of Django appears to be
specifically desired by the developers so I'll just change my working
copy to just check for a class attribute like "not_updatable", and
abort batch deletes for related object classes with that attribute.
Given the specific nature of my need (prevent cascading deletes of
views) it will be much more efficient than calling an empty delete()
method for each object.

3. That said, my delete_objects() hack was an attempt to solve the
more general problem of using an overridden delete() without any
special flags while allowing for the possibility of model subclassing
in the future, because it seemed that respecting an overridden
delete() was an important end in itself from an object integrity
perspective. I had hoped that the developers might be intrigued by the
concept behind the hack (checking once per related class whether a
user has overridden delete() and defaulting to the usual batch delete)
because of the inherent object-functionality-versus-database-
efficiency tension in an ORM. My hack offers options - the user can
choose to give up database speed for object-level functionality - and
options are always good as long as they are documented: "WARNING: Your
database may slow to a crawl if you override delete()!" Checking if
delete() was overridden is not a performance hit, so you don't lose
anything by offering the option. For my application, aggregate
reporting is far more common than deletion, so it needs to be
efficient and convenient to code.

4. Regarding documentation, I will put in a ticket for the
documentation on the relevant section
http://www.djangoproject.com/documentation/model-api/#overriding-default-model-methods
of the model-api page. That section invites people to override save()
and delete() but only discusses save() and the implication (by
omission) that overriding delete() is analogous is incorrect.

5. A big ongoing thanks to all of Django's contributors. You rock!

Thanks for reading this.

Cheers,
Jeff

On Jul 15, 1:24 am, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

> On 7/13/07, JeffH <jeff4...@gmail.com> wrote:
>
>
>
> > This post relates to two separate Django issues that intertwined for
> > me:
> > 1. Lack of aggregation support in Django's ORM

> > (I am delighted to seehttp://groups.google.com/group/django-developers/browse_thread/thread...

> > athttp://www.djangoproject.com/documentation/0.96/model-api/#providing-...

Russell Keith-Magee

unread,

Jul 16, 2007, 10:23:52 PM7/16/07

to django...@googlegroups.com

On 7/17/07, JeffH <jeff...@gmail.com> wrote:
>
> Russ: Thanks for your reply!
>
> 1. I don't consider my views-as-models approach elegant at the
> implementation level, but:

Its a bit of a nifty hack. However, I was talking more about views as
a general tool - it would be nice to support them. The success that
you have had mocking them up demonstrates that it is possible, we just
need to finess them.

> 3. That said, my delete_objects() hack was an attempt to solve the
> more general problem of using an overridden delete() without any
> special flags while allowing for the possibility of model subclassing
> in the future, because it seemed that respecting an overridden
> delete() was an important end in itself from an object integrity
> perspective.

To my mind, the better approach here would be to have proper support
for a 'read-only' query set that disables (or doesn't have) the delete
operation, rather than significantly changing the behaviour of bulk
deletes. Read-only objects/querysets would be an essential part of any
view support in Django.

> For my application, aggregate
> reporting is far more common than deletion, so it needs to be
> efficient and convenient to code.

I suppose this is the sticking point. I can see how this is a neat
hack for your specific situation, but I'm not convinced its a general
problem.

> 4. Regarding documentation, I will put in a ticket for the
> documentation on the relevant section
> http://www.djangoproject.com/documentation/model-api/#overriding-default-model-methods
> of the model-api page. That section invites people to override save()
> and delete() but only discusses save() and the implication (by
> omission) that overriding delete() is analogous is incorrect.

Well, it is correct if you consider bulk-delete to be different to
delete. There is no bulk-save analog. Bulk delete (delete on a
queryset) is quite a different operation, that doesn't integrate with
the individual object delete.

However, the distinction obviously isn't obvious to newcomers.
Suggestions on documentation improvements are always welcome. My
initial reaction is that if there is a need for improvement, its here:

http://www.djangoproject.com/documentation/db-api/#deleting-objects

> 5. A big ongoing thanks to all of Django's contributors. You rock!

No - we're more into the Gypsy jazz than the rock. But thanks anyway :-)

Yours,
Russ Magee %-)

JeffH

unread,

Jul 17, 2007, 12:08:59 PM7/17/07

to Django users

> > 3. That said, my delete_objects() hack was an attempt to solve the
> > more general problem of using an overridden delete() without any
> > special flags while allowing for the possibility of model subclassing
> > in the future, because it seemed that respecting an overridden
> > delete() was an important end in itself from an object integrity
> > perspective.
>
> To my mind, the better approach here would be to have proper support
> for a 'read-only' query set that disables (or doesn't have) the delete
> operation, rather than significantly changing the behaviour of bulk
> deletes. Read-only objects/querysets would be an essential part of any
> view support in Django.

Hmmm. I seem to have overgeneralized my problem and its solution. My
real problem with using views right now is that Django implements (and
therefore enforces) cascading deletes internally. I could easily
override database-level delete cascade without having to hack on
Django itself. I'm not concerned about intentional Manager-level bulk
deletes. My specific error occurred when I blanked out all the core
fields for an inline object in the Admin and saved the primary object.
The inline object also has a one-to-one related view object that is
caught up in the cascading delete code. I don't think the Manager-
level delete() for related objects is invoked in situations like this
--- though I'm speaking well beyond my knowledge level here.

Regardless of how it is implemented, offering the user the ability to
prevent cascading deletes of an object class either at the class
definition or the field definition level would allow lots of powerful
stuff without having to hack the Django core.

In terms of the bigger picture, I think aggregation support should be
handled at the queryset level, but view support would be much more
powerful and easy to use if views are treated as model-like objects
themselves because of their flexibility (for instance an updatable
view onto a table in another database). Views are an advanced feature
that not everybody needs - why not keep the changes minimal and put
the onus on the user (let the DBA code the create view SQL file, and
the application developer code the python)? In the short term, I would
suggest the actual view creation be handled as raw user SQL in place
of the usual model table creation (and that view creation be optional
in case the view already exists on the database), and the user can
specify that the view object is updatable, otherwise it defaults to
[no saves/no deletes/no cascading deletes].

> Well, it is correct if you consider bulk-delete to be different to
> delete. There is no bulk-save analog. Bulk delete (delete on a
> queryset) is quite a different operation, that doesn't integrate with
> the individual object delete.
>
> However, the distinction obviously isn't obvious to newcomers.
> Suggestions on documentation improvements are always welcome. My
> initial reaction is that if there is a need for improvement, its here:
>
> http://www.djangoproject.com/documentation/db-api/#deleting-objects
>

Good point. I'd say it should be explained there, and specifically
mentioned in the model-api section on overriding. I'll put in a
ticket.