Bulk delete - performance / object collection

735 views
Skip to first unread message

George Lund

unread,
Jan 2, 2013, 6:29:18 AM1/2/13
to django...@googlegroups.com
I'm trying to bulk-delete several million rows from my database.

The docs for Django 1.3 say "this will, whenever possible, be executed purely in SQL". A pure-SQL delete is what I want in this case, so that's fine.

However, the code is never getting as far as running any SQL.

Interrupting the script shows that the delete method on the QuerySet is trying to use a "Collector" to construct model instances for each row I'm trying to delete. This is going to take too long (and may in fact consume all the memory available) -- I don't think it's practical to wait in this case. (I've tried waiting over half an hour!)

(I'm looking at django.db.models.query.QuerySet.delete and django.db.models.deletion.Collector.collect / Collector.add.)

What's the point in doing the delete "purely in SQL" if all of the objects are getting constructed anyway? Why do they need to be "collected" before the SQL DELETE is run? The model instance in this case has no child rows, to which the delete might need to be cascaded.

Meanwhile I can construct the SQL by hand easily enough, but I feel this isn't doing things the right way.

thanks for any help

George

Cal Leeming [Simplicity Media Ltd]

unread,
Jan 2, 2013, 6:50:54 AM1/2/13
to django...@googlegroups.com
Hi George,

This is one area I spent quite a lot of time in personally, see;

Bulk operations using the ORM isn't always the right thing to do - and it entirely depends on what you consider bulk and acceptable performance.

You might want to look at the source code for this, to see how they handle bulk operations (they implemented the same bulk_update approach mentioned in the above threads)

Although bypassing the ORM might feel wrong at first, sometimes it is completely acceptable - you just need to make sure you don't abuse/misuse it unnecessarily.

Cal


George

--
You received this message because you are subscribed to the Google Groups "Django users" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-users/-/W4LqKzcnlaYJ.
To post to this group, send email to django...@googlegroups.com.
To unsubscribe from this group, send email to django-users...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-users?hl=en.

Larry Martell

unread,
Jan 2, 2013, 7:01:35 AM1/2/13
to django...@googlegroups.com
I've had this discussion with other developers. Many feel they need to
slavishly adhere to the ORM and if you don't you're some type of evil
entity. I could not disagree more. Django and the ORM are tools, and
should be used only when they are the right tools for the job. I'd
just go ahead and do the delete in SQL.

Cal Leeming [Simplicity Media Ltd]

unread,
Jan 2, 2013, 7:07:03 AM1/2/13
to django...@googlegroups.com
That's an interesting perspective actually..

"Don't be a slave of your mind (or ORM)"

Cal

--
You received this message because you are subscribed to the Google Groups "Django users" group.

George Lund

unread,
Jan 2, 2013, 7:15:54 AM1/2/13
to django...@googlegroups.com
> Meanwhile I can construct the SQL by hand easily enough, but I feel this 
> isn't doing things the right way.

I've had this discussion with other developers. Many feel they need to
slavishly adhere to the ORM and if you don't you're some type of evil
entity. I could not disagree more. Django and the ORM are tools, and
should be used only when they are the right tools for the job. I'd
just go ahead and do the delete in SQL.

I don't think SQL is evil, but there are lots of reasons why mixing programming languages (and abstraction levels) to solve a problem isn't really ideal.

From a very personal point of view I'd like to see archaic languages like SQL go away for ever one day, in the meantime I completely agree that if the ORM doesn't do what's needed then the right choice is to go with whatever tools will :-)

thanks
George

George Lund

unread,
Jan 2, 2013, 7:17:13 AM1/2/13
to django...@googlegroups.com
Thank you very much for this. I'll catch up with those threads and read more about DSE, which looks really interesting work.

regards

George

akaariai

unread,
Jan 2, 2013, 9:13:57 AM1/2/13
to Django users
A note about the upcoming 1.5: The delete() was optimized so that it
doesn't fetch the objects if it doesn't have to. The "doesn't have to"
translates to:
- no delete signals defined for the model
- no cascades

That being said using DSE might be the right choice for you.

- Anssi

On 2 tammi, 14:17, George Lund <gl...@mintel.com> wrote:
> Thank you very much for this. I'll catch up with those threads and read
> more about DSE, which looks really interesting work.
>
> regards
>
> George
>
> On Wednesday, 2 January 2013 11:50:54 UTC, Cal Leeming [Simplicity Media
>
>
>
>
>
>
>
> Ltd] wrote:
>
> > Hi George,
>
> > This is one area I spent quite a lot of time in personally, see;
>
> >https://groups.google.com/forum/?fromgroups=#!msg/django-users/iRhWD0...
> >https://groups.google.com/forum/#!topic/django-users/hgLrwMoFLII
>
> > Bulk operations using the ORM isn't always the right thing to do - and it
> > entirely depends on what you consider bulk and acceptable performance.
>
> > You might want to look at the source code for this, to see how they handle
> > bulk operations (they implemented the same bulk_update approach mentioned
> > in the above threads)
> >http://pypi.python.org/pypi/dse
>
> > Although bypassing the ORM might feel wrong at first, sometimes it is
> > completely acceptable - you just need to make sure you don't abuse/misuse
> > it unnecessarily.
>
> > Cal
>
> > On Wed, Jan 2, 2013 at 11:29 AM, George Lund <gl...@mintel.com<javascript:>
> > > wrote:
>
> >> I'm trying to bulk-delete several million rows from my database.
>
> >> The docs for Django 1.3<https://docs.djangoproject.com/en/1.3/topics/db/queries/#deleting-obj...>say "this will, whenever possible, be executed purely in SQL". A pure-SQL
> >> delete is what I want in this case, so that's fine.
>
> >> However, the code is never getting as far as running any SQL.
>
> >> Interrupting the script shows that the delete method on the QuerySet is
> >> trying to use a "Collector" to construct model instances for each row I'm
> >> trying to delete. This is going to take too long (and may in fact consume
> >> all the memory available) -- I don't think it's practical to wait in this
> >> case. (I've tried waiting over half an hour!)
>
> >> (I'm looking at django.db.models.query.QuerySet.delete and django.db.models.deletion.Collector.collect
> >> / Collector.add.)
>
> >> What's the point in doing the delete "purely in SQL" if all of the
> >> objects are getting constructed anyway? Why do they need to be "collected"
> >> before the SQL DELETE is run? The model instance in this case has no
> >> child rows, to which the delete might need to be cascaded.
>
> >> Meanwhile I can construct the SQL by hand easily enough, but I feel this
> >> isn't doing things the right way.
>
> >> thanks for any help
>
> >> George
>
> >> --
> >> You received this message because you are subscribed to the Google Groups
> >> "Django users" group.
> >> To view this discussion on the web visit
> >>https://groups.google.com/d/msg/django-users/-/W4LqKzcnlaYJ.
> >> To post to this group, send email to django...@googlegroups.com<javascript:>
> >> .
> >> To unsubscribe from this group, send email to
> >> django-users...@googlegroups.com <javascript:>.
Reply all
Reply to author
Forward
0 new messages