Update on Ticket #16891: Delete/update should return number of rows modified

80 views
Skip to first unread message

Steven Cummings

unread,
Oct 11, 2011, 9:35:44 PM10/11/11
to django-d...@googlegroups.com
Finally got around to these simple changes:


If the core devs approve of these changes I can get to work on creating the proper patch for SVN and add doc changes.

--
Steven

Florian Apolloner

unread,
Oct 12, 2011, 5:28:53 AM10/12/11
to django-d...@googlegroups.com
Uhm question -- Does the count for Model.save() make any sense? Isn't that always one?

Cheers,
Florian

Johannes Dollinger

unread,
Oct 12, 2011, 6:37:05 AM10/12/11
to django-d...@googlegroups.com

Ticket #12086 is a duplicate. I don't think QuerySet.delete() should return the number of all collected objects, but just the number of objects deleted from QuerySet.model.
And Model.save()/Model.delete() should really only return 1, as *one object* has been saved/deleted, even if more than one database row was touched. Thus, both methods don't need the return value.
Thanks for working on this!
__
Johannes

Anssi Kääriäinen

unread,
Oct 12, 2011, 9:07:53 AM10/12/11
to Django developers
On Oct 12, 1:37 pm, Johannes Dollinger <emulb...@googlemail.com>
wrote:
> Ticket #12086 is a duplicate. I don't think QuerySet.delete() should return the number of all collected objects, but just the number of objects deleted from QuerySet.model.

Agree. A way to get the counts per object type would be useful
(logging for starters). It should not be the default return value,
though...

> And Model.save()/Model.delete() should really only return 1, as *one object* has been saved/deleted, even if more than one database row was touched. Thus, both methods don't need the return value.

Model.delete() can easily return 0. The easiest way to get 0 is doing
the delete two times in a row. Or having a concurrent delete. The
return value costs us nothing, except if for some reason we would want
to return something else from .delete() in the future. I don't see
that happening.

Model.save() should return models.UPDATED or models.INSERTED. That is
more helpful than the 1/0 return value. For example you might want to
do different things if the model was inserted or updated in
application code. Or if loading data from a file, you might want to
know how many objects already existed in the DB and how many were
inserted. Of course if you could cheaply get models.UNCHANGED, that
would make the feature perfect...

I agree that model.save() should never return 0/None/models.NOACTION,
that should be an exception instead of a return value. After .save()
the model should exists in the database. If the object for some reason
isn't persisted, it is worth an exception instead of easy to miss
return value.

- Anssi Kääriäinen

Steven Cummings

unread,
Oct 12, 2011, 9:33:39 AM10/12/11
to django-d...@googlegroups.com
It is for now, but the ticket I'm building up to [1] is that it may actually be 0 if a precondition is not met. This is how your code can know if the current request really got to do a save or delete if two or more are trying to update the object at the same time.

[1] https://code.djangoproject.com/ticket/16549
--
Steven


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/rpkcIupLI9gJ.
To post to this group, send email to django-d...@googlegroups.com.
To unsubscribe from this group, send email to django-develop...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/django-developers?hl=en.

Steven Cummings

unread,
Oct 12, 2011, 9:42:22 AM10/12/11
to django-d...@googlegroups.com
--
Steven


On Wed, Oct 12, 2011 at 5:37 AM, Johannes Dollinger <emul...@googlemail.com> wrote:

Am 12.10.2011 um 03:35 schrieb Steven Cummings:
 
Ticket #12086 is a duplicate. I don't think QuerySet.delete() should return the number of all collected objects, but just the number of objects deleted from QuerySet.model.

Is the idea that you should just assume related objects are deleted, and so the count should just reflect the current set of top-level objects in the QuerySet? I ask because I'm actually not sure about this one myself.

And Model.save()/Model.delete() should really only return 1, as *one object* has been saved/deleted, even if more than one database row was touched. Thus, both methods don't need the return value.

See my response to Florian. The result may not always be one after further enhancements.
 
Thanks for working on this!
__
Johannes
--
You received this message because you are subscribed to the Google Groups "Django developers" group.

Steven Cummings

unread,
Oct 12, 2011, 9:47:52 AM10/12/11
to django-d...@googlegroups.com
--
Steven


On Wed, Oct 12, 2011 at 8:07 AM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:

Model.save() should return models.UPDATED or models.INSERTED. That is
more helpful than the 1/0 return value. For example you might want to
do different things if the model was inserted or updated in
application code. Or if loading data from a file, you might want to
know how many objects already existed in the DB and how many were
inserted. Of course if you could cheaply get models.UNCHANGED, that
would make the feature perfect...

If you're ensuring a set of data from some other source, what's the use of knowing what was new? Are there other concrete cases where this might be useful?
 

I agree that model.save() should never return 0/None/models.NOACTION,
that should be an exception instead of a return value. After .save()
the model should exists in the database. If the object for some reason
isn't persisted, it is worth an exception instead of easy to miss
return value.

Possibly, but an exception cannot be added until the programmer is passing a precondition and can reasonable expect the condition of nothing being saved. That's definitely not on this ticket.
 

 - Anssi Kääriäinen

Anssi Kääriäinen

unread,
Oct 12, 2011, 10:35:45 AM10/12/11
to Django developers
On Oct 12, 4:47 pm, Steven Cummings <estebis...@gmail.com> wrote:
> If you're ensuring a set of data from some other source, what's the use of
> knowing what was new? Are there other concrete cases where this might be
> useful?

The standard use when synchronizing data is to make sure there aren't
too many changes. This is to protect for corrupted data. For example
if more than 10% of the data changed, then rollback the action. Send
an email to administrator who needs to validate the changes by had.
Granted, without models.UNCHANGED this use case has limited value.

Another use case is overriding model.save(), and based on the action
you might want to create an entry in another table. Something like
when saving User, if the user was created, then save a Profile for
him. Or just displaying "The object was successfully created"/"The
object was successfully updated" to the user. This would be more
useful even in manage.py shell:
>>> someobj.save()
INSERTED
>>> someobj.save()
UPDATED
vs.
>>> someobj.save()
1
>>> someobj.save()
1

Then there is the possibility that some day we could support the
UNCHANGED return value. This would be actually cheap to do if Django
would support update of only changed fields. We couldn't support that
return value if the only choices would be just 1/0 return values.

One way to see that there might be some uses for this, is that
post_save does have a created argument. And then there is the question
that what does this cost Django? The only cost I can see is that you
can't do save_count += obj.save() but you need to do save_count +=
obj.save() and 1 or 0.

> > I agree that model.save() should never return 0/None/models.NOACTION,
> > that should be an exception instead of a return value. After .save()
> > the model should exists in the database. If the object for some reason
> > isn't persisted, it is worth an exception instead of easy to miss
> > return value.
>
> Possibly, but an exception cannot be added until the programmer is passing a
> precondition and can reasonable expect the condition of nothing being saved.
> That's definitely not on this ticket.

Agreed. The only question is: If we don't want to support the
models.UPDATED / models.INSERTED return values, and we decide that 0
should not be ever returned, then why return the 1? That will be a
return value of no information. If you are not going to do the
models.UPDATED/INSERTED return value, then using up the .save() return
value should not be done at this time. We can't change the return
value at will, so we should not commit to something which doesn't have
any use now, and might not have any use ever.

- Anssi

Steven Cummings

unread,
Oct 12, 2011, 10:40:35 PM10/12/11
to django-d...@googlegroups.com
On Wed, Oct 12, 2011 at 9:35 AM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
On Oct 12, 4:47 pm, Steven Cummings <estebis...@gmail.com> wrote:
Then there is the possibility that some day we could support the
UNCHANGED return value. This would be actually cheap to do if Django
would support update of only changed fields. We couldn't support that
return value if the only choices would be just 1/0 return values.

Fair enough. That is an interesting possibility.

> > I agree that model.save() should never return 0/None/models.NOACTION,
> > that should be an exception instead of a return value. After .save()
> > the model should exists in the database. If the object for some reason
> > isn't persisted, it is worth an exception instead of easy to miss
> > return value.
>
> Possibly, but an exception cannot be added until the programmer is passing a
> precondition and can reasonable expect the condition of nothing being saved.
> That's definitely not on this ticket.

Agreed. The only question is: If we don't want to support the
models.UPDATED / models.INSERTED return values, and we decide that 0
should not be ever returned, then why return the 1? That will be a
return value of no information. If you are not going to do the
models.UPDATED/INSERTED return value, then using up the .save() return
value should not be done at this time. We can't change the return
value at will, so we should not commit to something which doesn't have
any use now, and might not have any use ever.

Well, obviously I hope it does have use in the original ticket that spawned this one. I guess what I might need some help with here is: what are some of the directions of the ORM that might affect this decision.

The reasons I stuck with counts across the board in this change were:
1. A uniform interface: object counts all the time for create/updated/delete.
2. Ready support for cascading updates if Model.save() were to ever trigger save() on related models that were dirty.

Admittedly #2 is speculative, but so are the examples above of supporting update of only fields that have changed.

So, is there a good sense of "sure" or "probable" enhancements to models that could help us with the return API of Model.save, and whether QuerySet.delete should return counts that include related objects? If "coming enhancements" or directions for Models can concretely inform, that would be great.

Thoughts?

--
Steven

Anssi Kääriäinen

unread,
Oct 13, 2011, 12:51:49 AM10/13/11
to Django developers
On Oct 13, 5:40 am, Steven Cummings <estebis...@gmail.com> wrote:
[About the 1/0 return value of .save()]
> Well, obviously I hope it does have use in the original ticket that spawned
> this one. I guess what I might need some help with here is: what are some of
> the directions of the ORM that might affect this decision.

My point is that we should never just return 0 if there is a
concurrent modification or some other reason for not being able to
persist the object state. Why? When you request obj.save() you are
requesting the object state to be persisted. If that can't be done,
it's an exception. delete() does not have the same problem: if you do
a delete, and the object is already deleted, then there is no problem.
The DB state is still synchronized with the delete.

The biggest problem of just returning 0 from the .save() on concurrent
modification is that if the developer forgets to do if obj.save(): ...
else: "show error" and instead does only a obj.save(), the application
thinks the data was saved when in fact it was not. In the best case
you will get an angry call from the user. Users don't like it when an
application claims the data was saved but in fact it was not. In the
worst case, you are doing other saves in the same request, and the
state is half-persisted. Which is a data corruption bug right there.
If instead .save() throws an exception, and the developer forgets to
try-catch it, the user gets a 500 page. This is better than claiming
the data was saved when it wasn't, and much better than doing a half-
save.

Also, if you just return 0, you will lose the information WHY the
object couldn't be saved, you just know it wasn't saved for some
reason.

Another way to think about this: when there is a unique constraint
violation, we raise an exception (the one from the DB backend). To be
consistent with returning 0 when the object can't be saved due to some
other reason, shouldn't we just catch that exception and return 0? Bad
idea, right?

> The reasons I stuck with counts across the board in this change were:
> 1. A uniform interface: object counts all the time for
> create/updated/delete.
> 2. Ready support for cascading updates if Model.save() were to ever trigger
> save() on related models that were dirty.
>
> Admittedly #2 is speculative, but so are the examples above of supporting
> update of only fields that have changed.

Agreed. Re point 1: An uniform interface would also be returning
models.DELETED / models.ALREADY_DELETED(?) from .delete(). Granted, in
the #1 case 0/1 makes more sense. I do see the problem with
model.delete() returning a number, while model.save() returns a
constant.

> So, is there a good sense of "sure" or "probable" enhancements to models
> that could help us with the return API of Model.save, and whether
> QuerySet.delete should return counts that include related objects? If
> "coming enhancements" or directions for Models can concretely inform, that
> would be great.

I don't think there is. For that reason I would just postpone the
model.save() return value decision. I feel pretty strongly that
model.save() should never return 0 in the meaning of "didn't save your
changes" (note that this is different from "there were no changes").
It is just too easy to miss that return value and have half-done
updates because of that.

- Anssi Kääriäinen

Steven Cummings

unread,
Oct 13, 2011, 1:33:47 AM10/13/11
to django-d...@googlegroups.com
On Wed, Oct 12, 2011 at 11:51 PM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:

My point is that we should never just return 0 if there is a
concurrent modification or some other reason for not being able to
persist the object state. Why? When you request obj.save() you are
requesting the object state to be persisted. If that can't be done,
it's an exception. delete() does not have the same problem: if you do
a delete, and the object is already deleted, then there is no problem.
The DB state is still synchronized with the delete.

I'm not sure anybody said 0 would be returned for an *inability* to save data. I would agree that would be an exception situation, but it hasn't been discussed here yet.
 

The biggest problem of just returning 0 from the .save() on concurrent
modification is that if the developer forgets to do if obj.save(): ...


So the situation I expect this to occur in is the context of very deliberate coding by the programmer, e.g.:

count = obj.save(where={'version': 1})

If the programmer forgets to check the count then they probably didn't understand the feature being used to being with. I don't think this is going to result in the accidents you're describing, because until there is a specific feature context, save will return 1 (or whatever else we decide). I've no intention of introducing implicit features or pitfalls.

Also, if you just return 0, you will lose the information WHY the
object couldn't be saved, you just know it wasn't saved for some
reason.

Another way to think about this: when there is a unique constraint
violation, we raise an exception (the one from the DB backend). To be
consistent with returning 0 when the object can't be saved due to some
other reason, shouldn't we just catch that exception and return 0? Bad
idea, right?

I generally agree with this, but as I stated above I've yet to discuss a situation involving the inability to save. In the example so far the reason for not saving is the condition given by the programmer. I won't be swallowing any exceptions and returning zero, and if a coder chooses to avoid saving on a version for optimistic concurrency, that is a decision, not an inability.
 
> So, is there a good sense of "sure" or "probable" enhancements to models
> that could help us with the return API of Model.save, and whether
> QuerySet.delete should return counts that include related objects? If
> "coming enhancements" or directions for Models can concretely inform, that
> would be great.

I don't think there is. For that reason I would just postpone the
model.save() return value decision. I feel pretty strongly that
model.save() should never return 0 in the meaning of "didn't save your
changes" (note that this is different from "there were no changes").
It is just too easy to miss that return value and have half-done
updates because of that.

I think a reasonable option to discuss might be leaving the save() API as it is and rolling this enhancement back to the internal code (i.e., UpdateQuery, DeleteQuery) returning counts to support the prospective enhancements I've alluded to, and/or overrides of save(). Until there are any changes to save(), I agree it is not going to be useful info. However for delete it seems immediately usable (and then we're left with the debate of counting immediate-only or including related objects).



Anssi Kääriäinen

unread,
Oct 13, 2011, 2:06:16 AM10/13/11
to Django developers
On Oct 13, 8:33 am, Steven Cummings <estebis...@gmail.com> wrote:
> So the situation I expect this to occur in is the context of very deliberate
> coding by the programmer, e.g.:
>
> count = obj.save(where={'version': 1})
>
> If the programmer forgets to check the count then they probably didn't
> understand the feature being used to being with. I don't think this is going
> to result in the accidents you're describing, because until there is a
> specific feature context, save will return 1 (or whatever else we decide).
> I've no intention of introducing implicit features or pitfalls.

Ok, so if this is going to be low level API only, then this isn't as
dangerous as I though. Still, even in this case, an exception wouldn't
be any worse IMHO. You could always do try: obj.save() except
ConcurrentModification: pass.

Now I have the feeling that I have gone through this exact same
discussion before, and have had the exact same misunderstanding, too,
before. So, sorry for that...

> I think a reasonable option to discuss might be leaving the save() API as it
> is and rolling this enhancement back to the internal code (i.e.,
> UpdateQuery, DeleteQuery) returning counts to support the prospective
> enhancements I've alluded to, and/or overrides of save(). Until there are
> any changes to save(), I agree it is not going to be useful info. However
> for delete it seems immediately usable (and then we're left with the debate
> of counting immediate-only or including related objects).

I would go with immediate only, with the ability to get the counts for
cascaded deletes per object type as a kwarg option. The kwarg option
could be left for later implementation. One reason for immediate only
is that at least PostgreSQL and MySQL does it that way for ON DELETE
CASCADE foreign keys. So, if you are getting the value from the cursor
and using ON DELETE CASCADE instead of doing the cascade in Django,
you will not get the cascade counts anyways. And even if you do the
cascade in Django, then it would be consistent with what a SQL
database would report.

- Anssi Kääriäinen

Steven Cummings

unread,
Oct 14, 2011, 12:48:12 AM10/14/11
to django-d...@googlegroups.com
On Thu, Oct 13, 2011 at 1:06 AM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
Now I have the feeling that I have gone through this exact same
discussion before, and have had the exact same misunderstanding, too,
before. So, sorry for that...

It's cool. Better to make sure we're all clear here on the different opinions and options.
 
> I think a reasonable option to discuss might be leaving the save() API as it
> is and rolling this enhancement back to the internal code (i.e.,
> UpdateQuery, DeleteQuery) returning counts to support the prospective
> enhancements I've alluded to, and/or overrides of save(). Until there are
> any changes to save(), I agree it is not going to be useful info. However
> for delete it seems immediately usable (and then we're left with the debate
> of counting immediate-only or including related objects).

I would go with immediate only, with the ability to get the counts for
cascaded deletes per object type as a kwarg option. The kwarg option
could be left for later implementation. One reason for immediate only
is that at least PostgreSQL and MySQL does it that way for ON DELETE
CASCADE foreign keys. So, if you are getting the value from the cursor
and using ON DELETE CASCADE instead of doing the cascade in Django,
you will not get the cascade counts anyways. And even if you do the
cascade in Django, then it would be consistent with what a SQL
database would report.

Well, by those conventions and all of the feedback I've gotten so far, counting only immediate objects in a delete seems like the best way.

Are there other opinions on this issue or the return value of save? Do any core devs have insight into any potential changes in the ORM that would be affected by this decision? 

Thanks.

Steven Cummings

unread,
Oct 16, 2011, 5:48:19 PM10/16/11
to django-d...@googlegroups.com
Any opinions from core devs before I go back and make adjustments, or are some waiting to see the patch before weighing in?

--
Steven

Steven Cummings

unread,
Oct 17, 2011, 11:02:50 PM10/17/11
to django-d...@googlegroups.com
Well, per discussion so far I've rolled the changes back [1] to simply ensuring that the low-level query objects return usable counts. There was discussion of whether delete should return counts for immediate objects only or related as well, but I further decided to simply impose no API changes on base Model for now.

Steven Cummings

unread,
Nov 26, 2011, 3:42:38 PM11/26/11
to django-d...@googlegroups.com
I'm assuming there's some more work to do here, but I was hoping to get opinions from core devs on what's been discussed so far. Some of the feedback above was that because save() can only return a rows-modified count of 1 right now, that it's silly to even change. So the tentative decision is simply to have no changes to the Model and QuerySet APIs at all right now (a generally safe decision, all other things aside).

So, what are core-dev thoughts on going on ahead and returning row-modified counts from Model.save() and QuerySet.update/delete? If not, are there any comments on the code changes to date (which are fairly minimal), or is this something that everybody would be generally okay with me wrapping up very soon? Is this generally something that there is low interest in because my ultimate use-case (avoiding optimistic concurrency problems) is a non-issue for some majority of Django developers?

Thanks!

--
Steven

Florian Apolloner

unread,
Nov 26, 2011, 4:37:09 PM11/26/11
to django-d...@googlegroups.com
Hi,


On Saturday, November 26, 2011 9:42:38 PM UTC+1, Steven Cummings wrote:
So, what are core-dev thoughts on going on ahead and returning row-modified counts from Model.save() and QuerySet.update/delete?

QuerySet.update already returns the modified row count.

Cheers,
Florian

Steven Cummings

unread,
Nov 26, 2011, 4:41:59 PM11/26/11
to django-d...@googlegroups.com
Right, let me rephrase that as just Model.save(), Model.delete(), & QuerySet.delete()
--
Steven


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/cubG4FAps0oJ.

Florian Apolloner

unread,
Nov 26, 2011, 4:58:16 PM11/26/11
to django-d...@googlegroups.com
I think it would be great if delete would return the deleted rows (but only of the base table obviously), -1 on model.save() returning a rowcount.

Cheers,
Florian

Steven Cummings

unread,
Nov 26, 2011, 5:09:48 PM11/26/11
to django-d...@googlegroups.com
Okay, that's good stuff. So make the QuerySet APIs consistent (delete/update should return the same sort of info) but leave Model alone entirely.
--
Steven




Cheers,
Florian

--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To view this discussion on the web visit https://groups.google.com/d/msg/django-developers/-/ziNdv-aRePQJ.
Reply all
Reply to author
Forward
0 new messages