API question for model saving

12 views
Skip to first unread message

Malcolm Tredinnick

unread,
Apr 27, 2008, 7:53:49 PM4/27/08
to django-d...@googlegroups.com
One of the minor cleanup items I've got on my list now that the query
stuff is mostly done is fixing model saving so that in the odd cases
where finer control is necessary, we have it.

Sometimes when calling save(), we know (for business reasons) that it
must create a new object or raise an error. Similarly, in other cases,
it either must update an existing object or raise an error.

For example: creating a new employee in the a business system. The
primary key is their tax file number (unique per person). After entering
all the details, we hit save. This *must* create a new entry. If there
is an existing entry with the same tax file number, it's an error. Under
no circumstances should the existing entry be updated.

Conversely, we're now giving this person a raise. We enter the new
salary and the tax file number and hit save. It must update an existing
record. If a record with that primary key doesn't exist, it's an error
and no new record should be created.

These situations don't arise all the time, but in some situations they
can be fairly frequent and when you need them, it's important.

There are at least three possibilities for the API here. In all cases
the default behaviour is as it is now.

(1) Model.save(must_create=None) where "must_create" is True (must
create), False (must not create) or None (try an update; if that fails,
insert).

Pros: one parameter only (shorter; easier to subclass). Uses
existing, builtin constant values.

Cons: "False" tends to read most naturally as "not must_create",
rather than "must_not_create", so it does slightly fail the
reading test.

(2) Model.save(method=DEFAULT) and this time "method" takes one of three
constant values (making up the names a bit): "DEFAULT", "MUST_CREATE",
"MUST_UPDATE".

Pros: very clear what's going on when you use something other
than DEFAULT.

Cons: now you have to import the constant you want to use.
Alternatively, we put the constant on the Model class and now
you have to write Model.MUST_CREATE, which divides opinions a
bit as to readability (not everybody likes model-level
constants, so they're not used to reading them).

(3) Model.save(must_create=False, must_update=False).

Pros: Uses existing builtins.

Cons: Two binary parameters means four possibilities. The (True,
True) case is invalid. It's also two things you have to specify
when subclassing save().

If it was just me, I'd use (1). But I'm not typical. For our audience,
I'd probably favour (2) and make the constants have string values of
"create" and "update" so you could use them in literal form and they're
still readable -- a kind of halfway alternative to having to import the
constants if you don't want to.

Thoughts?

(The implementation of all this is pretty much trivial. Getting the API
right is the hard bit here.)

Regards,
Malcolm

--
For every action there is an equal and opposite criticism.
http://www.pointy-stick.com/blog/

Karen Tracey

unread,
Apr 27, 2008, 8:44:16 PM4/27/08
to django-d...@googlegroups.com
On Sun, Apr 27, 2008 at 7:53 PM, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
[snipped problem description]


(1) Model.save(must_create=None) where "must_create" is True (must
create), False (must not create) or None (try an update; if that fails,
insert).

       Pros: one parameter only (shorter; easier to subclass). Uses
       existing, builtin constant values.

       Cons: "False" tends to read most naturally as "not must_create",
       rather than "must_not_create", so it does slightly fail the
       reading test.

I like this option except for the name must_create, which makes the False setting fail readability.  Couldn't we think of a different name that didn't have this problem?  Some options:

create
new_instance
create_new_instance (ugh, too long)

To me, 'create' all by itself seems fine.

(2) Model.save(method=DEFAULT) and this time "method" takes one of three
constant values (making up the names a bit): "DEFAULT", "MUST_CREATE",
"MUST_UPDATE".

       Pros: very clear what's going on when you use something other
       than DEFAULT.

       Cons: now you have to import the constant you want to use.
       Alternatively, we put the constant on the Model class and now
       you have to write Model.MUST_CREATE, which divides opinions a
       bit as to readability (not everybody likes model-level
       constants, so they're not used to reading them).

I don't like this one so much, mostly because #1 just seems more elegant to me, not out of any strong dislike.

(3) Model.save(must_create=False, must_update=False).

       Pros: Uses existing builtins.

       Cons: Two binary parameters means four possibilities. The (True,
       True) case is invalid. It's also two things you have to specify
       when subclassing save().

I really dislike this one, specifically that it allows a caller to pass in an invalid set of values.   

JMO,
Karen

flo...@gmail.com

unread,
Apr 27, 2008, 8:52:42 PM4/27/08
to Django developers
I'd like to point out that in #2705, the patch creators have put the
API on the side of the get(), as for_update(), so that the decision
about whether the save should update or create is saved as state on
the model instances themselves. I don't really like that syntax, but
it is probably worth noting.

Malcolm Tredinnick

unread,
Apr 27, 2008, 8:58:34 PM4/27/08
to django-d...@googlegroups.com

I know about that, but it's an entirely different problem (and more
complex). That ticket is doing row-level locking -- pre-emptively
locking some data until an update is done on it. This issue is simply
about saving and the cases when new record creation needs control.
Row-level locking isn't the solution for many "must create" or "must
update" situations.

Malcolm

--
Plan to be spontaneous - tomorrow.
http://www.pointy-stick.com/blog/

Marty Alchin

unread,
Apr 27, 2008, 8:59:25 PM4/27/08
to django-d...@googlegroups.com
I'm a bit on the other side of the situation here. To me, save() seems
adequately ambiguous, either inserting or updating depending on the
situation. As an alternative, I'd think having separate methods, like
create() and update() would make things clearer still.

In addition to avoiding the whole argument-naming issue, it seems like
this would make subclassing even better. If you create a subclass and
override create(), that would be used in the explicit create() case as
well as the create side of the save() case.

I don't know, my mind's not really on this topic enough yet, so I'm
sure this makes less sense than it seems to me. I'll think it over
some more and see.

-Gul

Malcolm Tredinnick

unread,
Apr 27, 2008, 9:08:15 PM4/27/08
to django-d...@googlegroups.com

On Sun, 2008-04-27 at 20:59 -0400, Marty Alchin wrote:
> I'm a bit on the other side of the situation here. To me, save() seems
> adequately ambiguous, either inserting or updating depending on the
> situation. As an alternative, I'd think having separate methods, like
> create() and update() would make things clearer still.
>
> In addition to avoiding the whole argument-naming issue, it seems like
> this would make subclassing even better. If you create a subclass and
> override create(), that would be used in the explicit create() case as
> well as the create side of the save() case.

Why do people want to add a whole extra function call when a simple
parameter will do? All the main logic is going to be in one place anyway
(unless you like having masses of duplicated code). With a parameter
it's a simple if-test to pick the right branch. With a whole extra
method, you end up calling something that takes the parameter anyway
that does the whole extra if-test. So all you've done is introduce a
function call under some dubious justification that people might want to
screw up create() and update() independently. They can already do that;
they don't need an officially blessed way.

-1 on adding extra methods. This situation just isn't needed that often.
You're mostly just going to call save() with no parameters; it's the
right thing. For the rare cases, we give you micro-control (we need this
internally sometimes).

Malcolm

--
I intend to live forever - so far so good.
http://www.pointy-stick.com/blog/

Adrian Holovaty

unread,
Apr 27, 2008, 9:14:56 PM4/27/08
to django-d...@googlegroups.com
On Sun, Apr 27, 2008 at 6:53 PM, Malcolm Tredinnick
<mal...@pointy-stick.com> wrote:
> Sometimes when calling save(), we know (for business reasons) that it
> must create a new object or raise an error. Similarly, in other cases,
> it either must update an existing object or raise an error.

So I don't quite follow this, but...this basically is a way to
manually override our "if the primary key is set to None, then it's an
INSERT; otherwise it's an UPDATE" logic, right? Would you be able to
provide a more specific example of why one would need this
functionality?

Name-wise, I suggest the use of force. ;-)

force_create=True
force_update=True

Adrian

--
Adrian Holovaty
holovaty.com | everyblock.com | djangoproject.com

Malcolm Tredinnick

unread,
Apr 27, 2008, 9:25:26 PM4/27/08
to django-d...@googlegroups.com

On Sun, 2008-04-27 at 20:14 -0500, Adrian Holovaty wrote:
> On Sun, Apr 27, 2008 at 6:53 PM, Malcolm Tredinnick
> <mal...@pointy-stick.com> wrote:
> > Sometimes when calling save(), we know (for business reasons) that it
> > must create a new object or raise an error. Similarly, in other cases,
> > it either must update an existing object or raise an error.
>
> So I don't quite follow this, but...this basically is a way to
> manually override our "if the primary key is set to None, then it's an
> INSERT; otherwise it's an UPDATE" logic, right?

No. You can have inserts with the primary key being non-None in cases
where you're using semantic primary keys instead of Django's
auto-generated (surrogate) keys.

The example in my first post of a (unique) tax file number as the
primary key is a case in point.

> Would you be able to
> provide a more specific example of why one would need this
> functionality?

The employee record creation and update examples in the first post of
this thread (paragraphs three and four) are appropriate cases. That type
of business logic appears regularly in some domains. The idea here, by
the way, isn't to avoid update races (that's what "select for update" is
used for and that's appropriate for slightly different circumstances;
this is orthogonal). It's to avoid inadvertent updating or creation in
cases where that matters.

Internally, there's a case for doing this with sessions, for example: a
dirty session should only be saved with update(). That way, if the
session is deleted() before it is saved() by an alternate thread of
execution, the "save" can be thrown away when we receive a "does not
exist" error (it's been deleted for good reason, possibly because the
person logged out; so recreating it is arguably not the right approach).
For normal saving this clearly isn't appropriate, but for something
transient like sessions, which have a clear lifecycle, it could be the
right thing. This isn't something I'm going to initially, but it's a
possibility to consider.

Malcolm

--
If you think nobody cares, try missing a couple of payments.
http://www.pointy-stick.com/blog/

Grégoire Cachet

unread,
Apr 27, 2008, 10:17:18 PM4/27/08
to django-d...@googlegroups.com
Hey!

I have also been thinking to the problem lately.

If you're doing it because it's related to your business logic for a
specific model, I would rather see this feature in the model definition.

Why could you not include a parameter in the Meta class, like
no_update = True ?
By default, this parameter would be False.

We you call save(), the Model checks the value of no_update, and
creates a new entry if it's set to True instead of updating the
current entry.

I see some advantages to this approach:
- you will never forget to add the parameter when calling the save()
method.
- the property is written once in the code, so you don't repeat yourself
- it is easier to manage it for the admin interface: if it calls
save(), it will create a new record transparently

Btw, thank you very much for your work on queryset-refactor :)

Regards,
--
Grégoire Cachet

Le 27 avr. 08 à 19:53, Malcolm Tredinnick a écrit :

Marty Alchin

unread,
Apr 27, 2008, 10:37:23 PM4/27/08
to django-d...@googlegroups.com
On Sun, Apr 27, 2008 at 10:17 PM, Grégoire Cachet
<gregoir...@gmail.com> wrote:
> Why could you not include a parameter in the Meta class, like
> no_update = True ?
> By default, this parameter would be False.

The problem is, in Malcolm's example of an employee database, you
might well sometimes need to make changes to a specific employee's
records. For example, you might need to update their salary, the
department they work for, maybe even their name or date of birth if
they were entered incorrectly (or for legitimate reasons, such as a
woman getting married). Simply supplying no_update doesn't give the
application the chance to decide when an update is allowed, and when
it's not.

-Gul

Grégoire Cachet

unread,
Apr 27, 2008, 10:50:31 PM4/27/08
to django-d...@googlegroups.com
Le 27 avr. 08 à 22:37, Marty Alchin a écrit :

Ok, I misunderstood the exact context.
But still that would be a useful feature :)

Regards,
--
Grégoire Cachet

Ken Arnold

unread,
Apr 27, 2008, 11:18:30 PM4/27/08
to Django developers
On Apr 27, 8:58 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:
> On Sun, 2008-04-27 at 17:52 -0700, flo...@gmail.com wrote:
> > I'd like to point out that in #2705, the patch creators have put the
> > API on the side of the get(), as for_update(), so that the decision
> > about whether the save should update or create is saved as state on
> > the model instances themselves. I don't really like that syntax, but
> > it is probably worth noting.
>
> I know about that, but it's an entirely different problem (and more
> complex).


I think floguy has a point: how you get at the model object specifies
the saving behavior that you want.

If you did `Model.objects.get()`, then saving the resulting object
clearly means updating that specific object (and probably you want to
error if it no longer exists)

If you did `Model.objects.create()`, then you just created it anyway.
But that currently creates an object with the constructor and calls
`save()` on it. This is Malcolm's problem case: that `save` is
semantically identical to an update, though it shouldn't be.

Possible fix: define more clearly whether Django thinks that the model
is in the database or not. Currently that's (IIRC) based on whether
the model has a primary key. But as Malcolm points out, a semantic
primary key breaks this assumption.

Maybe a model can have some generic "database pointer" attribute. It's
`None` for a newly-created model, so the `save` in `create` does the
right thing: it always tries to make a new object, even if the primary
key was specified in the arguments. A 'get' from the database sets it
to the primary key, so any `save` tries to update that object (error-
ing? if that object no longer exists).

A corner case is where you change the primary key of an existing
object. Should this create a new object, or change the primary key of
the old one? Without thinking too hard about this, I'd think we'd want
to create a new object; updating all the foreign keys to the old
object would be quite a pain. Maybe the primary key should just be
immutable; if you want to change it, `copy(pk=new_pk)` the object,
then optionally `delete()` the original.

The 'database pointer' attribute might also support the row-level
locking case, if desired.

Usual disclaimers of seems-right-to-me-but-Malcolm-is-smarter-than-me
apply.

-Ken

Justin Fagnani

unread,
Apr 28, 2008, 12:36:02 AM4/28/08
to django-d...@googlegroups.com
On Sun, Apr 27, 2008 at 8:18 PM, Ken Arnold <kenneth...@gmail.com> wrote:
 
Possible fix: define more clearly whether Django thinks that the model
is in the database or not. Currently that's (IIRC) based on whether
the model has a primary key. But as Malcolm points out, a semantic
primary key breaks this assumption.

This would be extremely useful for me. I often need to make sure either an object is never updated, that only some fields are updated, or that an object can only be updated in certain cases. The object itself can determine when updates are allowed as long as it can tell whether it already exists in the DB and get the current instance if it does. Currently I use a combination of checking the primary key and querying the DB if it's set.

 

If there were model methods like exists() and saved_instance(), then to force an update you could just use an if:


if object.exists():

    object.save()


And in my cases, I could override save():


def save(self):

    c = self.saved_instance()

    if self.exists:

        c = self.saved_instance()

        if c.immutable_field != self.immutable_field:

            raise SomeError("immutable_field cannot be changed")

        if c.locked:

            raise SomeOtherError("Object %s is locked" % self.id)

    super(Model, self).save()


This could be more complex than what Malcolm's looking for though. Usual disclaimers as well.


-Justin

ludvig.ericson

unread,
Apr 28, 2008, 3:28:20 AM4/28/08
to Django developers
> (1) Model.save(must_create=None) where "must_create" is True (must
> create), False (must not create) or None (try an update; if that fails,
> insert).
>
> (2) Model.save(method=DEFAULT) and this time "method" takes one of three
> constant values (making up the names a bit): "DEFAULT", "MUST_CREATE",
> "MUST_UPDATE".
>
> (3) Model.save(must_create=False, must_update=False).

-1 on all three. They all break existing code very, very much. As you
noted, suppose:

class MyModel(models.Model):
def save(self):
if not self.id:
self.some_date = datetime.now()
super(MyModel, self).save()

Tada, keyword argument invalid and not passed. This pattern is *very*
common as far as I've seen, and yes, perhaps people should have
written it more cooperatively like "def save(self, *a, **kw)" and then
".save(*a, **kw)", but they didn't.

The best way I can think of is adding new saving functions, and let
save do what it does: update or insert. Maybe "model.update()" and
"model.insert()"? Should be pretty clear.

My 2c have been posted,
Ludvig.

David Danier

unread,
Apr 28, 2008, 3:33:25 AM4/28/08
to django-d...@googlegroups.com
> Sometimes when calling save(), we know (for business reasons) that it
> must create a new object or raise an error. Similarly, in other cases,
> it either must update an existing object or raise an error.

I think discussion about this issue has been on this list before, last
time someone suggested adding create() and update()...and keeping save()
as an method calling these two, like:
---------8<----------------------------------------------------
class Model(...):
def save(self, ...):
if self.has_pk() and self.pk_exists():
self.update()
else:
self.create()
def update(): ...
def create(): ...
---------------------------------------------------->8---------

So what is the big advantage of having an new parameter in save() like
you suggested? With having create()/update() you can do similar things.

Additionally it would be possible to override only parts of the
save()-logic in classes extending Model (for example when needing to
calculate field-values on create, but not on insert...which is currently
difficult).

And, of course, you would have no problems with naming the new parameter
and difficulties in creating self-explaining possible values ("not
must_create", rather than "must_not_create").

Just my 2 cents, David Danier

Malcolm Tredinnick

unread,
Apr 28, 2008, 4:21:36 AM4/28/08
to django-d...@googlegroups.com

On Mon, 2008-04-28 at 00:28 -0700, ludvig.ericson wrote:
> > (1) Model.save(must_create=None) where "must_create" is True (must
> > create), False (must not create) or None (try an update; if that fails,
> > insert).
> >
> > (2) Model.save(method=DEFAULT) and this time "method" takes one of three
> > constant values (making up the names a bit): "DEFAULT", "MUST_CREATE",
> > "MUST_UPDATE".
> >
> > (3) Model.save(must_create=False, must_update=False).
>
> -1 on all three. They all break existing code very, very much. As you
> noted, suppose:
>
> class MyModel(models.Model):
> def save(self):
> if not self.id:
> self.some_date = datetime.now()
> super(MyModel, self).save()
>
> Tada, keyword argument invalid and not passed.

Nonsense. That's why they have default values. If you don't pass
anything the default is applied.

>>> def f(x=3):
... print x
...
>>> f()
3

Malcolm

--
Quantum mechanics: the dreams stuff is made of.
http://www.pointy-stick.com/blog/

Johannes Beigel

unread,
Apr 28, 2008, 4:43:40 AM4/28/08
to django-d...@googlegroups.com
Am 28.04.2008 um 10:21 schrieb Malcolm Tredinnick:
> On Mon, 2008-04-28 at 00:28 -0700, ludvig.ericson wrote:
>>
>> suppose:
>>
>> class MyModel(models.Model):
>> def save(self):
>> if not self.id:
>> self.some_date = datetime.now()
>> super(MyModel, self).save()
>>
>> Tada, keyword argument invalid and not passed.
>
> Nonsense. That's why they have default values. If you don't pass
> anything the default is applied.

I don't think that's what ludgiv.ericson meant. It fails when you
*pass* that kwarg but your overloaded save() method doesn't accept it.
I don't think that this is bad though: A user has to change your her
code to reach this point, so it's her responsibility to change her
code of the overloaded save(), too.

Malcolm Tredinnick

unread,
Apr 28, 2008, 4:55:28 AM4/28/08
to django-d...@googlegroups.com

Yes, it's effectively a non-issue in that case. If somebody wants to
support the micro-control in the overridden save method, they can add
it. But they don't need to.

Let's keep in mind here that the need for this micro-control is very
much the minority case. It's not a non-existent case, but pretty much
all code won't care or need to change. After all, code for which this is
currently a requirement cannot be written using Django's save, by
definition. All current code accepts the current behaviour as correct
and, if they don't change a single thing, they'll continue to do that.

Malcolm

--
A conclusion is the place where you got tired of thinking.
http://www.pointy-stick.com/blog/

David Larlet

unread,
Apr 28, 2008, 6:21:59 AM4/28/08
to django-d...@googlegroups.com

Le 28 avr. 08 à 09:33, David Danier a écrit :

I'm +1 on this solution, which turns the well known pattern:

def save(self):
if not self.id:
self.some_date = datetime.now()

super(Model, self).save()

to:

def create(self):
self.some_date = datetime.now()
super(Model, self).create()

BTW, create()/update() sounds more explicit to me than save().

David

Malcolm Tredinnick

unread,
Apr 28, 2008, 6:51:54 AM4/28/08
to django-d...@googlegroups.com

On Sun, 2008-04-27 at 20:18 -0700, Ken Arnold wrote:
> On Apr 27, 8:58 pm, Malcolm Tredinnick <malc...@pointy-stick.com>
> wrote:
> > On Sun, 2008-04-27 at 17:52 -0700, flo...@gmail.com wrote:
> > > I'd like to point out that in #2705, the patch creators have put the
> > > API on the side of the get(), as for_update(), so that the decision
> > > about whether the save should update or create is saved as state on
> > > the model instances themselves. I don't really like that syntax, but
> > > it is probably worth noting.
> >
> > I know about that, but it's an entirely different problem (and more
> > complex).
>
>
> I think floguy has a point: how you get at the model object specifies
> the saving behavior that you want.

Only sometimes. And "select for update" is very heavyweight method of
doing so. If you crash or hang and don't release the lock correctly, you
mess up everybody. So now your code has another place where it has to
trust everything else (the current case of that is manual transaction
management). An unlikely failure mode, but very serious when it happens
(trying to track down problems caused by incorrectly unreleased locks
can be unnecessarily time consuming on a busy system).

> If you did `Model.objects.get()`, then saving the resulting object
> clearly means updating that specific object (and probably you want to
> error if it no longer exists)

Probably/possibly, but not always. Making it invariable would be an
error.

> If you did `Model.objects.create()`, then you just created it anyway

> But that currently creates an object with the constructor and calls
> `save()` on it. This is Malcolm's problem case: that `save` is
> semantically identical to an update, though it shouldn't be.
>
> Possible fix: define more clearly whether Django thinks that the model
> is in the database or not. Currently that's (IIRC) based on whether
> the model has a primary key. But as Malcolm points out, a semantic
> primary key breaks this assumption.

>
> Maybe a model can have some generic "database pointer" attribute. It's
> `None` for a newly-created model, so the `save` in `create` does the
> right thing: it always tries to make a new object, even if the primary
> key was specified in the arguments.

We still need a way to specify that the initial save should be an update
because you don't always need to retrieve before creating something to
update. I've got one application in production that updates a database
table based on information coming from an RSS feed. The identifiers it
gets from the feed may exist or they may not (in the database). I don't
care: I enter the latest value in any case because it's correct. So
Django's current save() behaviour is ideal there.

This isn't to say the "default" action idea is bad. It's something I was
thinking of as the next step here (there are a few next steps). First we
need a way to control saving one way or the other, then we need can add
things that might take advantage of it automatically.

I mentioned session management in my earlier reply to Adrian; that's a
case where "only update" is a reasonable default save behaviour. I
suspect only update is a pretty good default for most stuff retrieved
from the database, but it needs a way to be controlled. And I'd like
save() to still be usable as a single call. Having to do

obj.set_must_create()
obj.save()

rather than

obj.save(force_create=True)

means there are two lines that must stay together. Not horrible, but one
is less than two and it's not really less readable.

Now, save() inside could very well make its default behaviour vary based
on whether the model was loaded from the database or not, but this must
be controllable.

So I think this feature is worth having, but it doesn't remove the
initial API requirement, whether it be at the save() or save_base()
level and I can't see a reason not to expose it via save().

Regards,
Malcolm

--
Everything is _not_ based on faith... take my word for it.
http://www.pointy-stick.com/blog/

Malcolm Tredinnick

unread,
Apr 28, 2008, 6:51:55 AM4/28/08
to django-d...@googlegroups.com
Hey David,

On Mon, 2008-04-28 at 12:21 +0200, David Larlet wrote:
[...]


> def save(self):
> if not self.id:
> self.some_date = datetime.now()
> super(Model, self).save()
>
> to:
>
> def create(self):
> self.some_date = datetime.now()
> super(Model, self).create()

For this particular case it saves a whole line. One concern I have is
that if there's more complex logic in your overridden save method, some
of it is going to be useful in both cases and now you have to create
extra sub-functions for the common bits and remember to call them both
times. It leads to duplication. If you look around at sample code on
django-users and other places, you can see people doing a number of
pieces of auxilliary processing as a result of save happening on the
instance, so this isn't a trivial issue.

> BTW, create()/update() sounds more explicit to me than save().

Which immediately leads to one of the problems with it. Suppose I'm
writing a function that accepts objects, does something to them and then
wants to save them. Do I call create() or update()? There's no way to
tell. Currently, I call save() with no ambiguity problem.

Also, this presents an unnecessary backwards-incompatibility. Every
single piece of code now has to change to use one or other of these
methods. Every save() call. Currently and with the parameter approach,
*zero* existing code has to change initially. If you want to support the
must insert vs. must update difference, you can add a parameter (or two,
depending on which approach we take) and it's still backwards
compatible.

Finally, there's a namespacing problem. The current create() method,
which is really just a shortcut for __init__() + save() lives on the
model manager. An update() method (and presumably your version of
create()) would live on the class instance. "Update" is a very common
word and there are a number of non-database uses for it. We're impinging
into the third-party developer's namespace at that point (there is, by
the way, an update() method on Querysets, and hence available via model
managers, but it's for bulk updates, not instance updates). Django is
pretty careful not to steal from the class instance's namespace. I think
that's a good principle to follow.


Regards,
Malcolm

--
Despite the cost of living, have you noticed how popular it remains?
http://www.pointy-stick.com/blog/

Steve Holden

unread,
Apr 28, 2008, 6:53:20 AM4/28/08
to django-d...@googlegroups.com
You could argue that the separate-method solution has better cohesion.
Malcolm's point that this is very much a minority use-case, and that
mostly the two methods will anyway simply call a modified save with
alternative values of the flag (whose names are a little smelly, by the
way) is reasonable, I suppose.

But it does seem to me that create() and update() are more explicit than
save(must_create=True) and/or save(must_update=True). The separate
method solution also removes the possibility of the illegal fourth case
(must_update==must_create==True), which means there would be no need to
test for it.

In summary, we are arguing around the ragged edges of an overall very
neat system.

regards
Steve

--
Steve Holden +1 571 484 6266 +1 800 494 3119
Holden Web LLC http://www.holdenweb.com/

Steve Holden

unread,
Apr 28, 2008, 7:11:50 AM4/28/08
to django-d...@googlegroups.com
I s'pose that's the killer to the separate-methods argument.

> Also, this presents an unnecessary backwards-incompatibility. Every
> single piece of code now has to change to use one or other of these
> methods. Every save() call. Currently and with the parameter approach,
> *zero* existing code has to change initially. If you want to support the
> must insert vs. must update difference, you can add a parameter (or two,
> depending on which approach we take) and it's still backwards
> compatible.
>
How do these parameters get set for the above "function that accepts
objects and wants to save them"? Presumably it too has a default
"must_create"?

David Danier

unread,
Apr 28, 2008, 7:49:46 AM4/28/08
to django-d...@googlegroups.com
> For this particular case it saves a whole line. One concern I have is
> that if there's more complex logic in your overridden save method, some
> of it is going to be useful in both cases and now you have to create
> extra sub-functions for the common bits and remember to call them both
> times. It leads to duplication. If you look around at sample code on
> django-users and other places, you can see people doing a number of
> pieces of auxilliary processing as a result of save happening on the
> instance, so this isn't a trivial issue.

No, it does not duplicate code, as you still could use save() for common
code.

>> BTW, create()/update() sounds more explicit to me than save().
>
> Which immediately leads to one of the problems with it. Suppose I'm
> writing a function that accepts objects, does something to them and then
> wants to save them. Do I call create() or update()? There's no way to
> tell. Currently, I call save() with no ambiguity problem.
>
> Also, this presents an unnecessary backwards-incompatibility. Every
> single piece of code now has to change to use one or other of these
> methods. Every save() call. Currently and with the parameter approach,
> *zero* existing code has to change initially. If you want to support the
> must insert vs. must update difference, you can add a parameter (or two,
> depending on which approach we take) and it's still backwards
> compatible.

Sorry, but this sounds like you did not read my email at all (to which
David Larlet sent a reply). I proposed still having save(), but
implementing it like this:


---------8<----------------------------------------------------
class Model(...):
def save(self, ...):
if self.has_pk() and self.pk_exists():
self.update()
else:
self.create()

def update(...): ...
def create(...): ...
---------------------------------------------------->8---------

I don't think this will break _any_ code using the old version of save().

> Finally, there's a namespacing problem. The current create() method,
> which is really just a shortcut for __init__() + save() lives on the
> model manager. An update() method (and presumably your version of
> create()) would live on the class instance. "Update" is a very common
> word and there are a number of non-database uses for it.

You don't have to stick to this names. I just used them, as I think they
are pretty self-explainig.

Greetings, David Danier

Malcolm Tredinnick

unread,
Apr 28, 2008, 8:02:13 AM4/28/08
to django-d...@googlegroups.com

On Mon, 2008-04-28 at 13:49 +0200, David Danier wrote:
> > For this particular case it saves a whole line. One concern I have is
> > that if there's more complex logic in your overridden save method, some
> > of it is going to be useful in both cases and now you have to create
> > extra sub-functions for the common bits and remember to call them both
> > times. It leads to duplication. If you look around at sample code on
> > django-users and other places, you can see people doing a number of
> > pieces of auxilliary processing as a result of save happening on the
> > instance, so this isn't a trivial issue.
>
> No, it does not duplicate code, as you still could use save() for common
> code.

Common stuff *must* to be called from both create() and update() in your
case because you're proposing that people should be allowed to call them
directly.

That means save() is only ever going to be a placeholder for backwards
compatibility. Not worth it.

>
> >> BTW, create()/update() sounds more explicit to me than save().
> >
> > Which immediately leads to one of the problems with it. Suppose I'm
> > writing a function that accepts objects, does something to them and then
> > wants to save them. Do I call create() or update()? There's no way to
> > tell. Currently, I call save() with no ambiguity problem.
> >
> > Also, this presents an unnecessary backwards-incompatibility. Every
> > single piece of code now has to change to use one or other of these
> > methods. Every save() call. Currently and with the parameter approach,
> > *zero* existing code has to change initially. If you want to support the
> > must insert vs. must update difference, you can add a parameter (or two,
> > depending on which approach we take) and it's still backwards
> > compatible.
>
> Sorry, but this sounds like you did not read my email at all (to which
> David Larlet sent a reply). I proposed still having save(), but
> implementing it like this:

I did read your mail. Then, whilst writing my reply, I forgot that you
had proposed this plan because I was noting the problem with the common
code. save() becomes a dead method here unless people are forced to call
it. If an individual wants to split their save() handling into a create
and an update path in separate functions, that's fine. They can do that.
But Django's core doesn't need to do that; there just isn't the amount
of code to require it.

I didn't mean to dismiss your code fragment, David. I apologise. I was
trying to collapse multiple responses into one.

[...]


> You don't have to stick to this names. I just used them, as I think they
> are pretty self-explainig.

The point is that any names have the potential for a clash. We're very
miserly about taking names form the common namespace for that reason:
you have to pick something easily understandable and not likely to
collide. Not introducing any new names is safest of all.

Regards,
Malcolm

--
Save the whales. Collect the whole set.
http://www.pointy-stick.com/blog/

Martin Diers

unread,
Apr 28, 2008, 1:10:04 PM4/28/08
to django-d...@googlegroups.com
On Apr 27, 2008, at 7:44 PM, Karen Tracey wrote:

On Sun, Apr 27, 2008 at 7:53 PM, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
[snipped problem description]

(1) Model.save(must_create=None) where "must_create" is True (must
create), False (must not create) or None (try an update; if that fails,
insert).

       Pros: one parameter only (shorter; easier to subclass). Uses
       existing, builtin constant values.

       Cons: "False" tends to read most naturally as "not must_create",
       rather than "must_not_create", so it does slightly fail the
       reading test.

I like this option except for the name must_create, which makes the False setting fail readability.  Couldn't we think of a different name that didn't have this problem?  Some options:

create
new_instance
create_new_instance (ugh, too long)

For the record, I think that using:

new_instance = True / False

is the best solution. It solves the readability test, keeps backward compatibility, and prevents needless ambiguity which would ultimately lead to a dead function.

The cases where this is needed are rare enough that it makes no sense to change the semantics of the model, or risk breaking backward compatibility, particularly when a simple parameter can solve the problem.

--
Martin Diers

Mike Axiak

unread,
Apr 28, 2008, 1:15:04 PM4/28/08
to Django developers
On Apr 28, 1:10 pm, Martin Diers <mar...@diers.us> wrote:
> [...]
>
> new_instance = True / False
>
> is the best solution. It solves the readability test, keeps backward  
> compatibility, and prevents needless ambiguity which would ultimately  
> lead to a dead function.

Does it?

myinstance.save(new_instance=True)

Will this fail when it cannot create a new instance? Does it always
create a new instance?

myinstance.save(new_instance=False)

Does this fail when it has to create a new instance?

Which one of these will behave like the original save() method?
new_instance=None?

I believe we want save to be able to:

1. Behave like it does currently.
2. Have the ability to fail when it has to create a new instance.
3. Have the ability to fail when it has to update a current
instance.

I don't see how your syntax makes that clean.

Cheers,
Mike

Karen Tracey

unread,
Apr 28, 2008, 3:06:41 PM4/28/08
to django-d...@googlegroups.com
On Mon, Apr 28, 2008 at 1:15 PM, Mike Axiak <mca...@gmail.com> wrote:

On Apr 28, 1:10 pm, Martin Diers <mar...@diers.us> wrote:
> [...]
>
> new_instance = True / False
>
> is the best solution. It solves the readability test, keeps backward  
> compatibility, and prevents needless ambiguity which would ultimately  
> lead to a dead function.

Does it?

   myinstance.save(new_instance=True)

Will this fail when it cannot create a new instance? Does it always
create a new instance?

Yes and yes.  If this parameter (whatever it is named) is True, then save must create a new instance and it is an error if the database already contains a record with the specified primary key.  This is Malcolm's "create a new employee with primary key equal to tax identifier" case.
 
   myinstance.save(new_instance=False)

Does this fail when it has to create a new instance?

Yes.  If this parameter (whatever it is named) is False, then save must update an existing instance, it is an error if the database does not already contain a record with the specified primary key.  This is Malcolm's "update employee salary (again where employees are identified by a primary key of their tax identifier)" case.
 
Which one of these will behave like the original save() method?
new_instance=None?

Yes.   This is also the default, so existing code does not have to change and works the same as always.
 
I believe we want save to be able to:

  1. Behave like it does currently.
  2. Have the ability to fail when it has to create a new instance.
  3. Have the ability to fail when it has to update a current
instance.

I don't see how your syntax makes that clean.

I thought the behavior of this option was clear from Malcolm's initial description.  It does support the 1,2,3 you specify.  Which alternative would you prefer? 

Karen

didier Belot

unread,
Apr 28, 2008, 3:09:17 PM4/28/08
to django-d...@googlegroups.com
Mike Axiak a écrit :
> [...]

> Does it?
>
> myinstance.save(new_instance=True)
>
> Will this fail when it cannot create a new instance? Does it always
> create a new instance?
>
> myinstance.save(new_instance=False)
>
> Does this fail when it has to create a new instance?
>
> Which one of these will behave like the original save() method?
> new_instance=None?
>
> I believe we want save to be able to:
>
> 1. Behave like it does currently.
> 2. Have the ability to fail when it has to create a new instance.
> 3. Have the ability to fail when it has to update a current
> instance
Why not reverse the condition using must_exist ?

1. must_exist=None (default behavior)
2. must_exist=False (create or fail)
3. must_exist=True (update or fail)


My 2c ;-)

--
didier

Mike Axiak

unread,
Apr 28, 2008, 3:11:41 PM4/28/08
to Django developers
On Apr 28, 3:06 pm, "Karen Tracey" <kmtra...@gmail.com> wrote:
> thought the behavior of this option was clear from Malcolm's initial
> description.  It does support the 1,2,3 you specify.  Which alternative
> would you prefer?
Maybe it's just me, but I just don't like a parameter that seems
boolean having different behavior for None than for False.

I would rather see two different parameters, e.g.::
.save([force_update=True|False, force_create=True|False])

That both default to False, and raise a ValueError when both are True.

Of course that's just me.

Cheers,
Mike

Mike Axiak

unread,
Apr 28, 2008, 3:16:13 PM4/28/08
to Django developers
On Apr 28, 3:11 pm, Mike Axiak <mcax...@gmail.com> wrote:
> Of course that's just me.
I should also note that this is precisely the example used in FreeBSD
bikeshed argument [1]. I don't think it matters what parameters we use
that much, but I think it we can all agree that Django should support
it.

Cheers,
Mike

1: http://blue.bikeshed.com/

Karen Tracey

unread,
Apr 28, 2008, 3:38:04 PM4/28/08
to django-d...@googlegroups.com

Because this, to me, like must_create, fails the readability test for the False case.  must_exist=False on casual reading looks to me like it is saying "this does not have to already exist, but it might", instead of of "this object must not already exist".  Anything with "must" in the name I believe will have this problem for one of the two cases.

And, to follow-up on Mike's subsequent notes without generating yet more traffic on an overlong thread, yes, this is a bikeshed issue.  But Malcolm did solicit opinions, so here we are discussing them.  Malcolm will make whatever decision he feels is best, giving whatever weight he thinks appropriate to opinions expressed here.  I doubt anyone who has posted will take their marbles and go home if their pet syntax isn't the chosen one.  So far it sounds to me like Malcolm's initial thought that #2 was the right option for Django might be correct, in that I haven't seen anyone (that I recall) express strong dislike of that one.

Cheers,
Karen

Justin Fagnani

unread,
Apr 28, 2008, 3:44:35 PM4/28/08
to django-d...@googlegroups.com
On Mon, Apr 28, 2008 at 12:11 PM, Mike Axiak <mca...@gmail.com> wrote:
I would rather see two different parameters, e.g.::
   .save([force_update=True|False, force_create=True|False])

That both default to False, and raise a ValueError when both are True.

I think this is by far the most understandable variation, even though there's an invalid combination.

-Justin

David Cramer

unread,
Apr 29, 2008, 12:40:09 PM4/29/08
to Django developers
I don't really like the idea of having extra options on save (mostly
because I have a lot of save calls that already. Maybe I don't quite
see the point. I proposed an internal flag a while back which would
determine if something was actually being created or updated. I don't
think it was accepted but it added _is_stored to models. This seems a
lot cleaner and is more standard through DB implementations.

On Apr 28, 5:02 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Niels

unread,
Apr 30, 2008, 7:48:16 AM4/30/08
to Django developers
Bikeshedding or not, what about something like

class _default_save_behaviour(object):
pass
...

def save(new_instance=_default_save_behaviour, ...):
if new_instance == _default_save_behaviour:
<do default save behaviour>
elif bool(new_instance):
<fail if pk exists>
else:
<fail if pk does not exists>

Regards
Niels


On Apr 28, 9:44 pm, "Justin Fagnani" <justin.fagn...@gmail.com> wrote:

mrts

unread,
Apr 30, 2008, 11:22:04 AM4/30/08
to Django developers
Looks like enums would be the most natural solution.
As Python doesn't have enums, class attributes should be used
(module level globals would clutter the namespace).

Adding to Malcolm's initial proposal:

# django/db/models.py
class CreateStrategy:
MUST_CREATE = True
MUST_NOT_CREATE = False
DEFAULT = None

# myproject/myapp/foo.py

from django.db.models import CreateStrategy
...
foo = Foo(params)
foo.save(create_strategy=CreateStrategy.MUST_CREATE)
...

Patryk Zawadzki

unread,
Apr 30, 2008, 11:50:21 AM4/30/08
to django-d...@googlegroups.com
On Wed, Apr 30, 2008 at 5:22 PM, mrts <mr...@mrts.pri.ee> wrote:
> Looks like enums would be the most natural solution.
> As Python doesn't have enums, class attributes should be used
> (module level globals would clutter the namespace).
>
> Adding to Malcolm's initial proposal:
>
> # django/db/models.py
> class CreateStrategy:
> MUST_CREATE = True
> MUST_NOT_CREATE = False
> DEFAULT = None
>
> # myproject/myapp/foo.py
>
> from django.db.models import CreateStrategy
> ...
> foo = Foo(params)
> foo.save(create_strategy=CreateStrategy.MUST_CREATE)
> ...

That's all great and everything but I fail to see why we need such
functionality in django core? Each of the above easily solvable with
one abstract model class and as a bonus you get to pick any of the
proposed notations.

--
Patryk Zawadzki
PLD Linux Distribution

Tai Lee

unread,
May 1, 2008, 5:24:40 AM5/1/08
to Django developers
I'd vote for create() and update() methods which seem the least
confusing, but that doesn't seem very popular. If it has to be done
with arguments on save(), I'd say force_create=False,
force_update=False is the easiest to read and understand.

I'd prefer either of those options over a null/true/false argument
that isn't quite readable without double checking the documentation. I
don't think you could find any argument name that would make that
functionality clear with a single argument.

David Cramer

unread,
May 1, 2008, 10:31:11 AM5/1/08
to django-d...@googlegroups.com
I'm still not quite sure why we need any additional methods, or flags, or anything. Can someone explain to me where the underlying API is having issues?
--
David Cramer
Director of Technology
iBegin
http://www.ibegin.com/

Karen Tracey

unread,
May 1, 2008, 10:49:37 AM5/1/08
to django-d...@googlegroups.com
On Thu, May 1, 2008 at 10:31 AM, David Cramer <dcr...@gmail.com> wrote:
I'm still not quite sure why we need any additional methods, or flags, or anything. Can someone explain to me where the underlying API is having issues?

Malcolm's initial note creating this thread described an example scenario where additional control over save() behavior is desired:

Sometimes when calling save(), we know (for business reasons) that it
must create a new object or raise an error. Similarly, in other cases,
it either must update an existing object or raise an error.

For example: creating a new employee in the a business system. The
primary key is their tax file number (unique per person). After entering
all the details, we hit save. This *must* create a new entry. If there
is an existing entry with the same tax file number, it's an error. Under
no circumstances should the existing entry be updated.

Conversely, we're now giving this person a raise. We enter the new
salary and the tax file number and hit save. It must update an existing
record. If a record with that primary key doesn't exist, it's an error
and no new record should be created.

Karen

Marty Alchin

unread,
May 1, 2008, 11:01:39 AM5/1/08
to django-d...@googlegroups.com
On Thu, May 1, 2008 at 10:49 AM, Karen Tracey <kmtr...@gmail.com> wrote:
> On Thu, May 1, 2008 at 10:31 AM, David Cramer <dcr...@gmail.com> wrote:
>
> > I'm still not quite sure why we need any additional methods, or flags, or
> anything. Can someone explain to me where the underlying API is having
> issues?
>
> Malcolm's initial note creating this thread described an example scenario
> where additional control over save() behavior is desired:

Maybe I'm smoking something, but Patryk is making the most to me,
anyway. Wouldn't it be possible to just override save() if you really
need this for a particular model? Then you can split it into
create()/update() if you want, or add any-colored argument you like,
or whatever. And if you need it on more than one, you can just put
that on an abstract base class and inherit from that instead of
models.Model.

Or am I missing something glaringly obvious? I can understand the
desire to allow this option in trunk directly, but is there anything
that's actively preventing it from being possible now? And since
Django generally caters to the common case, which this clearly isn't,
does trunk really need to have it at all?

-Gul

David Cramer

unread,
May 1, 2008, 4:18:52 PM5/1/08
to django-d...@googlegroups.com
Are you talking about cloning objects that exist in the database? To where you'd pull it out, unset the ID and then resave it?

James Bennett

unread,
May 4, 2008, 7:02:08 PM5/4/08
to django-d...@googlegroups.com
OK, so, we appear to have two camps:

1. People who think this should be handled by introducing one or more
new methods to allow easy differentiation of the cases.

2. People who think this should be handled by adding new keyword
arguments to save().

And, techncially, a third camp consisting of David, who thinks an
attribute on the model ought to control the behavior. But I'm going to
toss that out immediately because I don't want this to turn into
action-at-a-distance. Whatever mechanism we decide on should be
explicit and should be clearly visible from the point in the code
where you actually write data to the DB, which means one of the above
two mechanisms or a combination thereof.

Now, a couple of design goals for this API:

* The common case -- where you simply don't care and just want to save
with no fuss -- needs to stay easy and so, in all likelihood, needs
to be the default.

* The amount of new API to be introduced should be minimized, in order
to maintain a simple interface.

* If there's anything we already have that can be leveraged to support
this, we should do so.

So, the big trick is that people are approaching this from the
perspective of three-valued logic: either you force an INSERT or you
force an UPDATE or you don't care. To support this, we've had
proposals for two new methods (bringing the options up to three to
cover them all), two new keyword arguments (to cover the two
additional cases) or one new keyword argument with three possible
values (to cover all cases).

I think all of these are barking up the wrong tree, because this can
actually be accomplished by two minor tweaks.

The first is to look at the update() method that already exists on every
QuerySet and every default manager: this method *already* does one of
the things we want, namely force an UPDATE query to run. So, without
any changes whatsoever, a proposed API of, say::

e = Employee(drone_id=1234567)
e.salary = 1000000000
e.save(force_update=True)

Can become::

Employee.objects.filter(drone_id=1234567).update(salary=1000000000)

Now, the complaint people will have with this is that you can
accidentally leave off the filtering, or do it wrong, and make
everybody in the company a billionaire. Personally I think this is one
of those bits of SQL that people need to learn the hard way (and some
DBs have modes that won't let you run UPDATE or DELETE without a WHERE
clause), but as a concession I could see adding a tiny bit of new
behavior.

Currently, update() doesn't return anything; changing it to return the
number of rows affected would be a help. For bonus API-complication
points, it could sprout a kwarg that lets you assert a certain number
of affected rows, and then raise an exception and roll back the
transaction if that assertion fails.

That deals with one of the three cases (force UPDATE), which means
that save() itself only needs to deal with two cases: "force INSERT"
and "I don't care".

From there, save() simply sprouts a "force_insert" kwarg, defaulting
to False, and you handle that when you care about it.

Also, the bikeshed should be blue.

--
"Bureaucrat Conrad, you are technically correct -- the best kind of correct."

Benjamin Slavin

unread,
May 4, 2008, 9:06:21 PM5/4/08
to Django developers
On Sun, May 4, 2008 at 7:02 PM, James Bennett <ubern...@gmail.com> wrote:
> The first is to look at the update() method that already exists on every
> QuerySet and every default manager: this method *already* does one of
> the things we want, namely force an UPDATE query to run. So, without
> any changes whatsoever, a proposed API of, say::
>
> e = Employee(drone_id=1234567)
> e.salary = 1000000000
> e.save(force_update=True)
>
> Can become::
>
> Employee.objects.filter(drone_id=1234567).update(salary=1000000000)

This is an interesting attempt to change the underlying debate, but
I'm -1 for the reasons listed below.


> Now, the complaint people will have with this is that you can
> accidentally leave off the filtering, or do it wrong, and make
> everybody in the company a billionaire.

Actually, my first thought is one of DRY, but here are the concerns
that jump to mind in no particular order:

1) Using the PK filter in every update is not respecting DRY. (I also
have scary visions of `type(e).objects.filter(**{e._meta.pk.name:
e.pk})` ending up in code designed to handle multiple models.)

2) The accidental omission of filter has highly destructive
side-effects. Sure, this is programmer error, but need we make it so
easy to shoot oneself in the foot?

3) The update mechanism is no longer parallel to the API for other
per-instance persistence operations.

4) This breaks the way in which custom DB fields must be handled. This
means that custom DB fields are no longer opaque to the programmer,
and implementations of update are tightly coupled to the specific
implementation of the field. [0]

5) This doesn't provide a consistent way to implement pre-save
functionality on the model (I mean the things that are traditionally
done by overriding `save`, but #4 above might be an extension of
this).


> That deals with one of the three cases (force UPDATE), which means
> that save() itself only needs to deal with two cases: "force INSERT"
> and "I don't care".
>
> From there, save() simply sprouts a "force_insert" kwarg, defaulting
> to False, and you handle that when you care about it.

I like the line of thought that lead to this, but find myself in the
insert/create and update camp. Internally, we've been using some
monkeypatching to add .insert and .update methods to all models (we
needed this functionality on models that were in contrib).


> Also, the bikeshed should be blue.

Market research suggests that it should be a garish orange.


Regards,
- Ben


[0] A quick peek at the internals of UpdateQuery makes me think this
is correct, though Malcolm may know better than me.

David Cramer

unread,
May 5, 2008, 2:00:34 AM5/5/08
to django-d...@googlegroups.com
Let's not quote me if you don't understand what I'm saying.

The patch we applied at Curse did the following (and solved some extra queries):

- Added an _is_stored attribute on instances, default value was None
- _is_stored became True upon an instance retrieval via a QuerySet
- _is_stored became False upon using create()
- If _is_stored was True it always performed an update
- If _is_stored was False it always performed an insert
- if _is_stored was not set it'd use the default behavior.

This may not directly solve the problems that are being referred to, but it was a bug fix none the less, and I believe it is relative to the issue at hand.

James Bennett

unread,
May 5, 2008, 2:11:23 AM5/5/08
to django-d...@googlegroups.com
On Mon, May 5, 2008 at 1:00 AM, David Cramer <dcr...@gmail.com> wrote:
> This may not directly solve the problems that are being referred to, but it
> was a bug fix none the less, and I believe it is relative to the issue at
> hand.

I never said it wasn't a *possible* solution, I just said it wasn't a
*good* solution.

David Cramer

unread,
May 5, 2008, 2:19:58 AM5/5/08
to django-d...@googlegroups.com
Let me also bring up once again, that this is what all the other major DB layers that I've looked at do. So it doesn't seem like it's a bad solution at all.

David Cramer

unread,
May 6, 2008, 8:16:42 PM5/6/08
to Django developers
Here is the patch i was talking about:

http://code.djangoproject.com/ticket/5309

On Apr 28, 5:02 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

einfallsreich

unread,
Jun 1, 2008, 8:55:24 AM6/1/08
to Django developers
I'd prefer multiple methods (save(), insert(), update()) to make
subclassing easier. If that's not an option, I'd suggest

Model.save(allow_update=True, allow_insert=True)

Then False,False would obviously be a no-op.

___
Johannes

Richard Davies

unread,
Jul 24, 2008, 6:34:02 AM7/24/08
to Django developers
I was caught by a race situation in get_or_create() this morning which
this feature would fix - I'm definitely keen to have one of these APIs
implemented for 1.0, and for get() and get_or_create() to both force
the create/prohibit the update on their calls to save() - as also
discussed recently in this group in a thread on sessions.


In terms of API,

I'd also quite like the multiple methods (save(), insert() and
update()), though I do understand the point about using up name space.

If that's out, then I'd personally go with Malcolm's original option 2
or 3 as second choice. Definitely shouldn't be option 1, since that
doesn't have the possibility of forcing an update.


Another thought on how to improve efficiency / reduce races in save()
itself:

I notice that the implementation of get_or_create() has been improved
from get/save ("get, save if fails") to get/save/get ("get, save if
fails, get again if that fails due to unique").

http://groups.google.com/group/django-developers/browse_thread/thread/9730f4005bf08b67
http://code.djangoproject.com/ticket/6641
http://code.djangoproject.com/changeset/7289

I believe that a similar change to save() itself from filter/update-or-
insert to update/create/update would increase efficiency and make
success more likely. In pseudo-code, we currently have:

def save():
if pk_set:
if filter_for_record:
update()
else:
record_exists = False

if not pk_set or not record_exists:
insert()


The alternative would be:

def save():
if not pk_set:
insert()
return

try:
update()
except DoesNotExist:
try:
insert()
except Exists:
update()

The advantage is:
- Efficiency: we have one fewer database call in the common case of an
update to an existing object, and the same number on saving a new
object
- Better handling of races: Removed the race when filter found the
object but it was then deleted before update; makes a second attempt
(like get_or_create) in the case that insert() fails due to the object
being created elsewhere immediately before


Cheers,

Richard.

Malcolm Tredinnick

unread,
Jul 24, 2008, 12:17:01 PM7/24/08
to django-d...@googlegroups.com

On Thu, 2008-07-24 at 03:34 -0700, Richard Davies wrote:
> I was caught by a race situation in get_or_create() this morning which
> this feature would fix - I'm definitely keen to have one of these APIs
> implemented for 1.0, and for get() and get_or_create() to both force
> the create/prohibit the update on their calls to save() - as also
> discussed recently in this group in a thread on sessions.
>
>
> In terms of API,
>
> I'd also quite like the multiple methods (save(), insert() and
> update()), though I do understand the point about using up name space.

I'm pretty close to committing this, but have been stabilising some of
the immediate post-alpha-release bugs first. It'll be in before the beta
(as parameters to save() for all the reasons described in the thread).

The disadvantage is that it uses an error path (an exception raising
path) for a very common case: insert. The current code does not force
the database to raise an error, but instead only does "correct"
operations against the database. Since even the current get_or_create()
implementation has side-effects that people forget about or don't
appreciate, I'd be a little reluctant to make a change like this at this
time. It would also kick off another thread about how MySQL should use
replace, etc, etc.

I'd rather fix this problem properly by us working a proper "select for
update" API both internally, and then later externally, post 1.0. That's
the more natural db-oriented solution here.

Regards,
Malcolm


Collin Grady

unread,
Jul 24, 2008, 3:31:02 PM7/24/08
to django-d...@googlegroups.com
Malcolm Tredinnick said the following:

> I'd rather fix this problem properly by us working a proper "select for
> update" API both internally, and then later externally, post 1.0. That's
> the more natural db-oriented solution here.

http://code.djangoproject.com/ticket/2705 :)

--
Collin Grady

With/Without - and who'll deny it's what the fighting's all about?
-- Pink Floyd

Malcolm Tredinnick

unread,
Jul 24, 2008, 3:34:20 PM7/24/08
to django-d...@googlegroups.com

On Thu, 2008-07-24 at 12:31 -0700, Collin Grady wrote:
> Malcolm Tredinnick said the following:
> > I'd rather fix this problem properly by us working a proper "select for
> > update" API both internally, and then later externally, post 1.0. That's
> > the more natural db-oriented solution here.
>
> http://code.djangoproject.com/ticket/2705 :)

Yes, I'm aware of the ticket.

Malcolm

Sebastian Bauer

unread,
Jul 25, 2008, 2:51:51 AM7/25/08
to django-d...@googlegroups.com
What stop You from commit 2705 ticket, because im very interested of this ? :)

Malcolm Tredinnick pisze:

Richard Davies

unread,
Aug 12, 2008, 12:51:05 PM8/12/08
to Django developers
Malcolm Tredinnick wrote:
> I'm pretty close to committing this

...and I see that the save() flags capability is now in, as of
changeset 8267 - thanks Malcolm!


The most important use of save() flags will be to force_insert in both
create() and get_or_create(), so that these calls are guaranteed to do
what their names mean (currently create() will also incorrectly update
an existing object with the same primary key, and get_or_create()
suffers from race conditions which can lead to similar updates).

It's worth getting that patch included early, since it may find some
bugs in the rest of the code base - particularly incorrect uses of
create(). The patch is as below, and I have been testing with it to
find these bugs...

--- a/django/db/models/query.py 2008-08-12 13:09:52.000000000 +0100
+++ b/django/db/models/query.py 2008-08-12 13:39:22.000000000 +0100
@@ -307,7 +307,7 @@
and returning the created object.
"""
obj = self.model(**kwargs)
- obj.save()
+ obj.save(force_insert=True)
return obj

def get_or_create(self, **kwargs):
@@ -327,7 +327,7 @@
params.update(defaults)
obj = self.model(**params)
sid = transaction.savepoint()
- obj.save()
+ obj.save(force_insert=True)
transaction.savepoint_commit(sid)
return obj, True
except IntegrityError, e:


So far, this has exposed one bug elsewhere. It's in the sessions code,
where the database backend SessionStore.save() is calling create()
both to create the object and later to update the session data. Here's
a patch which changes that call from a create() into a save(), which
gives the behavior that the code actually uses.

--- a/django/contrib/sessions/backends/db.py 2008-08-12
13:10:52.000000000 +0100
+++ b/django/contrib/sessions/backends/db.py 2008-08-12
16:54:49.000000000 +0100
@@ -37,11 +37,12 @@
return True

def save(self):
- Session.objects.create(
+ s = Session(
session_key = self.session_key,
session_data = self.encode(self._session),
expire_date = self.get_expiry_date()
)
+ s.save()

def delete(self, session_key):
try:


[Clearly there is a wider issue of session key collisions which can
also be solved using force_insert, but there are existing threads,
tickets and patches for that, so I'm not going to tackle it here - see
http://groups.google.com/group/django-developers/browse_thread/thread/8cb4edee0db52197
and http://code.djangoproject.com/ticket/1180]


With both of the patches above, using force_insert in create() and
get_or_create() works for me, and I'd encourage the core team to
include these into trunk as soon as possible.

Cheers,

Richard.

Malcolm Tredinnick

unread,
Aug 12, 2008, 2:07:00 PM8/12/08
to django-d...@googlegroups.com

On Tue, 2008-08-12 at 09:51 -0700, Richard Davies wrote:
> Malcolm Tredinnick wrote:
> > I'm pretty close to committing this
>
> ...and I see that the save() flags capability is now in, as of
> changeset 8267 - thanks Malcolm!
>
>
> The most important use of save() flags will be to force_insert in both
> create() and get_or_create(), so that these calls are guaranteed to do
> what their names mean (currently create() will also incorrectly update
> an existing object with the same primary key,

You're assuming that's incorrect, but that's not necessarily so. Have a
look at the documentation for create(). It says it's a shortcut for two
other lines of code that also have the possibility of updating. I'm
uninclined to change it at this point.

Malcolm

Richard Davies

unread,
Aug 12, 2008, 2:36:35 PM8/12/08
to Django developers
> > The most important use of save() flags will be to force_insert in both
> > create() and get_or_create(), so that these calls are guaranteed to do
> > what their names mean (currently create() will also incorrectly update
> > an existing object with the same primary key,
>
> You're assuming that's incorrect, but that's not necessarily so. Have a
> look at the documentation for create(). It says it's a shortcut for two
> other lines of code that also have the possibility of updating. I'm
> uninclined to change it at this point.

For get_or_create(), using the force_insert flag should be
uncontroversial -
the documentation says that it either returns an existing object or
creates a new one - updating an existing object due to a race is
definitely
not an expected behaviour, and will cause people problems.


For create(), updating an existing object is such an unnatural meaning
of "create" that I hadn't even considered the possibility! But I agree
that
the documentation currently implicitly allows this extra meaning. I'd
urge you to remove this extra meaning from create() - especially given
that it's implicit, not obvious, from the documentation, and my
testing
only found one place where it was actually being used.

Alternatively, if you want to keep both meanings of create() as they
currently are, then please can you propagate the force_insert and
force_update flags as optional parameters to the create() call as
well?

Thanks,

Richard.

David Cramer

unread,
Aug 12, 2008, 3:44:38 PM8/12/08
to django-d...@googlegroups.com
I'm not going to bitch because I haven't had the issue. But If i were calling .create and it ended up doing an update, I would be screaming bloody murder in this thread...

(I probably was screaming sometime last year about this already though :P)
Reply all
Reply to author
Forward
0 new messages