Proposal: deprecated Model.__init_

Jacob Kaplan-Moss‏

לא נקראה,

29 בינו׳ 2008, 15:13:1329.1.2008

עד django-d...@googlegroups.com‏

Howdy folks --

The short version:

I'd like to deprecate initializing models using positional arguments
(i.e. ``p = Person(1, 'Joe Somebody')``) in favor of only allowing
keyword-argument initialization (i.e. ``p = Person(id=1, name='Joe
Somebody')``).

I'd make the ``*args`` style start issuing DeprecationWarnings
immediately, and remove support entirely when we wipe deprecated
features in the run-up to 1.0. I'd make this change on the
queryset-refactor branch.

Long version, with explanation:

This week I've been starting to help Malcolm out the queryset-refactor
branch. To get my feet wet I've been playing with Adrian's deferred
fields proposal (http://code.djangoproject.com/ticket/5420).

Turns out it's trickier that I'd though, but mostly because of the
fact that QuerySets initialize models using ``Model.__init__(*args)``
instead of using kwargs. I'm almost certainly going to have to change
the internal behavior to get deferred fields working OK.

As I started down that road, though, I realized that positional
initialization of models is something I've only seen done internally
to Django. Further, using this "feature" can lead to all sorts of
nasty bugs: if you change the order of fields in models.py all of a
sudden fields start getting the "wrong" values from positional
initialization. On top of that, removing ``*args`` support from
``Model.__init__`` would make the code cleaner and a bit faster.

Thoughts?

Jacob

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 15:37:2629.1.2008

עד Django developers‏

If you depricate this functionality please provide an alternative like
Model.fromargs() classmethod. It is extremely useful when you need to
create Model objects from custom queries. Like here:
http://code.djangoproject.com/browser/django/trunk/tests/modeltests/custom_methods/models.py
in Article.articles_from_same_day_2(). So `self.__class__(*row)` will
become self.__class__.fromargs(*row). Or, even better,
self.__class__.fromtuple(row)

Marty Alchin‏

לא נקראה,

29 בינו׳ 2008, 15:39:4029.1.2008

עד django-d...@googlegroups.com‏

On Jan 29, 2008 3:13 PM, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> Thoughts?

I haven't run into any problems with this as of yet, but I'd like to
wholeheartedly support this move. I was sincerely amazed when I
noticed that models could be instantiated either way, and I couldn't
think of a real reason why it was so. I just assumed somebody else
knew better than I did, so I didn't question it.

As a side note, reordering fields in models.py doesn't seem to cause
any problems, because _get_sql_clause generates a SELECT clause with
each field named explicitly, ordered according to _meta.fields. And
since Model.__init__ lines up *args with _meta.fields, it all works
out. At least, this has been true as far as my local tests go, anyway.
The only way I can see if causing a problem is if _meta.fields got
reordered *after* the SQL was generated, but before the model is
instantiated. Not a very likely scenario.

More realistically though, there's another aspect where this could
cause confusion. If anyone's listening for the pre_init signal, their
code would also have to do the whole args/kwargs check, since the
model can be instantiated either way. Doing away with *args will make
pre_init listeners much simpler, and more resilient to change, since
they don't have to also verify the order against that of _meta.fields.

For me, though, it's not a matter of any real-world problems
encountered. It just seems like there should be One Way to do it, and
keyword arguments make tremendously more sense (to me) as that One
Way. So, consider it philosophical support, but support all the same.

-Gul

Jacob Kaplan-Moss‏

לא נקראה,

29 בינו׳ 2008, 15:41:5329.1.2008

עד django-d...@googlegroups.com‏

On 1/29/08, Ivan Illarionov <ivan.il...@gmail.com> wrote:
> If you depricate this functionality please provide an alternative like
> Model.fromargs() classmethod. It is extremely useful when you need to
> create Model objects from custom queries.

Good point.

One of the other things planned as part of qs-rf is the ability to use
custom queries to initialize models -- something like
``Model.objects.custom_query("SELECT ...")`` -- so that you wouldn't
really need *args initialization anyway.

If that feature is the penance I need to go through to atone for
killing Model(*args), I'll happily do so :)

Jacob

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 15:50:0429.1.2008

עד Django developers‏

> One of the other things planned as part of qs-rf is the ability to use
> custom queries to initialize models -- something like
> ``Model.objects.custom_query("SELECT ...")``

That's not enough - I need this feature to initialize models from the
result of stored procedures, not just simple queries. One may also
want to initialize their models from some external source like XML
file, RSS feed or anything - there should be a way to do it easily.

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 15:52:1029.1.2008

עד Django developers‏

Ok I really can use Model.objects.custom_query("EXECUTE
PROCEDURE ...") but someone may need more options.

Marty Alchin‏

לא נקראה,

29 בינו׳ 2008, 15:54:0629.1.2008

עד django-d...@googlegroups.com‏

In the worst case, generating a kwargs dictionary from an args tuple
isn't really all that difficult.

kwargs = dict([(cls._meta.fields[i].attname, v) for (i, v) in enumerate(args)])

Seems like any code which explicitly needs to handle tuple
instantiation (likely the minority) can supply a helper method using
something similar to the above.

-Gul

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 16:04:4529.1.2008

עד Django developers‏

> kwargs = dict([(cls._meta.fields[i].attname, v) for (i, v) in enumerate(args)])
>
> Seems like any code which explicitly needs to handle tuple
> instantiation (likely the minority) can supply a helper method using
> something similar to the above.

Yes, but why not have fromtuple classmethod optimized for this use-
case? And I believe this way of initialization could be really faster
than keyword initialization.

Tom Tobin‏

לא נקראה,

29 בינו׳ 2008, 16:08:4229.1.2008

עד django-d...@googlegroups.com‏

On 1/29/08, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> I'd make the ``*args`` style start issuing DeprecationWarnings
> immediately, and remove support entirely when we wipe deprecated
> features in the run-up to 1.0. I'd make this change on the
> queryset-refactor branch.

I'm inclined to like *anything* that helps queryset-refactor along.
^_^ I've never been in the habit of using positional arguments for
model instantiation (too confusing), so a +1 for axing it. The
fromargs() alternative seems like a decent compromise.

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 16:29:1829.1.2008

עד Django developers‏

I agree that positional arguments instantiation is confusing and
mixing it with keyword instantiation is even more confusing - it would
be great if default Model.__init__ will be keyword argument only. But
I think we still need a faster alternative. Comment in Model.__init__
states that 'nstantiation for iteration is 33% faster' - so any user
who wants to optimize model instantiation should be able to use
alternative method.

Jacob Kaplan-Moss‏

לא נקראה,

29 בינו׳ 2008, 16:43:2229.1.2008

עד django-d...@googlegroups.com‏

On 1/29/08, Ivan Illarionov <ivan.il...@gmail.com> wrote:

> Yes, but why not have fromtuple classmethod optimized for this use-
> case? And I believe this way of initialization could be really faster
> than keyword initialization.

I'm pretty strongly opposed to a fromtuple (or whatever) method: it's
brittle and to tightly couples code to the arbitrary order of defined
fields.

Speed also XXXX

As it stands now (in QSRF r7049), args are indeed faster than kwargs::

>>> t1 = timeit.Timer("Person(1, 'First', 'Last')", "from blah.models
import Person")
>>> t2 = timeit.Timer("Person(id=1, first='First', last='Last')",
"from blah.models import Person")
>>> t1.timeit()
25.09495210647583
>>> t2.timeit()
36.52219820022583

However, much of that extra time is spent dealing with the args/kwargs
confusion; chopping out the code that handles initialization from args
gives better timing:

>>> t = timeit.Timer("Person(id=1, first='First', last='Last')", "from
blah.models import Person")
>>> t.timeit()
29.819494962692261

So that's about a 15% speed improvement over the current
__init__(**kwargs) at the cost of losing that same 15% since you can't
do *args.

[Note, however, that this speed argument is a bit silly when __init__
fires two extremely slow signals. Improving signal performance --
which is very high on my todo list -- will probably make this whole
performance debate a bit silly]

Jacob

David Cramer‏

לא נקראה,

29 בינו׳ 2008, 17:50:1929.1.2008

עד Django developers‏

I'm with Ivan. If QSRF supports everything we do + everything we want,
there's no need for args (as theres no need for custom queries), but
until it does, *args is very valuable, or at least a method which can
do the same.

On Jan 29, 1:43 pm, "Jacob Kaplan-Moss" <jacob.kaplanm...@gmail.com>
wrote:

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 18:29:4629.1.2008

עד Django developers‏

Speed:
>>> t1.timeit() # Model.__init__ from django trunk with args stuff stripped
179.84739959966981
>>> t2.timeit()
139.67626695571641 # Model.fromargs()

Implementation:
def fromtuple(cls, values):
dispatcher.send(signal=signals.pre_init, sender=cls,
args=values, kwargs={})
new_instance = object.__new__(cls)
if len(values) > len(new_instance._meta.fields):
raise IndexError("Number of args exceeds number of
fields")
fields_iter = iter(new_instance._meta.fields)
for val, field in izip(values, fields_iter):
setattr(new_instance, field.attname, val)
for field in fields_iter:
setattr(new_instance, field.attname, field.get_default())
dispatcher.send(signal=signals.post_init, sender=cls,
instance=new_instance)
return new_instance
fromtuple = classmethod(fromtuple)

def fromargs(cls, *args):
return cls.fromtuple(args)
fromargs = classmethod(fromargs)

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 18:44:0829.1.2008

עד Django developers‏

> I'm pretty strongly opposed to a fromtuple (or whatever) method: it's
> brittle and to tightly couples code to the arbitrary order of defined
> fields.

Jacob, why are you opposed to alternative instantiation methods?
Standard Python does it with dict.fromkeys() Why not?

Jacob Kaplan-Moss‏

לא נקראה,

29 בינו׳ 2008, 18:57:2329.1.2008

עד django-d...@googlegroups.com‏

On 1/29/08, Ivan Illarionov <ivan.il...@gmail.com> wrote:

> Jacob, why are you opposed to alternative instantiation methods?

>>> import this
...
There should be one-- and preferably only one --obvious way to do it.
...

On top of that, as I keep saying, it leads to brittle code -- I've
been bitten a number of times.

For example, say you've got this model::

class Person(models.Model):
first = models.CharField()
last = models.CharField(blank=True)

And somewhere -- maybe in a data import script -- you've got::

p = Person(None, first, last)

Now you decide you need to change you model a bit::

class Person(models.Model):
first = models.CharField()
middle = models.CharField()
last = models.CharField(blank=True)

Then you run the data import script and get a bunch of people with
last names stored under ``person.middle``.

Relying on the order of fields in the model definition is asking for a
heaping load of fail. Hence my desire to see it go away.

Jacob

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 19:11:4629.1.2008

עד Django developers‏

Then it makes sense to check for number of arguments (len(args) ==
len(obj._meta.fields)) and raise an error if it's not equal to number
of fields. The *args instantiation is useful and fast. QSRF
improvements may make it less useful nut **kwargs instantiation just
always be slower due to the nature of Python.

import this
...
Although practicality beats purity.
...

On 30 янв, 02:57, "Jacob Kaplan-Moss" <jacob.kaplanm...@gmail.com>
wrote:

Ivan Illarionov‏

לא נקראה,

29 בינו׳ 2008, 19:31:5829.1.2008

עד Django developers‏

http://pastebin.com/m62466a6b

Russell Keith-Magee‏

לא נקראה,

29 בינו׳ 2008, 21:01:5929.1.2008

עד django-d...@googlegroups.com‏

On Jan 30, 2008 8:57 AM, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
>
> There should be one-- and preferably only one --obvious way to do it.
> ...
>
> On top of that, as I keep saying, it leads to brittle code -- I've
> been bitten a number of times.

...

> Relying on the order of fields in the model definition is asking for a
> heaping load of fail. Hence my desire to see it go away.

I'm +1 on removing __init__(*args), for exactly the same reasons. I
can see the use case for handling user-instantaition of models from
custom queries, but I agree with Jacob that the right solution is to
make a clean API interface for this use case, not to open an API entry
point that is inherently prone to bugs.

Yours,
Russ Magee %-)

alex....@gmail.com‏

לא נקראה,

30 בינו׳ 2008, 0:32:2730.1.2008

עד Django developers‏

The use of *args also seems like it will be a barrier/just messy if/
when schema evolution ever gets(or with 3rd party apps), I'm +1 on
this.

On Jan 29, 8:01 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

Nicola Larosa‏

לא נקראה,

30 בינו׳ 2008, 2:34:5730.1.2008

עד django-d...@googlegroups.com‏

Jacob Kaplan-Moss wrote:
> I'd like to deprecate initializing models using positional arguments
> (i.e. ``p = Person(1, 'Joe Somebody')``) in favor of only allowing
> keyword-argument initialization (i.e. ``p = Person(id=1, name='Joe
> Somebody')``).

Huh?

+1 to that.

No, wait, make that +1 to erasing the ruttin' posargs "feature" from the
gorram *language*. Dong ma?

I'll be in my bunk.

--
Nicola Larosa - http://www.tekNico.net/

I'm a leaf on the wind. Watch how I soar.

Ned Batchelder‏

לא נקראה,

30 בינו׳ 2008, 8:09:5530.1.2008

עד django-d...@googlegroups.com‏

I would have thought Adrian's deferred fields proposal would fall
squarely in the post-1.0 bucket. It clearly has no backward
compatibility issues, so it can be added after 1.0. Maybe you aren't
proposing to include it in 1.0; it wasn't clear from your message.

I'm not arguing against removing positional argument support from model
constructors, just wondering about the 1.0 focus.

--Ned.

--
Ned Batchelder, http://nedbatchelder.com

Jacob Kaplan-Moss‏

לא נקראה,

30 בינו׳ 2008, 10:47:4530.1.2008

עד django-d...@googlegroups.com‏

On 1/30/08, Ned Batchelder <n...@nedbatchelder.com> wrote:
> I'm not arguing against removing positional argument support from model
> constructors, just wondering about the 1.0 focus.

Yeah, you're totally right that the feature is a perfect post-1.0
candidate. I'm just using it as a vehicle for getting up to speed on
the changes in the queryset-refactor branch so I can help Malcolm wrap
it up.

The *args thing, though, is certainly something I'd rather do pre-1.0
-- mostly because it makes the implementation of Model.__init__ much
cleaner -- so I wanted to get the discussion started.

Jacob

Jacob Kaplan-Moss‏

לא נקראה,

30 בינו׳ 2008, 10:48:1730.1.2008

עד django-d...@googlegroups.com‏

On 1/30/08, Nicola Larosa <nicola...@gmail.com> wrote:
> No, wait, make that +1 to erasing the ruttin' posargs "feature" from the
> gorram *language*. Dong ma?

Whoa, now, don't go quoting Firefly on me or we'll be here all day :)

Jacob

Justin Bronn‏

לא נקראה,

30 בינו׳ 2008, 12:41:2630.1.2008

עד Django developers‏

I'm +1 for deprecating positional arguments from __init__().

> Relying on the order of fields in the model definition is asking for a
> heaping load of fail. Hence my desire to see it go away.

While I agree this argument applies for having positional args in
__init__(), those instantiating using a fromargs/fromtuple method (as
Ivan suggested) are more likely to be cognizant of the "heaping load
of fail" implications.

Are there other reasons that such a class method should be excluded?
Advanced users such as Ivan and David would benefit from its utility
and have given some valid use cases. If it really is as simple as
Ivan has posted and doesn't cause implementation/maintenance headaches
in other portions of the code, then why not?

-Justin

Empty‏

לא נקראה,

30 בינו׳ 2008, 12:47:4630.1.2008

עד django-d...@googlegroups.com‏

> I'd like to deprecate initializing models using positional arguments
> (i.e. ``p = Person(1, 'Joe Somebody')``) in favor of only allowing
> keyword-argument initialization (i.e. ``p = Person(id=1, name='Joe
> Somebody')``).

+1 from me. I've been doing some interesting model stuff lately and
the positional arguments have caused a lot of headaches. It hasn't
bit me yet, but does make it difficult to do some tasks. I'm all in
favor of standardizing this stuff.

Michael Trier
blog.michaeltrier.com

Malcolm Tredinnick‏

לא נקראה,

1 בפבר׳ 2008, 21:21:051.2.2008

עד django-d...@googlegroups.com‏

Supporting every single conceivable way to create an instance out of the
box isn't really a good goal: the overhead that everybody ends up paying
to support the fringe cases is unfair. The phrase you're after for those
remaining cases is "custom __init__ method".

Malcolm

--
Depression is merely anger without enthusiasm.
http://www.pointy-stick.com/blog/

Malcolm Tredinnick‏

לא נקראה,

1 בפבר׳ 2008, 21:23:401.2.2008

עד django-d...@googlegroups.com‏

On Tue, 2008-01-29 at 14:13 -0600, Jacob Kaplan-Moss wrote:
> Howdy folks --
>
> The short version:
>

> I'd like to deprecate initializing models using positional arguments
> (i.e. ``p = Person(1, 'Joe Somebody')``) in favor of only allowing
> keyword-argument initialization (i.e. ``p = Person(id=1, name='Joe
> Somebody')``).
>

> I'd make the ``*args`` style start issuing DeprecationWarnings
> immediately, and remove support entirely when we wipe deprecated
> features in the run-up to 1.0. I'd make this change on the
> queryset-refactor branch.

Good idea. Not sure why you're running into this with the work you're
doing, but it is fragile because of the auto-inserted primary keys. Also
it's going to be handy to insert more auto-generated fields up front
there, too, just to keep things neat and it's currently backwards
incompatible to do that.

Malcolm

--
Success always occurs in private and failure in full view.
http://www.pointy-stick.com/blog/

Adrian Holovaty‏

לא נקראה,

2 בפבר׳ 2008, 19:06:142.2.2008

עד django-d...@googlegroups.com‏

On Jan 29, 2008 2:13 PM, Jacob Kaplan-Moss <jacob.ka...@gmail.com> wrote:
> I'd like to deprecate initializing models using positional arguments
> (i.e. ``p = Person(1, 'Joe Somebody')``) in favor of only allowing
> keyword-argument initialization (i.e. ``p = Person(id=1, name='Joe
> Somebody')``).

I'm late to the party!

I'm +1 on this, despite the fact that this is a backwards
incompatibility that we haven't mentioned in our discussions of final
1.0 features.

Like some others have pointed out, this would make custom SQL queries
a bit more painful, but if we introduce a custom_query() method as
Jacob suggested, that would solve it. Personally I tend to use the
*args syntax in custom queries, but it's not a huge change to rewrite
code to use custom_query().

Adrian

--
Adrian Holovaty
holovaty.com | everyblock.com | djangoproject.com

David Cramer‏

לא נקראה,

3 בפבר׳ 2008, 14:57:453.2.2008

עד Django developers‏

I somehow can't see the custom_query solving our issues with needing
custom queries. How is it going to handle JOINs?

On Feb 2, 4:06 pm, "Adrian Holovaty" <holov...@gmail.com> wrote:

Brian Harring‏

לא נקראה,

4 בפבר׳ 2008, 7:29:494.2.2008

עד django-d...@googlegroups.com‏

On Tue, Jan 29, 2008 at 03:43:22PM -0600, Jacob Kaplan-Moss wrote:
> On 1/29/08, Ivan Illarionov <ivan.il...@gmail.com> wrote:
> > Yes, but why not have fromtuple classmethod optimized for this use-
> > case? And I believe this way of initialization could be really faster
> > than keyword initialization.
>
> I'm pretty strongly opposed to a fromtuple (or whatever) method: it's
> brittle and to tightly couples code to the arbitrary order of defined
> fields.

Agreed- if the dev is avoiding repeating themselves, either it'll
end up requiring people to convert args into kwargs, or vice versa,
and then invoking a custom method w/ that object- in the end slowing
it down while making it rather fugly.

> Speed also XXXX
>
> As it stands now (in QSRF r7049), args are indeed faster than kwargs::
>
> >>> t1 = timeit.Timer("Person(1, 'First', 'Last')", "from blah.models
> import Person")
> >>> t2 = timeit.Timer("Person(id=1, first='First', last='Last')",
> "from blah.models import Person")
> >>> t1.timeit()
> 25.09495210647583
> >>> t2.timeit()
> 36.52219820022583
>
> However, much of that extra time is spent dealing with the args/kwargs
> confusion; chopping out the code that handles initialization from args
> gives better timing:

In hindsight, the pops can be optimized out via caching a set of the
allowed field names, and doing a differencing of that set against
kwargs- would avoid the semi-costly exception throwing (at least
KeyError) in addition.

Additional upshot of doing that conversion is that you would get a
listing of *all* invalid kwargs provided, instead of one of
potentially many kwargs that didn't match a field name.

Might want to take a stab at that, since that likely will reduce the
runtime a bit thus reducing the gap.

> >>> t = timeit.Timer("Person(id=1, first='First', last='Last')", "from
> blah.models import Person")
> >>> t.timeit()
> 29.819494962692261
>
> So that's about a 15% speed improvement over the current
> __init__(**kwargs) at the cost of losing that same 15% since you can't
> do *args.

Honestly, I dislike from_tuple instantiation, but I dislike chucking
out *args support w/out making kwargs processing far closer in
processing speed. Reasoning is pretty simple- most peoples implicit
usage of Model.__init__ is instantiation of db returned results, row
results in other words. I'm sure there are people out there who have
far heavier writes then reads db wise, but I'd expect the majority of
django consumers are doing reads, which in my own profiling still was
one of the most costly slices of django CPU runtime.

So, if row based instantiations are the majority usage, and DBAPI
(last I read) doesn't mandate a default way to fetch a dict
instead of a row, might want to take a closer look at the cost of row
conversion for Model.__init__. It'll be more then 15% slowdown of db
access processing wise and should be measured- at the very least, to
know what fully what is gained/lost in the transition.

> [Note, however, that this speed argument is a bit silly when __init__
> fires two extremely slow signals. Improving signal performance --
> which is very high on my todo list -- will probably make this whole
> performance debate a bit silly]

Kindly tag ticket 4561 in then- signaling speed gains can occur, but
nothing will match the speed of *not* signaling, instead signaling
only when something is listening (which that ticket specifically adds
for __init__, yielding ~25% reduction).

Just to be clear, I understand the intent of making the code/api
simplier- in the same instant, we're talking about one of the most
critical chunks of django speed wise in my experience. Minor losses
in performance can occur, but I'd rather not see a 20% overhead
addition for it.

Aside from that, if you can post the patch somewhere for what you've
tweaked, I'd be curious.

Thanks,
~brian

השב לכולם

השב למחבר

העבר לנמענים

Proposal: deprecated Model.__init__(*args)

Jacob Kaplan-Moss‏

Ivan Illarionov‏

Marty Alchin‏

Jacob Kaplan-Moss‏

Ivan Illarionov‏

Ivan Illarionov‏

Marty Alchin‏

Ivan Illarionov‏

Tom Tobin‏

Ivan Illarionov‏

Jacob Kaplan-Moss‏

David Cramer‏

Ivan Illarionov‏

Ivan Illarionov‏

Jacob Kaplan-Moss‏

Ivan Illarionov‏

Ivan Illarionov‏

Russell Keith-Magee‏

alex....@gmail.com‏

Nicola Larosa‏

Ned Batchelder‏

Jacob Kaplan-Moss‏

Jacob Kaplan-Moss‏

Justin Bronn‏

Empty‏

Malcolm Tredinnick‏

Malcolm Tredinnick‏

Adrian Holovaty‏

David Cramer‏

Brian Harring‏

Proposal: deprecated Model.init(*args)