The short version:
I'd like to deprecate initializing models using positional arguments
(i.e. ``p = Person(1, 'Joe Somebody')``) in favor of only allowing
keyword-argument initialization (i.e. ``p = Person(id=1, name='Joe
Somebody')``).
I'd make the ``*args`` style start issuing DeprecationWarnings
immediately, and remove support entirely when we wipe deprecated
features in the run-up to 1.0. I'd make this change on the
queryset-refactor branch.
Long version, with explanation:
This week I've been starting to help Malcolm out the queryset-refactor
branch. To get my feet wet I've been playing with Adrian's deferred
fields proposal (http://code.djangoproject.com/ticket/5420).
Turns out it's trickier that I'd though, but mostly because of the
fact that QuerySets initialize models using ``Model.__init__(*args)``
instead of using kwargs. I'm almost certainly going to have to change
the internal behavior to get deferred fields working OK.
As I started down that road, though, I realized that positional
initialization of models is something I've only seen done internally
to Django. Further, using this "feature" can lead to all sorts of
nasty bugs: if you change the order of fields in models.py all of a
sudden fields start getting the "wrong" values from positional
initialization. On top of that, removing ``*args`` support from
``Model.__init__`` would make the code cleaner and a bit faster.
Thoughts?
Jacob
I haven't run into any problems with this as of yet, but I'd like to
wholeheartedly support this move. I was sincerely amazed when I
noticed that models could be instantiated either way, and I couldn't
think of a real reason why it was so. I just assumed somebody else
knew better than I did, so I didn't question it.
As a side note, reordering fields in models.py doesn't seem to cause
any problems, because _get_sql_clause generates a SELECT clause with
each field named explicitly, ordered according to _meta.fields. And
since Model.__init__ lines up *args with _meta.fields, it all works
out. At least, this has been true as far as my local tests go, anyway.
The only way I can see if causing a problem is if _meta.fields got
reordered *after* the SQL was generated, but before the model is
instantiated. Not a very likely scenario.
More realistically though, there's another aspect where this could
cause confusion. If anyone's listening for the pre_init signal, their
code would also have to do the whole args/kwargs check, since the
model can be instantiated either way. Doing away with *args will make
pre_init listeners much simpler, and more resilient to change, since
they don't have to also verify the order against that of _meta.fields.
For me, though, it's not a matter of any real-world problems
encountered. It just seems like there should be One Way to do it, and
keyword arguments make tremendously more sense (to me) as that One
Way. So, consider it philosophical support, but support all the same.
-Gul
Good point.
One of the other things planned as part of qs-rf is the ability to use
custom queries to initialize models -- something like
``Model.objects.custom_query("SELECT ...")`` -- so that you wouldn't
really need *args initialization anyway.
If that feature is the penance I need to go through to atone for
killing Model(*args), I'll happily do so :)
Jacob
kwargs = dict([(cls._meta.fields[i].attname, v) for (i, v) in enumerate(args)])
Seems like any code which explicitly needs to handle tuple
instantiation (likely the minority) can supply a helper method using
something similar to the above.
-Gul
I'm inclined to like *anything* that helps queryset-refactor along.
^_^ I've never been in the habit of using positional arguments for
model instantiation (too confusing), so a +1 for axing it. The
fromargs() alternative seems like a decent compromise.
I'm pretty strongly opposed to a fromtuple (or whatever) method: it's
brittle and to tightly couples code to the arbitrary order of defined
fields.
Speed also XXXX
As it stands now (in QSRF r7049), args are indeed faster than kwargs::
>>> t1 = timeit.Timer("Person(1, 'First', 'Last')", "from blah.models
import Person")
>>> t2 = timeit.Timer("Person(id=1, first='First', last='Last')",
"from blah.models import Person")
>>> t1.timeit()
25.09495210647583
>>> t2.timeit()
36.52219820022583
However, much of that extra time is spent dealing with the args/kwargs
confusion; chopping out the code that handles initialization from args
gives better timing:
>>> t = timeit.Timer("Person(id=1, first='First', last='Last')", "from
blah.models import Person")
>>> t.timeit()
29.819494962692261
So that's about a 15% speed improvement over the current
__init__(**kwargs) at the cost of losing that same 15% since you can't
do *args.
[Note, however, that this speed argument is a bit silly when __init__
fires two extremely slow signals. Improving signal performance --
which is very high on my todo list -- will probably make this whole
performance debate a bit silly]
Jacob
>>> import this
...
There should be one-- and preferably only one --obvious way to do it.
...
On top of that, as I keep saying, it leads to brittle code -- I've
been bitten a number of times.
For example, say you've got this model::
class Person(models.Model):
first = models.CharField()
last = models.CharField(blank=True)
And somewhere -- maybe in a data import script -- you've got::
p = Person(None, first, last)
Now you decide you need to change you model a bit::
class Person(models.Model):
first = models.CharField()
middle = models.CharField()
last = models.CharField(blank=True)
Then you run the data import script and get a bunch of people with
last names stored under ``person.middle``.
Relying on the order of fields in the model definition is asking for a
heaping load of fail. Hence my desire to see it go away.
Jacob
I'm +1 on removing __init__(*args), for exactly the same reasons. I
can see the use case for handling user-instantaition of models from
custom queries, but I agree with Jacob that the right solution is to
make a clean API interface for this use case, not to open an API entry
point that is inherently prone to bugs.
Yours,
Russ Magee %-)
Huh?
+1 to that.
No, wait, make that +1 to erasing the ruttin' posargs "feature" from the
gorram *language*. Dong ma?
I'll be in my bunk.
--
Nicola Larosa - http://www.tekNico.net/
I'm a leaf on the wind. Watch how I soar.
I'm not arguing against removing positional argument support from model
constructors, just wondering about the 1.0 focus.
--Ned.
--
Ned Batchelder, http://nedbatchelder.com
Yeah, you're totally right that the feature is a perfect post-1.0
candidate. I'm just using it as a vehicle for getting up to speed on
the changes in the queryset-refactor branch so I can help Malcolm wrap
it up.
The *args thing, though, is certainly something I'd rather do pre-1.0
-- mostly because it makes the implementation of Model.__init__ much
cleaner -- so I wanted to get the discussion started.
Jacob
Whoa, now, don't go quoting Firefly on me or we'll be here all day :)
Jacob
+1 from me. I've been doing some interesting model stuff lately and
the positional arguments have caused a lot of headaches. It hasn't
bit me yet, but does make it difficult to do some tasks. I'm all in
favor of standardizing this stuff.
Michael Trier
blog.michaeltrier.com
Supporting every single conceivable way to create an instance out of the
box isn't really a good goal: the overhead that everybody ends up paying
to support the fringe cases is unfair. The phrase you're after for those
remaining cases is "custom __init__ method".
Malcolm
--
Depression is merely anger without enthusiasm.
http://www.pointy-stick.com/blog/
Good idea. Not sure why you're running into this with the work you're
doing, but it is fragile because of the auto-inserted primary keys. Also
it's going to be handy to insert more auto-generated fields up front
there, too, just to keep things neat and it's currently backwards
incompatible to do that.
Malcolm
--
Success always occurs in private and failure in full view.
http://www.pointy-stick.com/blog/
I'm late to the party!
I'm +1 on this, despite the fact that this is a backwards
incompatibility that we haven't mentioned in our discussions of final
1.0 features.
Like some others have pointed out, this would make custom SQL queries
a bit more painful, but if we introduce a custom_query() method as
Jacob suggested, that would solve it. Personally I tend to use the
*args syntax in custom queries, but it's not a huge change to rewrite
code to use custom_query().
Adrian
--
Adrian Holovaty
holovaty.com | everyblock.com | djangoproject.com
Agreed- if the dev is avoiding repeating themselves, either it'll
end up requiring people to convert args into kwargs, or vice versa,
and then invoking a custom method w/ that object- in the end slowing
it down while making it rather fugly.
> Speed also XXXX
>
> As it stands now (in QSRF r7049), args are indeed faster than kwargs::
>
> >>> t1 = timeit.Timer("Person(1, 'First', 'Last')", "from blah.models
> import Person")
> >>> t2 = timeit.Timer("Person(id=1, first='First', last='Last')",
> "from blah.models import Person")
> >>> t1.timeit()
> 25.09495210647583
> >>> t2.timeit()
> 36.52219820022583
>
> However, much of that extra time is spent dealing with the args/kwargs
> confusion; chopping out the code that handles initialization from args
> gives better timing:
In hindsight, the pops can be optimized out via caching a set of the
allowed field names, and doing a differencing of that set against
kwargs- would avoid the semi-costly exception throwing (at least
KeyError) in addition.
Additional upshot of doing that conversion is that you would get a
listing of *all* invalid kwargs provided, instead of one of
potentially many kwargs that didn't match a field name.
Might want to take a stab at that, since that likely will reduce the
runtime a bit thus reducing the gap.
> >>> t = timeit.Timer("Person(id=1, first='First', last='Last')", "from
> blah.models import Person")
> >>> t.timeit()
> 29.819494962692261
>
> So that's about a 15% speed improvement over the current
> __init__(**kwargs) at the cost of losing that same 15% since you can't
> do *args.
Honestly, I dislike from_tuple instantiation, but I dislike chucking
out *args support w/out making kwargs processing far closer in
processing speed. Reasoning is pretty simple- most peoples implicit
usage of Model.__init__ is instantiation of db returned results, row
results in other words. I'm sure there are people out there who have
far heavier writes then reads db wise, but I'd expect the majority of
django consumers are doing reads, which in my own profiling still was
one of the most costly slices of django CPU runtime.
So, if row based instantiations are the majority usage, and DBAPI
(last I read) doesn't mandate a default way to fetch a dict
instead of a row, might want to take a closer look at the cost of row
conversion for Model.__init__. It'll be more then 15% slowdown of db
access processing wise and should be measured- at the very least, to
know what fully what is gained/lost in the transition.
> [Note, however, that this speed argument is a bit silly when __init__
> fires two extremely slow signals. Improving signal performance --
> which is very high on my todo list -- will probably make this whole
> performance debate a bit silly]
Kindly tag ticket 4561 in then- signaling speed gains can occur, but
nothing will match the speed of *not* signaling, instead signaling
only when something is listening (which that ticket specifically adds
for __init__, yielding ~25% reduction).
Just to be clear, I understand the intent of making the code/api
simplier- in the same instant, we're talking about one of the most
critical chunks of django speed wise in my experience. Minor losses
in performance can occur, but I'd rather not see a 20% overhead
addition for it.
Aside from that, if you can post the patch somewhere for what you've
tweaked, I'd be curious.
Thanks,
~brian