Implementation of descriptor fields

5 views
Skip to first unread message

Adrian Holovaty

unread,
Jan 27, 2006, 7:34:47 PM1/27/06
to django-d...@googlegroups.com
I've implemented the very first bit of descriptor fields -- the bit
that does poll.choice_set instead of poll.get_choice_list().

http://code.djangoproject.com/changeset/2141

Just before that, I'd added a core_filters attribute to Managers,
which, if set, is applied to every data-access query for that manager.

http://code.djangoproject.com/changeset/2140

An implementation question, though: If foo.objects is a Manager
instance, and accessing foo.objects returns a QuerySet instance, how
would one access the other manager methods, such as
foo.objects.get_pub_date_list(), etc.? Would that mean all of the
manager methods would have to be proxied through the QuerySet
instance? Seems quite dirty, and I think this is why I originally had
been thinking of doing manager.all().

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Russell Keith-Magee

unread,
Jan 27, 2006, 9:47:52 PM1/27/06
to django-d...@googlegroups.com
On 1/28/06, Adrian Holovaty <holo...@gmail.com> wrote:
>
> An implementation question, though: If foo.objects is a Manager
> instance, and accessing foo.objects returns a QuerySet instance, how
> would one access the other manager methods, such as
> foo.objects.get_pub_date_list(), etc.?

Isn't get_pub_date_list() really just a specialist type of values() call?

foo.objects -> set of all objects
foo.objects.values() -> dictionary containing values from set of all objects
foo.objects.pub_date_values() -> list containing all available dates

and more generally,

foo.objects.ATTR_values() -> set containing all available ATTRs

To maintain the full feature list of get_pub_date_list(), allow:

foo.objects.pub_date_values('year') -> set of all available years

as a special case for datetime fields.

Russ Magee %-)

Adrian Holovaty

unread,
Jan 28, 2006, 2:51:44 AM1/28/06
to django-d...@googlegroups.com
On 1/27/06, Russell Keith-Magee <freakb...@gmail.com> wrote:
> On 1/28/06, Adrian Holovaty <holo...@gmail.com> wrote:
> > An implementation question, though: If foo.objects is a Manager
> > instance, and accessing foo.objects returns a QuerySet instance, how
> > would one access the other manager methods, such as
> > foo.objects.get_pub_date_list(), etc.?
>
> Isn't get_pub_date_list() really just a specialist type of values() call?

My mistake -- get_pub_date_list() wasn't the best example. Better
examples would be completely custom manager methods, such as
create_user() and and make_random_password() in
django.contrib.auth.models.UserManager.

Russell Keith-Magee

unread,
Jan 28, 2006, 3:36:14 AM1/28/06
to django-d...@googlegroups.com
On 1/28/06, Adrian Holovaty <holo...@gmail.com> wrote:
>
> On 1/27/06, Russell Keith-Magee <freakb...@gmail.com> wrote:
> > On 1/28/06, Adrian Holovaty <holo...@gmail.com> wrote:
> > > An implementation question, though: If foo.objects is a Manager
> > > instance, and accessing foo.objects returns a QuerySet instance, how
> > > would one access the other manager methods, such as
> > > foo.objects.get_pub_date_list(), etc.?
> >
> > Isn't get_pub_date_list() really just a specialist type of values() call?
>
> My mistake -- get_pub_date_list() wasn't the best example. Better
> examples would be completely custom manager methods, such as
> create_user() and and make_random_password() in
> django.contrib.auth.models.UserManager.

Is there any particular reason that methods like that can't be class
methods of the Model itself? How is create_user a 'management'
function, rather than a 'model' function? This is especially true when
the manager is acting as a descriptor for a set of objects.

Having a quick poke around, most of the Manager methods I can see end
up working with self.klass anyway, which seems to suggest that they
_are_ model methods, but we're doing gymnastics to avoid putting them
there.

Russ Magee %-)

Luke Plant

unread,
Jan 28, 2006, 5:32:09 AM1/28/06
to django-d...@googlegroups.com
On Saturday 28 January 2006 00:34, Adrian Holovaty wrote:
> I've implemented the very first bit of descriptor fields -- the bit
> that does poll.choice_set instead of poll.get_choice_list().

I'm up for doing some work on this, and have some time at the moment,
even if it's just converting all the tests - if you want to delegate
out some work to me so we don't overlap. I'll hang around on IRC too.

I have a remaining query about implementation:

As Robert pointed out before, you usually don't have to worry about
resetting the cache, since the objects will get GC after the template
and views stop using them. However, the Article.objects manager is
special, since it is rooted in a class, and that's going to persist,
isn't it? So any iteration of Article.objects will cause the entire
table to be read and kept in memory.

You could say - "if there are no 'where clause' params added, then don't
cache", but this wouldn't work for custom managers e.g.
Articles.goodarticles = Articles.objects.filter(score_gt=50), since they
would persist too.

One solution would be to have managers not cache by default, but you
could call Articles.objects.with_cache(True) to get an instance with
caching turned on. Related object lookups would always do that.

Luke


--
"I married Miss Right, I just didn't know her first name was 'Always'"

Luke Plant || L.Plant.98 (at) cantab.net || http://lukeplant.me.uk/

Robert Wittams

unread,
Jan 28, 2006, 7:48:22 AM1/28/06
to django-d...@googlegroups.com

I planned it so that a Manager isa QuerySet, ie it inherits from it or a
suitable common base class. No descriptor malarkeys.

So I had envisioned:

QuerySet
| |
Manager RelatedObjectSet

As part of the inheritance hierarchy. Methods like .filter(),
.order_by() etc would be defined by query set. Calling these methods
would return a duplicate of the current object with the appropriate
changes made to the query specification inside it. This would allow the
easy definition of multiple managers. (Ie the result of a filter
operation would still have all the manager methods defined on it.)

So I think bare QuerySets would only arise as the result of operations
like & and | ... might be wrong though.

Robert

Luke Plant

unread,
Jan 28, 2006, 7:58:48 AM1/28/06
to django-d...@googlegroups.com
Some more implementation questions:

Presumably:

Article.objects.order_by(('title',)).order_by(('something_else',))
==
Article.objects.order_by(('something_else',))

so that you can override an ordering (I can't see many use cases for the
alternative i.e. cumulative ordering)

But what about these:

Article.objects.order_by(('title',)) &
Article.objects.order_by(('something_else',))

Which then leads to the obvious question:
Article.objects.order_by(('title',)) |
Article.objects.order_by(('something_else',))

Do we just say the result is undefined, and pick one? This is the most
sane I think, because the semantics of '&' is the same as filter():

Article.objects.filter(title='foo').filter(date=now)
== (Article.objects.filter(title='foo') &
Article.objects.filter(date=now))

Luke


?

Robert Wittams

unread,
Jan 28, 2006, 8:13:15 AM1/28/06
to django-d...@googlegroups.com
Luke Plant wrote:
> Some more implementation questions:
>
> Presumably:
>
> Article.objects.order_by(('title',)).order_by(('something_else',))
> ==
> Article.objects.order_by(('something_else',))
>
> so that you can override an ordering (I can't see many use cases for the
> alternative i.e. cumulative ordering)
>
> But what about these:
>
> Article.objects.order_by(('title',)) &
> Article.objects.order_by(('something_else',))
>
> Which then leads to the obvious question:
> Article.objects.order_by(('title',)) |
> Article.objects.order_by(('something_else',))
>
> Do we just say the result is undefined, and pick one? This is the most
> sane I think, because the semantics of '&' is the same as filter():
>
> Article.objects.filter(title='foo').filter(date=now)
> == (Article.objects.filter(title='foo') &
> Article.objects.filter(date=now))
>
> Luke

hugo and I talked on IRC about this, probably a couple of months ago
now. The best we could come up with for the | case was "undefined" -
probably remove the ordering clause completely. If you want an ordering,
do it after everything else :

(this example ends up kind of silly)

Article.objects.order_by(('title',)) |
Article.objects.order_by(('something_else',))

should be done as

(Article.objects | Article.objects).order_by('something_else')

Also, note that order by takes varargs rather than a tuple.

Luke Plant

unread,
Jan 28, 2006, 8:34:57 AM1/28/06
to django-d...@googlegroups.com
> hugo and I talked on IRC about this, probably a couple of months ago
> now. The best we could come up with for the | case was "undefined" -
> probably remove the ordering clause completely.

That would mean that if the manager, for instance, happened to be
ordered, you would lose ordering if you did an 'or' query, even
between. I think it should be:
- if only one of them has ordering, use it
- if they both have an ordering, pick one

This means if they both happen to have the same ordering it will work
out right. Or you could check for different orderings (which shouldn't
be an expensive comparison) and in that case remove all ordering.

Luke

> Also, note that order by takes varargs rather than a tuple.

Noted (I'm already having a go). I had started off exactly as you
proposed in the other e-mail before it arrived, so hopefully I'm on the
right lines.

Luke

Luke Plant

unread,
Jan 28, 2006, 9:15:12 AM1/28/06
to django-d...@googlegroups.com
OK, more holes in the proposal. Everything that used to be kwarg to
Manager().get_list() (or other methods), now has to alter the state of
the QuerySet, otherwise caching won't work. So we have the following
that still need sorting out:

select_related
limit
offset
params
select
where
tables

limit and offset can be done using slice notation. For consistency, all
the rest could have their own methods that return new instances:
.select_related(true_or_false)
.params(*args)
.select({})
.where([])
.tables([])

But I'm tending towards combining the last 4, all of which are for doing
extra related lookups, into a single .extras():

.extras(params=(), select={}, where=[], tables=[])

This seems nicer to me, as some of them don't make sense on their own.
Also, when combining QuerySets using & or |, seeing as these extra bits
don't fit into the whole 'sets' mentality, I think the result should be
to remove them from the new QuerySet. Thoughts?

Robert Wittams

unread,
Jan 28, 2006, 9:23:22 AM1/28/06
to django-d...@googlegroups.com
Luke Plant wrote:
>>hugo and I talked on IRC about this, probably a couple of months ago
>>now. The best we could come up with for the | case was "undefined" -
>>probably remove the ordering clause completely.
>
>
> That would mean that if the manager, for instance, happened to be
> ordered, you would lose ordering if you did an 'or' query, even
> between. I think it should be:
> - if only one of them has ordering, use it
> - if they both have an ordering, pick one
>
> This means if they both happen to have the same ordering it will work
> out right. Or you could check for different orderings (which shouldn't
> be an expensive comparison) and in that case remove all ordering.
>
> Luke

Hm, that does make sense. +1 .

Luke Plant

unread,
Jan 28, 2006, 9:37:06 AM1/28/06
to django-d...@googlegroups.com
On Saturday 28 January 2006 14:23, Robert Wittams wrote:

> Hm, that does make sense. +1 .

Except this sentence, which I never managed to finish:

> > That would mean that if the manager, for instance, happened to be
> > ordered, you would lose ordering if you did an 'or' query, even
> > between.

Should say: ...even between two managers that have the same ordering.

:-)

Adrian Holovaty

unread,
Jan 29, 2006, 1:19:55 AM1/29/06
to django-d...@googlegroups.com
On 1/28/06, Russell Keith-Magee <freakb...@gmail.com> wrote:
> > My mistake -- get_pub_date_list() wasn't the best example. Better
> > examples would be completely custom manager methods, such as
> > create_user() and and make_random_password() in
> > django.contrib.auth.models.UserManager.
>
> Is there any particular reason that methods like that can't be class
> methods of the Model itself? How is create_user a 'management'
> function, rather than a 'model' function? This is especially true when
> the manager is acting as a descriptor for a set of objects.

We had this discussion a while back; the goal is to separate
"instance-level" functionality from "table-level" functionality.
Anything that doesn't explicitly act on a model instance goes in the
manager. With that in mind, I'm not sure having Manager subclass
QuerySet (and making it a descriptor) is the best strategy, because a
Manager potentially does a lot more than return database queries. That
definition is a bit limiting. I'll play around with code Sunday and
see if I can come up with something.

Robert Wittams

unread,
Jan 29, 2006, 10:38:39 AM1/29/06
to django-d...@googlegroups.com
> We had this discussion a while back; the goal is to separate
> "instance-level" functionality from "table-level" functionality.
> Anything that doesn't explicitly act on a model instance goes in the
> manager. With that in mind, I'm not sure having Manager subclass
> QuerySet (and making it a descriptor)

Does it need to be a descriptor? Caching issues?

> is the best strategy, because a
> Manager potentially does a lot more than return database queries. That
> definition is a bit limiting. I'll play around with code Sunday and
> see if I can come up with something.
>
> Adrian

I can't see any conceptual issues with the manager being described as:

A set of objects that provides access to default and custom
functionality over the whole set. I think we need to stop immediately
thinking in terms of database queries, and look at things at a higher
level first.

One (imo nice) thing which falls out of this definition is that the
'self' argument to custom manager methods (in a subclass of manager) is
the particular collection of objects you invoked the function on. This
makes it quite similar to eg defining a subclass of list with extra
methods (silly contrived example follows):

class UserManager(Manager):
def find_users(predicate):
return [user for user in self if predicate(user)]

So if you had multiple managers which defined different restrictions due
to filter definitions, this function would work as expected within the
restricted sets.

Adrian Holovaty

unread,
Jan 29, 2006, 11:58:46 PM1/29/06
to django-d...@googlegroups.com
On 1/28/06, Luke Plant <luke....@gmail.com> wrote:
> As Robert pointed out before, you usually don't have to worry about
> resetting the cache, since the objects will get GC after the template
> and views stop using them. However, the Article.objects manager is
> special, since it is rooted in a class, and that's going to persist,
> isn't it? So any iteration of Article.objects will cause the entire
> table to be read and kept in memory.
>
> You could say - "if there are no 'where clause' params added, then don't
> cache", but this wouldn't work for custom managers e.g.
> Articles.goodarticles = Articles.objects.filter(score_gt=50), since they
> would persist too.
>
> One solution would be to have managers not cache by default, but you
> could call Articles.objects.with_cache(True) to get an instance with
> caching turned on. Related object lookups would always do that.

I've finally wrapped my brain around this problem, and it's a tricky
one. A common idiom is to do this in a view:

def some_view(request):
person_list = Person.objects
return render_to_response('foo/person_list', {'person_list':
person_list})

As Luke mentioned, we cannot cache the result of Person.objects,
because the class persists. (The next time somebody does
Person.objects, he will get the result of the cache, which is not what
we want.)

But we *do* want caching, because if we don't have it in this case, it
will do a separate database query each time person_list is iterated
over. That's not good, either.

The goal is: Only one database query, and no persistent caching across
accesses to Person.objects.

One way to solve this problem would be to require people to use list():

def some_view(request):
person_list = list(Person.objects)
return render_to_response('foo/person_list', {'person_list':
person_list})

But that's a horrible hack, IMO, and it's would be hard to explain to
new Django users why they'd need to do that.

This comes back to the Person.objects.all() thing that I'd mentioned
earlier. With all(), there's no strange special-case behavior.
Person.objects.all() would return a cache-filled object, and there
wouldn't be a need for a QuerySet to know whether it could accept
caching or not.

def some_view(request):
person_list = Person.objects.all()
return render_to_response('foo/person_list', {'person_list':
person_list})

This would further simplify things by making a Manager's methods
*return* QuerySets instead of having a Manager *be* a QuerySet. The
downside is it's slightly more typing to do Person.objects.all() than
Person.objects -- but that's the only downside I see. Thoughts?

Eugene Lazutkin

unread,
Jan 30, 2006, 1:42:08 AM1/30/06
to django-d...@googlegroups.com
Adrian Holovaty wrote:
>
> This would further simplify things by making a Manager's methods
> *return* QuerySets instead of having a Manager *be* a QuerySet. The
> downside is it's slightly more typing to do Person.objects.all() than
> Person.objects -- but that's the only downside I see. Thoughts?


.all() is not that much of a typing. And it is explicit. Now you found
one more reason to use it. Why not? +1.

Thanks,

Eugene

luke....@gmail.com

unread,
Jan 30, 2006, 7:38:58 AM1/30/06
to Django developers
Adrian Holovaty wrote:

> This comes back to the Person.objects.all() thing that I'd mentioned
> earlier. With all(), there's no strange special-case behavior.
> Person.objects.all() would return a cache-filled object, and there
> wouldn't be a need for a QuerySet to know whether it could accept
> caching or not.
>
> def some_view(request):
> person_list = Person.objects.all()
> return render_to_response('foo/person_list', {'person_list':
> person_list})
>
> This would further simplify things by making a Manager's methods
> *return* QuerySets instead of having a Manager *be* a QuerySet. The
> downside is it's slightly more typing to do Person.objects.all() than
> Person.objects -- but that's the only downside I see. Thoughts?

Glad you brought this up, I was intending to post along similar lines.

Consider this way of creating custom managers:

goodArticles = objects.filter(score_gt=50)

Presumably this needs the same behaviour as a Manager i.e. doesn't
cache. So the the filter method needs to return a non-caching
QuerySet. To get a caching version, you would have to do .all(), so
you would have to do this everytime you actually wanted to get data:

Article.objects.all().filter(blah..)

Would that be acceptable? Alternatively, we could say that the above
method of creating custom managers is wrong, and you should use
subclassing instead.

Also, with this syntax, you could still have Managers *being* QuerySets
(and it would probably be easier to implement it this way). All that
happens is that they are non-caching, and all the methods on them (or
just .all(), depending on the answer to the above) return new sets with
caching turned on.

Finally, the other idea of using list() turns out really bad. From
what I can tell, list(iterable) calls __len__ on that iterable if it
exists (after having fetched the data, for some unknown reason), so
with the current behaviour of __len__ on QuerySets,
list(Article.objects) fetches the data twice if caching is disabled (as
currently). You actually have to do list(iter(Article.objects)).
Nasty!

Luke

Amit Upadhyay

unread,
Jan 30, 2006, 8:02:51 AM1/30/06
to django-d...@googlegroups.com
Presumably this needs the same behaviour as a Manager i.e. doesn't
cache.  So the the filter method needs to return a non-caching
QuerySet.  To get a caching version, you would have to do .all(), so
you would have to do this everytime you actually wanted to get data:

Article.objects.all().filter(blah..)

Have not been following the discussion very closely, but why not have a parameter use_cache (defaulting to True) in the get_list and making it completely explicit. Sometimes I like the conveniece of not storing something in a variable, like from within a template, and calling the get_list [or get_set or whatever it is called in this branch] multiple times ( {% if object.get_related_list %} {% for related_item in object.get_related_list %} # do something ... ) while at other times I need the accurate current list.

--
Amit Upadhyay
Blog: http://www.rootshell.be/~upadhyay
+91-9867-359-701

Luke Plant

unread,
Jan 30, 2006, 8:18:57 AM1/30/06
to Django developers

Amit Upadhyay wrote:

> Have not been following the discussion very closely, but why not have a
> parameter use_cache (defaulting to True) in the get_list and making it
> completely explicit. Sometimes I like the conveniece of not storing
> something in a variable, like from within a template, and calling the
> get_list [or get_set or whatever it is called in this branch] multiple times

The problem is, there isn't even a 'get_set' in this branch, so there
isn't anywhere to specify that parameter. Fetching the data is done
implicitly by iterating over the set. In this context, something like a
objects.use_cache() method would be more appropriate - it would return
a QuerySet with caching. It's then equivalent to the above .all()
proposal.

Also, we don't want to add a keyword argument to filter(), because they
all are used for specifying where clauses. Positional arguments to
filter() are used for Q() objects (or will be, once that functionality
gets put back in).

Luke

Robert Wittams

unread,
Jan 30, 2006, 10:01:38 AM1/30/06
to django-d...@googlegroups.com

I can't see a non-evil way around this ATM with the caching issue. I
think it would be clearer what is going on if the method is called
.cached() rather than .all(), though....

Robert

Brant Harris

unread,
Jan 30, 2006, 11:04:01 AM1/30/06
to django-d...@googlegroups.com
Although, I think it'd be great to have it as propsed, at some point
the OO of your ORM needs to make consessions to the SQL of your ORM.
Which goes back to my earlier proposal.

That said... Person.objects could be a property that returns the
Manager/Query. In this way you could send a cacheless version of it
every time it is called, and would work for your first bit of code, as
desired.

Luke Plant

unread,
Jan 30, 2006, 1:49:43 PM1/30/06
to django-d...@googlegroups.com
On Monday 30 January 2006 16:04, Brant Harris wrote:

> That said... Person.objects could be a property that returns the
> Manager/Query. In this way you could send a cacheless version of it
> every time it is called, and would work for your first bit of code,
> as desired.

I attempted something like this first time (I think you mean it would
return a version with caching *enabled* each time, right?), but removed
it again. I wasn't sure how it would interact with things like
_default_manager, and also if you create custom managers based on the
the default one, Article.objects. I don't know enough about whether
creating the managers performance-wise either - funny things go on with
events - maybe Adrian could advise on that.

Luke

--
"I'm at peace with the world. I'm completely serene. I know why I was
put here and why everything exists. I am here so everybody can do what
I want. Once everybody accepts it, they'll be serene too." (Calvin and
Hobbes)

Brant Harris

unread,
Jan 30, 2006, 5:11:36 PM1/30/06
to django-d...@googlegroups.com
On 1/30/06, Luke Plant <luke....@gmail.com> wrote:
> I attempted something like this first time (I think you mean it would
> return a version with caching *enabled* each time, right?), but removed
> it again. I wasn't sure how it would interact with things like
> _default_manager, and also if you create custom managers based on the
> the default one, Article.objects. I don't know enough about whether
> creating the managers performance-wise either - funny things go on with
> events - maybe Adrian could advise on that.
>
> Luke

Er, I meant the property would return a query object with the cache
reset. The best way to do it would be to have the metaclass move the
given manager to something like "__manager__", and then the property
would do:

def get_manager(self):
q = self.__manager__.filter()
return q

It's surely hacky, but I think it needs to be to match the proposed
functionality.

Reply all
Reply to author
Forward
0 new messages