Groups keyboard shortcuts have been updated
Dismiss
See shortcuts

DescriptorFields status/Manager API change

61 views
Skip to first unread message

Joseph Kocherhans

unread,
Jan 25, 2006, 11:52:29 AM1/25/06
to django-d...@googlegroups.com
Is anyone planning on tackling the descriptor fields proposal anytime
soon? If not, I can give it a shot.

I'm not entirely clear as to how to do the ManyToMany and the
auto-generated end of the ForeignKey fields yet. It really feels like
they should just return a Manager instance (that automatically filters
on the original object's id) when they are accessed as an attribute of
an object, and that the Manager should (at least partially) adopt the
proposed descriptor API.

At any rate, Managers and ManyToMany/OneToMany attributes do extremely
similar things, and I think they could share an api and maybe even
share most implementation. If managers behaved like object attributes,
this is what I'm worried about:

MyModel.objects.filter(creator__exact=5)
for obj in MyModel.objects:
print obj

It seems like people might want to use the objects attribute with
different filter/ordering criteria multiple times. Some sort of
.clone() method would help. In the simple case you just use
MyModel.objects directly, but if you want different result sets, use
.clone()

object_set1 = MyModel.objects.clone()
object_set2 = MyModel.objects.clone()

object_set1.filter(creator__exact=1)
object_set2.filter(creator__exact=2)

Maybe unifying the attribute and manager api's isn't worth the
trouble. I'm sure there are some issues I'm overlooking here, so
please point out any problems you see.

Joseph

Jason Davies

unread,
Jan 25, 2006, 12:16:05 PM1/25/06
to Django developers

Joseph Kocherhans wrote:

> At any rate, Managers and ManyToMany/OneToMany attributes do extremely
> similar things, and I think they could share an api and maybe even
> share most implementation. If managers behaved like object attributes,
> this is what I'm worried about:
>
> MyModel.objects.filter(creator__exact=5)
> for obj in MyModel.objects:
> print obj

I was under the impression that you'd do something like:

for obj in MyModel.objects.filter(creator__exact=5):
print obj

i.e. .filter() doesn't do anything to the existing MyModel.objects, but
returns a new lazy collection object with extra lookup params. Thus
you could do:

object_set1 = MyModel.objects.filter(creator__exact=1)
object_set2 = MyModel.objects.filter(creator__exact=2)

Regards,
Jason

Joseph Kocherhans

unread,
Jan 25, 2006, 12:40:41 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Jason Davies <jason....@gmail.com> wrote:
>
>
> Joseph Kocherhans wrote:
>
> > At any rate, Managers and ManyToMany/OneToMany attributes do extremely
> > similar things, and I think they could share an api and maybe even
> > share most implementation. If managers behaved like object attributes,
> > this is what I'm worried about:
> >
> > MyModel.objects.filter(creator__exact=5)
> > for obj in MyModel.objects:
> > print obj
>
> I was under the impression that you'd do something like:
>
> for obj in MyModel.objects.filter(creator__exact=5):
> print obj

If this were the case then it would be impossible to combine filter
and order_by, at least in this way:

MyModel.objects.filter(creator__exact=5)
MyModel.objects.order_by('status')


for obj in MyModel.objects:
print obj

For that code it seems like MyModel.objects should be a lazy
collection containing objects with creator=5 and ordered by status,
but in this case you would get an error because MyModel.objects
wouldn't support iteration.

If MyModel.objects (rather than MyModel.objects.filter()) instantiated
and returned a lazy collection, something like this should work as
expected:

for obj in MyModel.objects.filter(creator__exact=5).order_by('status'):
print obj

Or this:

myobjects = MyModel.objects
myobjects.filter(creator__exact=5)
myobjects.order_by('status')
for obj in myobjects:
print obj

Also, the current proposal doesn't apply to manipulators, just fields.
For the above you would still use managers like this:

MyModel.object.get_list(creator__exact=5, order_by=('status'))

Joseph

Adrian Holovaty

unread,
Jan 25, 2006, 12:46:09 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Joseph Kocherhans <jkoch...@gmail.com> wrote:
> Is anyone planning on tackling the descriptor fields proposal anytime
> soon? If not, I can give it a shot.

I was planning on starting this last night but got wrapped up in other
stuff. I'd like to start on it myself later this evening.

> MyModel.objects.filter(creator__exact=5)
> for obj in MyModel.objects:
> print obj

If managers behaved like that, how would one do the equivalent of get_object()?

Adrian

--
Adrian Holovaty
holovaty.com | djangoproject.com | chicagocrime.org

Joseph Kocherhans

unread,
Jan 25, 2006, 12:46:22 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Joseph Kocherhans <jkoch...@gmail.com> wrote:
> On 1/25/06, Jason Davies <jason....@gmail.com> wrote:
> >
> >
> > Joseph Kocherhans wrote:
> >
> > > At any rate, Managers and ManyToMany/OneToMany attributes do extremely
> > > similar things, and I think they could share an api and maybe even
> > > share most implementation. If managers behaved like object attributes,
> > > this is what I'm worried about:
> > >
> > > MyModel.objects.filter(creator__exact=5)
> > > for obj in MyModel.objects:
> > > print obj
> >
> > I was under the impression that you'd do something like:
> >
> > for obj in MyModel.objects.filter(creator__exact=5):
> > print obj
>
> If this were the case then it would be impossible to combine filter
> and order_by, at least in this way:
>
> MyModel.objects.filter(creator__exact=5)
> MyModel.objects.order_by('status')
> for obj in MyModel.objects:
> print obj
>
> For that code it seems like MyModel.objects should be a lazy
> collection containing objects with creator=5 and ordered by status,
> but in this case you would get an error because MyModel.objects
> wouldn't support iteration.

Oops... MyModel.object would return an unfiltered/unordered lazy
collection. My mistake.

Joseph

Adrian Holovaty

unread,
Jan 25, 2006, 12:50:10 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Joseph Kocherhans <jkoch...@gmail.com> wrote:
> If this were the case then it would be impossible to combine filter
> and order_by, at least in this way:
>
> MyModel.objects.filter(creator__exact=5)
> MyModel.objects.order_by('status')
> for obj in MyModel.objects:
> print obj
>
> For that code it seems like MyModel.objects should be a lazy
> collection containing objects with creator=5 and ordered by status,
> but in this case you would get an error because MyModel.objects
> wouldn't support iteration.

This is different than what I'd been envisioning -- I was under the
impression MyModel.objects wouldn't contain state. A manager shouldn't
contain state of the filter() queries that were passed to it. (This is
a good argument for treating managers differently than many-to-one and
many-to-many QueryResult objects.)

Joseph Kocherhans

unread,
Jan 25, 2006, 12:50:29 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Adrian Holovaty <holo...@gmail.com> wrote:
>
> On 1/25/06, Joseph Kocherhans <jkoch...@gmail.com> wrote:
> > Is anyone planning on tackling the descriptor fields proposal anytime
> > soon? If not, I can give it a shot.
>
> I was planning on starting this last night but got wrapped up in other
> stuff. I'd like to start on it myself later this evening.
>
> > MyModel.objects.filter(creator__exact=5)
> > for obj in MyModel.objects:
> > print obj
>
> If managers behaved like that, how would one do the equivalent of get_object()?

Probably just MyModel.objects.get(1) where the first arg is assumed to
be the pk. You could also pass in slug__exact='test' or whatever.

Joseph

Joseph Kocherhans

unread,
Jan 25, 2006, 1:03:58 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Adrian Holovaty <holo...@gmail.com> wrote:
>

Why should field attributes contain filter state, and not managers
though? (Probably neither of them should BTW.) Allowing
filters/ordering on attributes seems like a really convenient thing,
but I think the semantics of attribute access are getting in the way
here. At any rate, having managers work one way, and "to-many"
attributes another is going to be confising to people.

Joseph

Luke Plant

unread,
Jan 25, 2006, 1:51:11 PM1/25/06
to django-d...@googlegroups.com
On Wed, 25 Jan 2006 11:46:09 -0600 Adrian Holovaty wrote:

> If managers behaved like that, how would one do the equivalent of
> get_object()?

... and get_values() (which doesn't fit the 'sets' paradigm at all,
since it can contain duplicates)?

The original proposal was only for model instance attributes as far
as I understood it. This does bring up a problem with the proposal.
The similarity between
Foo.objects.get_list()
and
foo.get_bar_list()

makes it strange and confusing for them to use different syntax, (as
Joseph pointed out), but is it possible for them to use the same syntax?
You would have to do away with anything that isn't 'set'-like.

(I was just about to post something about the get_values() method and
sets, until I realised the original proposal didn't include the manager
methods, but I'm glad to see my nagging thoughts did have something
behind them!).

Luke

--
"I have had a perfectly lovely evening. However, this wasn't it."
(Groucho Marx)

Luke Plant || L.Plant.98 (at) cantab.net || http://lukeplant.me.uk/

Luke Plant

unread,
Jan 25, 2006, 2:17:39 PM1/25/06
to django-d...@googlegroups.com

... also, what happens about exceptions i.e. when do they get thrown?
This applies to all the lazy collections, and also foreign key relationships:

e.g. from the wiki:
> article.reporter
> article.reporter.id (Doesn't do a DB query)

The first could throw Reporter.DoesNotExist, but the second doesn't.
I'm not sure that is a great plan really - and similar things apply to
the collections -- you have to start putting exception handling in
strange places, or round bigger blocks. I think this needs thinking
through properly too before any work starts. (I haven't thought these
things through - I'm just flagging up some worries before I rush out
for the evening).

Adrian Holovaty

unread,
Jan 25, 2006, 10:57:12 PM1/25/06
to django-d...@googlegroups.com
I've written up my latest proposal here:

http://code.djangoproject.com/wiki/DescriptorFields

It turns out related-object lookup can cleanly use the manager API.
Thoughts? If there are no big objections, let's start converting the
magic-removal unit tests to use this new syntax, and I'll start
implementation.

Joseph Kocherhans

unread,
Jan 25, 2006, 11:15:41 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Adrian Holovaty <holo...@gmail.com> wrote:
>
> I've written up my latest proposal here:
>
> http://code.djangoproject.com/wiki/DescriptorFields
>
> It turns out related-object lookup can cleanly use the manager API.
> Thoughts? If there are no big objections, let's start converting the
> magic-removal unit tests to use this new syntax, and I'll start
> implementation.

I'm much happier with this. Good work Adrian!

One question though... it doesn't say anything about accessing fields
via the class anymore. Is that still the plan? Article.sites,
Article.reporter, etc would return the field object, and
article_obj.site_set returns a manager.

Also, it would be cool if the _set stuff wasn't forced on us, there's
a disconnect between creating an attribute called 'sites' in your
model, and accessing that attribute via 'site_set'. Allowing users to
set the attribute name on both sides would require some trickery, but
it could be done. (I can think of a couple of ways.) That said, I can
live with the '_set' names.

Joseph

Adrian Holovaty

unread,
Jan 25, 2006, 11:25:50 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Joseph Kocherhans <jkoch...@gmail.com> wrote:
> One question though... it doesn't say anything about accessing fields
> via the class anymore. Is that still the plan? Article.sites,
> Article.reporter, etc would return the field object, and
> article_obj.site_set returns a manager.

Yeah, definitely -- that's still planned. I was a bit delete-happy in
editing that wiki page. :)

> Also, it would be cool if the _set stuff wasn't forced on us, there's
> a disconnect between creating an attribute called 'sites' in your
> model, and accessing that attribute via 'site_set'. Allowing users to
> set the attribute name on both sides would require some trickery, but
> it could be done. (I can think of a couple of ways.) That said, I can
> live with the '_set' names.

Good call about the "sites" thing. Really, that attribute name isn't
used at all (from what I can tell)...hmmm. What are your ideas?

Joseph Kocherhans

unread,
Jan 25, 2006, 11:36:06 PM1/25/06
to django-d...@googlegroups.com
On 1/25/06, Adrian Holovaty <holo...@gmail.com> wrote:
>

Either pass in the attribute name of the related object (or as I'd
like to call it, "the ugly way"):

class Article(models.Model):
sites = models.ManyToManyField(Site, related_attr='articles')

Or, steal an idea from SQLObject and allow strings rather class names
for OneToOne, ManyToMany, and ForeignKey (I think internally,
SQLObject uses the string to lookup the object from inside the current
module, but it's been awhile since I looked at it):

class Article(models.Model):
sites = models.ManyToManyField('Site')

That way people *can* define both sides with plain old attribute
assignment, but they don't *have* to. If someone *didn't* specify one
of the attributes, it should get created anyhow using the '_set' name.
This option would make the most sense if there was one more field type
that could define the 'other' side of a ForeignKey. Finding a decent
name for that field type might be challenging though ;)

Joseph

Jacob Kaplan-Moss

unread,
Jan 25, 2006, 11:49:44 PM1/25/06
to django-d...@googlegroups.com
On Jan 25, 2006, at 9:57 PM, Adrian Holovaty wrote:
> I've written up my latest proposal here:
>
> http://code.djangoproject.com/wiki/DescriptorFields

Yum!

I agree with Joseph -- I like this new syntax a *lot*.

I do have a few semantic questions, though. WARNING: Much nit-
picking follows; I should lead with the disclaimer that I'm *very*
happy with the syntax; I just want to make sure we get it as perfect
as possible!

First:

> Also, it would be cool if the _set stuff wasn't forced on us, there's
> a disconnect between creating an attribute called 'sites' in your
> model, and accessing that attribute via 'site_set'.

I *very* much agree with this.

Consider::

class Article(models.Model):
headline = models.CharField(maxlength=50)
reporter = models.ForeignKey(Reporter)
sites = models.ManyToManyField(Site)

The fact that I can't do ``article.sites.all()`` seems wrong -- is
there a reason for the foo_set name over the given field name that
I'm missing? This also deals with the edge case where I have
something like::

class Foo(models.Model):
bars = models.ManyToManyField(Bar)
bar_set = models.CharField()

Second, I'm still unhappy that it's not clear when a database lookup
is performed. Luckily, I think the proposed syntax makes it easy to
write a few rules so that you'll always know when the db is being
hit. How does this sound:

1. ``Manager.get`` always perfoms a lookup.
2. Any other manager method only actually hits the database when
iterated.
3. ``object.foreign_key`` always performs a lookup.
4. ``object.m2m_set`` is esentially a manager, so it behaves as
for 1 and 2.

There is a question of how/weather to cache database lookups::

article = Article.objects.get(pk=1)
article.reporter.fname # obvious performs a lookup
article.reporter.lname # does this?

I'd expect that subsequent accesses to the same foreign key do not
perform additional lookups, but if that's so there needs to be a way
to clear the cached object in case you have an object you need to
hang on to for a while. I'd suggest ``article.clear_relation_cache()``.

There's also the question of caching many to many lookups; my gut is
that Django shouldn't::

list(article.site_set) # hits the db
list(article.site_set) # does it again

Finally, it seems like the manager has a lot of methods, and I think
some of them are redundant. For one, ``Reporter.objects.all
().distinct()`` strikes me as less obvious than ``Reporter.objects.all
(distinct=True)``

In fact, I have trouble understanding why we need ``all()`` at all,
actually; what's the difference between these two?

::

Reporter.objects.all()
Reporter.objects.filter()

Why not rename "filter" to "list" (or "iterator" if we want to be
pedantic) and collapse the two methods into one?

Similarly, the chaining of filter/order_by/in_bulk seems crufty to
me; are these equivalent?

::

Reporter.objects.filter(fname="Joe").order_by("fname")
Reporter.objects.order_by("fname").filter(fname="Joe")

I'm a big fan of the "One True Way" principle, so that rubs me the
wrong way. Why not simply have order_by/distinct/etc. be kwargs to
the manager functions? If there's a good reason to do the "chained
methods" that's fine, but let's not be too clever for our own sakes, eh?

Adrian, thanks again for thinking this through -- I think this change
will make code a *lot* more readable, and of course a lot more fun.

Jacob

hugo

unread,
Jan 26, 2006, 3:23:09 AM1/26/06
to Django developers
>I'm a big fan of the "One True Way" principle, so that rubs me the
>wrong way. Why not simply have order_by/distinct/etc. be kwargs to
>the manager functions? If there's a good reason to do the "chained
>methods" that's fine, but let's not be too clever for our own sakes, eh?

One nice thing about chained methods is, you can pass around the query
object and modify it after the fact. Think "curried queries" - you
build a partial query and pass it to a function, which itself can add
more query specifications or ordering or stuff like that, without
needing to know about what the actual query is. This opens up a really
nice way to build generic code: you only need to put those query
specifications into your code that _your_ code needs, any filtering for
example can be done outside.

And I think all/filter is quite nice, too - that way you can see
directly whether a query starts with the full set or wether you work on
a subset. Sure, this can be done with one single method with
parameters, but I rather like the distinguished method names - it's
more readable to me.

bye, Georg

hugo

unread,
Jan 26, 2006, 3:30:18 AM1/26/06
to Django developers
>Good call about the "sites" thing. Really, that attribute name isn't
>used at all (from what I can tell)...hmmm. What are your ideas?

Regardless of what you do in the end: please no "magic name invention",
especially if it is something like "attribute sites becomes site_set" -
that way we would be back in magic pluralization/singularization land
;-)

Is there any reason why the article_obj.sites attribute can't hold the
manager already? We never directly access that attribute anyway - as it
doesn't carry any value of meaning. So Article.sites would be the
ManyToMany field and article_obj.sites would be the manager - looks
nice to me. That way you can just do article_obj.sites.all() to get all
linked objects. If people want to have the _set in the name, they can
allways just name the attribute "site_set" instead of "sites".

bye, Georg

Robert Wittams

unread,
Jan 26, 2006, 4:18:53 AM1/26/06
to django-d...@googlegroups.com

Any reason here why Manager couldn't inherit from Query (or a common
base type, eg QuerySet ), so it can be used directly as well as/instead
of via .all() ?

Also, you need to mention that Query instances can be combined via & and
| ( and support the other set stuff), but that the results are lazily
calculated.

Jacob Kaplan-Moss

unread,
Jan 26, 2006, 9:53:30 AM1/26/06
to django-d...@googlegroups.com
On Jan 26, 2006, at 2:23 AM, hugo wrote:
> One nice thing about chained methods is, you can pass around the query
> object and modify it after the fact. Think "curried queries" - you
> build a partial query and pass it to a function, which itself can add
> more query specifications or ordering or stuff like that, without
> needing to know about what the actual query is. This opens up a really
> nice way to build generic code: you only need to put those query
> specifications into your code that _your_ code needs, any filtering
> for
> example can be done outside.

Ah, interesting -- I hadn't thought about that situation. So
essentially a Query is a curried lookup until you iterate it, yes?

What happens to a query after it's been iterated? For example, how
does the following behave?

::

people = Reporter.objects.filter(fname="Joe")
for p in people:
print p

people2 = people.order_by('fname')
for p in people:
print p

> And I think all/filter is quite nice, too - that way you can see
> directly whether a query starts with the full set or wether you
> work on
> a subset. Sure, this can be done with one single method with
> parameters, but I rather like the distinguished method names - it's
> more readable to me.

Fair enough -- my tastes run the other way towards a single method,
but it's really not the big a deal.

Jacob

Adrian Holovaty

unread,
Jan 26, 2006, 11:28:57 AM1/26/06
to django-d...@googlegroups.com
On 1/25/06, Jacob Kaplan-Moss <ja...@jacobian.org> wrote:
> > Also, it would be cool if the _set stuff wasn't forced on us, there's
> > a disconnect between creating an attribute called 'sites' in your
> > model, and accessing that attribute via 'site_set'.
>
> I *very* much agree with this.
>
> Consider::
>
> class Article(models.Model):
> headline = models.CharField(maxlength=50)
> reporter = models.ForeignKey(Reporter)
> sites = models.ManyToManyField(Site)
>
> The fact that I can't do ``article.sites.all()`` seems wrong -- is
> there a reason for the foo_set name over the given field name that
> I'm missing? This also deals with the edge case where I have
> something like::

The reason is consistency. If we enforce the "_set" thing (or whatever
other name we come up with), that means there's a simple rule to
remember:

* To access related objects, whether they're one-to-many or
many-to-many, just use the attribute called "modelname_set" on your
object, where modelname is the lowercase name of the related model.

If, instead, we allow the article.sites thing (instead of
article.site_set), that introduces a special-case for many-to-many
relationships.

Make sense? But, yeah, I realize that the "sites" attribute name in
that model is essentially meaningless. Any other ideas?

> Second, I'm still unhappy that it's not clear when a database lookup
> is performed. Luckily, I think the proposed syntax makes it easy to
> write a few rules so that you'll always know when the db is being
> hit. How does this sound:
>
> 1. ``Manager.get`` always perfoms a lookup.
> 2. Any other manager method only actually hits the database when
> iterated.
> 3. ``object.foreign_key`` always performs a lookup.
> 4. ``object.m2m_set`` is esentially a manager, so it behaves as
> for 1 and 2.

I was thinking this would behave exactly as before --

1. Manager.get() always performs a lookup.
2. Any other default manager method only actually hits the
database when iterated (or repr()'d?).
3. object.foreign_key performs a lookup only the first time --
unless select_related=True was used on the lookup of the parent
object, in which case the value would already be cached.


4. object.m2m_set is esentially a manager, so it behaves as for 1 and 2.

> There is a question of how/weather to cache database lookups::
>
> article = Article.objects.get(pk=1)
> article.reporter.fname # obvious performs a lookup
> article.reporter.lname # does this?

No, that second one wouldn't perform a lookup. This would work exactly
as before, just like article.get_reporter().fname and
article.get_reporter().lname -- it's cached the first time it's
accessed.

> I'd expect that subsequent accesses to the same foreign key do not
> perform additional lookups, but if that's so there needs to be a way
> to clear the cached object in case you have an object you need to
> hang on to for a while. I'd suggest ``article.clear_relation_cache()``.

We haven't had this up to this point, but -- sure. :)

> There's also the question of caching many to many lookups; my gut is
> that Django shouldn't::
>
> list(article.site_set) # hits the db
> list(article.site_set) # does it again

I believe Django currently *does* cache many-to-many lookups, so I'd
been thinking it would continue to do so -- but I don't really mind
either way.

> Finally, it seems like the manager has a lot of methods, and I think
> some of them are redundant. For one, ``Reporter.objects.all
> ().distinct()`` strikes me as less obvious than ``Reporter.objects.all
> (distinct=True)``

Two reasons for making distinct a method instead of a keyword argument --

1. As Hugo pointed out, this makes it possible to chain queries.
2. It makes it possible to leave the "__exact" off, because we can be
positive any keyword argument to filter() is a field lookup, not a
meta argument such as "distinct" or "limit". For example, if you have
a field name called "distinct", you'd be able to do
Reporter.objects.filter(distinct='blah'), which would be the
equivalent of Reporter.objects.filter(distinct__exact='blah').
Granted, this is more of a side benefit than a reason *to* do it, but
I like it because it lets us trim the "__exact", which is the most
common lookup type.

> In fact, I have trouble understanding why we need ``all()`` at all,
> actually; what's the difference between these two?
>
> Reporter.objects.all()
> Reporter.objects.filter()
>
> Why not rename "filter" to "list" (or "iterator" if we want to be
> pedantic) and collapse the two methods into one?

It's for readability, like Hugo said, but I'm not 100% sold on it
either way. Come to think of it, get() accepts filter arguments, so,
if we have a list(), it'd be nice and consistent if list() also
accepted filter arguments.

> Similarly, the chaining of filter/order_by/in_bulk seems crufty to
> me; are these equivalent?
>
> Reporter.objects.filter(fname="Joe").order_by("fname")
> Reporter.objects.order_by("fname").filter(fname="Joe")
>
> I'm a big fan of the "One True Way" principle, so that rubs me the
> wrong way. Why not simply have order_by/distinct/etc. be kwargs to
> the manager functions? If there's a good reason to do the "chained
> methods" that's fine, but let's not be too clever for our own sakes, eh?

Again, it's all about chaining the methods to be able to pass around
the query object, and limiting the filter() arguments to remove the
"meta" arguments.

Tim Keating

unread,
Jan 26, 2006, 1:29:53 PM1/26/06
to Django developers
hugo wrote:
> One nice thing about chained methods is, you can pass around the query
> object and modify it after the fact. Think "curried queries" - you
> build a partial query and pass it to a function, which itself can add
> more query specifications or ordering or stuff like that, without
> needing to know about what the actual query is. This opens up a really
> nice way to build generic code: you only need to put those query
> specifications into your code that _your_ code needs, any filtering for
> example can be done outside.
>
> And I think all/filter is quite nice, too - that way you can see
> directly whether a query starts with the full set or wether you work on
> a subset. Sure, this can be done with one single method with
> parameters, but I rather like the distinguished method names - it's
> more readable to me.

+1 to this argument from me. I, too, like "all/filter" -- explicit is
better than implicit.

TK

Jacob Kaplan-Moss

unread,
Jan 26, 2006, 2:17:20 PM1/26/06
to django-d...@googlegroups.com
On Jan 26, 2006, at 10:28 AM, Adrian Holovaty wrote:
> The reason is consistency. If we enforce the "_set" thing (or whatever
> other name we come up with), that means there's a simple rule to
> remember:
[snip]

> Make sense? But, yeah, I realize that the "sites" attribute name in
> that model is essentially meaningless. Any other ideas?

It seems to me that the translation from ``sites`` in the model to
``site_set`` in the instance is "worse" than having an inconsistency
between m2m and o2m relations. In fact, I don't really have a
problem with the two relation types behaving differently.

I'd suggest that m2m relations use the attribute name, and that we
introduce an option for ForeignKey that lets you override the o2m name::

class Article:
writer = meta.ForeignKey(Reporter, related_name="articles")
sites = meta.ManyToManyField(Site)

s = Site.objects.get(pk=1)
r = Reporter.objects.get(pk=1)

s.article_set.all()
r.articles.all()

That is, if ``related_name`` isn't given then ``OBJECT_set`` is used.

<impression voice="DHH">Convention over Configuration!</impression>

> I was thinking this would behave exactly as before --
>
> 1. Manager.get() always performs a lookup.
> 2. Any other default manager method only actually hits the
> database when iterated (or repr()'d?).
> 3. object.foreign_key performs a lookup only the first time --
> unless select_related=True was used on the lookup of the parent
> object, in which case the value would already be cached.
> 4. object.m2m_set is esentially a manager, so it behaves as for
> 1 and 2.

Perfect.

>> I'd expect that subsequent accesses to the same foreign key do not
>> perform additional lookups, but if that's so there needs to be a way
>> to clear the cached object in case you have an object you need to
>> hang on to for a while. I'd suggest ``article.clear_relation_cache
>> ()``.
>
> We haven't had this up to this point, but -- sure. :)

That's a good point; is it worth adding this method at all? Is anyone
actually going to need it?

> I believe Django currently *does* cache many-to-many lookups, so I'd
> been thinking it would continue to do so -- but I don't really mind
> either way.

I'm also OK either way, but let's make it clear (somewhere) which it is.

> Two reasons for making distinct a method instead of a keyword
> argument --
>
> 1. As Hugo pointed out, this makes it possible to chain queries.
> 2. It makes it possible to leave the "__exact" off, because we can be
> positive any keyword argument to filter() is a field lookup, not a
> meta argument such as "distinct" or "limit". For example, if you have
> a field name called "distinct", you'd be able to do
> Reporter.objects.filter(distinct='blah'), which would be the
> equivalent of Reporter.objects.filter(distinct__exact='blah').
> Granted, this is more of a side benefit than a reason *to* do it, but
> I like it because it lets us trim the "__exact", which is the most
> common lookup type.

Yeah, I'm convinced -- I had trouble reproducing your reasoning, but
now that you explain it it makes perfect sense.

> It's for readability, like Hugo said, but I'm not 100% sold on it
> either way. Come to think of it, get() accepts filter arguments, so,
> if we have a list(), it'd be nice and consistent if list() also
> accepted filter arguments.

That's kinda what I would expect. The only problem is that list()
doesn't actually return a list -- it returns a Query that can be
iterated over... Hm...

OK, what about this: make ``objects`` callable::

Before After

Article.objects.get_list() Article.objects()
Article.objects.get_list(**kw) Article.objects(**kw)
Article.objects.get_object(**kw) Article.objects.get(**kw)
Article.objects.get_values(**kw) Article.objects.values(**kw)

That's more concise in the common case -- I'd guess that get_list()
is far and away the most common method used -- and it does away with
the filter/all distinction.

Thoughts?

Jacob

Adrian Holovaty

unread,
Jan 26, 2006, 2:38:26 PM1/26/06
to django-d...@googlegroups.com
On 1/26/06, Jacob Kaplan-Moss <ja...@jacobian.org> wrote:
> It seems to me that the translation from ``sites`` in the model to
> ``site_set`` in the instance is "worse" than having an inconsistency
> between m2m and o2m relations. In fact, I don't really have a
> problem with the two relation types behaving differently.
>
> I'd suggest that m2m relations use the attribute name, and that we
> introduce an option for ForeignKey that lets you override the o2m name::
>
> class Article:
> writer = meta.ForeignKey(Reporter, related_name="articles")
> sites = meta.ManyToManyField(Site)
>
> s = Site.objects.get(pk=1)
> r = Reporter.objects.get(pk=1)
>
> s.article_set.all()
> r.articles.all()
>
> That is, if ``related_name`` isn't given then ``OBJECT_set`` is used.

OK, this sounds good to me.

> OK, what about this: make ``objects`` callable::
>
> Before After
>
> Article.objects.get_list() Article.objects()
> Article.objects.get_list(**kw) Article.objects(**kw)
> Article.objects.get_object(**kw) Article.objects.get(**kw)
> Article.objects.get_values(**kw) Article.objects.values(**kw)
>
> That's more concise in the common case -- I'd guess that get_list()
> is far and away the most common method used -- and it does away with
> the filter/all distinction.

The problem with this is that Article.objects is a Manager instance,
so it would be slightly ugly and special-casish to have to specify
behavior for __call__(). If you wanted to override the functionality
for a custom manager, you'd have to override __call__(). That's not
"magic" per se, but it's still a bit of a special case. Gotta say I
really like the all() and filter() explicitness.

Robert Wittams

unread,
Jan 26, 2006, 2:51:19 PM1/26/06
to django-d...@googlegroups.com
Jacob Kaplan-Moss wrote:
> class Article:
> writer = meta.ForeignKey(Reporter, related_name="articles")
> sites = meta.ManyToManyField(Site)
>
> s = Site.objects.get(pk=1)
> r = Reporter.objects.get(pk=1)
>
> s.article_set.all()
> r.articles.all()
>
> That is, if ``related_name`` isn't given then ``OBJECT_set`` is used.

Given that this is identical to my original plan for descriptor fields (
don't randomly rename things, only make up names when no name is
provided by the user), I'm all for this. Making <whatever>_set into
religion was not the aim...

>
> That's kinda what I would expect. The only problem is that list()
> doesn't actually return a list -- it returns a Query that can be
> iterated over... Hm...
>
> OK, what about this: make ``objects`` callable::
>
> Before After
>
> Article.objects.get_list() Article.objects()
> Article.objects.get_list(**kw) Article.objects(**kw)
> Article.objects.get_object(**kw) Article.objects.get(**kw)
> Article.objects.get_values(**kw) Article.objects.values(**kw)
>
> That's more concise in the common case -- I'd guess that get_list() is
> far and away the most common method used -- and it does away with the
> filter/all distinction.
>
> Thoughts?

I don't really like it. I much prefer the idea of the manager being
something you can iterate over directly, and filter being a method to
use when you want to filter it, that returns an object of the same type
but just filtered down. Ie Article.objects is just a set that happens to
be backed by the database, and just happens to be filterable. Seems
pretty intuitive to me. No .all() or __call__ required.

I don't think the same can be said for making random objects callable...
I have no idea what this means, its just entirely arbitrary. Things
should generally only pretend to be functions when that is their main
purpose.

luke....@gmail.com

unread,
Jan 26, 2006, 4:52:16 PM1/26/06
to Django developers
I'm not a fan of .all() either - if you can do .filter() on
Article.objects, then surely Article.objects is already a collection of
some kind. That's what is sounds like: Article.objects == all the
'Article' objects.

Also, if Query instances act as sets, they should support len(), and
you don't need .count() :

reporter_obj.get_article_count() -> len(reporter_obj.article_set)
Article.objects.get_count() -> len(Article.objects)

This, too, makes much more sense without the .all()

Luke

luke....@gmail.com

unread,
Jan 26, 2006, 5:17:28 PM1/26/06
to Django developers
This is great, but one remaining issue:

If the related objects lookups always use the corresponding manager,
then this:

> article_obj.reporter

would be equivalent to:

> Reporter.objects.get(pk=article_obj.id)

That can throw a Reporter.DoesNotExist exception, which might not just
be corner case i.e. it might be perfectly allowable in your model to
have articles with a null reporter.

In the proposal (the original one at least), article_obj.reporter.id
wouldn't do a DB lookup, so .reporter must be lazy, which kind of makes
things worse - you can use reporter.id, but reporter.name will blow up
on you.

There are two use cases AFAICS:
- in view code, you want to know immediately if the object doesn't
exist, and not have to do some silly tricks to get the lazy object to
initialise from the db and throw any appropriate exceptions
- in template code, you might want:
{% if article_obj.reporter %}
Written by {{ article_obj.reporter.name }}
{% endif %}

Can we do both these? Perhaps a .lazyget() method on objects will help
- foreign key fields always translate into that. The __get__()
descriptor on the article_obj.reporter always does a .lazyget(), which
returns a lazy object which only has the id set. This way we can still
do article_obj.reporter.id without DB access. The lazy object needs a
__nonzero__ method that will first initialise it, and if it fails to
initialise it then return false, otherwise true.

.get(), on the other hand, doesn't get a lazy object, but immediately
throws exceptions if it's not there.

Think that covers it - are there any holes?

Luke

Joseph Kocherhans

unread,
Jan 26, 2006, 6:45:22 PM1/26/06
to django-d...@googlegroups.com
On 1/26/06, luke....@gmail.com <luke....@gmail.com> wrote:
>
> I'm not a fan of .all() either - if you can do .filter() on
> Article.objects, then surely Article.objects is already a collection of
> some kind. That's what is sounds like: Article.objects == all the
> 'Article' objects.

And there's the original problem that started this mess ;-) Let me see
if I can lay it all out. (Sorry to pick on your response Luke, not
trying to single you out.)

Case 1:
Let's assume that MyModel.objects or my_object.related_set *IS* a
Query object (in other words, it has state):

MyModel.objects # returns Q1
MyModel.objects.filter(*args) # Q1 with filters applied

q = MyModel.objects # q is Q1
q.filter(*args) # q is still Q1, but with filters now
q.order_by('test') # q is still Q1 + filters + ordering

q2 = MyModel.objects # oh shit, this is Q1 + filters + ordering, but I
expected a new Q

If you want to get 2 different iterators with different filter
criteria, you need some sort of .clone() method on Query objects.

q2 = MyModel.objects.clone() # ahhh... a new Q :)

Case 2:
Let's assume that MyModel.objects and my_object.related_set *RETURN* a
*NEW* Query object. (In other words is stateless.) You might expect
this:

MyModel.objects # Q1
MyModel.objects.filter(*args) # Q1 with filters applied
MyModel.objects.order_by(*args) # Q1 with ordering applied

But in fact, you'll get this:

MyModel.objects # Q1
MyModel.objects.filter(*args) # Q2
MyModel.objects.order_by(*args) # Q3

But this works fine:

q = MyModel.objects # q is Q1
q.filter(*args) # q is Q1 with filters applied
q.order_by(*args) # q is Q1 with ordering applied

My point was that you have to explain the difference between the last
two cases, or if MyModel.objects is stateful (as in Case 1) you need
that clone method I talked about. (Also, non-intuitive)

The syntax Robert proposed is very appealing to me, but AFAICS, the
facts that follow from that syntax are confusing as hell. If I'm
missing something, please let me know.


> Also, if Query instances act as sets, they should support len(), and
> you don't need .count() :

Ian Bicking has kept len() out of SQLObject result sets even though it
seems really intuitive to use. Here's a rundown of what I remember
about his argument: __len__ would run "count (*)" against the db. I
think iter() calls len() implicitly for performance reasons, so you'd
be running a useless count(*) every time you started iterating over a
Query object. On the other hand, maybe if the iterator has already
cahced the objects, __len__ could just call len() on the cache. It
might be possible to work something out.

Joseph

Robert Wittams

unread,
Jan 26, 2006, 7:14:15 PM1/26/06
to django-d...@googlegroups.com
Joseph Kocherhans wrote:
> On 1/26/06, luke....@gmail.com <luke....@gmail.com> wrote:
>
>>I'm not a fan of .all() either - if you can do .filter() on
>>Article.objects, then surely Article.objects is already a collection of
>>some kind. That's what is sounds like: Article.objects == all the
>>'Article' objects.
>
>
> And there's the original problem that started this mess ;-) Let me see
> if I can lay it all out. (Sorry to pick on your response Luke, not
> trying to single you out.)
>
> Case 1:
> Let's assume that MyModel.objects or my_object.related_set *IS* a
> Query object (in other words, it has state):

I have absolutely no idea how you arrived at this weirdo stateful model
- it clearly makes absolutely no sense wrt concurrency.

> Case 2:
> Let's assume that MyModel.objects and my_object.related_set *RETURN* a
> *NEW* Query object. (In other words is stateless.) You might expect
> this:
>
> MyModel.objects # Q1
> MyModel.objects.filter(*args) # Q1 with filters applied
> MyModel.objects.order_by(*args) # Q1 with ordering applied
>
> But in fact, you'll get this:
>
> MyModel.objects # Q1
> MyModel.objects.filter(*args) # Q2
> MyModel.objects.order_by(*args) # Q3
>
> But this works fine:
>
> q = MyModel.objects # q is Q1
> q.filter(*args) # q is Q1 with filters applied
> q.order_by(*args) # q is Q1 with ordering applied

I really have no idea what you are talking about here. Sorry.

The only interpretation I can make is that you are worried about making
multiple database calls when you . The only alternative to this is to
reimplement a relational database in python. Do you really want to do
the filtering client side? Or what? I am having real trouble grasping
your issue here.

Luke Plant

unread,
Jan 26, 2006, 7:39:52 PM1/26/06
to django-d...@googlegroups.com
On Thursday 26 January 2006 23:45, Joseph Kocherhans wrote:

> Let's assume that MyModel.objects or my_object.related_set *IS* a
> Query object (in other words, it has state):

It's state is simply an empty set of where clauses and an empty order by
list and these never actually change.

> q = MyModel.objects # q is Q1
> q.filter(*args) # q is still Q1, but with filters now

No, q.filter(*args) is a *new* set, it doesn't change q

> q.order_by('test') # q is still Q1 + filters + ordering

Again, a new set, which doesn't include the filter at all. The clone
would be done by the filter() and order_by() methods.

Unless I'm missing something about why that isn't possible?

> Ian Bicking has kept len() out of SQLObject result sets even though
> it seems really intuitive to use. Here's a rundown of what I remember
> about his argument: __len__ would run "count (*)" against the db. I
> think iter() calls len() implicitly for performance reasons, so you'd
> be running a useless count(*) every time you started iterating over a
> Query object. On the other hand, maybe if the iterator has already
> cahced the objects, __len__ could just call len() on the cache. It
> might be possible to work something out.

I'm happy with that, but I've done some quick tests and it doesn't seem
that __len__ is called when you get or use the iterator.

The only issue I've come across is that since managers are used to do
related object lookups, and related objects are cached, so are all
queries done by the manager - so iterating over Article.objects will
retrieve all the articles and keep them in a cache. But I imagine some
weakrefs will be adequate to sort that out.

Luke

--
I heard a man say that brigands demand your money or your life, whereas
women require both. (Samuel Butler)

Luke Plant || L.Plant.98 (at) cantab.net || http://lukeplant.me.uk/

--
"I imagine bugs and girls have a dim suspicion that nature played a
cruel trick on them, but they lack the intelligence to really
comprehend the magnitude of it." (Calvin and Hobbes)

Joseph Kocherhans

unread,
Jan 26, 2006, 8:01:28 PM1/26/06
to django-d...@googlegroups.com
On 1/26/06, Robert Wittams <rob...@wittams.com> wrote:
>
> Joseph Kocherhans wrote:
> > On 1/26/06, luke....@gmail.com <luke....@gmail.com> wrote:
> >
> >>I'm not a fan of .all() either - if you can do .filter() on
> >>Article.objects, then surely Article.objects is already a collection of
> >>some kind. That's what is sounds like: Article.objects == all the
> >>'Article' objects.
> >
> >
> > And there's the original problem that started this mess ;-) Let me see
> > if I can lay it all out. (Sorry to pick on your response Luke, not
> > trying to single you out.)
> >
> > Case 1:
> > Let's assume that MyModel.objects or my_object.related_set *IS* a
> > Query object (in other words, it has state):
>
> I have absolutely no idea how you arrived at this weirdo stateful model
> - it clearly makes absolutely no sense wrt concurrency.

Wow. I'm not even sure how I got to that point now.


> > Case 2:
> > Let's assume that MyModel.objects and my_object.related_set *RETURN* a
> > *NEW* Query object. (In other words is stateless.) You might expect
> > this:
> >
> > MyModel.objects # Q1
> > MyModel.objects.filter(*args) # Q1 with filters applied
> > MyModel.objects.order_by(*args) # Q1 with ordering applied
> >
> > But in fact, you'll get this:
> >
> > MyModel.objects # Q1
> > MyModel.objects.filter(*args) # Q2
> > MyModel.objects.order_by(*args) # Q3
> >
> > But this works fine:
> >
> > q = MyModel.objects # q is Q1
> > q.filter(*args) # q is Q1 with filters applied
> > q.order_by(*args) # q is Q1 with ordering applied
>
> I really have no idea what you are talking about here. Sorry.
>
> The only interpretation I can make is that you are worried about making
> multiple database calls when you . The only alternative to this is to
> reimplement a relational database in python. Do you really want to do
> the filtering client side? Or what? I am having real trouble grasping
> your issue here.

Nevermind. I was making an (erroneous and old) assumption that you
could modify the state of a query instance like:

q.filter()
q.order_by()

When in fact Adrian's proposal specifically says that filter() and
order_by() would return new query instances, not modify q's state (a
dict of filter criteria or whatever) I'm sorry :( To do the above, you
would have to chain the methods like:

q.filter().order_by()

or reassign q

q = q.filter()
q = q.order_by()

I'm happy now. +1 on MyModel.objects -> new Query instance rather than
MyModel.objects.all()
Sorry for the confusion. Hopefully this clears things up for someone
else as well.

Joseph

hugo

unread,
Jan 27, 2006, 3:27:27 AM1/27/06
to Django developers
>Ah, interesting -- I hadn't thought about that situation. So
>essentially a Query is a curried lookup until you iterate it, yes?

Yep.

>What happens to a query after it's been iterated? For example, how
>does the following behave?

I'd say it should memoize it's result - so it is only queried once per
request to reduce database hits. But there should maybe a .reset()
method on a manager to reset the memoized data, so you can rerun your
query if you need to (won't happen that often in web code, but might be
needed in batch code).

bye, Georg

Russell Keith-Magee

unread,
Jan 27, 2006, 7:11:42 AM1/27/06
to django-d...@googlegroups.com
On 1/27/06, Jacob Kaplan-Moss <ja...@jacobian.org> wrote:
> I'd suggest that m2m relations use the attribute name, and that we
> introduce an option for ForeignKey that lets you override the o2m name::
>
> class Article:
> writer = meta.ForeignKey(Reporter, related_name="articles")
> sites = meta.ManyToManyField(Site)
>
> s = Site.objects.get(pk=1)
> r = Reporter.objects.get(pk=1)
>
> s.article_set.all()
> r.articles.all()
>
> That is, if ``related_name`` isn't given then ``OBJECT_set`` is used.

Two quick notes:

1) ManyToManyField will need a related_name argument, not just o2m
fields, so as to handle reverse direction queries (e.g.,
sites.article_set). This may have been implied, but so far all the
disussion and all the examples have been about ForeignKey renaming.

2) I have a minor problem with the _set suffix: to me, _set implies
uniqueness in the returned results, which will not exist unless
.distinct()/distinct=True is used. Either distinct needs to be turned
on by default and disabled by parameter/filter (which I have argued
for in a previous thread), or a suffix that does not imply uniqueness
is required. Possible candidates: _list, _objects

Russ Magee %-)

Robert Wittams

unread,
Jan 27, 2006, 7:40:24 AM1/27/06
to django-d...@googlegroups.com
Russell Keith-Magee wrote:
> 2) I have a minor problem with the _set suffix: to me, _set implies
> uniqueness in the returned results, which will not exist unless
> .distinct()/distinct=True is used. Either distinct needs to be turned
> on by default and disabled by parameter/filter (which I have argued
> for in a previous thread), or a suffix that does not imply uniqueness
> is required. Possible candidates: _list, _objects

Could you provide an example where you would actually end up with
duplicate results here? I'm having trouble thinking of how this would
actually occur. AFAICT, each child object will only show up once in the
results. If there is one, I agree that it would make sense to use
distinct by default.

There was a huge long thread a while back in which other names were
discussed. Look it up in the archive if you like.

The main reason I like _set is that it suggests a bunch of operations
semantics that are fairly natural : that of the built-in set type in
Python 2.4 . _list is incredibly misleading (see the previous threads),
_objects is fairly vague...

Russell Keith-Magee

unread,
Jan 27, 2006, 7:55:52 AM1/27/06
to django-d...@googlegroups.com
On 1/27/06, hugo <g...@hugo.westfalen.de> wrote:

> >What happens to a query after it's been iterated? For example, how
> >does the following behave?
>
> I'd say it should memoize it's result - so it is only queried once per
> request to reduce database hits. But there should maybe a .reset()
> method on a manager to reset the memoized data, so you can rerun your
> query if you need to

Why not make the query object itself be the thing that is reset? e.g.

people = Reporter.objects.filter(fname="Joe")
for p in people:
print p

people.reset()
people = people.filter(lname="Smith")


for p in people:
print p

Would print all the reporters named Joe, reset the query, then rerun a
revised query to print all the reporters named Joe Smith.

Taking this approach a little further, it could also address Adrian's
Manager __call__ problem with Jacob's Article.objects() proposal.
Rather than exposing the manager itself, expose an interface that can
be used as a factory for producing Query objects. Keep the Manager
internally as a mechanism for managing database connections and SQL
composition, but don't expose it as the Article.objects member.

On the class itself, Article.objects(), Article.values(),
Article.in_bulk() become factory methods for producing Query objects
which, when iterated, provide objects of the expected type (instances,
dictionaries, etc).

filter, order_by, etc are kept as methods on a query object itself,
rather than methods on the manager. If you want to apply a filter, use
Article.objects().filter(headline="foo"). The metaphor here is 'make a
basic query object, then narrow it with a filter'.

On class instances, article_obj.sites() becomes the analogous factory
method for queries.

This approach also simplifies one use case for multiple managers -
pre-filtered Managers. If you need every query to have a particular
filter pre-applied, add a class method that returns
objects().filter(...) as required.

The one problem I can see is what to do with
article_obj.sites().clear(). I don't have a solution for this one,
other than to suggest making 'sites' an object that returns a query on
__call__. This is the same special case that Adrian objected to, but
it now only applies to modifying relationships in m2m queries, rather
than to every single class and query in the system.

Russ Magee %-)

Robert Wittams

unread,
Jan 27, 2006, 8:13:49 AM1/27/06
to django-d...@googlegroups.com

> Taking this approach a little further, it could also address Adrian's
> Manager __call__ problem with Jacob's Article.objects() proposal.
> Rather than exposing the manager itself, expose an interface that can
> be used as a factory for producing Query objects. Keep the Manager
> internally as a mechanism for managing database connections and SQL
> composition, but don't expose it as the Article.objects member.
>
> On the class itself, Article.objects(), Article.values(),
> Article.in_bulk() become factory methods for producing Query objects
> which, when iterated, provide objects of the expected type (instances,
> dictionaries, etc).
>
> filter, order_by, etc are kept as methods on a query object itself,
> rather than methods on the manager. If you want to apply a filter, use
> Article.objects().filter(headline="foo"). The metaphor here is 'make a
> basic query object, then narrow it with a filter'.
>
> On class instances, article_obj.sites() becomes the analogous factory
> method for queries.

I've got to say, this is absolutely horrible and non-obvious. API design
should not be an exercise in how clever or confusing you should be - it
should be non-surprising. Article.objects acting as a set of Articles is
non-surprising. Article.objects.filter(whatever="fish") returning a new
filtered set is non surprising.

> This approach also simplifies one use case for multiple managers -
> pre-filtered Managers. If you need every query to have a particular
> filter pre-applied, add a class method that returns
> objects().filter(...) as required.

And this is simpler than

class Something:
name = CharField(maxlength=100)
objects = Manager()
bad_objects = objects.filter(name_startswith="bad")

how? (Yes, its a contrived example.)

Russell Keith-Magee

unread,
Jan 27, 2006, 8:19:51 AM1/27/06
to django-d...@googlegroups.com
On 1/27/06, Robert Wittams <rob...@wittams.com> wrote:
>
> Russell Keith-Magee wrote:
> > 2) I have a minor problem with the _set suffix: to me, _set implies
> > uniqueness in the returned results, which will not exist unless
> > .distinct()/distinct=True is used.
>
> Could you provide an example where you would actually end up with
> duplicate results here? I'm having trouble thinking of how this would
> actually occur. AFAICT, each child object will only show up once in the
> results. If there is one, I agree that it would make sense to use
> distinct by default.

Any query involving joins over a m2o or m2m relation; if there are
multiple matches in the related table, there will be multiple rows
with the fields from the primary table: e.g.,

daily_planet = Newspaper.objects.get_object(name='Daily Planet')
daily_planet.reporter_set.filter(article_headline__startswith="Foo")

will duplicate results for every reporter that has written more than
one article with a headline starting with 'Foo'.

> There was a huge long thread a while back in which other names were
> discussed. Look it up in the archive if you like.

Ok. I missed that discussion the first time around. I now agree that
_set is the best of the bunch. Fixing distinct seems the better
approach.

Russ Magee %-)

Robert Wittams

unread,
Jan 27, 2006, 8:25:35 AM1/27/06