M2M through intermediary

11 views
Skip to first unread message

akonsu

unread,
Aug 15, 2008, 5:24:55 PM8/15/08
to Django users
hello,

a sample code that i am having problem with is below. the problem is
that my filter seems to return more items than the all() operator. i
am lost please help:

class Part(models.Model) :
name = models.SlugField(unique = True)
def __str__(self) : return self.name

class Plate(models.Model) :
name = models.SlugField(unique = True)
parts = models.ManyToManyField(Part, through = 'PlatePart')
plates = models.ManyToManyField('self', symmetrical = False)
def __str__(self) : return self.name

class PlatePart(models.Model) :
name = models.SlugField()
part = models.ForeignKey(Part)
plate = models.ForeignKey(Plate)
def __str__(self) : return self.name

...
...
...

plate0, _ = Plate.objects.get_or_create(name = 'plate0')
plate1, _ = Plate.objects.get_or_create(name = 'plate1')

part0, _ = Part.objects.get_or_create(name = 'part0')

pp0, _ = PlatePart.objects.get_or_create(name = '__a', plate = plate0,
defaults = { 'part' : part0 })
pp1, _ = PlatePart.objects.get_or_create(name = '__b', plate = plate0,
defaults = { 'part' : part0 })

print plate0.parts.all()
print plate0.parts.filter(platepart__name__startswith = '__')

the output is like this:

[<Part: part0>, <Part: part0>]
[<Part: part0>, <Part: part0>, <Part: part0>, <Part: part0>]

Russell Keith-Magee

unread,
Aug 15, 2008, 10:59:38 PM8/15/08
to django...@googlegroups.com
On Sat, Aug 16, 2008 at 5:24 AM, akonsu <ako...@gmail.com> wrote:
>
> hello,
>
> a sample code that i am having problem with is below. the problem is
> that my filter seems to return more items than the all() operator. i
> am lost please help:

This is a bit of a leaky abstraction between the SQL/relational world
and the Django ORM. The answers that are returned are completely
correct, but it takes a little explaining to see why.

The first query - plate0.parts.all() - turns in to the following sql

{'time': '0.000', 'sql': u'SELECT "myapp_part"."id",
"myapp_part"."name" FROM "myapp_part" INNER JOIN "myapp_platepart" ON
("myapp_part"."id" = "myapp_platepart"."part_id") WHERE
"myapp_platepart"."plate_id" = 1 '}

That is, only the part and platepart tables are involved; since your
query doesn't involve any terms on the part itself, Django can
shortcut the query and not involve the part table. Since there are two
platepart instances, and the parts.all() query returns the part
associated with each of those instances. In this case, there two parts
are actually the same object, so there is duplication in the result
set.

The second query - 'plate0.parts.filter(...)' - does involve a term on
the part table, so the part table needs to be involved:

{'time': '0.000', 'sql': u'SELECT "myapp_part"."id",
"myapp_part"."name" FROM "myapp_part" INNER JOIN "myapp_platepart" ON
("myapp_part"."id" = "myapp_platepart"."part_id") INNER JOIN
"myapp_platepart" T4 ON ("myapp_part"."id" = T4."part_id") WHERE
("myapp_platepart"."plate_id" = 1 AND T4."name" LIKE \\_\\_% ESCAPE
\'\\\' )'}

(This explanation is going to get messy, so I hope it makes sense - if
it doesn't read up on how joins work in SQL, and see if you can
untangle the mess).

What happens here is that there are two instances in the platepart
table that match, each of which is then joined into the part table, so
there are two part instances. However, because of the join conditions,
there are actually 4 matches - platepart 1 with part 1, platepart 1
with part 2, platepart 2 with part 1 and platepart 2 with part 2. So,
as a result of including the part table, you end up with duplicated
duplicated results - and hence 4 returned objects.

The good news is that the general solution here is to use distinct().
If you had queried 'plate0.parts.all().distinct()' or
'plate0.parts.filter(..).distinct()', you would get a single Part
returned for each query.

Yours,
Russ Magee %-)

akonsu

unread,
Aug 16, 2008, 12:50:31 AM8/16/08
to Django users
Russ,

thanks for the detailed analyses.

i totally agree with the first join. it makes perfect sense to me. the
second one though i would write thus:

select *
from myapp_platepart
inner join myapp_plate on myapp_platepart.plate_id=myapp_plate.id
inner join myapp_part on myapp_platepart.part_id=myapp_part.id;

this join returns only two rows, just like the first one. i would say
the way the second join is constructed currently is a bug. would you
agree?

konstantin

On Aug 15, 10:59 pm, "Russell Keith-Magee" <freakboy3...@gmail.com>
wrote:

Malcolm Tredinnick

unread,
Aug 16, 2008, 11:27:17 AM8/16/08
to django...@googlegroups.com

On Fri, 2008-08-15 at 21:50 -0700, akonsu wrote:
> Russ,
>
> thanks for the detailed analyses.
>
> i totally agree with the first join. it makes perfect sense to me. the
> second one though i would write thus:
>
> select *
> from myapp_platepart
> inner join myapp_plate on myapp_platepart.plate_id=myapp_plate.id
> inner join myapp_part on myapp_platepart.part_id=myapp_part.id;
>
> this join returns only two rows, just like the first one. i would say
> the way the second join is constructed currently is a bug. would you
> agree?

I haven't read traced through all the details, but seeing the duplicate
rows and the fact you're using a related field through many-to-many
makes me fairly sure you've rediscovered ticket #8046, which is indeed a
bug.

See also
http://groups.google.com/group/django-users/browse_thread/thread/940db5787c444514/f13e948cdc523d33?lnk=gst&q=Uzi#f13e948cdc523d33

Regards,
Malcolm


Reply all
Reply to author
Forward
0 new messages