[Question] Many-To-Many query where only pk is returned

135 views
Skip to first unread message

Cristiano Coelho

unread,
Nov 18, 2015, 7:41:22 PM11/18/15
to Django developers (Contributions to Django itself)
Hello there,

Lets say I have these two models (sorry about the spanish names!) ( Django 1.8.6 and MySQL backend )

class Especialidad(models.Model):
    nombre = models.CharField(max_length=250, blank=False, unique=True)


class Usuario(AbstractBaseUser): 
    permisosEspecialidad = models.ManyToManyField("Especialidad", blank=True)

Let u be some Usuario instance, and the following query:

u.permisosEspecialidad.all().values_list('pk',flat=True)

The actual printed query is:

SELECT `samiBackend_especialidad`.`id`
FROM `samiBackend_especialidad`
INNER JOIN `samiBackend_usuario_permisosEspecialidad` ON ( `samiBackend_especialidad`.`id` = `samiBackend_usuario_permisosEspecialidad`.`especialidad_id` )
WHERE `samiBackend_usuario_permisosEspecialidad`.`usuario_id` = 8

As my understanding, since I'm only selecting the id field which is already present in the intermediary table (and is also a FK), the actual join is redundant, as I have all the info I need in this case.

So the query could work like this

SELECT `samiBackend_usuario_permisosEspecialidad`.`especialidad_id`
FROM `samiBackend_usuario_permisosEspecialidad`
WHERE `samiBackend_usuario_permisosEspecialidad`.`usuario_id` = 8


I guess this works this way because this particular case might be hard to detect or won't be compatible with any additional query building, however, for ForeignKey relations, this optimization is already done (If you select the primary key from the selected model only, it wont add a join)

What would be the complications to implement this? Would it worth the effort?


Josh Smeaton

unread,
Nov 18, 2015, 10:04:31 PM11/18/15
to Django developers (Contributions to Django itself)
It might be a bit early in the day for me, but isn't that query already optimised? That is, it's already eliminated a join. It hasn't joined to the "Especialidad" table, it's only joined to the intermediate table. I *think* the join to the intermediate table is necessary because there could be duplicates.

Given the tables:

Usuario(pk):
1
2

Intermediate(usurario_id, especialidad_id):
1, 1
1, 2

Especialidad(pk)
1
2

Joining Usuario to Intermediate will return 4 results in SQL (2 for each pk on Usuario) unless there was a distinct in there somewhere. I haven't tested, so I'm not sure if django does duplicate elimination, but I'm pretty sure it doesn't.

Does this look right to you, or am I missing something?

Cheers

Eduardo Erlo

unread,
Nov 19, 2015, 7:00:13 AM11/19/15
to django-d...@googlegroups.com
I guess that Cristiano may be right.

Making the query only on the Intermediate table would be enough for this case, because the two columns that we need for the query are on this table, and looks like Django is using a JOIN only to get the "id" of Especialidad.

Here's an example:

GIven the Users...

In [2]: Usuario.objects.all().values_list('pk', flat=True)
Out[2]: [1, 2]

...and the Especialidads...

In [3]: Especialidad.objects.all().values_list('pk', flat=True)
Out[3]: [1, 2]

...and the M2M relations.

In [5]: Usuario.objects.get(pk=1).permisosEspecialidad.all().values_list('pk', flat=True)
Out[5]: [1, 2]

In [6]: Usuario.objects.get(pk=2).permisosEspecialidad.all().values_list('pk', flat=True)
Out[6]: [2]

----

The current Django query:

SELECT "samiBackend_especialidad"."id" FROM "samiBackend_especialidade" INNER JOIN "samiBackend_usuario_permisosEspecialidad" ON ( "samiBackend_especialidad"."id" = "samiBackend_usuario_permisosEspecialidad"."especialidade_id" ) WHERE "samiBackend_usuario_permisosEspecialidad"."usuario_id" = 1;
Results: 1, 2

SELECT "samiBackend_especialidade"."id" FROM "samiBackend_especialidad" INNER JOIN "samiBackend_usuario_permisosEspecialidad" ON ( "samiBackend_especialidad"."id" = "samiBackend_usuario_permisosEspecialidad"."especialidade_id" ) WHERE "samiBackend_usuario_permisosEspecialidad"."usuario_id" = 2;
Results: 2

...and the simpler query (just on the Intermediate table)

SELECT "samiBackend_usuario_permisosEspecialidad"."especialidad_id" FROM "samiBackend_usuario_permisosEspecialidad"  WHERE "samiBackend_usuario_permisosEspecialidad"."usuario_id" = 1;
Results: 1, 2

SELECT "samiBackend_usuario_permisosEspecialidad"."especialidad_id" FROM "samiBackend_usuario_permisosEspecialidad"  WHERE "samiBackend_usuario_permisosEspecialidad"."usuario_id" = 2;
Results: 2


I may be missing something as well, but I think that this optimization may be apropriated.






--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/9eddc20d-8740-48d1-a38d-fd7a00a56f2a%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.



--
Eduardo Erlo

charettes

unread,
Nov 19, 2015, 9:51:51 AM11/19/15
to Django developers (Contributions to Django itself)
Hi Cristiano,

If I get it correctly you'd like m2m querying to start with the intermediary (FROM) table and JOIN the referenced one only if more fields than the primary key are selected.

class Book(models.Model):
    name = models.CharField(max_length=100)

class Author(models.Model):
    books = models.ManyToMany(Book)

author = Author.objects.get(pk=1)
author.books.values_list('pk')

Would result in the following query:
SELECT book_id FROM author_books WHERE author_id = 1;

Instead of:
SELECT id FROM book JOIN author_books ON (book.id = author_books.book_id) WHERE author_id = 1;

I think this is a sensible optimization but I wonder about its feasibility. It looks like the `pk` reference would require some special handling to reference `book_id` since it's not actually a primary key on the intermediate table.

Simon

Cristiano Coelho

unread,
Nov 19, 2015, 4:03:20 PM11/19/15
to Django developers (Contributions to Django itself)
I think you didn't understand the query. You would filter by the intermediate table directly, and filter by Usuario (as you want all Especialidad from one Usuario), yielding 2 results as it should with a single select from one table. Right now django translates it into a more complicated select with a join from one of the tables with the intermediary one, which should be needed only if you are getting data from the main table.

Cristiano Coelho

unread,
Nov 19, 2015, 4:05:11 PM11/19/15
to Django developers (Contributions to Django itself)
You are right. I believe an optimization like this would probably help just a few people, as only fetching data from the intermediary table is a rare thing. But if it is an easy change, which improves performance, why not? However I think the change is quite complicated.

charettes

unread,
Nov 19, 2015, 4:19:17 PM11/19/15
to Django developers (Contributions to Django itself)
In the meanwhile you can directly query the intermediate model instead:

Author.books.through.objects.filter(author_id=author.id).values_list('book_id')

Anssi Kääriäinen

unread,
Nov 20, 2015, 4:05:20 AM11/20/15
to django-d...@googlegroups.com
We din't currently do join trimming from the beginning of the query's tables, only from the end of the tables.

Fixing this issue only doesn't seem worth it, as the current join trimmig code can't be adapted to handle trimming from the beginning of the query's tables. But we might be able to trim any non-necessary table from the query in the compiler stage. If we were able to do that, then there might be a point in implementing that. I know we sometimes include non-necessary tables in the query for example in multitable inheritance scenarios.

 - Anssi
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
Reply all
Reply to author
Forward
0 new messages