Selecting data from intermediate table in prefetch_related results in duplicate rows

496 views
Skip to first unread message

Tomáš Ehrlich

unread,
May 20, 2018, 5:42:04 AM5/20/18
to django...@googlegroups.com
Hello,
I have a two models (User, Project) in m2m relationship with intermediate (Role) table.

When I’m selecting all users in project and I also want to select corresponding role, I simply annotate one field from Role table using F expression:

Users.objects.all().filter(projects__name=‘Django’).annotate(role=F(‘roles__role’))

Annotate in this case doesn’t create new join, because `roles` table is already joined to filter on data from `projects` table. This works well.


However, I tried to same in `prefetch_related` and I’m getting duplicate rows, because there’s a new JOIN statement added. (Usecase: Selecting all projects in DB with all users per project)

The SQL statement with `prefetch_related`, but without `annotate` looks like this:

roles = Prefetch(
    ‘users',
    queryset=User.objects.all()
)
qs = Project.objects.prefetch_related(roles)

SELECT
  ("users_role"."project_id") AS "_prefetch_related_val_project_id",
  — other fields here
FROM "users_user"
  INNER JOIN "users_role" ON ("users_user"."id" = "users_role"."user_id")
WHERE "users_role"."project_id” IN (1, 2, 3, 4, 5)


As you can see, the table `users_role` is already joined, so I’m basically looking for Django ORM expression which generates following SQL query:

SELECT
  ("users_role"."project_id") AS "_prefetch_related_val_project_id”,
  “users_role_.”role”,
  — other fields here
FROM "users_user"
  INNER JOIN "users_role" ON ("users_user"."id" = "users_role"."user_id")
WHERE "users_role"."project_id” IN (1, 2, 3, 4, 5)


Unfortunatelly, following expression generates incorrect SQL:

roles = Prefetch(
    ‘users',
    queryset=User.objects.all().annotate(role=F('roles__role'))
)
qs = Project.objects.prefetch_related(roles)

SELECT
  ("users_role"."project_id") AS "_prefetch_related_val_project_id",
  "users_role"."role"         AS “role”,
  — other fields here
FROM "users_user"
  LEFT OUTER JOIN "users_role" ON ("users_user"."id" = "users_role"."user_id")
  INNER JOIN "users_role" T3 ON ("users_user"."id" = T3."user_id")
WHERE T3."project_id" IN
      (1, 2, 3, 4, 5)

The extra `left outer join` causes duplicate entries.


I’ve found one ticket (https://code.djangoproject.com/ticket/27144) which seems to be relevant, but it’s old and closed.

Any ideas? Is it bug or is there really a reason to include extra JOIN? I’m not very skilled in relational algebra.

Thank you in advance!


Cheers,
   Tom



signature.asc

Tomáš Ehrlich

unread,
May 20, 2018, 5:58:20 AM5/20/18
to django...@googlegroups.com
I completely forgot about `extra` method and it seems it does solve the problem:

roles = Prefetch(
    ‘users',
    queryset=User.objects.extra(select={'role': 'users_role.role'})
)
qs = Project.objects.prefetch_related(roles))

However, I’m still curious why `annotate` doesn’t work when used inside `prefetch_related`.


1. Is it bug? Should it work in the same way as when used outside `prefetch_related`?
2. Is there a way to avoid `extra` method? It should be avoided at all cost, after all.


Thank you in advance!
   Tom


20. 5. 2018 v 11:41, Tomáš Ehrlich <tomas....@gmail.com>:
signature.asc
Reply all
Reply to author
Forward
0 new messages