Django annotations questions - why is the queyset extended after an annotation (with many to many)

alonn

unread,

Mar 1, 2023, 12:54:53 PM3/1/23

to PyWeb-IL

Consider the following (reduced for bravity) model:

class Experiment(models.Model):
"""
"""

interpretations = models.ManyToManyField(to="studies.Theory",
related_name="experiments_interpretations",
limit_choices_to=Q(parent__isnull=False),
through="studies.Interpretation")
paradigms = models.ManyToManyField(to="studies.Paradigm", related_name="experiments",
limit_choices_to=Q(parent__isnull=False)) # validator at least one

class Theory(models.Model):
class Meta:
verbose_name_plural = "theories"

name = models.CharField(null=False, blank=False, max_length=100)
parent = models.ForeignKey(null=True, blank=True, to="studies.Theory", on_delete=CASCADE, related_name="children")

class Paradigm(models.Model):
parent = models.ForeignKey(null=True,
blank=True,
related_name="child_paradigm", to="studies.Paradigm",
on_delete=SET_NULL)
name = models.CharField(null=False, blank=False, max_length=100)

Note the many to many from experiment to theory through interpretation.

I need to filter experiments via the interpretation type and then for each theory, attach a list experiments broken by paradigm. and I try to achieve this via django orm with PostgreSQL (with an arraysubquery for the list . while with other cases I've managed, in this specific case I just can't nail it. because for some reason once I add the annotation , the queryset, adds more items :( (instead of 7 expected in my test, it's 10)

the following code

base_subquery = Paradigm.objects.filter(experiments__in=OuterRef("children__experiments")) \

.filter(experiments__in=interpretations_experiments) \
.values("name") \
.annotate(series_name=F("name"))

subquery = base_subquery \
.annotate(experiment_count=Count("experiments")) \
.order_by() \
.annotate(data=JSONObject(key=F("series_name"), value=F("experiment_count"))) \
.values_list("data")

interpretations_experiments = Interpretation.objects.filter(type=self.interpretation).filter(
experiment__in=self.experiments).values("experiment")
parent_theories = Theory.objects.filter(parent__isnull=True)

result = interpretations_experiments \

.annotate(series_name=F("name")) \
.values("series_name") \
.annotate(series=ArraySubquery(subquery)) \
.values("series_name", "series")

If I remove the "annotate" clause with the arraysubquery, I get the expected result of item per theory (but without the array I need) but if I add annotate, I get an extra three (per paradigm connected to experiment => theory, or something like that)

Can anyone explain me what's going on? and how can I fix this? the django orm implicit joins and group bys always make it very hard to me to reason about the SQL results of various queries which are more then simple SELECT filters

I'd appreciate any help

אורי‎

unread,

Mar 1, 2023, 12:59:55 PM3/1/23

to pywe...@googlegroups.com

Which version of Django? Did you try other versions? It looks to me this is a question for Stack Overflow.

אורי

u...@speedy.net

--
You received this message because you are subscribed to the Google Groups "PyWeb-IL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyweb-il+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyweb-il/4bc938dc-c190-4aaa-a8e7-2493f35e6c56n%40googlegroups.com.

alonn

unread,

Mar 1, 2023, 1:23:19 PM3/1/23

to PyWeb-IL

django 4.1 . and I'll ask there also, but from my experience Haki, Shai, Meir, and some others here are more knowledgable with those specific corners

אורי‎

unread,

Mar 1, 2023, 1:36:27 PM3/1/23

to pywe...@googlegroups.com

Hi alonn,

I'm not sure if this is the case, but is it OK that an annotation changes the number of records received by a query? It looks to me this might be unintended, and therefore it may be a bug in Django, in this case you can submit a bug report to Django.

אורי

u...@speedy.net

To view this discussion on the web visit https://groups.google.com/d/msgid/pyweb-il/b42203c6-3894-4f70-8cec-64e3460328ebn%40googlegroups.com.

Shai Berger

unread,

Mar 1, 2023, 1:37:20 PM3/1/23

to pywe...@googlegroups.com

On Wed, 1 Mar 2023 09:54:53 -0800 (PST)
alonn <alon...@gmail.com> wrote:
>
> I'd appreciate any help
>

Hi,

Your example is still a little "a lot" to analyze; try printing the SQL.
In your example code, you can do that with

print(str(result.query))

Do that with and without the suspected annotation, and if you don't yet
see the light, please bring the results here.

HTH,
Shai.

alonn

unread,

Mar 1, 2023, 2:19:47 PM3/1/23

to PyWeb-IL

Sorry missed this model which is the through model between them experiment and theory, I'll add the queries as shai suggested, although I didn't why it's happening

class Interpretation(models.Model):
experiment = models.ForeignKey(to="studies.Experiment", on_delete=CASCADE, related_name="theories")
theory = models.ForeignKey(to="studies.Theory", on_delete=CASCADE, related_name="experiments",
limit_choices_to=Q(parent__isnull=False))
type = models.CharField(null=False, blank=False, choices=InterpretationsChoices.choices, max_length=30)

I really doubt it's a django bug, it probably me doing something wrong, this is the kind of issues, I miss working with sqlalchemy

Also indentation got broken when pasting here, this is really for brevity

Shai Berger

unread,

Mar 1, 2023, 4:17:24 PM3/1/23

to pywe...@googlegroups.com

Hi Alon,

This still doesn't seem right.

On Wed, 1 Mar 2023 11:19:47 -0800 (PST)
alonn <alon...@gmail.com> wrote:

>
> class Interpretation(models.Model):
> experiment = models.ForeignKey(to="studies.Experiment", on_delete=CASCADE, related_name="theories")
> theory = models.ForeignKey(to="studies.Theory", on_delete=CASCADE, related_name="experiments",
> limit_choices_to=Q(parent__isnull=False))
> type = models.CharField(null=False, blank=False, choices=InterpretationsChoices.choices, max_length=30)
>

but in the queries you gave, you do (effectively)

interpretations_experiments = (
Interpretation.objects
.filter(type=self.interpretation)
.filter(experiment__in=self.experiments)
.values("experiment")
.annotate(series_name=F("name"))
...
)

but for that to work, Interpretation (the main model of the
QuerySet) needs to have a "name" field, and it doesn't.

alonn

unread,

Mar 1, 2023, 5:19:25 PM3/1/23

to PyWeb-IL

Thanks, the name is from the theory (the Interpretation is a through model between experiment and theory)

and the main model in the queryset is Theory and not as I now see I've misleadingly wrote when I tried to make this simpler for brevity

parent_theories = Theory.objects.filter(parent__isnull=True)

result = parent_theories \ .. etc

but thanks to your good suggestion before I found the issue. there was a left join on theory, so without a distinct there, you can get multiple theories..

I've should've looked at the actual query earlier (which I did, but not copied to a clean place with SQL highlighting and formatting) , sorry for the mess

And While yes, it's never a bug in PostgreSQL and rarely in Django, it's usually my own code the problem,

I do wish there was a more explicit way to define joins, group by, etc..

Reply all

Reply to author

Forward