Consider the following (reduced for bravity) model:
class Experiment(models.Model):
"""
"""
interpretations = models.ManyToManyField(to="studies.Theory",
related_name="experiments_interpretations",
limit_choices_to=Q(parent__isnull=False),
through="studies.Interpretation")
paradigms = models.ManyToManyField(to="studies.Paradigm", related_name="experiments",
limit_choices_to=Q(parent__isnull=False)) # validator at least one
class Theory(models.Model):
class Meta:
verbose_name_plural = "theories"
name = models.CharField(null=False, blank=False, max_length=100)
parent = models.ForeignKey(null=True, blank=True, to="studies.Theory", on_delete=CASCADE, related_name="children")
class Paradigm(models.Model):
parent = models.ForeignKey(null=True,
blank=True,
related_name="child_paradigm", to="studies.Paradigm",
on_delete=SET_NULL)
name = models.CharField(null=False, blank=False, max_length=100)
Note the many to many from experiment to theory through interpretation.
I need to filter experiments via the interpretation type and then for each theory, attach a list experiments broken by paradigm. and I try to achieve this via django orm with PostgreSQL (with an arraysubquery for the list . while with other cases I've managed, in this specific case I just can't nail it. because for some reason once I add the annotation , the queryset, adds more items :( (instead of 7 expected in my test, it's 10)
the following code
base_subquery = Paradigm.objects.filter(experiments__in=OuterRef("children__experiments")) \
.filter(experiments__in=interpretations_experiments) \
.values("name") \
.annotate(series_name=F("name"))
subquery = base_subquery \
.annotate(experiment_count=Count("experiments")) \
.order_by() \
.annotate(data=JSONObject(key=F("series_name"), value=F("experiment_count"))) \
.values_list("data")
interpretations_experiments = Interpretation.objects.filter(type=self.interpretation).filter(
experiment__in=self.experiments).values("experiment")
parent_theories = Theory.objects.filter(parent__isnull=True)
result = interpretations_experiments \
.annotate(series_name=F("name")) \
.values("series_name") \
.annotate(series=ArraySubquery(subquery)) \
.values("series_name", "series")
If I remove the "annotate" clause with the arraysubquery, I get the expected result of item per theory (but without the array I need) but if I add annotate, I get an extra three (per paradigm connected to experiment => theory, or something like that)
Can anyone explain me what's going on? and how can I fix this? the django orm implicit joins and group bys always make it very hard to me to reason about the SQL results of various queries which are more then simple SELECT filters