Django annotations questions - why is the queyset extended after an annotation (with many to many)

16 views
Skip to first unread message

alonn

unread,
Mar 1, 2023, 12:54:53 PM3/1/23
to PyWeb-IL
Consider the following (reduced for bravity) model:

class Experiment(models.Model):
    """
    """
    
    interpretations = models.ManyToManyField(to="studies.Theory",
                                             related_name="experiments_interpretations",
                                             limit_choices_to=Q(parent__isnull=False),
                                             through="studies.Interpretation")
    paradigms = models.ManyToManyField(to="studies.Paradigm", related_name="experiments",
                                       limit_choices_to=Q(parent__isnull=False))  # validator at least one
  
class Theory(models.Model):
    class Meta:
        verbose_name_plural = "theories"

    name = models.CharField(null=False, blank=False, max_length=100)
    parent = models.ForeignKey(null=True, blank=True, to="studies.Theory", on_delete=CASCADE, related_name="children")

class Paradigm(models.Model):
    parent = models.ForeignKey(null=True,
                               blank=True,
                               related_name="child_paradigm", to="studies.Paradigm",
                               on_delete=SET_NULL)
    name = models.CharField(null=False, blank=False, max_length=100)

Note the many to many from experiment to theory through interpretation. 
I need to filter experiments via the interpretation type  and then for each theory, attach a list experiments broken by paradigm. and I try to achieve this via django orm with PostgreSQL (with an arraysubquery for the list . while with other cases I've managed, in this specific case I just can't nail it. because for some reason once I add the annotation , the queryset, adds more items :( (instead of 7 expected in my test, it's 10) 

the following code 
base_subquery = Paradigm.objects.filter(experiments__in=OuterRef("children__experiments")) \
            .filter(experiments__in=interpretations_experiments) \
            .values("name") \
            .annotate(series_name=F("name"))
subquery = base_subquery \
            .annotate(experiment_count=Count("experiments")) \
            .order_by() \
            .annotate(data=JSONObject(key=F("series_name"), value=F("experiment_count"))) \
            .values_list("data")

interpretations_experiments = Interpretation.objects.filter(type=self.interpretation).filter(
            experiment__in=self.experiments).values("experiment")
        parent_theories = Theory.objects.filter(parent__isnull=True)
result =  interpretations_experiments \
            .annotate(series_name=F("name")) \
            .values("series_name") \
            .annotate(series=ArraySubquery(subquery)) \
            .values("series_name", "series")
If I remove the "annotate" clause with the arraysubquery, I get the expected result of item per theory (but without the array I need) but if I add annotate, I get an extra three (per paradigm connected to experiment => theory, or something like that)

Can anyone explain me what's going on? and how can I fix this? the django orm implicit joins and group bys always make it very hard to me to reason about the SQL results of various queries which are more then simple SELECT filters

I'd appreciate any help

אורי

unread,
Mar 1, 2023, 12:59:55 PM3/1/23
to pywe...@googlegroups.com
Which version of Django? Did you try other versions? It looks to me this is a question for Stack Overflow.

--
You received this message because you are subscribed to the Google Groups "PyWeb-IL" group.
To unsubscribe from this group and stop receiving emails from it, send an email to pyweb-il+u...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/pyweb-il/4bc938dc-c190-4aaa-a8e7-2493f35e6c56n%40googlegroups.com.

alonn

unread,
Mar 1, 2023, 1:23:19 PM3/1/23
to PyWeb-IL
django 4.1 . and I'll ask there also, but from my experience Haki, Shai, Meir, and some others here are more knowledgable with those specific corners

אורי

unread,
Mar 1, 2023, 1:36:27 PM3/1/23
to pywe...@googlegroups.com
Hi alonn,

I'm not sure if this is the case, but is it OK that an annotation changes the number of records received by a query? It looks to me this might be unintended, and therefore it may be a bug in Django, in this case you can submit a bug report to Django.

Shai Berger

unread,
Mar 1, 2023, 1:37:20 PM3/1/23
to pywe...@googlegroups.com
On Wed, 1 Mar 2023 09:54:53 -0800 (PST)
alonn <alon...@gmail.com> wrote:
>
> I'd appreciate any help
>

Hi,

Your example is still a little "a lot" to analyze; try printing the SQL.
In your example code, you can do that with

print(str(result.query))

Do that with and without the suspected annotation, and if you don't yet
see the light, please bring the results here.

HTH,
Shai.

alonn

unread,
Mar 1, 2023, 2:19:47 PM3/1/23
to PyWeb-IL
Sorry missed this model which is the through model between them experiment and theory, I'll add the queries as shai suggested, although I didn't why it's happening

class Interpretation(models.Model):
    experiment = models.ForeignKey(to="studies.Experiment", on_delete=CASCADE, related_name="theories")
    theory = models.ForeignKey(to="studies.Theory", on_delete=CASCADE, related_name="experiments",
                               limit_choices_to=Q(parent__isnull=False))
    type = models.CharField(null=False, blank=False, choices=InterpretationsChoices.choices, max_length=30)

I really doubt it's a django bug, it probably me doing something wrong, this is the kind of issues, I miss working with sqlalchemy 

Also indentation got broken when pasting here, this is really for brevity 

Shai Berger

unread,
Mar 1, 2023, 4:17:24 PM3/1/23
to pywe...@googlegroups.com
Hi Alon,

This still doesn't seem right.

On Wed, 1 Mar 2023 11:19:47 -0800 (PST)
alonn <alon...@gmail.com> wrote:

>
> class Interpretation(models.Model):
> experiment = models.ForeignKey(to="studies.Experiment", on_delete=CASCADE, related_name="theories")
> theory = models.ForeignKey(to="studies.Theory", on_delete=CASCADE, related_name="experiments",
> limit_choices_to=Q(parent__isnull=False))
> type = models.CharField(null=False, blank=False, choices=InterpretationsChoices.choices, max_length=30)
>

but in the queries you gave, you do (effectively)

interpretations_experiments = (
Interpretation.objects
.filter(type=self.interpretation)
.filter(experiment__in=self.experiments)
.values("experiment")
.annotate(series_name=F("name"))
...
)

but for that to work, Interpretation (the main model of the
QuerySet) needs to have a "name" field, and it doesn't.

alonn

unread,
Mar 1, 2023, 5:19:25 PM3/1/23
to PyWeb-IL
Thanks, the name is from the theory (the Interpretation is a through model between experiment and theory)
and the main model in the queryset is Theory and not as I now see I've misleadingly wrote when I tried to make this simpler for brevity 

parent_theories = Theory.objects.filter(parent__isnull=True)
result =  parent_theories \ .. etc 

 but thanks to your good suggestion before I found the issue. there was a left join on theory, so without a distinct there, you can get multiple theories.. 
I've should've looked at the actual query earlier (which I did, but not copied to a clean place with SQL highlighting and formatting) , sorry for the mess 

And While yes, it's never a bug in PostgreSQL and rarely in Django, it's usually my own code the problem,
I do wish there was a more explicit way to define joins, group by, etc.. 


Reply all
Reply to author
Forward
0 new messages