Django query not returning all objects

238 views
Skip to first unread message

Koonal Bharadwaj

unread,
Aug 18, 2016, 1:27:42 AM8/18/16
to Django users
Hello,

The issue:

When trying to order_by on a related model duplicate entries are created. I solved this with an OrderedSet that is applied after paginating (http://code.activestate.com/recipes/576694/). However, when I try to paginate the queryset, all the results are not returned. It's missing a few table rows. 

Models:
class HeartFlowCase(models.Model):
   
"""
 
    A Hearflow Case.
 
    This serves at the true state of a case as it is processed through the system.
    It also holds the results of each step of the processing.

    """

   
# Primary
    hf_id = models.CharField('HeartFlow ID', max_length=200, blank=True, unique=True)
    canonical_data
= models.ForeignKey('cases.CaseData', to_field='uuid', related_name="canonical_data")


class CaseData(models.Model):
   
"""
    Extracted and computed values related to a single processing run of a case.

    A HeartFlowCase can have multiple runs associated with it.
    The one which is accepted to be the "clinically accurate" one is the one referred to as 'canonical_data' by
    the HeartFlowCase itself.

    """

   
# Primary
    heartflowcase = models.ForeignKey(HeartFlowCase, related_name='data', blank=True, null=True)
    uuid
= models.CharField('Case Data UUID', max_length=200, blank=False, null=False, unique=True)
    deadline
= models.DateTimeField('Deadline', blank=True, default=get_default_deadline)


As you can see, there is a ForeignKey to canonical CaseData from HeartFlowCase and a ForeignKey to HeartFlowCase from CaseData. So you can have multiple CaseDatas per HeartFlowCase. 

Structure:
HeartFlowCase
    |
    data - CaseData1
        \
         \ CaseData2

For example: 

Total number of HeartFlow objects are 5 with 2 CaseDatas each. If I order_by deadline on CaseData as:
cases = HeartFlowCase.objects.all().order_by(data__deadline)
this returns duplicates since there are multiple CaseDatas, which is fine. Now I try and paginate it by applying:
paginator = Paginator(cases, 2)
cases
= paginator.page(1)

Now the SQL query has a LIMIT and OFFSET given. If the order_by gives duplicate HeartFlowCase objects this hits the LIMIT number and the results that are returned are missing a HeartFlowCase object. So if 2 duplicates are returned per case after applying the OrderedSet I get only one case back and missing one case since it was never returned from the database. As I goto the next page:
cases = paginator.page(2)
that missingHeartFlowCase object that was not returned from the first page queryset is lost forever and is never displayed. 

I hope this is clear. Please let me know if I can clarify further. Any help would greatly be appreciated. Thanks.

Constantine Covtushenko

unread,
Aug 18, 2016, 4:59:19 AM8/18/16
to django...@googlegroups.com
Hi Koonal,

As said in django doc you can use `distinct()` to remove duplicated rows from first query.

I believe with this your pagination should works as expected.

Regards,


--
You received this message because you are subscribed to the Google Groups "Django users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.
To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-users/61ec03f6-3325-49fe-bcdc-a7ca50784dc0%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Todor Velichkov

unread,
Aug 18, 2016, 6:20:34 AM8/18/16
to Django users, constantine...@gmail.com
Another solution would be to annotate min/max deadline date and order by the annotation. Something like:

cases = HeartFlowCase.objects.all().annotate(min_deadline=Min('data__deadline')).order_by('min_deadline')
To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

Koonal Bharadwaj

unread,
Aug 18, 2016, 9:19:35 PM8/18/16
to Django users, constantine...@gmail.com
Hi Constantine,

Thanks for the quick reply. 

No luck with distinct. I tried that before (cases = cases.distinct() ), right after the order_by but it seems to have no effect, returning multiple duplicates.

To unsubscribe from this group and stop receiving emails from it, send an email to django-users...@googlegroups.com.

Koonal Bharadwaj

unread,
Aug 18, 2016, 10:05:50 PM8/18/16
to Django users, constantine...@gmail.com
Hi Todor,

I have implemented something similar:
cases = HeartFlowCase.objects.all().annotate(Count('hf_id', distinct=True))

I also tried your suggestion and both seem to produce the same results. The number of duplicates is minimized but not removed entirely. Also the same problem when applying it to pagination still persists. 

Thanks for the suggestion. 

Simon Charette

unread,
Aug 19, 2016, 4:10:30 PM8/19/16
to Django users, constantine...@gmail.com
For reference there's a Django ticket to suggest using this technique.

Cheers,
Simon

[1] https://code.djangoproject.com/ticket/19842

Constantine Covtushenko

unread,
Aug 19, 2016, 9:27:22 PM8/19/16
to django...@googlegroups.com
Hi Koonal,

Sorry for not exact suggestion.

I hope that combination of  `annotate` and `values` will be helpful, see this link for more detailed explanation.

Pleas try something like following:

cases = HeartFlowCase.objects.values(‘all fields required in your view’).annotate(min_deadline=Min('data__deadline')).order_by('min_deadline')

It is the Todor's suggestion extended with value method run.

I believe that you need `GROUP BY` in your SQL query and it generated by `values` as said in Django documentation.


To unsubscribe from this group and stop receiving emails from it, send an email to django-users+unsubscribe@googlegroups.com.

To post to this group, send email to django...@googlegroups.com.
Visit this group at https://groups.google.com/group/django-users.
Reply all
Reply to author
Forward
0 new messages