On Fri, Nov 1, 2013, Javier Guerra Giraldez <
jav...@guerrag.com> wrote:
>have you tried eliminating the second IN relationship? something like
>
>entities = entity.get_descendants()
>
>items = BibliographicRecord.objects.filter
>(authored__researcher__person__member_of__entity__in=entities).distinct()
Indeed I have, but in that form it takes around 1770ms, compared to around 1540ms in the original form. What I actually do is:
# breaking apart the queries allows the use of values_lists
entities = self.entity.get_descendants(
include_self=True
).values_list('id', flat=True)
# and the set() here is about 230ms faster than putting a distinct() on
# the first query
researchers = set(Researcher.objects.filter(
person__entities__in=entities
).values_list('person', flat=True))
self.items = BibliographicRecord.objects.listable_objects().filter(
authored__researcher__in=researchers,
).distinct()
I think that's partly because this way the SELECT doesn't have to grab all the fields of publications_bibliographicrecord.
But, the real killer is the combination of ordering (in the queryset or on the model, it doesn't matter) with the distinct() - as soon as one is removed from the equation, the execution time drops to around 250ms.
That's for 55000 BibliographicRecords created by that last operation (before distinct() is applied; distinct() reduces them to 28000).
That seems excessive to me.
BibliographicRecord has a custom primary key, and its id fields look like "d9ce7e2f-663e-4fc6-8448-b214c6915aed:web-of-science". Could that be implicated in performance?
Daniele