Re: bug

11 views
Skip to first unread message

Luke Benstead

unread,
Jun 26, 2024, 4:56:24 AM6/26/24
to Jacob Gur, djangae-users
+ djangae-users so more people have visibility

So in answer to your questions:

If result set is large, then each subsequent chunked query seems it could get progressively slower?

Yes, paginating OR queries is bad for performance but there isn't really any other option than what we're doing. Let me give you an extreme example, say you create 100 entities with a field called page, and you set 10 entities to page 1, 10 to page 2 etc. - so 10 entities on each page.

Now, you run a query like this:

pages = [x for x in range(10)]
qs = MyThings.objects.filter(page__in=pages]).order_by("page")
Paginator(qs, 10).page(1)

This will run 10 queries, one for each page.

So on the first page, we query offset=0, limit=10, so we fetch 10 results for each sub-query of the OR and it turns out that all the results are actually in the first sub-query, so the other 9 queries were pointless (but there's no way for us to know that).

Then, you go for page 2. Which is offset=10, limit=10... your only option here is to NOT apply the offset, because we know that all 10 results we need for page 2 will be in the second subquery, and if we offset that one by 10 we'd skip all the results we need, so instead we have to apply offset=0, limit=20 and filter them in memory.

The merging code in ORQuery.fetch is kind of scary. I didn't try to wrap my head around it. I guess you're comfortable it handles all edge cases

I'm *reasonably* comfortable 😄

I'll look into your bug today and let you know when I've figured out why it's not working on prod but is on the emulator...

On Tue, 25 Jun 2024 at 20:55, Jacob Gur <ja...@fareclock.com> wrote:
Sorry for bugging you. A couple other things:
1) Just curious about using offset approach for merging OR queries, and how that affects performance. If result set is large, then each subsequent chunked query seems it could get progressively slower?
2) The merging code in ORQuery.fetch is kind of scary. I didn't try to wrap my head around it. I guess you're comfortable it handles all edge cases :-)

Thanks and much appreciated!



On Tue, Jun 25, 2024 at 3:45 PM Jacob Gur <ja...@fareclock.com> wrote:
Here's screenshot of diff for visualization:

image.png



On Tue, Jun 25, 2024 at 3:43 PM Jacob Gur <ja...@fareclock.com> wrote:
Also for some reason, it does not repro in unit test. Only in cloud datastore.


On Tue, Jun 25, 2024 at 3:42 PM Jacob Gur <ja...@fareclock.com> wrote:
Hi Luke!

It looks like ChunkedResultset._fetch has an infinite loop bug, or maybe its caller does based on what it yields. Was seeing it occur on a query like this:

SELECT (*) FROM timeclock_tcemployee WHERE (active=True AND account_id=716859318297101 AND rolesarray_contains_all['worker'] AND labelsarray_contains3191065176237289) OR (active=True AND account_id=716859318297101 AND rolesarray_contains_all['worker'] AND labelsarray_contains966208829868309) OR (active=True AND account_id=716859318297101 AND rolesarray_contains_all['worker'] AND labelsarray_contains4365024065786519) OR (active=True AND account_id=716859318297101 AND rolesarray_contains_all['worker'] AND labelsarray_contains1443946820361155)

Changing it to the following resolves the issue:

def _fetch(self):
total = self.limit or math.inf

offset = 0
limit = self.chunk_size
while offset < total:
entities = [x for x in self.query.fetch(offset=offset, limit=limit)]

for entity in entities:
yield entity

if len(entities) < limit:
break

offset += self.chunk_size

Thanks!


--

--
Luke Benstead - Head of Engineering
p.ota.to - Interested in working with us? We're hiring!

   

Potato London Limited a company registered in England and Wales with company number 07178897 at 18 Upper Ground Sea Containers, London, England, SE1 9GL VAT Reg No GB988351763. This e-mail communication, including any attachment, is intended only for the individual(s) or entity named above and to others who have been specifically authorised to receive it. Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of Potato London Limited shall be understood as neither given nor endorsed by it.

Jacob Gur

unread,
Jun 26, 2024, 5:05:43 AM6/26/24
to Luke Benstead, djangae-users
Thanks Luke!

But it seems that the offset/limit code is always used to chunk fetch calls when a queryset has more than 50 entities regardless of whether the caller specified an offset/limit. Am I understanding that correctly? If so, there are repeated offset fetches for each chunk that get increasingly large. The limit bug I referenced earlier exacerbates the issue, since the code is using it like a high-mark value instead of a limit.

Luke Benstead

unread,
Jun 26, 2024, 8:13:13 AM6/26/24
to Jacob Gur, djangae-users
Oh! No, that's slightly different, we retrieve the query in chunks to avoid using up too much ram while we merge the multiple resultsets.

However, a hard limit of 50 is bad for performance so I'm fixing that now. I'm still unsure what the infinite loop is...

Luke Benstead

unread,
Jun 26, 2024, 8:35:49 AM6/26/24
to Jacob Gur, djangae-users

I'm still confused why you were getting an infinite loop; that loop should only break when fetch stops returning entities. I've refactored it a bit to make it a bit cleaner so it might resolve it...

Jacob Gur

unread,
Jun 26, 2024, 11:07:26 AM6/26/24
to Luke Benstead, djangae-users
Thanks Luke.

I guess I'm confused about how ChunkedResultset._fetch sets:
limit = offset + self.chunk_size
Should limit just be self.chunk_size?

Luke Benstead

unread,
Jun 26, 2024, 11:35:17 AM6/26/24
to Jacob Gur, djangae-users
Oh!! Yes you're right! I'll fix in a bit :)

Jacob Gur

unread,
Jun 26, 2024, 11:40:07 AM6/26/24
to Luke Benstead, djangae-users
Thank you Luke!
Reply all
Reply to author
Forward
0 new messages