Also, gcloudc turned a django query into 2 datastore queries based on a filter property__overlap=['value1', 'value2']. Should it just use datastore's new native IN operator? gcloudc would still need to branch if there are more than 10 values.Thanks!On Wed, Jul 3, 2024 at 2:05 PM Jacob Gur <ja...@fareclock.com> wrote:Also the cost in latency and billing is huge.On Wed, Jul 3, 2024 at 2:05 PM Jacob Gur <ja...@fareclock.com> wrote:Thanks! Yeah, it seems like Google steers people away from offsets, so they probably haven't paid much attention to bugs with it.On Wed, Jul 3, 2024 at 2:03 PM Luke Benstead <lu...@potatolondon.com> wrote:This looks like a massive bug! I'll look into it a bit more tomorrow.As for cursors: they don't work for all queries so they'd have to be conditional (we should definitely make use of them though!)On Wed, 3 Jul 2024, 18:57 Jacob Gur, <ja...@fareclock.com> wrote:I think I found the issue, and it has to do with running offset queries on datastore:I wrote a local Python script against datastore with a query that fetches 16 entities. Then I added a fetch offset of 2000, and it took like 2-3 minutes to run and still returned 8 entities. I would have expected it to return pretty quickly with no results. But it does not do that. See script below with some redacted values.With this behavior then it seems that ChunkedResultset._fetch could turn into an infinite loop. In fact, for this very same query running on GAE with logging statement it logs the following:> INFO 2024-07-03T16:50:49.633830Z datastore fetch query=<google.cloud.datastore.query.Query object at 0x3e7f82590b60> filters=[<active = 'True'>, <account_id = '111'>, <roles = 'worker'>, <labels = '222'>] offset=0 limit=2000
> INFO 2024-07-03T16:50:51.247278Z datastore fetch query=<google.cloud.datastore.query.Query object at 0x3e7f82590b60> filters=[<active = 'True'>, <account_id = '111'>, <roles = 'worker'>, <labels = '222'>] offset=2000 limit=2000
> INFO 2024-07-03T16:52:33.535224Z datastore fetch query=<google.cloud.datastore.query.Query object at 0x3e7f82590b60> filters=[<active = 'True'>, <account_id = '111'>, <roles = 'worker'>, <labels = '222'>] offset=4000 limit=2000
> INFO 2024-07-03T16:55:52.074123Z datastore fetch query=<google.cloud.datastore.query.Query object at 0x3e7f82590b60> filters=[<active = 'True'>, <account_id = '111'>, <roles = 'worker'>, <labels = '222'>] offset=6000 limit=2000As you can see it just keeps looping apparently because it's yielding some results (unexpectedly), and eventually GAE times out.It seems like Datastore has some bug with offsets, and their docs say to avoid them:Should we perhaps just remove that optimization?
from google.cloud import datastore
def run():
ds_client = datastore.Client(project='my-project')
query = ds_client.query(kind='my-model')
query.add_filter(filter=datastore.query.PropertyFilter('active', '=', True))
query.add_filter(filter=datastore.query.PropertyFilter('account_id', '=', 1111))
query.add_filter(filter=datastore.query.PropertyFilter('roles', '=', 'worker'))
query.add_filter(filter=datastore.query.PropertyFilter('labels', '=', 222))
count = 0
for entity in query.fetch(offset=2000):
count += 1
print(f'Fetched {count} entities')
if __name__ == '__main__':
run()On Wed, Jul 3, 2024 at 1:27 PM Jacob Gur <ja...@fareclock.com> wrote:I need to add some more logging. Let me get back to you. Thanks.On Wed, Jul 3, 2024 at 1:19 PM Luke Benstead <lu...@potatolondon.com> wrote:Hi Jacob,I think there's a problem with your fix where if it returns exactly the number of items as the limit, it'll go around the loop again and hit the same problem... I think your fix reduces the chance of hitting it but doesn't totally fix it ...Any idea what is being returned when you hit the infinite loop? What the yielded count is? This is very odd...On Wed, 3 Jul 2024, 18:16 Jacob Gur, <ja...@fareclock.com> wrote:Hi Luke!There still appears to be a query loop bug inside the ChunkedResultset._fetch method. See screenshot diff below for the line of code, and what it needs to change to. The impact of this bug is that it will do an extra loop with a large offset. I don't know why, but even if there are less results than the offset (i.e., not results at that offset), the datastore query with a large offset will take a very long time until GAE raises the deadline exceeded exception.Ignore the logging statement there, which I added for debugging.Thanks!
![]()
![]()
![]()
![]()
Potato London Limited a company registered in England and Wales with company number 07178897 at 18 Upper Ground Sea Containers, London, England, SE1 9GL VAT Reg No GB988351763. This e-mail communication, including any attachment, is intended only for the individual(s) or entity named above and to others who have been specifically authorised to receive it. Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of Potato London Limited shall be understood as neither given nor endorsed by it.
![]()
![]()
![]()
![]()
Potato London Limited a company registered in England and Wales with company number 07178897 at 18 Upper Ground Sea Containers, London, England, SE1 9GL VAT Reg No GB988351763. This e-mail communication, including any attachment, is intended only for the individual(s) or entity named above and to others who have been specifically authorised to receive it. Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of Potato London Limited shall be understood as neither given nor endorsed by it.
Potato London Limited a company registered in England and Wales with company number 07178897 at 18 Upper Ground Sea Containers, London, England, SE1 9GL VAT Reg No GB988351763. This e-mail communication, including any attachment, is intended only for the individual(s) or entity named above and to others who have been specifically authorised to receive it. Privileged/Confidential Information may be contained in this message. If you are not the addressee indicated in this message (or responsible for delivery of the message to such person), you may not copy or deliver this message to anyone. In such case, you should destroy this message and kindly notify the sender by reply email. Please advise immediately if you or your employer does not consent to email for messages of this kind. Opinions, conclusions and other information in this message that do not relate to the official business of Potato London Limited shall be understood as neither given nor endorsed by it.
--
You received this message because you are subscribed to the Google Groups "djangae-users" group.
To unsubscribe from this group and stop receiving emails from it, send an email to djangae-user...@googlegroups.com.
To view this discussion on the web, visit https://groups.google.com/d/msgid/djangae-users/CAArNv2JnZsuXdWxcW%2BbhSdDPnwsQBFKOfMwvLmSCSHYkgwKj1A%40mail.gmail.com.
import time
from google.cloud import datastore
def run():
ds_client = datastore.Client(project='my-project')
query = ds_client.query(kind='my-kind')
query.add_filter(filter=datastore.query.PropertyFilter('active', '=', True))
query.add_filter(filter=datastore.query.PropertyFilter('account_id', '=', 111))
query.add_filter(filter=datastore.query.PropertyFilter('roles', '=', 'worker'))
query.add_filter(filter=datastore.query.PropertyFilter('labels', '=', 222))
start = time.time()
count = 0
for entity in query.fetch():
count += 1
end = time.time()
print(f'Fetched all {count} entities in {end - start} seconds')
real_count = count
for offset in range(0, real_count + 2):
start = time.time()
count = 0
for entity in query.fetch(offset=offset):
count += 1
end = time.time()
print(f'Fetched at offset={offset} {count} entities in {end - start} seconds')
if __name__ == '__main__':
run()
(env) [tools] $ python development/test_query.py
Fetched all 16 entities in 1.8421251773834229 seconds
Fetched at offset=0 16 entities in 1.5357460975646973 seconds
Fetched at offset=1 15 entities in 1.4241530895233154 seconds
Fetched at offset=2 14 entities in 1.3997318744659424 seconds
Fetched at offset=3 13 entities in 1.454319953918457 seconds
Fetched at offset=4 12 entities in 1.4750661849975586 seconds
Fetched at offset=5 11 entities in 1.4487709999084473 seconds
Fetched at offset=6 10 entities in 1.6332170963287354 seconds
Fetched at offset=7 9 entities in 1.4814810752868652 seconds
Fetched at offset=8 8 entities in 1.4843201637268066 seconds
Fetched at offset=9 7 entities in 1.4107389450073242 seconds
Fetched at offset=10 6 entities in 1.2657389640808105 seconds
Fetched at offset=11 5 entities in 1.359346866607666 seconds
Fetched at offset=12 4 entities in 1.2958650588989258 seconds
Fetched at offset=13 15 entities in 2.125680923461914 seconds
Fetched at offset=14 14 entities in 2.1337952613830566 seconds
Fetched at offset=15 13 entities in 2.309364080429077 seconds
Fetched at offset=16 12 entities in 2.101933002471924 seconds
Fetched at offset=17 11 entities in 2.0870232582092285 seconds
Fetched all 47 entities in 2.624439001083374 seconds
Fetched at offset=0 47 entities in 2.8041152954101562 seconds
Fetched at offset=1 46 entities in 2.3104848861694336 seconds
Fetched at offset=2 45 entities in 2.514388084411621 seconds
Fetched at offset=3 44 entities in 2.5804989337921143 seconds
Fetched at offset=4 43 entities in 2.185914993286133 seconds
Fetched at offset=5 42 entities in 2.1105079650878906 seconds
Fetched at offset=6 41 entities in 1.9990060329437256 seconds
Fetched at offset=7 40 entities in 1.9735119342803955 seconds
Fetched at offset=8 39 entities in 1.8566601276397705 seconds
Fetched at offset=9 38 entities in 1.9719581604003906 seconds
Fetched at offset=10 37 entities in 1.9301698207855225 seconds
Fetched at offset=11 36 entities in 2.037950038909912 seconds
Fetched at offset=12 35 entities in 1.9101719856262207 seconds
Fetched at offset=13 34 entities in 2.175400972366333 seconds
Fetched at offset=14 33 entities in 1.993534803390503 seconds
Fetched at offset=15 32 entities in 2.045619010925293 seconds
Fetched at offset=16 31 entities in 1.7902143001556396 seconds
Fetched at offset=17 30 entities in 1.9727048873901367 seconds
Fetched at offset=18 29 entities in 1.8399991989135742 seconds
Fetched at offset=19 28 entities in 1.8167150020599365 seconds
Fetched at offset=20 27 entities in 1.9758448600769043 seconds
Fetched at offset=21 26 entities in 1.971527099609375 seconds
Fetched at offset=22 25 entities in 1.9573183059692383 seconds
Fetched at offset=23 24 entities in 2.007323741912842 seconds
Fetched at offset=24 23 entities in 2.0852410793304443 seconds
Fetched at offset=25 22 entities in 1.9327712059020996 seconds
Fetched at offset=26 21 entities in 2.124237060546875 seconds
Fetched at offset=27 20 entities in 2.164341688156128 seconds
Fetched at offset=28 19 entities in 2.2032740116119385 seconds
Fetched at offset=29 18 entities in 2.085136890411377 seconds
Fetched at offset=30 17 entities in 2.1753549575805664 seconds
Fetched at offset=31 16 entities in 2.2737879753112793 seconds
Fetched at offset=32 15 entities in 2.457875967025757 seconds
Fetched at offset=33 14 entities in 2.5094480514526367 seconds
Fetched at offset=34 13 entities in 2.2802958488464355 seconds
Fetched at offset=35 12 entities in 2.217987060546875 seconds
Fetched at offset=36 11 entities in 2.2501230239868164 seconds
Fetched at offset=37 46 entities in 3.4906251430511475 seconds
Fetched at offset=38 45 entities in 3.5711519718170166 seconds
Fetched at offset=39 44 entities in 3.6711418628692627 seconds
Fetched at offset=40 43 entities in 3.581292152404785 seconds
Fetched at offset=41 42 entities in 3.4699208736419678 seconds
Fetched at offset=42 41 entities in 3.5942070484161377 seconds
Fetched at offset=43 40 entities in 3.5041091442108154 seconds
Fetched at offset=44 39 entities in 3.6253280639648438 seconds
Fetched at offset=45 38 entities in 3.8075268268585205 seconds
Fetched at offset=46 37 entities in 3.6504292488098145 seconds
Fetched at offset=47 36 entities in 3.92195725440979 seconds
Fetched at offset=48 35 entities in 3.9876527786254883 seconds
Fetched all 47 entities in 1.096101999282837 seconds
Fetched at offset=0 47 entities in 0.8123199939727783 seconds
Fetched at offset=1 46 entities in 0.7693791389465332 seconds
Fetched at offset=2 45 entities in 0.7635159492492676 seconds
Fetched at offset=3 44 entities in 0.8107190132141113 seconds
Fetched at offset=4 43 entities in 0.8266890048980713 seconds
Fetched at offset=5 42 entities in 0.8234360218048096 seconds
Fetched at offset=6 41 entities in 0.7394850254058838 seconds
Fetched at offset=7 40 entities in 0.9313828945159912 seconds
Fetched at offset=8 39 entities in 0.9470729827880859 seconds
Fetched at offset=9 38 entities in 0.8133449554443359 seconds
Fetched at offset=10 37 entities in 0.8025968074798584 seconds
Fetched at offset=11 36 entities in 0.7505018711090088 seconds
Fetched at offset=12 35 entities in 0.7745420932769775 seconds
Fetched at offset=13 34 entities in 0.7269790172576904 seconds
Fetched at offset=14 33 entities in 0.7658612728118896 seconds
Fetched at offset=15 32 entities in 0.7415847778320312 seconds
Fetched at offset=16 31 entities in 0.8065919876098633 seconds
Fetched at offset=17 30 entities in 0.6893661022186279 seconds
Fetched at offset=18 29 entities in 0.7347326278686523 seconds
Fetched at offset=19 28 entities in 0.6246979236602783 seconds
Fetched at offset=20 27 entities in 0.5889348983764648 seconds
Fetched at offset=21 26 entities in 0.5839731693267822 seconds
Fetched at offset=22 25 entities in 0.5588181018829346 seconds
Fetched at offset=23 24 entities in 0.60282301902771 seconds
Fetched at offset=24 23 entities in 0.623826265335083 seconds
Fetched at offset=25 22 entities in 0.544029951095581 seconds
Fetched at offset=26 21 entities in 0.5274021625518799 seconds
Fetched at offset=27 20 entities in 0.5954887866973877 seconds
Fetched at offset=28 19 entities in 0.6200449466705322 seconds
Fetched at offset=29 18 entities in 0.546410083770752 seconds
Fetched at offset=30 17 entities in 0.5713531970977783 seconds
Fetched at offset=31 16 entities in 0.5711748600006104 seconds
Fetched at offset=32 15 entities in 0.7145779132843018 seconds
Fetched at offset=33 14 entities in 0.5594689846038818 seconds
Fetched at offset=34 13 entities in 0.5883598327636719 seconds
Fetched at offset=35 12 entities in 0.5411069393157959 seconds
Fetched at offset=36 11 entities in 0.6010549068450928 seconds
Fetched at offset=37 10 entities in 0.7733020782470703 seconds
Fetched at offset=38 9 entities in 0.7477700710296631 seconds
Fetched at offset=39 8 entities in 0.7131767272949219 seconds
Fetched at offset=40 7 entities in 0.6722078323364258 seconds
Fetched at offset=41 6 entities in 0.6086330413818359 seconds
Fetched at offset=42 5 entities in 0.6574337482452393 seconds
Fetched at offset=43 4 entities in 0.6047208309173584 seconds
Fetched at offset=44 3 entities in 0.6393558979034424 seconds
Fetched at offset=45 2 entities in 0.6811807155609131 seconds
Fetched at offset=46 1 entities in 0.6426198482513428 seconds
Fetched at offset=47 0 entities in 1.0678982734680176 seconds
Fetched at offset=48 0 entities in 0.6798508167266846 seconds
Iterating pages doesn't seem to help:def _create_query(ds_client: datastore.Client) -> datastore.Query:
query = ds_client.query(kind='my_kind')
query.add_filter(filter=datastore.query.PropertyFilter('active', '=', True))
query.add_filter(filter=datastore.query.PropertyFilter('account_id', '=', 111))
query.add_filter(filter=datastore.query.PropertyFilter('roles', '=', 'worker'))
query.add_filter(filter=datastore.query.PropertyFilter('labels', '=', 222))
return query
def _iterate_query_iterator(query_iterator):
start = time.time()
count = 0
for page in query_iterator.pages:
for entity in page:
count += 1
end = time.time()
return count, (end - start)
def run():
ds_client = datastore.Client(project='my-project')
count, elapsed = _iterate_query_iterator(_create_query(ds_client).fetch())
print(f'Fetched all {count} entities in {elapsed} seconds')
real_count = count
for offset in range(0, real_count + 2):
count, elapsed = _iterate_query_iterator(_create_query(ds_client).fetch(offset=offset))
print(f'Fetched at offset={offset} {count} entities in {elapsed} seconds')
if __name__ == '__main__':
run()
Fetched all 16 entities in 1.5504603385925293 secondsFetched at offset=0 16 entities in 1.3184325695037842 seconds
Fetched at offset=1 15 entities in 1.3266000747680664 seconds
Fetched at offset=2 14 entities in 1.549025058746338 seconds
Fetched at offset=3 13 entities in 1.5165982246398926 seconds
Fetched at offset=4 12 entities in 1.3854012489318848 seconds
Fetched at offset=5 11 entities in 1.448840856552124 seconds
Fetched at offset=6 10 entities in 1.2203941345214844 seconds
Fetched at offset=7 9 entities in 1.526397943496704 seconds
Fetched at offset=8 8 entities in 1.5369889736175537 seconds
Fetched at offset=9 7 entities in 1.2339091300964355 seconds
Fetched at offset=10 6 entities in 1.290881633758545 seconds
Fetched at offset=11 5 entities in 1.2578182220458984 seconds
Fetched at offset=12 4 entities in 1.3468360900878906 seconds
Fetched at offset=13 15 entities in 2.0339272022247314 seconds
Fetched at offset=14 14 entities in 1.9105310440063477 seconds
Fetched at offset=15 13 entities in 2.0260910987854004 seconds
Fetched at offset=16 12 entities in 1.8350520133972168 seconds
Fetched at offset=17 11 entities in 2.0160040855407715 secondsI also opened a Google Cloud Support ticket over this issue. It seems like a backend issue using offset with zig-zag merge and list properties.On Fri, Jul 5, 2024 at 2:36 PM Luke Benstead <lu...@potatolondon.com> wrote:I've created a MR with cursor support here: https://gitlab.com/potato-oss/google-cloud/django-gcloud-connectors/-/merge_requests/161I've also slightly changed how we iterate the query (using the `.pages` attribute of the iterator, rather than iterating the iterator itself) - I have no idea if it will work around this but it's worth a try...
On Fri, 5 Jul 2024 at 19:22, Alessandro Artoni <art...@potatolondon.com> wrote:Good luck with that - last time we reported something we got an answer (not a fix) 1 year later 😬I'm AFK, but would be good to see if we can monkey-patch a fix or work around the bug in some way