Python: iterating over a large dataset

53 views
Skip to first unread message

Andrea Cimino

unread,
Sep 14, 2016, 6:18:42 AM9/14/16
to Google App Engine
Hope this is the right place to ask this question:

I am using DjangoAppengine, a wrapper arround Python Appengine which provides the ability to work with Django on Appengine.
I am trying to query a table which has more 10.000 entries. Unfortunately appengine raises this error:

Timeout: The datastore operation timed out, or the data was temporarily unavailable.

I am doing my query on a backend, to avoid the 60 seconds limit which is imposed by the standard
machines. I think that there is anyway a 60 seconds time limit on datastore queries.
I tried to understand what happens by using appstats, which provides the chart attached in the screenshot.
As you can see there are a lot if Next(). I thought this would run the next batches in order to get all the results
and avoiding the 60 seconds query limit. But this does not seem the case.

I know that cursors should be "the way" to avoid this issue.
This is a snippet from the google provided documentation at:

# Start a query for all Person entities
people
= Person.all()
   
# If the application stored a cursor during a previous request, use it
person_cursor
= memcache.get('person_cursor')
if person_cursor:
  people
.with_cursor(start_cursor=person_cursor)
   
# Iterate over the results
for person in people:
 
# Do something
   

The question is, when i use the person in people statement,
should i use break from the for statement and the collect the cursor to avoid the timeout issue?
From the code above seems that I iterate over all the "Person" object without caring about
the cursor.

Thanks,
Andrea




Screenshot_20160914_121033.png

Kaan Soral

unread,
Sep 15, 2016, 2:45:52 AM9/15/16
to Google App Engine
Without going into the code, you should

a) Fetch ~100
b) Update Cursor
c) Stop if you are close to the time limit

So that you can re-run the routine with the cursor later on

So basically, instead of `for person in people:` - do a while loop and manually fetch entities, instead of using the `in` operator for the query
Reply all
Reply to author
Forward
0 new messages