Very slow fetch page queries

Richard Cheesmar

unread,

Aug 31, 2017, 4:31:20 AM8/31/17

to Google App Engine

The following query seems to take an way too long
    
    t0 = time.time()
    
    # Fetch the entities and return result with cursor
    result, next_cursor, more = MyModel.query(MyModel.active == True, MyModel.offline == False).fetch_page(pagination, start_cursor=cursor)
    
    logging.info(time.time() - t0)



On the local development machine it takes nearly 2 seconds and there are less than 150 entities in the model

When live with around 800 entities it takes nearly 2 seconds to retrieve the first 250 entities and then around 4 seconds to retrieve the remainder. There are two separate requests to retrieve each set on the live machine. 

I've set the time on the query only, therefore no other processing is taken into account in these timings.


Enter code here...

Yannick (Cloud Platform Support)

unread,

Aug 31, 2017, 4:53:41 PM8/31/17

to Google App Engine

Hello Richard, since you're not mentioning it could you tell us whether or not you've defined indexes for this specific query? There's also no indication of how large these entities are. Do you get the same issues for other kind of entities?

You should read through the queries best practices and make sure you're following them all.

Richard Cheesmar

unread,

Sep 1, 2017, 4:29:33 AM9/1/17

to Google App Engine

Hi, Yannick,

Yes they are fairly large and have several pickleProperties with lists of properties and some json elements. Yes an Indexes are there for the two query params active and offline in that model

Richard Cheesmar

unread,

Sep 1, 2017, 4:42:49 AM9/1/17

to Google App Engine

Yannick,

Could you please point out why the reference to query best practices when the query is clearly defined above. It's a simple page fetch with two conditions, and as described is taking 2 seconds on average to fetch 250 results and 4 seconds for 550 results on the live version...

At the moment after the first time these are fetched I'm placing the results in memcache for subsequent calls. However, I am using the free memcache as the data size does not warrant the price for the minimum of 1 gig fixed memcache, and the free memcache is not guaranteed so there are times when I have to resort to the query. As it is, these times are unacceptable and will surely be a bigger problem when data sizes grow. I will of course resort to aync calls when each fetch reaches 1000, I'll split those into 100 batches, but still why is it so slow for even 250 results?

If I can understand why this is occurring then I can try to figure out a way around it, maybe it is a bottleneck in the way my model is formed, it's size of something else. However, if I am simply querying on the two entities as above, why would that effect it?

Thanks

Richard Cheesmar

unread,

Sep 1, 2017, 7:03:43 AM9/1/17

to Google App Engine

Interestingly, I have another query which is identical except for one more entity comparison

result, next_cursor, more = MyModel.query(MyModel.active == True, MyModel.offline == False,
 MyModel.location.alpha2 == alpha2).fetch_page(pagination, start_cursor=cursor)

This query is infinitely faster.

Are Boolean model properties indexed by default?

Yannick (Cloud Platform Support)

unread,

Sep 1, 2017, 2:04:30 PM9/1/17

to Google App Engine

Hello Richard,

I referred you to the best practices because depending on your use case any one of them might be the key to your performance issues. In Cloud Datastore there are only 2 factors that determine the speed of your query.

The first of those is indexing, keeping in mind that you cannot do a query without an index existing for that query. By default every single property of your entities has two indexes created for it, one for ascending and one for descending. Any query using only one of these fields will be fast and no further optimization is required.

If your query combines 2 fields or more with equality filters, as is the case in your example, those single property indexes will be queried separately and then merged, which is not a fast operation. That is why you need to make sure that you not only have indexes for both of those properties, but you also need to have a composite index explicitly defined with active and model in that order for your MyModel kind.

So something that could explain the difference in speed is if you do not have an explicit composite index for the slow query, but do have one for the fast query.

The second factor that determines the speed of your query is the total size of the dataset returned by your query. This total size is determined both by the number of entities returned and the size of those entities. Given this it is normal that a query for a result set twice as large would take twice as long. The other thing to consider is that you should only be querying for fields that you are going to be using.

For instance, as explained in the best practices, if the only fields you care about are active and online then you could use a projection query to retrieve only those fields.

I hope this helps answer your questions.

Richard Cheesmar

unread,

Sep 2, 2017, 12:41:35 PM9/2/17

to Google App Engine

Hi, Yannick,

I have solved the issue :).

What I had to do was take the main property of the model entities that I need to send to the client and and save them to another new model.

The subsequent query does not require any equality tests on properties just a straight fetch_page and returns the larger set of data in 0.000469923019409 ms compared to the 4 seconds it took the other query.

So the problem obviously lies in the amount of data that the model entities contain and have to retrieve. Is there some measurement for the degradation of speed in relationship to entity size documented somewhere?

The only downfall in this is that I had to do a large amount of redesign and have the same data in two places. The later of which cannot be avoided without major redesign.

Anyway, I'm a happy dev this evening.

Alex Martelli

unread,

Sep 2, 2017, 2:10:15 PM9/2/17

to google-a...@googlegroups.com

To fetch only some of an entity's properties, consider a projection query, e.g https://cloud.google.com/appengine/docs/standard/python/datastore/projectionqueries

Alex

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine+unsubscribe@googlegroups.com.
To post to this group, send email to google-appengine@googlegroups.com.
Visit this group at https://groups.google.com/group/google-appengine.
To view this discussion on the web visit https://groups.google.com/d/msgid/google-appengine/bc0d605e-4f12-49af-8cdf-9a0c1f53b097%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Richard Cheesmar

unread,

Sep 3, 2017, 1:44:55 AM9/3/17

to Google App Engine

I couldn't use a projection query on the PickleProperty I wanted it was too large to index

To unsubscribe from this group and stop receiving emails from it, send an email to google-appengi...@googlegroups.com.
To post to this group, send email to google-a...@googlegroups.com.

Visit this group at https://groups.google.com/group/google-appengine.

Yannick (Cloud Platform Support)

unread,

Sep 4, 2017, 9:26:08 AM9/4/17

to Google App Engine

Hello Richard, I'm glad to hear you were able to resolve your issue. There is no explicitly documented relation between the latency and the size of the result set, other than to say the former is dependent on the latter. This should effectively be a linear relationship.

Reply all

Reply to author

Forward