Datastore queries are unpredictably slow when a limit is applied

620 views
Skip to first unread message

Arjunkumar Udainath

unread,
Mar 4, 2017, 12:01:57 PM3/4/17
to Google App Engine
I have an app that's sort of like a to-do application. Any given user would have "Tasks" that are pending at any given time. The number of tasks pending for a user can range from 0 to 5000. I need to fetch the pending tasks (or at least a small batch) for each user when they load the page.

How do I make sense of these latencies:

Scenario 1:

User A: Has 2500 pending tasks. The query limit is 500 and results fetched on the first request are obviously 500. Time taken: 5767 milliseconds (5.7 seconds).

User B: Has 2 pending tasks. The query limit is 500 and results fetched on the first request are obviously 2. Time taken: 7124 milliseconds (7.1 seconds).

Scenario 2:

User A: Has 2500 pending tasks. The query limit is 10 and results fetched on the first request are obviously 10. Time taken: ~400 milliseconds (1/2 second).

User B: Has 2 pending tasks. The query limit is 10 and results fetched on the first request are obviously 2. Time taken: 5-6 seconds.

I thought I could speed up the query if I knew what the limit would be for users with fewer than 500. So there is a user with 551 tasks. Here's what happened:

Scenario 3:

User A: Has 2500 pending tasks. The query limit is 551 and results fetched on the first request are obviously 551. Time taken: 6251 milliseconds (6 seconds).

User B: Has 551 pending tasks. The query limit is 551 and results fetched on the first request are obviously 551. Time taken: 13053 milliseconds (13 seconds).

I can't find a pattern here. How do I optimize my queries so that users with less data can get their pages loaded quickly.

Adam (Cloud Platform Support)

unread,
Mar 4, 2017, 5:33:39 PM3/4/17
to Google App Engine
Can you let us know what language runtime you're using, preferably with a sample of code which shows your query and fetch options?

Arjunkumar Udainath

unread,
Mar 4, 2017, 5:45:52 PM3/4/17
to Google App Engine
Additional info:
The query has 4 "equals" filters on 4 different properties (3 strings and 1 Boolean). It does not have a sort order applied.

Arjunkumar Udainath

unread,
Mar 5, 2017, 11:12:53 AM3/5/17
to Google App Engine
My replies take a long time to get posted here since I'm new.

My application is written in Java and instance class of my app is F4. There are a total of 922,758 entities of that Kind in the Datastore.

Here's the function:

public static Map <String , Object> getEntitiesUsingQueryCursor( String kind , int limit , int chunkSize , String currentCursor, String account, String user, Boolean status, String dept ) throws Exception
        {

            String nextCursor = null;

            Entity entity = null;

            List <Entity> listOfEntity = new ArrayList <Entity>();

            Map <String , Object> result = new HashMap <String , Object>();


            DatastoreService datastore = DatastoreServiceFactory.getDatastoreService();
            com.google.appengine.api.datastore.Query q = new com.google.appengine.api.datastore.Query( kind );

List <Filter> listOfFilter = new ArrayList <Filter>();
Filter filter1 = new FilterPredicate( "accountID" , FilterOperator.EQUAL ,  account);
Filter filter2 = new FilterPredicate( "assigneeID" , FilterOperator.EQUAL ,  user);
Filter filter3 = new FilterPredicate( "departmentID" , FilterOperator.EQUAL ,  dept);
Filter filter4 = new FilterPredicate( "open" , FilterOperator.EQUAL ,  status); //Boolean
listOfFilter.add( filter1 );
listOfFilter.add( filter2 );
listOfFilter.add( filter3 );
listOfFilter.add( filter4 );
Filter filterParams1 = filterParams = CompositeFilterOperator.and( listOfFilter );
q.setFilter( filter );

            PreparedQuery pq = datastore.prepare( q );
            FetchOptions fetchOptions = FetchOptions.Builder.withLimit(limit).prefetchSize( chunkSize ).chunkSize( chunkSize );// limit & chunksize - 500

            if ( !StringUtil.isBlank( currentCursor ) )
                fetchOptions.startCursor( Cursor.fromWebSafeString( currentCursor ) );

            QueryResultIterable <Entity> results = pq.asQueryResultIterable( fetchOptions );
            QueryResultIterator <Entity> iterator = results.iterator();

            while ( iterator.hasNext() )
                {
                    entity = iterator.next();
                    listOfEntity.add( entity );
                }

            if(listOfEntity.size() == limit)
                nextCursor = iterator.getCursor().toWebSafeString();

            result.put( "cursor" , nextCursor );
            result.put( "entity" , listOfEntity );

            return result;
}

Adam (Cloud Platform Support)

unread,
Mar 6, 2017, 6:57:02 PM3/6/17
to google-a...@googlegroups.com
Datastore query times are not guaranteed to be consistent, as Datastore is a shared, distributed service. Since you're performing 4 equality filters this can add to the variation. The question of how to optimize for speed is a bit broad and depends on your data model. With that being said I can offer a few general suggestions:

1) You can speed up queries, trading extra storage and write costs, by defining manual indexes on properties which are frequently queried. See Datastore Indexes for more details on this as well as the related document Index Selection and Advanced Search. .

2) Related to the above, if you only need to fetch a subset of properties instead of the whole entity you can define indexes on those properties and use Projection Queries, which improves speed by fetching from the index instead of the entity.

3) Use Objectify. Objectify will automatically cache entities in Memcache, and will perform a keys-only query + cached get-by-key when possible (also called a "hybrid" query). Initial queries will be slower with this method but subsequent queries will be significantly faster as the entities will be pulled from Memcache.

4) Try to reduce the actual number of queries to the Datastore if possible. There are a pair of answers on Stack Overflow asked by the same user which mention two strategies for this. One is denormalization, which refers to intentionally duplicating properties across different entities (trading more storage for speed). Another is to prefetch and store in memory the results of a query on one kind and use it as a lookup for another (which offloads some of the filtering from the Datastore to your application, which can be faster in certain situations).

5) If possible, alter your data model to reduce the number of filters required for a query. For example, you may not need to have one entity with several foreign key relationships and instead be able split it into different entities / queries.

Arjunkumar Udainath

unread,
Mar 9, 2017, 8:31:54 AM3/9/17
to Google App Engine
Thanks for your help. Adding the index worked out for us. 

Thank you for providing the other suggestions as well. 
Reply all
Reply to author
Forward
0 new messages