Querying a small amount of data (30 entities) from datastore is excessively slow.

6 views
Skip to first unread message

John

unread,
Jun 4, 2008, 12:33:22 AM6/4/08
to Google App Engine
Hi There,

I am having trouble determining why getting a small amount data from
the datastore is excessively slow. I have done a fair bit of debugging
to try to isolate what is happening but can't seem to make any
progress.

I am writing an RFID application which receives messages from 1 or
more RFID devices and uses the data to triangulate the location of an
RFID tag. I have a page which I use for debugging my app which loads
the 30 most recent messages received from the RFID gateways. My
problem is that the page that loads the messages loads extremely slow.

In my case I have approximately 1100 messages in my datastore on my
machine (I have been using the SDK) and it is taking 14 seconds to
load the page. Initially I thought that perhaps there was something
about the SDK that was causing things to run slow so I tried uploading
my application to the server but it still runs slow and I am getting
messages in the logs saying:

"This request used a high amount of CPU, and was roughly 2.1 times
over the average request CPU limit. High CPU requests have a small
quota, and if you exceed this quota, your app will be temporarily
disabled."

What is strange is that when I view the contents of the datastore
using the Data Viewer, the page loads extremely quickly. I can't see
why what I am doing is any different than using the data viewer.

Here is some example code:

class RFindRFIDMessage(db.Model):
xml = db.TextProperty()
date = db.DateTimeProperty(auto_now_add=True)
type = db.IntegerProperty()
tagID = db.IntegerProperty()
state = db.IntegerProperty()
dateTimeReaderReceived = db.DateTimeProperty()
rawSignalStrength = db.IntegerProperty()
signalStrengthDBm = db.IntegerProperty()
gatewayID = db.IntegerProperty()
# References
gateway = db.ReferenceProperty(Gateway)
tag = db.ReferenceProperty(Tag)

The code that loads the messages is as follows:

class Messages(webapp.RequestHandler):
def get(self):
messagesQuery = RFindRFIDMessage.all().order('-date')
messages = messagesQuery.fetch(30)

templateValues = {
'messages': messages
}

path = os.path.join(os.path.dirname(__file__),
'messages.html')
self.response.out.write(template.render(path, templateValues))

messages.html is a DJango template that simply iterates over the
messages and displays them in a table:

<tr>
<th>Message Type</th>
<th>Gateway ID</th>
<th>Tag ID</th>
<th>Tag Type</th>
<th>Tag State</th>
<th>Date/Time Added</th>
<th>Date/Time Gateway Received</th>
<th>Signal Strength</th>
</tr>
{% for message in messages %}
<tr>
<td>{{ message.type }}</td>
<td>{{ message.gatewayID }}</td>
<td>{{ message.tagID }}</td>
<td>{{ message.tag.type }}</td>
<td>{{ message.state }}</td>
<td>{{ message.date }}</td>
<td>{{ message.dateTimeReaderReceived }}</td>
<td>{{ message.signalStrengthDBm }}</td>
</tr>
<tr>
<td colspan="8">
<table>
<tr>
<th>Tag ID</th>
<th>Signal Strength</th>
<th>Link Quality</th>
</tr>
{% for refTagResp in message.referencetagresponse_set %}
<tr>
<td>{{ refTagResp.tag.tagID }}</td>
<td>{{ refTagResp.signalStrengthDBm }}</td>
<td>{{ refTagResp.linkQuality }}</td>
</tr>
{% endfor %}
</table>
</td>
</tr>
{% endfor %}


- I have verified that an index has been created on the date property
of the RFindRFIDMessage by looking at the index.yaml file
- I have tried removing the ordering by date but I still get the same
results
- I have tried commenting out the usage of the back reference but I
still get the same results
- I have tried reducing the fetch method call so only one message is
returned but that still takes 5 seconds to complete execution
- I have cleared the datastore and initially the page runs very
quickly but the execution time appears to grow linearly with the
number of messages in the datastore
- I have tried setting messages = None after the call to get the data
from the datastore to see if most of the time is being taken in
processing the template; this also reduces the execution time to 5
seconds
- While doing all of these tests, there are no other users on my local
machine and nothing else using the application

So far if I am doing something wrong it isn't immediately obvious. If
anyone has any insight, please let me know.

Cheers,

John

Ross Ridge

unread,
Jun 4, 2008, 3:40:13 AM6/4/08
to Google App Engine
John wrote:
> In my case I have approximately 1100 messages in my datastore on my
> machine (I have been using the SDK) and it is taking 14 seconds to
> load the page. Initially I thought that perhaps there was something
> about the SDK that was causing things to run slow so I tried uploading
> my application to the server but it still runs slow and I am getting
> messages in the logs saying:
>
> "This request used a high amount of CPU, and was roughly 2.1 times
> over the average request CPU limit. High CPU requests have a small
> quota, and if you exceed this quota, your app will be temporarily
> disabled."

What was the actual time for this request? You can find it in the
logs for you app. My guess that it was about 500 ms. That's not too
bad for an atypical worst case. What happens to the request times
after you repeat the request a few times?
...
> - While doing all of these tests, there are no other users on my local
> machine and nothing else using the application
>
> So far if I am doing something wrong it isn't immediately obvious. If
> anyone has any insight, please let me know.

Don't try to measure the performance of the datastore using the SDK
running on your machine. It's performance characteristics are nothing
like the real thing. You'll need to do your testing on the actual
servers.

Ross Ridge

Marzia Niccolai

unread,
Jun 4, 2008, 1:05:09 PM6/4/08
to google-a...@googlegroups.com
Hi,
 Ross is exactly right, but it bears repeating.  The performance of the development web server will always be slower than Google App Engine, but this is especially true with the datastore.

 The development web server stores all of the data in a single file, so the more entities that you load, the slower the performance becomes.  This will not be a factor when you upload your application.

-Marzia

John

unread,
Jun 4, 2008, 10:15:31 PM6/4/08
to Google App Engine
Hi Ross,

I can't see where I can single out the time taken for a single request
except by using the dashboard viewer and selecting milliseconds /
request.

With 44 messages in my datastore on the server and requesting the page
that displays the last 30 messages, the length of time per request is
spiking to 2600 milliseconds. I am using the profiler as well so I
would expect that this will slow things down a bit.

You are right in that I do see a speedup when running my app on
google's servers versus using the SDK.

In my case with similar data (not the same data but same number of
messages, 44 in this case), using the profiler I can see that the same
request takes 2.467 CPU seconds on google's servers and 6.426 CPU
seconds using the SDK. I am still not thrilled about the performance
though because when I use the dataviewer to view the messages the page
loads instantaneously.

I am going to do some more investigation to see if there is anything I
can do to make my app page load faster. I am also concerned because I
am still getting messages about exceeding the average CPU quota.

Thanks,

John

John

unread,
Jun 4, 2008, 10:31:46 PM6/4/08
to Google App Engine
Update:

If I reduce the number of entities to 10 I see significant speedup,
i.e. using the profiler on the same dataset, the request completes in
0.320 CPU seconds. Perhaps the datastore is optimized to return small
numbers of entities.

Anyways, if I find anything else I will post again.

Ross Ridge

unread,
Jun 5, 2008, 2:50:41 AM6/5/08
to Google App Engine
John wrote:
> I can't see where I can single out the time taken for a single request
> except by using the dashboard viewer and selecting milliseconds /
> request.

If you look in the "Logs" section, set the filter to Warning or lower,
you should see entries like this:

06-04 08:53PM 45.313 /search_library?
artist=&style=Standard&type=Action 200 517ms 206 kb ...
See details
W 06-04 08:53PM 45.815 This request used a high amount of CPU, and
was ro

In this case the request took 517ms. If the URL is long then the
numbers might not be visible in your browser, but you can either view
the HTML source or cut and paste the entry to see it all.

Ross Ridge

DennisP

unread,
Jun 5, 2008, 4:49:46 PM6/5/08
to Google App Engine
"Perhaps the datastore is optimized to return small numbers of
entities."

...this would be a nasty state of affairs for anyone who, like me,
wants to write a discussion forum on GAE. Slashdot frequently has
hundreds of comments on an article, and occasionally over a thousand.
> > >                             Ross Ridge- Hide quoted text -
>
> - Show quoted text -

peterk

unread,
Jun 5, 2008, 6:02:49 PM6/5/08
to Google App Engine
I've not looked closely at your code, but out of interest I through 35
entities into my datastore. I have a page that retrieves them all, and
spits them out into XML. Obviously there are differences between my
entities and yours, but FWIW, the page loads as fast now with 35
entities as it did with 5 (i.e. very).

I am similarly retrieving all, ordering based on a date, and sending
to a template to display as a XML document.

Brett Morgan

unread,
Jun 5, 2008, 7:37:43 PM6/5/08
to google-a...@googlegroups.com
Question for ya. What happens if you keep the entire comment tree as a pickled blob inside an article entity?

Yes it's a pain on several fronts - you get contention on comment posts, it's nasty to use inside of the admin tool, you can't index the pickled blob.

But you get the ability to render the comment tree nice and fast. =)
--

Brett Morgan http://brett.morgan.googlepages.com/
Reply all
Reply to author
Forward
0 new messages