How to determine datastore reads?

367 views
Skip to first unread message

J

unread,
Oct 12, 2011, 1:31:29 AM10/12/11
to google-a...@googlegroups.com

Since GAE has released the comparison billing for the new pricing model, I've been trying to "optimize" my app to reduce costs.  I was initially over the free quota limits for datastore reads and writes.  I was able to lower the datastore writes to fall under the free tier by eliminating certain writes and also by eliminating unnecessary indexes on certain properties.

But unfortunately, I can't seem to figure out where the reads are coming from.  I initially thought the reads were mostly coming from two cron jobs that I have configured:

1) A job that ran every 1 minute to query for and delete "expired" objects for 1 of my entities.  
2) A job that ran every 1 minute to clean up old session entities.  I'm actually using the com.google.apphosting.utils.servlet.SessionCleanupServlet to do this (which I found somewhere online). BTW - is there a better way to remove old sessions?

I changed both jobs to run every 60 minutes instead (thinking this would make a huge difference) but my datastore reads are still high and not too different from previous numbers.

I've also examined my app (not a lot of code) and don't see where I could be making so many datastore reads.

Anyone know how I can profile my app to see where datastore reads are coming from?  BTW I'm using the Java version of GAE.

Thanks,
J

Gerald Tan

unread,
Oct 12, 2011, 1:46:18 PM10/12/11
to google-a...@googlegroups.com
Are you using a lot of COUNT or OFFSET/LIMIT queries?

J

unread,
Oct 13, 2011, 12:36:22 AM10/13/11
to google-a...@googlegroups.com
Not really.  I have one COUNT query that is run during the aforementioned cron job (which runs once every 60 minutes now).  But the # of entities for that type is very low, typically between 1-10, so I wouldn't think the COUNT query would add too many reads.

Is there something about COUNT queries that I should be aware of?

J

unread,
Oct 13, 2011, 11:34:45 PM10/13/11
to google-a...@googlegroups.com
Anyone know a way I can profile my app to see where my datastore reads are coming from?

Rishi Arora

unread,
Oct 14, 2011, 12:43:03 AM10/14/11
to google-a...@googlegroups.com
Appstats
On Thu, Oct 13, 2011 at 10:34 PM, J <ja...@mapstagram.com> wrote:
Anyone know a way I can profile my app to see where my datastore reads are coming from?

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/HohHHRXvFmUJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

djidjadji

unread,
Oct 15, 2011, 9:17:02 AM10/15/11
to google-a...@googlegroups.com
Where in the Appstats result can I find the number of Datastore Read
Ops used in this request or in a certain RPC of the request?

Op 14 oktober 2011 06:43 heeft Rishi Arora <rishi...@ship-rack.com>
het volgende geschreven:

Rishi Arora

unread,
Oct 15, 2011, 3:39:49 PM10/15/11
to google-a...@googlegroups.com
Actually, appstats is more consistent with the old pricing model, and so, it only shows up number of datastore queries executed, puts and gets, for each RPC.  Here's a screenshot from the link I posted earlier:

The request history section shows recent requests, and if you expand a request, it'll show you datastore puts, gets, and queries. I believe gets translate to reads one-to-one, but puts depend on how many indexes get modified in the process.  One put will almost always translate to multiple data store writes (one of each index that gets updated, and one for the entity being modified)

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.

djidjadji

unread,
Oct 16, 2011, 8:38:55 AM10/16/11
to google-a...@googlegroups.com
The RPC.Get and RPC.RunQuery can result in many Datastore Read Ops.
They can return more then one object and thus read multiple index
entries and multiple objects.

Google is able to tally the Datastore Read and Write Ops. Why is there
no API call that will return these numbers for the current request?

I have changed part of my application to construct and read an
aggregate object (containing the attributes of many small objects).
The result is that the number of Datastore Read Ops dropped
considerable. I'm still having a large number of Read Ops left and
have no idea which requests are responsible for the bulk of them.

It takes a long time to find the culprit because the billing stats
lags quit some time.

Op 15 oktober 2011 21:39 heeft Rishi Arora <rishi...@ship-rack.com>
het volgende geschreven:

J

unread,
Oct 16, 2011, 11:13:53 AM10/16/11
to google-a...@googlegroups.com
I agree.  As I originally posted, I'm having difficulty figuring out where the datastore reads are coming from.  I installed AppStats, and it shows 3 "RunQuery"s and one "memcache.Get" per request.  So does this mean each request results in 3 datastore reads?  

Also, isn't memcache backed by the datastore?  So does this mean gets from memcache result in a datastore read?

Any have a better idea of how I can figure out where my datastore reads are coming from?  

MLTrim

unread,
Oct 18, 2011, 7:09:15 AM10/18/11
to google-a...@googlegroups.com
I'm in your same situation and I'm fighting to make the number of Datastore Reads low, but sadly without success.

At first sight I thought the culprit was Googlebot but also after some adjustment the number of reads are not changed.

Is it possible to know, with a clear example, how many Datastore Reads are hidden inside the Count and the Query operations?

I have 110K Datastore Reads operations in 24h with just 2K requests (Bots+Organic).. that means something like 10+ Datastore Reads per request (and I'm caching a lot of stuff!)

Please, help us!




Gerald Tan

unread,
Oct 18, 2011, 9:09:46 AM10/18/11
to google-a...@googlegroups.com
If I'm not wrong Count will cost 1 Datastore Small-op per Entity counted, so if the count is N, it will cost N Datastore Small-op.
Each Query operation should cost 1 Datastore Read per Entity returned.

Obviously, caching is the best method to bring down the cost, but it will also depend on your caching strategy,

MLTrim

unread,
Oct 19, 2011, 3:35:04 AM10/19/11
to google-a...@googlegroups.com
Is it possible to have an official answer from the App Engine team?
I think these informations are vital for our applications; how can I optimize something if I don't know what to optimize :)?

Having a simple Model like this:

class Example(db.Model):
foo = db.StringProperty()
bars= db.ListProperty(str)

and 1000 entities in the datastore, I'm interested in the cost of these kind of operations:

entities_count =  Example.all(keys_only = True).filter('bars=','spam').count()
entities_count = Example.all().count(10000)
entities = Example.all().fetch(10000)
entities = Example.all().filter('bars=','spam').filter('bars=','fu').fetch(10000)
entities = Example.all().fetch(10000, offset=500)

Cross-post here:
http://stackoverflow.com/questions/7807529/how-many-datastore-reads-consume-each-fetch-count-and-query-operations

Brandon Wirtz

unread,
Oct 19, 2011, 4:25:12 AM10/19/11
to google-a...@googlegroups.com

I build test cases for my codes subroutines and bench mark them. This is an important step in any Database testing.  I have worked with organizations that had full time staff who sat around all day coming up with different ways to get the same query result and documenting, speed, load, disk access, and memory requirements.

 

Enable billing, run 1000 operations in a loop, look at your usage. Then do the same thing with which ever other ways you are considering implementing.

 

I often do these tests BEFORE picking a platform for deployment.

--

You received this message because you are subscribed to the Google Groups "Google App Engine" group.

To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/xeuJzGl4jpYJ.

MLTrim

unread,
Oct 20, 2011, 5:34:46 AM10/20/11
to google-a...@googlegroups.com
Checkin for an entire day the "Datastore reads" quota value and the log requests, I have finally found the causes of the high number of Datastore reads.
1.Query with offset on pagination: I use cursors of course but I gave the option to fallback to query with offset in case of cache-miss or direct access to a given page.
2.Filters that emulates the startwith function using the + u'\ufffd'trick: it probably has to scan and fetch all the entities to compare the strings, resulting in a high count of Datastore reads.
3.Query that fetches large dataset: where applicable, I'm planning to aggregate and store data in blob in order to spend just one Datastore read to get thousands of entries.

My application lightly uses the count() function so the number of small Database reads seems to stay in a decent range.

Michele









Gerald Tan

unread,
Oct 20, 2011, 7:06:33 AM10/20/11
to google-a...@googlegroups.com

1.Query with offset on pagination: I use cursors of course but I gave the option to fallback to query with offset in case of cache-miss or direct access to a given page.

Yes... http://code.google.com/appengine/docs/python/datastore/queryclass.html#Query_fetch
You will be charged offset+limit Datastore Reads, for queries with limit/offset. Logically, the skipped entities *should* be charged under Datastore Small-ops since the entities aren't actually returned, but from what I've seen they are charged under Datastore Reads.
 
2.Filters that emulates the startwith function using the + u'\ufffd'trick: it probably has to scan and fetch all the entities to compare the strings, resulting in a high count of Datastore reads.

If the property is indexed, this shouldn't be necessary.

Reply all
Reply to author
Forward
0 new messages