Understanding keys-only query costs

245 views
Skip to first unread message

Ryan Chazen

unread,
Jan 4, 2013, 4:12:37 PM1/4/13
to google-a...@googlegroups.com
Hi,

Could anybody help me out with understand keys-only query costs?

I've done the following tests:

Fetch 2 items using a regular query (fully entity fetched)
- (3 RPCs, cost=350, billed_ops=[DATASTORE_READ:5])
Fetch 2 items using a keys-only query, and then fetch the full item (cache miss)
- (9 RPCs, cost=370, billed_ops=[DATASTORE_READ:5, DATASTORE_SMALL:2])
Fetch 2 items using a keys-only query, and fetch the full entity from memcache (cache hit)
- (5 RPCs, cost=230, billed_ops=[DATASTORE_READ:3, DATASTORE_SMALL:2])

So even with full cache hit, a keys-only query still uses 3 full datastore reads?
I'm assuming keys-only queries should only be used when the hit chance is high? eg only use keys-only queries on data that is expected to have been accessed recently.

Johan Euphrosine

unread,
Jan 4, 2013, 4:50:44 PM1/4/13
to Google App Engine .
Hi Ryan,

A query cost 1 read + 1 read per entity retrieved or 1 small read per key if that's a key only query as described in the documentation:

So fetching 2 items using a regular query should cost:
1 DATASTORE_READ (query) + 2 DATASTORE_READ (entity read)

Fetching 2 items using a key only query then a get_multi should cost:
1 DATASTORE_READ (query) + 2 DATASTORE_SMALL (key read) + 2 DATASTORE_READ (entity read)

Fetching 2 items using a key only query then getting them from the cache should cost:
1 DATASTORE_READ (query) + 2 DATASTORE_SMALL (key read)

I would recommend to experiment with the new AppStats shell (/_ah/stats/shell on your appstats enabled app) to figure out query cost, for example this query:
q = User.query(User.data == 'foo')
results in the following appstats trace if there is only 3 entities with data == 'foo':
@164ms datastore_v3.RunQuery real=10ms api=0ms cost=280 billed_ops=[DATASTORE_READ:4]

Here is slidedeck explaining some optimization pattern for datastore operations using interactive appstats examples (click the Run button on each slide).

Hope that helps, and let me know if you experience anything different.




--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/O7xPE09_GIwJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Johan Euphrosine (proppy)
Developer Programs Engineer
Google Developer Relations

Ryan Chazen

unread,
Jan 4, 2013, 4:59:41 PM1/4/13
to google-a...@googlegroups.com
That link - /_ah/stats/shell - does not seem to be working for me on either the dev server or the live server. I get a 404 error.

I think I found the issue - the query that is confusing me is using an 'IN' filter, eg property IN [1,2]
In this case, for a keys-only query, would each member of the 'IN' count as a full read? If I did IN[1,2,3,4,5,6,7,8,9,10] and the query returned no results, would this use 10 read ops?

Guido van Rossum

unread,
Jan 4, 2013, 5:49:34 PM1/4/13
to google-a...@googlegroups.com
On Fri, Jan 4, 2013 at 1:59 PM, Ryan Chazen <rya...@gmail.com> wrote:
> That link - /_ah/stats/shell - does not seem to be working for me on either
> the dev server or the live server. I get a 404 error.

Does /_ah/stats work for you? If it doesn't work, you haven't enabled
Appstats in your app.yaml (it's in the builtins section). If that does
work, you must configure the appstats shell in your
appengine_config.py for it to work in production, but it should be on
by default in the dev server. (Python only.)

--
--Guido van Rossum (python.org/~guido)

Ryan Chazen

unread,
Jan 5, 2013, 2:44:43 AM1/5/13
to google-a...@googlegroups.com, gu...@python.org

Ah - I'm using Java and not Python. Appstats does work, and I have set it up on /appstats - this is where I got the write costs in the original post.
I guess the /shell is not part of the Java version yet?

At any rate, above image is from doing an IN query filter across numbers 1..10 with none of them returning any results. Each query is run completely separately, and they are not even run concurrently!
This means that IN queries are basically useless on AppEngine - why do you even have support for them if they don't work? Are there any plans on the horizon to make 'IN' queries work as in every other nosql database?

Ryan Chazen

unread,
Jan 5, 2013, 6:04:41 AM1/5/13
to google-a...@googlegroups.com, gu...@python.org

Ouch ouch, it's even worse than expected. I ran an query with an IN filter with the same value repeated. eg IN[0,0,0,0,0,0,0,0,0,0,0] - The query runs completely in serial doing an independent query for each of the 0s. You are charged for the same entity returned multiple times, and the entity is returned to you multiple times even though it's obviously the same entity.

The IN filter should definitely be marked as deprecated in the SDK docs, and there should be a large warning against using it - it is ALWAYS better to use the async datastore to create multiple queries with an EQUALs filter - it will cost exactly the same, return the same data, and be far faster as it runs in parallel. The whole IN filter seems like a "user trap" on Google's part - at the minimum, the IN query should be changed to run in parallel!


alex

unread,
Jan 5, 2013, 7:37:28 AM1/5/13
to google-a...@googlegroups.com
are you running this on dev or production server?


--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/kJ4oHDPjEPoJ.

Johan Euphrosine

unread,
Jan 5, 2013, 7:59:12 AM1/5/13
to Google App Engine ., Guido van Rossum
Part of what your describing looks documented here:

"""
The IN operator compares value of a property to each item in a list. The IN operator is equivalent to many = queries, one for each value, that are ORed together. An entity whose value for the given property equals any of the values in the list can be returned for the query.

Note: The IN and != operators use multiple queries behind the scenes. For example, the IN operator executes a separate underlying datastore query for every item in the list. The entities returned are a result of the cross-product of all the underlying datastore queries and are de-duplicated. A maximum of 30 datastore queries are allowed for any single GQL query.

"""
If you experience a different behavior than the documented one I encourage you to fill a detailed defect on the public issue tracker (with Component-Datastore): https://code.google.com/p/googleappengine/issues/entry?template=Other%20defect

Or a documentation defect (with Component-Documentation) if you think the documentation could be improved.

Feel free to link the issue(s) on that thread, and I will make sure (after acknowledging them) that they are escalated to the right team.

Thanks in advance for your contributions.

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/kJ4oHDPjEPoJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Ryan Chazen

unread,
Jan 5, 2013, 8:21:35 AM1/5/13
to google-a...@googlegroups.com
Both dev and live server - the screenshots above are from dev, but live behaves the same way...

I've changed my code now to do multiple async EQUALs queries instead of the one IN query. I've also made it cache the keys-only results of each EQUALs query to memcache, so it can save work if it's already done a sector before.
This has sped it up a huge amount (avg 300ms per request down to well under 100ms), so it seems to be the right way to do it. If everything is in memcache already, then it has a cost of 0 as well now, which is nice...

If you're curious as to what I'm using it for, it's a geohash style map for an experimental game... basically Google maps on a fake map.
You can play around with an early version of it over here: http://2.gloopsh.appspot.com/Admin.html#com.rc.gloopsh.admin.AdminWorldMap$AdminWorldMapPlace

Left click+drag to move the map around, mousewheel to zoom in and out of the map.
You can see it loading the sectors as you scroll to them, which was running the IN query across each sector.

--

Thanks Johan, but I don't think there is any real defect here - 'IN' does work as documented - it's just not very useful as currently implemented. Instead of being 'IN' as in other DBs, it's just newquery(EQUALS) or newquery(EQUALS) or ...
Obviously it's a pretty bad idea to use those kind of queries! Most of AppEngine DS tries to stop you from making bad queries that don't perform well, so it's just a bit out of place and I'm sure it has blindsided a lot of people so far.

Johan Euphrosine

unread,
Jan 5, 2013, 9:53:01 AM1/5/13
to Google App Engine .


On Sat, Jan 5, 2013 at 5:21 AM, Ryan Chazen <rya...@gmail.com> wrote:

Thanks Johan, but I don't think there is any real defect here - 'IN' does work as documented - it's just not very useful as currently implemented. Instead of being 'IN' as in other DBs, it's just newquery(EQUALS) or newquery(EQUALS) or ...
Obviously it's a pretty bad idea to use those kind of queries! Most of AppEngine DS tries to stop you from making bad queries that don't perform well, so it's just a bit out of place and I'm sure it has blindsided a lot of people so far.


Hi Ryan,

You were reporting before that the corresponding OR queries weren't run concurrently, filling a Defect on the public issue tracker about this would be a gniceood way to start a discussion with the datastore engineering team.

Hope that all makes sense.

Ryan Chazen

unread,
Jan 5, 2013, 10:14:05 AM1/5/13
to google-a...@googlegroups.com
Hey Johan,

I've filed it as a feature request : http://code.google.com/p/googleappengine/issues/detail?id=8627&thanks=8627&ts=1357398564

However, I very much doubt to ever see this request fulfilled!
In any event, thank you everyone who replied for your time!

Regards,
Ryan

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.

Johan Euphrosine

unread,
Jan 5, 2013, 10:17:56 AM1/5/13
to Google App Engine .
On Sat, Jan 5, 2013 at 7:14 AM, Ryan Chazen <rya...@gmail.com> wrote:
Hey Johan,

I've filed it as a feature request : http://code.google.com/p/googleappengine/issues/detail?id=8627&thanks=8627&ts=1357398564

However, I very much doubt to ever see this request fulfilled!

Thanks for your contribution Ryan, I will make sure it gets escalated to the datastore team.
Reply all
Reply to author
Forward
0 new messages