fetching by keys (as strings) is kind of slow

137 views
Skip to first unread message

Andreas

unread,
Feb 14, 2012, 4:05:00 PM2/14/12
to google-a...@googlegroups.com
im just profiling my app and i have to say i get surprising results.

lets say i have a list with 500 keys as strings not key objects.
db.get(keylist) takes 3seconds??!!!

how does this take that long? i mean i have the keys already.
would it be (significantly) faster with key objects?

andreas

Robert Kluin

unread,
Feb 15, 2012, 12:25:20 AM2/15/12
to google-a...@googlegroups.com
What is the data model like? Do the entities have a lot of
properties, list properties, or datetime properties? Have you tried
this with a very simple entity, perhaps even with no properties?

Robert

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
>

Andreas

unread,
Feb 15, 2012, 8:56:12 AM2/15/12
to google-a...@googlegroups.com
the entities have an average of 10properties but no list properties.
its a subclass of a PolyModel and there are 2 datetime properties and one objectproperty which stores into the blobstore.
i didnt try with 'simple' entities yet but i will today i think.

Robert Kluin

unread,
Feb 16, 2012, 1:29:11 AM2/16/12
to google-a...@googlegroups.com
Hey Andreas,
One thing to note is that PolyModels use lists in the background to
facilitate querying. If it is deep, it could add some overhead. Try
this with small entities, if there's a big difference you'll know it
is your data model.

Robert

Jeff Schnitzer

unread,
Feb 16, 2012, 1:52:34 AM2/16/12
to google-a...@googlegroups.com
3s for fetching 500 doesn't seem wildly out of sorts.

You can speed it up a lot (at the cost of consistency) by doing an eventually consistent get instead of a strongly consistent get.  I don't know the way to do that in Python.

Jeff


andreas

Andreas

unread,
Feb 16, 2012, 9:03:21 AM2/16/12
to google-a...@googlegroups.com
really? we are not speaking about fetching a million entities. 
i would expect to fetch 500 entities within a second if not a lot less. 
but obviously this is not the case.

Kaan Soral

unread,
Feb 16, 2012, 5:05:08 PM2/16/12
to google-a...@googlegroups.com
Why don't you get_async(keys) ?

Andreas

unread,
Feb 16, 2012, 5:07:35 PM2/16/12
to google-a...@googlegroups.com
because i need the entities and do operations on them.
why should get_async be faster... it only means i can do other stuff not related to that query objects while it gets the entities.

On Feb 16, 2012, at 5:05 PM, Kaan Soral wrote:

Why don't you get_async(keys) ?

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/qTYgHdk82h4J.

Kaan Soral

unread,
Feb 16, 2012, 5:10:36 PM2/16/12
to google-a...@googlegroups.com
You are right, for a second there I thought "get" would get those elements sequentially, but probably it fetches them asynch and it is taking 3 seconds anyway

supercobra

unread,
Feb 16, 2012, 5:30:36 PM2/16/12
to google-a...@googlegroups.com
There is no way fetching a key takes 3 sec. There is a problem either
on your side or on GAE's.

Andreas

unread,
Feb 16, 2012, 5:39:09 PM2/16/12
to google-a...@googlegroups.com
its not one key its 500

On Feb 16, 2012, at 5:30 PM, supercobra wrote:

> There is no way fetching a key takes 3 sec. There is a problem either
> on your side or on GAE's.
>

> --
> You received this message because you are subscribed to the Google Groups "Google App Engine" group.

Jeff Schnitzer

unread,
Feb 16, 2012, 7:21:46 PM2/16/12
to google-a...@googlegroups.com
The datastore is not fast.  And why do you think it would be?  Because you can fetch thousands of items out of RAM on your laptop in a fraction of a second?

GAE is a key/value store distributed on a gigantic cluster of probably thousands of machines, each of which is busily working on not just your load/store problem but thousands of other people's.  Your laptop has the dataset cached in RAM, and even if it didn't, your data is probably stored sequentially on a single spindle.  GAE has to fetch your entities from up to 500 separate machines in the cluster, very likely off of disk.

The problems you find at scale will not show up when you query mysql on your laptop.  GAE is already operating "at scale" so it performs in the slow, but predictable-on-average way that gigantic computing architectures do.

Treat the datastore like a key/value store that likes really big entities.  There are things you can do to make fetching 500 keys at a time a little faster (example: fetch in eventual consistency mode), but performance is going to suck compared to what you are used to on other platforms... unless you've actually run those other platforms at scale.  The best thing to do is adjust your architecture so you're fetching 1 fat entity instead of 500 little ones.  It's not always possible.

Jeff

Andreas

unread,
Feb 17, 2012, 11:55:01 AM2/17/12
to google-a...@googlegroups.com
i understand the scale of the datastore and of course it wont perform like my little db on my machine with a few entities in there and im actually not comparing my local environment with the GAE production environment.

i was really expecting a lot more speed for a key query but i guess im wrong. 
still 3 seconds for 500 entities seems a little too much but this is probably only how i was expecting it to perform.

Jeff Schnitzer

unread,
Feb 17, 2012, 2:11:23 PM2/17/12
to google-a...@googlegroups.com
By the way, I don't know that you can't improve the performance of your fetches - for example, serialization costs on tend to represent a much higher percentage of the fetch time in pythonland than javaland.  If you're using python, you might be able to improve this by optimizing your data model.  I don't know; you would need to profile carefully using appstats.

Jeff

Brandon Wirtz

unread,
Feb 17, 2012, 2:21:31 PM2/17/12
to google-a...@googlegroups.com

Or building a Shim as a Java App that does your Fetches (and gets you another couple megs of Memcache J)

GordonHo

unread,
Feb 18, 2012, 11:17:33 AM2/18/12
to google-a...@googlegroups.com
i am having a similiar problem.
occasionally i need to fetch all entities of a given table (about 5000) - this really takes way too long (talking about ~40-60seconds).

so far i haven'd found an easy to to fetch lots of entities from the datastore. at some point i probably will create one big fat entity containing all others, but this will involve quite some work..

Jeff Schnitzer

unread,
Feb 18, 2012, 11:57:31 PM2/18/12
to google-a...@googlegroups.com
Switch to eventual consistency mode.  In java it's ReadPolicy.Consistency.EVENTUAL (look at the datastore configuration classes).  It speeds things up quite a lot in my totally unscientific tests.

Jeff

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/x0OAOy9ft0cJ.

Andreas

unread,
Mar 21, 2012, 3:49:15 PM3/21/12
to google-a...@googlegroups.com
ok guys now fetching 500 entities by keys as strings takes something between 8-10 seconds?!!
3s was already too much... 10 is not acceptable. 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/x0OAOy9ft0cJ.

Andreas

unread,
Mar 21, 2012, 4:45:55 PM3/21/12
to google-a...@googlegroups.com
tested ndb.ge_multi(). the same query with the same keys takes 0.7 seconds.... how is this possible?

Jeff Schnitzer

unread,
Mar 21, 2012, 7:14:31 PM3/21/12
to google-a...@googlegroups.com
There are a lot of possible answers to that question. Have you
enabled appstats? That is the best way to figure out what is actually
going on.

Jeff

Andreas

unread,
Mar 21, 2012, 8:26:48 PM3/21/12
to google-a...@googlegroups.com
i mean with query here i dont really mean querying and filtering. i have a list of keys and i get() them.
what is going on i think is that ndb does the query a bit smarter than the db module.

ndb is consistently under 1s while db takes multiple seconds.

Jeff Schnitzer

unread,
Mar 21, 2012, 8:45:06 PM3/21/12
to google-a...@googlegroups.com
If I had to guess, I would suspect the difference is NDB's caching
layer pulling data from memcache. But I don't know. The only way to
find out is to run appstats and find out what's actually going on.
Stop speculating, start measuring. It's the same datastore
underneath.

Jeff

Andreas

unread,
Mar 21, 2012, 9:12:32 PM3/21/12
to google-a...@googlegroups.com
your right. didnt have the time yet to find out what the difference is but i will.
another possible option is that ndb is making async gets for the keys compared to the db...
Reply all
Reply to author
Forward
0 new messages