speed up fetching of larger data set

35 views
Skip to first unread message

GordonHo

unread,
Jan 18, 2012, 1:17:55 PM1/18/12
to google-a...@googlegroups.com
hi,

i have occasionally the problem that I've to fetch several thousands entities from the datastore.
so far i tried to solve the speed issue by splitting the fetch into several chunks beeing fetched at the same time.

however so far i am quite unsatisfied with the speed, fetching ~6000 entities takes about 45seconds. that is using 200 a batch size for the fetch.

has anyone some experience how to speed this up?

cheers,

gordon

Andreas

unread,
Jan 18, 2012, 1:26:18 PM1/18/12
to google-a...@googlegroups.com
45 seconds to fetch 6k entities sounds a little weird to me. 
what do you do with those entities after fetching them?
did you profile this operation with appstats?

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/7SkembCEJ9AJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Waleed Abdulla

unread,
Jan 26, 2012, 1:01:50 PM1/26/12
to google-a...@googlegroups.com
You didn't mention if you're fetching the entities using their keys or with a query. If you can fetch with keys, it's faster. And, also, increase your batch size:

    db.get([list of 1000 keys])

Another thing you might want to try is using async operations so you trigger the fetches one after another without waiting, and then collect the data as it comes. 

Are you on master/slave or high-replication datastore? I haven't tested it myself, but I hear that HRD helps in this case because it reduces the occasional  extended delays when reading the datastore, and since you're fetching 6000 entities, the odds of one of them getting stuck and taking several seconds to load goes higher.

Ikai Lan (Google)

unread,
Jan 26, 2012, 2:29:06 PM1/26/12
to google-a...@googlegroups.com
Between the way the datastore works and deserialization, fetching 6000 entities is never going to be efficient.

What problem are you trying to solve? Is it possible to reduce the number of entities? For instance, can you store data for 1000 entities in 6 entities and fetch only 6 big entities? There's a pricing impact here as well: fetching 6 entities will be significantly cheaper than fetching 6000 because you are charged for datastore ops, not size of entities fetched.

--
Ikai Lan 
Developer Programs Engineer, Google App Engine
Reply all
Reply to author
Forward
0 new messages