Here we do 1000 individual get()s:
http://kornewald.appspot.com/get
Here we do 1000 individual fetch()es for the same entities:
http://kornewald.appspot.com/fetch
Here we do four batch-get()s of 250 entities each:
http://kornewald.appspot.com/batchget
Here we do four batch-fetch()es for 250 entities each:
http://kornewald.appspot.com/batchfetch
The number returned is the time needed for retrieving the entities, so
the first two basically show the time per single get()/fetch().
Is there anything wrong with the benchmark code?
Our previous benchmarks showed a much more significant difference (3x
slower fetch()). Now it's merely a 30% difference and the few
milliseconds can hardly be noticed by the end-user.
Can we stop designing models like crazy around key names because there
is hardly any benefit in the added complexity or inconvenience in most
cases (e.g., not being able to change the key name afterwards)?
It looks like the only case where batch-get()s are useful is when you
can't formulate a single fetch() for the same kind of query.
Bye,
Waldemar
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
Yes, here the individual fetch() calls indeed cost more than 2x as
much, but really I'd rather pay slightly more (it won't be 2x as much,
anyway, when taking all queries and other cost factors into account)
and not care about the productivity overhead that keys add. In
benchmarks I did a pretty long time ago (when App Engine was very
young) a single get() took around 17ms and a fetch() for a single
entity could take 35-90ms, depending on how lucky you were. This was
significant enough to be noticeable on some pages which loaded a lot
of individual entities, so we accepted that, for example, usernames
can't be easily modified afterwards. The new numbers totally change
the game.
Hopefully, someday we'll also not have to care about startup times,
anymore (even if we have to pay a few $/month for pre-warmed instances
or give up the free quota for warm instances).
Bye,
Waldemar
Bye,
Waldemar
for query performance, we turned on a new code path that parallelizes
internal operations and bigtable scans and lookups more aggressively,
which is likely the reason for the improvements of query fetches vs.
gets that you saw.
for fault tolerance, we're now doing more retries in the backend
automatically, usually up to the full 30s request deadline for most
calls - basically everything except transaction commits, which retries
client side instead of in the backend. (if you're using python, you
might now want to try db.run_in_transaction_custom_retries() with a
high number of retries, e.g. 10, instead of just
db.run_in_transaction(). similar java support should be coming soon.)
we'll mention more detail in the official release notes and blog post,
but based on a day or so of results so far, we're already seeing a
substantial drop in error rate, mostly due to reduced timeouts, across
the board. we're also seeing that error rate is much less spiky, wihch
is always good.
> guestbook.zip
> 2KViewDownload
Will every query that works directly on a (composite) datastore index
be almost as fast as db.get()?
Why don't you also increase the number of retries for
run_in_transaction?
Bye,
Waldemar
as a general rule, no. queries that scan indices will always have to
do at least two serial disk seeks, one to read the index and one to
look up the entities the index rows point to. gets only need a single
disk seek, since they have the entities' primary key.
having said that, one vs two disk seeks isn't always the dominating
factor for latency. instead, python protocol buffer decoding might
dominate, or the bigtable tablet server RPCs might, e.g. if you're
fetching many entities from many different tablet servers. (we'll
issue a lot of RPCs in parallel on your behalf, but not arbitrarily
many.)
note that this only applies to queries that use an index. kindless
ancestor queries and queries on __key__, for example, scan the
entities table directly, so they'll only need a single disk seek.
> Why don't you also increase the number of retries for
> run_in_transaction?
agreed, we probably will, hopefully as soon as 1.3.2.