great questions! i can tackle at least some of these.
as a disclaimer, everything i say here refers to the production
environment. the SDK behaves differently, and as discussed in a few
other threads, is inefficient in a number of ways.
On Sep 11, 12:43 pm, Bill <
billk...@gmail.com> wrote:
> 1) Will timeout issues on put/transactions be removed when we go pay-
> as-you-go or should we develop production apps with these limits in
> mind?
datastore timeouts and request deadlines will still exist after we've
launched billing, so yes, you'll want to develop with them in mind.
> Exact # of puts or transactions you can reasonably expect to
> work within one request before quota issue.
exact numbers will always depend on the size and shape of your data.
having said that, you should be able to put or delete a large number
of entities, e.g. in the hundreds or more, if you pass multiple
entities or keys in a single put() or delete() call:
http://code.google.com/appengine/docs/datastore/functions.html
you may also be able to write or delete more entities if the ratio of
entities to entity groups in the put() or delete() call is high.
> 3) Best practices for (de)normalization and entity sizes. A gut
> reaction some developers might take when approaching datastore is to
> denormalize and put stuff in fewer tables. What are the costs of
> keeping many small entities and using reference properties instead?
storing and querying on reference properties doesn't cost any more
than storing and querying on non-reference properties. the one
reference property feature that incurs extra cost is the automatic
dereferencing:
http://code.google.com/appengine/docs/datastore/typesandpropertyclasses.html#ReferenceProperty
> For example, in a many-to-many relationship, we could have 3 Kinds: A,
> B, and join(A,B). This is just like a traditional relational DB with
> a join model. What are the costs of traversing implicit collection
> sets defined by the reference properties in the join Kind? If you
> have a limited relationship between two entities, when does using a
> ListProperty (of keys, for example) make sense, especially in light of
> the cap on indexed properties per entity?
you almost always want to model one-to-many relationships with
reference properties. similarly, you almost always want to model many-
to-many relationships with a list reference property, ie
ListProperty(db.Key). with these, "related to X" queries won't cost
any more than any other query. using a "join" kind, on the other hand,
incurs additional fetches for each of the result entities on top of
the join kind query. the main use case for join kinds is when you
want to impose additional criteria on the join at runtime.
rafe kaplan's google i/o talk describes these techniques in detail:
http://sites.google.com/site/io/working-with-google-app-engine-models
> 4) Benchmarks! I've been meaning to run tests on costs for different
> datastore operations:
> - Direct get using key or id
> - Direct get using list of key/id
> - Fetches using filters
> - Iterative get from a query
> - How the above 3 (direct w/ key, bulk fetch, iterative get) scale
> with request size.
> - Delete/Put
> - The big hit using transactions
like always, these will depend noticeably on the size and shape of
your data. i can give a few rules of thumb, though.
direct gets by key will usually be the fastest operation. single-
property queries, ie queries with a single filter or sort order,
should generally be fast. queries that use a user-defined index should
generally be fast.
queries with equals filters on multiple properties that use the built-
in indexes have extra amount of overhead, which is roughly a fixed
cost per query result. the overhead will depend on (you can probably
guess what's next) the size and shape of your data. if these queries
aren't as fast as you'd like in your app, adding dedicated index(es)
will speed them up.
finally, transactions shouldn't add a prohibitive amount of overhead.
in many cases, doing a number of writes in a transaction can actually
be (a little) faster than doing them outside of a transaction. are you
seeing a noticeable slowdown with transactions, compared to without?
> Is it a big win to come up with a good key naming scheme or does that
> bite you in other ways?
you mean, providing a key_name instead of having the datastore
allocate an id?
http://code.google.com/appengine/docs/datastore/keysandentitygroups.html#Kinds_Names_and_IDs
performance should be the same with key_name vs. id. the main
difference is that key_name allows a (limited) form of querying
without actually querying. for example, say you're writing a wiki, and
you put the page name in key_name. when a request for a page comes in,
you can construct a key with that key_name in memory and get() it
directly, as opposed to querying with an equals filter.