--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/bW3pKqHKcSEJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.
1) Limit of one write per second per entity group.
This one is usually really easy to work around; just break your app up
into many entity groups. Most apps tend to naturally break down by
user or by business or some other categorization that has a low
per-unit write rate. As long as you are cognizant of query
eventuality and XG transaction limits, you can scale up to any write
rate. If each user is a separate entity group, it's like each user is
running on it's own little database.
The problems tend to appear when you need accurate runtime counts
across changing data. At this point look into sharded counters. You
shouldn't need to do this often.
2) Throughput limit on tablet splits for increasing index values
You won't hit this until many hundreds of writes per second. The
problem is when you have an index on a more-or-less monotonically
increasing field like say a timestamp. When the index is updated, the
writes will always be to the end of the table... and you'll get a "hot
tablet" that will split (causing a delay), then another "hot tablet"
since you're always writing to the end. The HRD helps in that it
gives you a multiple of the total write rate, but you still get a
limit.
Ikai wrote about this (and drew some awesome cartoons) here:
http://ikaisays.com/2011/01/25/app-engine-datastore-tip-monotonically-increasing-values-are-bad/
If you design with these two issues in mind, you shouldn't have any
problem doing thousands of writes per second... or whatever you can
afford.
Jeff
2012/1/29 Andrei-Ştefăniţă Cocorean <andrei....@gmail.com>:
Robert
2012/1/29 Andrei-Ştefăniţă Cocorean <andrei....@gmail.com>:
Wild guess: There are very few people who hit this write rate limit
so nobody really thinks about it. Even Ikai's advice is to ignore it
until it becomes a problem, because for 99.9% of apps it never will
be.
Jeff
Robert
The "sharding" in GAE-land works a little differently from the way you think.
There's the notion of an Entity Group, which is probably closest to a
traditional data federation, but with a twist: you typically create
zillions of tiny entity groups, say, one for each customer. The
sharding is quite transparent; you only notice it when you write to
the same EG too fast or you try to run transactions across EGs.
The kind of sharding you would have to do to escape the hot tablet
problem is sharding the values of a particular field. The index
tablets span all EGs. So you might create 4 "versions" of the login
timestamp (say, prefixed with a different letter) and then issue four
queries when you want to query for the last 100 people that logged in.
In this case, you just pick a random prefix every time you write the
field... there's no need to make it stable.
Jeff
In the past I've worked around this problem in several different
ways. The best is to see if there is a natural way to "shard" the
index such that you won't need to do queries across the shards. If
you can do that, you're done. It also depends on what the problem
value is used for. In some cases it may not need to be so precise, so
you can spread the load around a bit by randomizing the value within
some acceptable range of error. I've also used the task-queue to
control the rate at which the problem entities get written. This can
help if you've got very bursty write rates.
Robert
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/3Ltu7gGC8EsJ.
When I have timestamps on high write-rate entities that are
non-critical, for example "expiration" times that are used only for
cleanup, I'll sometimes add a random jitter of several hours to spread
the writes out a bit. I'd be surprised if changing it by a few
seconds helped much -- but it could. Keep in mind, there will already
be some degree of randomness since the instance clocks have some
slight variation. If you're hitting this issue, I'd give it a shot
though. If it works it could at least buy you some time to get a
better fix.
I don't think there is a fixed number of rows per shard. I think it
is split up by data size, and I don't think the exact number is
publicly documented. Maybe you can roughly figure it out via
experimentation.
Robert
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/r0SVTq6i4iEJ.
It is interesting that sharding is determined by access patterns. Is
that something you can elaborate on at all? ;)
Robert
Robert
Robert
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/3FJONwpMYXcJ.
Hi All,I hope I can get some advice on a similar topic to that posted above.I need to record time series data at high-frequency and timestamps are very important.
During a peak period for each data point it might be recording at 1 second update rates per data point. But during the off-peak periods I expect 10-30 second update rates are to be expected.My thinking is that each data-point could be its own tablet; this will result in between 100,000 to 1,000,000 tablets.The other point too note is that it is ok if I have a delay before data is wrote to the database; so I could dump to the database after I have collected every 10 or 20 data points. (Possibly using Memcache)From my reading generally and on this thread it should be possible if I can split each data point onto its own tablet; I am still unsure how I can make the cost manageable In particular if I can use Memcache as a method to reduce the number of writes.Best RegardsBrad
--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/36DvbBBlMbAJ.