Question on key design: Datastore errors and tablets

40 views
Skip to first unread message

Ulrich

unread,
Feb 17, 2010, 12:30:51 PM2/17/10
to Google App Engine
Hi,

I have read the following
"Timeouts due to datastore issues --- [...] The most common example of
this occurs when you are rapidly inserting a large number of entities
of the same kind, with auto-generated IDs. In this case, most inserts
hit the same range of the same tablet, and the single tablet server is
overwhelmed with writes. [...] If this does affect your app, the
easiest solution is to use more evenly distributed IDs instead of the
auto-allocated ones [...]"
( http://code.google.com/appengine/articles/handling_datastore_errors.html
)

Let's say I am having a model "Parent" and a model "Child". For Parent
entities, I use key names that are evenly distributed. For Child
entities, I use auto-generated key IDs and _no_ key names, but all
Child entities are children of Parent entities, so the paths to the
children contain the evenly distributes key names of the parents.
If I have many write operations on children that are in the same
entity group, the described error could occur. But what happens if my
write operations are on children that are in different entity groups?
Their IDs are auto-generated and not evenly distributed, but their
paths contain the evenly distributed key names.

Nick Johnson (Google)

unread,
Feb 17, 2010, 1:32:16 PM2/17/10
to google-a...@googlegroups.com
Hi Ulrich,

Good question! The point being made in the article refers to the global distribution of the complete key, so writes to these children will be well distributed, and you won't have to worry about this source of contention.

-Nick Johnson
 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Nick Johnson, Developer Programs Engineer, App Engine
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Eli Jones

unread,
Feb 17, 2010, 2:02:14 PM2/17/10
to google-a...@googlegroups.com
I understand the process of evenly distributing IDs since they are Integer values.. is there a canonized appengine way to evenly distribute key_names?  

Just make sure key_name1 and key_name2 don't have their i-th letters "too close" too eachother? How far is far enough?

Does doing even distribution matter if you aren't using auto-generated IDs?

Thanks for information.

Ulrich

unread,
Feb 17, 2010, 2:33:08 PM2/17/10
to google-a...@googlegroups.com
Hi Nick,

Thanks for your fast answer!

> <mailto:google-a...@googlegroups.com>.


> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com

> <mailto:google-appengine%2Bunsu...@googlegroups.com>.

Nick Johnson (Google)

unread,
Feb 18, 2010, 12:59:37 PM2/18/10
to google-a...@googlegroups.com
Hi Eli,

Using a randomly generated ID like a uuid is perfectly satisfactory to achieve an even distribution.

On Wed, Feb 17, 2010 at 7:02 PM, Eli Jones <eli....@gmail.com> wrote:
I understand the process of evenly distributing IDs since they are Integer values.. is there a canonized appengine way to evenly distribute key_names?  

Just make sure key_name1 and key_name2 don't have their i-th letters "too close" too eachother? How far is far enough?

Does doing even distribution matter if you aren't using auto-generated IDs?

It certainly can - if you insert, in order, "aaaa", "aaab", "aaac", etc, you'll encounter the same problem at very high volumes as you'd see with auto generated IDs.

-Nick Johnson

peterk

unread,
Feb 18, 2010, 7:28:50 PM2/18/10
to Google App Engine
What about keynames like:

counter_standard_dbf
counter_standard_clo

or would something like

dbfo01la_counter_standard
clo091b_counter_standard

work better?

I'm thinking of cases where you may use keynames that can in some way
be constructed/predicted for fast access later.
like..<username>_counter_standard

Would the common pre-fix or post-fix make for close distribution? :|

On Feb 18, 5:59 pm, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:


> Hi Eli,
>
> Using a randomly generated ID like a uuid is perfectly satisfactory to
> achieve an even distribution.
>

> On Wed, Feb 17, 2010 at 7:02 PM, Eli Jones <eli.jo...@gmail.com> wrote:
> > I understand the process of evenly distributing IDs since they are Integer
> > values.. is there a canonized appengine way to evenly distribute key_names?
>
> > Just make sure key_name1 and key_name2 don't have their i-th letters "too
> > close" too eachother? How far is far enough?
>
> > Does doing even distribution matter if you aren't using auto-generated IDs?
>
> It certainly can - if you insert, in order, "aaaa", "aaab", "aaac", etc,
> you'll encounter the same problem at very high volumes as you'd see with
> auto generated IDs.
>
> -Nick Johnson
>
>
>
>
>
>
>
> > Thanks for information.
>
> > On Wed, Feb 17, 2010 at 1:32 PM, Nick Johnson (Google) <
> > nick.john...@google.com> wrote:
>
> >> Hi Ulrich,
>

> >>> google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>


> >>> .
> >>> For more options, visit this group at
> >>>http://groups.google.com/group/google-appengine?hl=en.
>
> >> --
> >> Nick Johnson, Developer Programs Engineer, App Engine
> >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number:
> >> 368047
>
> >>  --
> >> You received this message because you are subscribed to the Google Groups
> >> "Google App Engine" group.
> >> To post to this group, send email to google-a...@googlegroups.com.
> >> To unsubscribe from this group, send email to

> >> google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>


> >> .
> >> For more options, visit this group at
> >>http://groups.google.com/group/google-appengine?hl=en.
>
> >  --
> > You received this message because you are subscribed to the Google Groups
> > "Google App Engine" group.
> > To post to this group, send email to google-a...@googlegroups.com.
> > To unsubscribe from this group, send email to

> > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com>

Nick Johnson (Google)

unread,
Feb 19, 2010, 6:58:42 AM2/19/10
to google-a...@googlegroups.com
On Fri, Feb 19, 2010 at 12:28 AM, peterk <peter...@gmail.com> wrote:
What about keynames like:

counter_standard_dbf
counter_standard_clo

or would something like

dbfo01la_counter_standard
clo091b_counter_standard

work better?

I'm thinking of cases where you may use keynames that can in some way
be constructed/predicted for fast access later.
like..<username>_counter_standard

Would the common pre-fix or post-fix make for close distribution? :|

Either one will work fine - Bigtable will split tablets based on key to ensure no tablet gets too big. Long identical prefixes just mean that the split will be based on later characters in the string.

What's important for key distribution for really high update rates is the distribution of key names/IDs for those updates: If they all go to a single tablet (eg, they make up a small proportion of the total range of IDs you're employing), they will be limited by what that tablet server can support. If they are widely spread out within the range you're using, regardless of what that range is, you'll be fine.

-Nick Johnson
 
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

peterk

unread,
Feb 19, 2010, 8:04:27 AM2/19/10
to Google App Engine
Thanks Nick, I understand now. So I guess the easiest thing to do is
to have a random component in your keynames...at least for apps I'm
considering I don't think I'd have any other way to reasonably ensure
the range of keynames in a given (batch) update were well distributed.

On Feb 19, 11:58 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:

> > > >>> google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com><google-appengine%2Bunsubscrib


> > e...@googlegroups.com>
> > > >>> .
> > > >>> For more options, visit this group at
> > > >>>http://groups.google.com/group/google-appengine?hl=en.
>
> > > >> --
> > > >> Nick Johnson, Developer Programs Engineer, App Engine
> > > >> Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration
> > Number:
> > > >> 368047
>
> > > >>  --
> > > >> You received this message because you are subscribed to the Google
> > Groups
> > > >> "Google App Engine" group.
> > > >> To post to this group, send email to
> > google-a...@googlegroups.com.
> > > >> To unsubscribe from this group, send email to

> > > >> google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com><google-appengine%2Bunsubscrib


> > e...@googlegroups.com>
> > > >> .
> > > >> For more options, visit this group at
> > > >>http://groups.google.com/group/google-appengine?hl=en.
>
> > > >  --
> > > > You received this message because you are subscribed to the Google
> > Groups
> > > > "Google App Engine" group.
> > > > To post to this group, send email to google-a...@googlegroups.com
> > .

> > > > To unsubscribe from this group, send email to

> > > > google-appengi...@googlegroups.com<google-appengine%2Bunsubscrib e...@googlegroups.com><google-appengine%2Bunsubscrib

Nick Johnson (Google)

unread,
Feb 19, 2010, 8:12:47 AM2/19/10
to google-a...@googlegroups.com
Hi Peter,

On Fri, Feb 19, 2010 at 1:04 PM, peterk <peter...@gmail.com> wrote:
Thanks Nick, I understand now. So I guess the easiest thing to do is
to have a random component in your keynames...at least for apps I'm
considering I don't think I'd have any other way to reasonably ensure
the range of keynames in a given (batch) update were well distributed.

Bear in mind that you only have to even worry about this if you're expecting hundreds of QPS of inserts to the same model.

If you are in this situation, hashing some stable information from your model may be sufficient to generate a well distributed key name.

-Nick Johnson

To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

peterk

unread,
Feb 19, 2010, 9:02:14 AM2/19/10
to Google App Engine
Say I did a batch update of 500 entities all of the same model...could
this breach the '100s of qps' requirement that could lead to tablets
getting too hot? I've seen benches (http://blog.dantup.com/pi/
bm_put_perf.png) that show 500 entities being batch put in ~4s which
suggests an average put rate that might be in the 100+ per second in
such a case.

Or would that be 'ok'? :) I'm guessing if I were in a situation where
multiple such batch updates could be occurring simultaneously or in a
tight timeframe, then I'd be more likely to run into this...depending
on how busy my app became, I'd possibly need to start doing that (i.e.
lots of such large batch updates happening in short timeframes).


On Feb 19, 1:12 pm, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> Hi Peter,

Nick Johnson (Google)

unread,
Feb 19, 2010, 10:51:06 AM2/19/10
to google-a...@googlegroups.com
Hi peterk,

On Fri, Feb 19, 2010 at 2:02 PM, peterk <peter...@gmail.com> wrote:
Say I did a batch update of 500 entities all of the same model...could
this breach the '100s of qps' requirement that could lead to tablets
getting too hot? I've seen benches (http://blog.dantup.com/pi/
bm_put_perf.png
) that show 500 entities being batch put in ~4s which
suggests an average put rate that might be in the 100+ per second in
such a case.

Only if you're doing such puts at a high rate. It's the sustained rate that matters, not the instantaneous rate.

-Nick Johnson
 
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Michael Hermus

unread,
May 14, 2012, 4:31:10 PM5/14/12
to google-a...@googlegroups.com
I know this is an old thread, but relevant to a recent topic of interest. My question is:

If the sustained QPS is indeed high, is there any advantage to doing batch put operations (as it relates to the 'hot tablet' issue with sequential index values)? In other words, if I am trying to write 1000 entities/second using timestamp as an indexed property, will batching them into groups of 250 or 500 also batch the index row writes that are going to the 'hot tablet', thus mitigating the problem to some extent?


On Friday, February 19, 2010 10:51:06 AM UTC-5, Nick Johnson (Google) wrote:
Hi peterk,
Reply all
Reply to author
Forward
0 new messages