How is an entity key generated?

31 views
Skip to first unread message

Oliver Zheng

unread,
Jun 12, 2009, 7:22:59 PM6/12/09
to Google App Engine
Is it a hash function in that the resulting string is randomly
distributed among its range? It seems like in a distributed hash table
like BigTable, this would be how it's implemented.

The reason I'm asking is, I want to do a random query. I want to
select a random entity from an entity group. I thought if the key is
distributed like this, then I could generate a random string like it
myself, and simply query for the first result bigger than this. Is
this viable?

Thanks.

Oliver Zheng

unread,
Jun 12, 2009, 7:26:05 PM6/12/09
to Google App Engine
I forgot to include this question:

What is the overhead of querying like this for just the first result,
vs just getting with the key? What is the magnitude of difference in
time?

Nick Johnson (Google)

unread,
Jun 15, 2009, 9:34:34 AM6/15/09
to google-a...@googlegroups.com
Hi Oliver,

On Sat, Jun 13, 2009 at 12:26 AM, Oliver Zheng <goo...@oliverzheng.com> wrote:

I forgot to include this question:

What is the overhead of querying like this for just the first result,
vs just getting with the key? What is the magnitude of difference in
time?

A get operation takes about 20-40 ms, while a query takes 160-200ms. See the system status console: http://code.google.com/status/appengine/detail/datastore/2009/06/15#ae-trust-detail-datastore-get-latency



On Jun 12, 4:22 pm, Oliver Zheng <goo...@oliverzheng.com> wrote:
> Is it a hash function in that the resulting string is randomly
> distributed among its range? It seems like in a distributed hash table
> like BigTable, this would be how it's implemented.

Bigtable isn't actually a distributed hash table. For more details, check out the paper at http://labs.google.com/papers/bigtable.html .

A key consists of an app ID and a series of (type, id_or_name) pairs, where type is the name of your datastore class, and id_or_name is an autogenerated integer ID, or a user-provided name (in the case where you provided a key_name at construction time). Root entities have exactly one of these pairs, while child entities have one for each parent, in addition to their own - that is, the key describes the chain of entities.

In the case of 'stringified' keys, what you are seeing is the base64 encoding of the protocol buffer containing the key. You can verify this by going to shell.appspot.com and entering:

---
from google.appengine.ext import db
db.Key(mystr)
---

Where mystr is the key string - it will output the full key.
 

>
> The reason I'm asking is, I want to do a random query. I want to
> select a random entity from an entity group. I thought if the key is
> distributed like this, then I could generate a random string like it
> myself, and simply query for the first result bigger than this. Is
> this viable?

No, but you could do the same with key components, assuming you know your IDs or names are evenly distributed. A better approach, however, is to associate a random number between 0 and 1 with each entity, and query on that.

-Nick Johnson


>
> Thanks.



gae123

unread,
Jun 16, 2009, 10:43:07 PM6/16/09
to Google App Engine
Nick,

can we consider what you write below to be part of the API or an
internal implementation detail we whould not be relying on?

Thanks

Nick Johnson (Google)

unread,
Jun 17, 2009, 6:43:49 AM6/17/09
to google-a...@googlegroups.com
I would not rely on the encoding str(key) remaining the same, no. The only guarantee is that it will be interconvertible - eg, db.Key(str(key)) == key. I can't think of a practical reason why you should be relying on this in the first place, though.

-Nick Johnson
--
Nick Johnson, App Engine Developer Programs Engineer
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Samus_

unread,
Jul 3, 2009, 12:29:09 AM7/3/09
to Google App Engine
actually, the first thing that came to mind for this was to use it as
part of a url to access specific entities since the reference says
it's url-safe: http://code.google.com/appengine/docs/python/datastore/keyclass.html#Key
and also the same idea is shown on this example:
http://code.google.com/appengine/docs/python/datastore/creatinggettinganddeletingdata.html#Getting_an_Entity_Using_a_Key
but of course if the encoding method is not guaranteed to remain the
same then this is a bad idea (cool URIs don't change right?) good to
know thanks.

On Jun 17, 7:43 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:

Nick Johnson (Google)

unread,
Jul 3, 2009, 5:25:17 AM7/3/09
to google-a...@googlegroups.com
Hi Samus,

On Fri, Jul 3, 2009 at 5:29 AM, Samus_<mail2...@gmail.com> wrote:
>
> actually, the first thing that came to mind for this was to use it as
> part of a url to access specific entities since the reference says
> it's url-safe: http://code.google.com/appengine/docs/python/datastore/keyclass.html#Key
> and also the same idea is shown on this example:
> http://code.google.com/appengine/docs/python/datastore/creatinggettinganddeletingdata.html#Getting_an_Entity_Using_a_Key
> but of course if the encoding method is not guaranteed to remain the
> same then this is a bad idea (cool URIs don't change right?) good to
> know thanks.

Using keys in URLs is more or less an officially 'blessed' approach.
For that reason, I'd revise my earlier statement and say that while
it's not totally impossible for the key encoding format to change in
future, it's extremely unlikely.

That said, you can provide much more user-friendly URLs if you use
just the fields you need from the key - www.mysite.com/123 is much
more user friendly than www.mysite.com/somelongbase64string.

-Nick Johnson

Samus_

unread,
Jul 5, 2009, 12:47:03 PM7/5/09
to Google App Engine
Hi Nick, thanks for your reply! yes indeed a numeric id would be more
readable so, just to be clear and for future reference, your
suggestion is to use entity.key().id() to construct the url and
Model.get_by_id() to retireve the entity right?

On Jul 3, 6:25 am, "Nick Johnson (Google)" <nick.john...@google.com>
wrote:
> Hi Samus,
>
> On Fri, Jul 3, 2009 at 5:29 AM, Samus_<mail2sa...@gmail.com> wrote:
>
> > actually, the first thing that came to mind for this was to use it as
> > part of a url to access specific entities since the reference says
> > it's url-safe:http://code.google.com/appengine/docs/python/datastore/keyclass.html#Key
> > and also the same idea is shown on this example:
> >http://code.google.com/appengine/docs/python/datastore/creatinggettin...
> > but of course if the encoding method is not guaranteed to remain the
> > same then this is a bad idea (cool URIs don't change right?) good to
> > know thanks.
>
> Usingkeysin URLs is more or less an officially 'blessed' approach.
> For that reason, I'd revise my earlier statement and say that while
> it's not totally impossible for the key encoding format to change in
> future, it's extremely unlikely.
>
> That said, you can provide much more user-friendly URLs if you use
> just the fields you need from the key -www.mysite.com/123is much
> more user friendly thanwww.mysite.com/somelongbase64string.
>
> -Nick Johnson
>
>
>
>
>
> > On Jun 17, 7:43 am, "Nick Johnson (Google)" <nick.john...@google.com>
> > wrote:
> >> I would not rely on the encodingstr(key) remaining the same, no. The only

Nick Johnson (Google)

unread,
Jul 6, 2009, 6:04:06 AM7/6/09
to google-a...@googlegroups.com
Hi Samus_,

That's correct.

-Nick Johnson

David

unread,
Jul 6, 2009, 9:58:06 AM7/6/09
to Google App Engine

> The reason I'm asking is, I want to do a random query. I want to
> select a random entity from an entity group. I thought if the key is
> distributed like this, then I could generate a random string like it
> myself, and simply query for the first result bigger than this. Is
> this viable?
>

Google App Engine aside, this is a not a good way to select a random
key if you want to pick a key uniformly at random. Suppose the keys
were integers from 1 to 100, and there were three entities with random
keys of, say, {12,88,97}. You would be more likely to select 88 by
your method, since it has a longer gap preceding it.

Reply all
Reply to author
Forward
0 new messages