Google Groups Home
Help | Sign in
creating unique numeric IDs in datastore (sample code)
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  23 messages - Collapse all
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
vrypan  
View profile
 More options Apr 27, 2:10 am
From: vrypan <vry...@gmail.com>
Date: Sat, 26 Apr 2008 23:10:53 -0700 (PDT)
Local: Sun, Apr 27 2008 2:10 am
Subject: creating unique numeric IDs in datastore (sample code)
I wrote some code to implement the equivalent of a "unique
auto_increment index" in datastore (or better, to overcome the lack of
it).

You can find the code here: http://vrypan.net/log/2008/04/27/unique-integer-ids-in-google-datastore/

It looks like it's working, but I'm a Python newbie, so please point
out any mistakes or performance considerations!


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
dundeemt  
View profile
 More options Apr 27, 11:28 am
From: dundeemt <dunde...@gmail.com>
Date: Sun, 27 Apr 2008 08:28:46 -0700 (PDT)
Local: Sun, Apr 27 2008 11:28 am
Subject: Re: creating unique numeric IDs in datastore (sample code)
On Apr 27, 1:10 am, vrypan <vry...@gmail.com> wrote:

> I wrote some code to implement the equivalent of a "unique
> auto_increment index" in datastore (or better, to overcome the lack of
> it).

> You can find the code here:http://vrypan.net/log/2008/04/27/unique-integer-ids-in-google-datastore/

> It looks like it's working, but I'm a Python newbie, so please point
> out any mistakes or performance considerations!

I don't think that an auto-increment field is the way to go.  It is
viable when you only have 1 database but I don't think that is how GAE
operates.  Someone step in and correct me if I'm wrong.  The datastore
for your app is going to/can be replicated out to other machines based
on geographic usage.  This would mean that their exists times, when
datastore' != datastore'' -- over time datastore' would be sync'd with
datastore'' so that datastore' == datastore''   -- this would lead one
to believe that there will be times when the idea of an auto-increment
field will not be synchronizable or that the result of the
synchronization would be less than satisfactory.  My belief that auto-
increment fields are the wrong idea in this environment is
strengthened by the fact that they are not offered as an intrinsic
data type in the Model or Expando classes.

The way to go, in my opinion, is to use UUIDs. (see links below)
  http://docs.python.org/lib/module-uuid.html
  http://www.faqs.org/rfcs/rfc4122.html

1) data access is very expensive, using a UUID should be faster
2) UUID1 or UUID4 would be the types to consider
3) UUID1 is preferable as it would introduce some machine significance
which should make the chances for a collision to be even more remote
than for a UUID4 (random)

Maybe one of the Google engineers would want to comment.  Also, it
would be nice if GAE supplied a UUID property as one of the datastore
value types.

-Jeff


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Lee O  
View profile
 More options Apr 27, 4:51 pm
From: "Lee O" <lee...@gmail.com>
Date: Sun, 27 Apr 2008 13:51:33 -0700
Local: Sun, Apr 27 2008 4:51 pm
Subject: Re: [google-appengine] Re: creating unique numeric IDs in datastore (sample code)

Note that there is also unique id's for each object in the datastore. They
also increment. The only problem is it is not specific to model type, but
rather simple based on objects in the store.

--
Lee Olayvar
http://www.leeolayvar.com

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Dave Hanson  
View profile
 More options Apr 27, 6:11 pm
From: Dave Hanson <d...@drhanson.net>
Date: Sun, 27 Apr 2008 15:11:46 -0700 (PDT)
Local: Sun, Apr 27 2008 6:11 pm
Subject: Re: creating unique numeric IDs in datastore (sample code)
When I need an id field, I use the unique id for each entity in the
datastore. It's a bit of a pain, because must save new objects once
before their ids are available, e.g.,

class Label(db.Model):
  name = db.StringProperty(required=True)
  ...
  id = db.IntegerProperty()

class LabelHandler(webapp.RequestHandler):
  def post(self):
    ...
    label = Label(name=self.request.get('name'))
    label.put()
    label.id = label.key().id()
    label.put()
    ...


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
vrypan  
View profile
 More options Apr 27, 8:15 pm
From: vrypan <vry...@gmail.com>
Date: Sun, 27 Apr 2008 17:15:55 -0700 (PDT)
Local: Sun, Apr 27 2008 8:15 pm
Subject: Re: creating unique numeric IDs in datastore (sample code)
All mentioning object.key().id() are right. There are some
implications however:

1. Is key().id() really faster than my suggestion?
It should be. But how much?

2. I like Dave Hanson's example. However, what happens if someone
tries to access the label.id before it stored back?
Ensuring consistency there may take more resources -or not?

3. My "counter" approach has an extra bonus: it's useful when you need
to know the number of objects stored. Remember, there is no "select
count()" in datastore, and the key().id() sequence is not well defined
(the next object you store may have an id that's not last object's id
+1)

Jeff, I don't think your synchronization concerns are valid in this
case. It looks like the implementation I suggested is consistent (the
counter increments take place in a transaction) and the actual piece
of information that needs to be replicated between servers is a
relatively small object. That said, I have no insight on how google
infrastructure works, so I may be totally wrong. :-)

I would be nice if Google provided a stress-testing service for our
apps. Something like ab (Apache benchmarking tool) for AppEngine,
hosted by google?

--Panayotis.

On Apr 28, 1:11 am, Dave  Hanson <d...@drhanson.net> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
dundeemt  
View profile
 More options Apr 27, 9:02 pm
From: dundeemt <dunde...@gmail.com>
Date: Sun, 27 Apr 2008 18:02:15 -0700 (PDT)
Local: Sun, Apr 27 2008 9:02 pm
Subject: Re: creating unique numeric IDs in datastore (sample code)

On Apr 27, 7:15 pm, vrypan <vry...@gmail.com> wrote:

I too, am hoping someone from Google would weigh in on the issue.
Since, I don't have a GAE account yet, I've had to do most of my
design/testing based on assumptions about synchronization and
replication.  And I could be very wrong. ;)  However, using UUIDs in
places where I would have used auto-incrementing is not that big of a
deal.  It is a surrogate key,  just as an auto-increment field is and
can be created prior to "put"ting the record, so additional db
interactions are not necessary.  So the idea does have certain
advantages.  As to keeping a count of records,  that can be memoized
so the penalty for a COUNT(*) could be amortized out over a large
number of requests.  (I'm not a big fan of surrogate keys but I am a
pragmatist on the topic.)

While the dev environment is apparently quite good in mocking the
python portion and basic table operations, it does leave a lot
unanswered for the implications of replication and synchronization --
I'm not sure if anyone(non-google) completely understands this with
regard to those two topics.  And I don't fault the dev environment,
it's a nice piece of work but it isn't GAE proper<g>, nor could it be
expected to be.

Regards,

Jeff


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jeremey Barrett  
View profile
(1 user)  More options Apr 28, 12:41 am
From: "Jeremey Barrett" <jeremey.barr...@gmail.com>
Date: Sun, 27 Apr 2008 23:41:49 -0500
Local: Mon, Apr 28 2008 12:41 am
Subject: Re: [google-appengine] Re: creating unique numeric IDs in datastore (sample code)

On Sun, Apr 27, 2008 at 5:11 PM, Dave Hanson <d...@drhanson.net> wrote:

>  When I need an id field, I use the unique id for each entity in the
>  datastore. It's a bit of a pain, because must save new objects once
>  before their ids are available, e.g.,

>  class Label(db.Model):
>   name = db.StringProperty(required=True)
>   ...
>   id = db.IntegerProperty()

This would be better done with a property, I think:

class Label(db.Model):
    name = db.StringProperty(required=True)

    def get_id(self):
        return self.key().id()
    id = property(get_id)

Then you can do:

  label = Label(name='foo')
  label.put()
  print label.id

You could put more intelligence into get_id() of course.

I'm not sure there's a compelling reason to do any of this, though,
since label.key().id() is quite simple if you want a simple numeric id.

Regards,
Jeremey.


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben the Indefatigable  
View profile
 More options Apr 28, 8:43 am
From: Ben the Indefatigable <bcbry...@gmail.com>
Date: Mon, 28 Apr 2008 05:43:17 -0700 (PDT)
Local: Mon, Apr 28 2008 8:43 am
Subject: Re: creating unique numeric IDs in datastore (sample code)
On Apr 27, 11:28 am, dundeemt <dunde...@gmail.com> wrote:

> I don't think that an auto-increment field is the way to go.  It is
> viable when you only have 1 database but I don't think that is how GAE
> operates.  Someone step in and correct me if I'm wrong.  The datastore
> for your app is going to/can be replicated out to other machines based
> on geographic usage.  This would mean that their exists times, when
> datastore' != datastore'' -- over time datastore' would be sync'd with
> datastore'' so that datastore' == datastore''   -- this would lead one
> to believe that there will be times when the idea of an auto-increment
> field will not be synchronizable or that the result of the
> synchronization would be less than satisfactory.  My belief that auto-
> increment fields are the wrong idea in this environment is
> strengthened by the fact that they are not offered as an intrinsic
> data type in the Model or Expando classes.

On Apr 27, 8:15 pm, vrypan <vry...@gmail.com> wrote:

> Jeff, I don't think your synchronization concerns are valid in this
> case. It looks like the implementation I suggested is consistent (the
> counter increments take place in a transaction) and the actual piece
> of information that needs to be replicated between servers is a
> relatively small object. That said, I have no insight on how google
> infrastructure works, so I may be totally wrong. :-)

Jeff, synchronization is not a concern. The datastore hides the issue
of multiple machines. Even when you don't have much data it is likely
that the master copies of different entities of your data reside on
different machines because it is never assigned or bound to one
machine; the BigTable infrastructure is shared by many applications
from the beginning. Rest assured that "datastore == datastore" as you
put it. The whole point of a transaction on an entity is that you can
rely on the integrity of that entity regardless of datastore's
underlying implementation.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Mahmoud  
View profile
 More options Apr 28, 10:37 am
From: Mahmoud <mahmoud.ar...@gmail.com>
Date: Mon, 28 Apr 2008 07:37:39 -0700 (PDT)
Local: Mon, Apr 28 2008 10:37 am
Subject: Re: creating unique numeric IDs in datastore (sample code)
Why do you need an auto_increment id?

The datastore already generates a globally _unique_ key for each
entity. Uniqueness is assured by choosing a long enough random
alphanumerical string, which makes collisions practically impossible
and eradicates the need for expensive transactions. Moreover, there is
also a unique numerical id, that is locally unique for that entity
type.

As for record/entity counts, numerous posts have suggested storing the
count somewhere in the datastore. However, one would have to update
the count whenever entities are added or deleted. This adds a penalty
to write operations, but makes fetching the count trivial.

-Mahmoud

On Apr 27, 2:10 am, vrypan <vry...@gmail.com> wrote:


    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
vrypan  
View profile
 More options Apr 28, 11:09 am
From: vrypan <vry...@gmail.com>
Date: Mon, 28 Apr 2008 08:09:26 -0700 (PDT)
Local: Mon, Apr 28 2008 11:09 am
Subject: Re: creating unique numeric IDs in datastore (sample code)
On Apr 28, 5:37 pm, Mahmoud <mahmoud.ar...@gmail.com> wrote:

> Why do you need an auto_increment id?

> As for record/entity counts, numerous posts have suggested storing the
> count somewhere in the datastore. However, one would have to update
> the count whenever entities are added or deleted. This adds a penalty
> to write operations, but makes fetching the count trivial.

Storing the count is exactly the same problem! After all, if storing
the count was so trivial, you would use its current value as a unique,
auto-increment id, no?
You should update the counter in a transaction making sure that you
have the actual number and concurrent actions don't increase it from x
to x+1.
Back to where we started :-)

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ben the Indefatigable  
View profile
 More options Apr 28, 12:59 pm
From: Ben the Indefatigable <bcbry...@gmail.com>
Date: Mon, 28 Apr 2008 09:59:34 -0700 (PDT)
Local: Mon, Apr 28 2008 12:59 pm
Subject: Re: creating unique numeric IDs in datastore (sample code)
a better title for this post might have been "sequence number" or
"count" rather than unique ID. I imagine there are genuine uses for
this because it really just boils down to run_in_transaction on a
function that increments the number.

    Reply to author    Forward  
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.