Auto generated keys vs UUID

1,079 views
Skip to first unread message

Jaganathan K

unread,
Aug 31, 2010, 4:46:54 AM8/31/10
to Google App Engine, Shanmuganandh G, Ramya Shan
Hi Google, Nick Johnson and others

Does Google recommend using UUID (UUID4 to be specific) as key names
for entities in GAE instead of auto generated numeric ids?

Does this hold good in all use cases in the universe?

And is this the reason why import export of auto generates IDs are not
yet supported in BulkUpload? Should I wait till this is supported or
should I refactor my code to use UUID key names?

Is there a demo page that compares between the bulk read/write
performance between entities that have 32-bit numeric ids versus xxx-
bit UUID key names?

Some home work that I did:

http://groups.google.com/group/google-appengine/browse_thread/thread/bc868376ffa37619/db9bda21c95ea806?lnk=gst&q=uuid#db9bda21c95ea806

http://groups.google.com/group/google-appengine/browse_thread/thread/a6fcce84837c34b/d4c9d5b3c7869222?lnk=gst&q=uuid#d4c9d5b3c7869222

http://groups.google.com/group/google-appengine/browse_thread/thread/7dedb7d65bdf4f/37f2733fd9f572b9?lnk=gst&q=uuid#37f2733fd9f572b9

Thanks
Jagan

Nick Johnson (Google)

unread,
Sep 2, 2010, 10:47:41 AM9/2/10
to google-a...@googlegroups.com
Hi Jagan,

On Tue, Aug 31, 2010 at 9:46 AM, Jagan <ksj...@gmail.com> wrote:
Hi Google, Nick Johnson and others

Does Google recommend using UUID (UUID4 to be specific) as key names
for entities in GAE instead of auto generated numeric ids?

Does this hold good in all use cases in the universe?

This depends entirely on your use-case. Unless you expect an extremely high insertion rate of new entities (eg, >=100QPS) with no natural key name, it's not necessary to use a UUID. UUIDs take more space than regular integer IDs.
 

And is this the reason why import export of auto generates IDs are not
yet supported in BulkUpload? Should I wait till this is supported or
should I refactor my code to use UUID key names?

It's possible to do this using the bulkloader, but requires you to override a method to generate key names.
 

Is there a demo page that compares between the bulk read/write
performance between entities that have 32-bit numeric ids versus xxx-
bit UUID key names?

The performance should be roughly the same - but the space used will be significantly larger in the case of UUIDs.

-Nick Johnson
 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Jaganathan K

unread,
Sep 13, 2010, 9:35:32 AM9/13/10
to google-a...@googlegroups.com
Hi Nick and others

Thank you for your response.

More inline below ...

On Thu, Sep 2, 2010 at 8:17 PM, Nick Johnson (Google) <nick.j...@google.com> wrote:
Hi Jagan,

On Tue, Aug 31, 2010 at 9:46 AM, Jagan <ksj...@gmail.com> wrote:
Hi Google, Nick Johnson and others

Does Google recommend using UUID (UUID4 to be specific) as key names
for entities in GAE instead of auto generated numeric ids?

Does this hold good in all use cases in the universe?

This depends entirely on your use-case. Unless you expect an extremely high insertion rate of new entities (eg, >=100QPS) with no natural key name, it's not necessary to use a UUID. UUIDs take more space than regular integer IDs.
 

And is this the reason why import export of auto generates IDs are not
yet supported in BulkUpload? Should I wait till this is supported or
should I refactor my code to use UUID key names?

It's possible to do this using the bulkloader, but requires you to override a method to generate key names.

Were you speaking about the old bulkloader? From looking at the code at <sdk-home>\google\appengine\tools\bulkloader.py, I think that might not work, because even if we return a Key constructed with the id, it is later doing loader.create_entity with the key name. Here is the snippet that I am relating to:

Line 1260-1267:
    for line_number, values in rows:
      key = loader.generate_key(line_number, values)
      if isinstance(key, datastore.Key):
        parent = key.parent()
        key = key.name()
      else:
        parent = None
      entity = loader.create_entity(values, key_name=key, parent=parent)


Anyways, I tried the following technique in the new bulkloader:

- kind: XYZ
  connector: csv
  connector_options:
  property_map:
    - property: __key__
      external_name: key
      import_transform: transform.create_foreign_key('XYZ', True)
      export_transform: transform.key_id_or_name_as_string

It is working. Is this good enough?

But I have one general doubt here. I read in one forum that, with these techniques, the sequence number (?) that GAE maintains for the next new generated id wont be incremented / updated, and so when our app tries to create a new entity, its auto generated id might clash. Is this true? Is there any link that describes how auto increment ids work on App Engine?
 
 

Is there a demo page that compares between the bulk read/write
performance between entities that have 32-bit numeric ids versus xxx-
bit UUID key names?

The performance should be roughly the same - but the space used will be significantly larger in the case of UUIDs.

-Nick Johnson
 

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.




--
Nick Johnson, Developer Programs Engineer, App Engine Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047
Google Ireland Ltd. :: Registered in Dublin, Ireland, Registration Number: 368047

Thanks
Jagan

--
Let the words of our mouth and the meditations of our heart
Be acceptable in Thy sight here tonight!
- Rivers of Babylon (album)

Jaganathan K

unread,
Sep 18, 2010, 12:56:56 AM9/18/10
to google-a...@googlegroups.com, Nick Johnson (Google)
Hi Nick and others

Finally, I tried my app on the imported DB today. The import was done as per the technique mentioned in the previous mail.

As expected, it threw the following error, when I initiated an action that creates a new entity:

the id allocated for a new entity was already in use, please try again
Traceback (most recent call last):
  [some trace truncated]
  File "/base/data/home/apps/notesonatree/1-5.344881683710780408/notesonatree/core/models.py", line 111, in create_note
    new_note.put()
  File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/db/__init__.py", line 893, in put
    return datastore.Put(self._entity, rpc=rpc)
  File "/base/python_runtime/python_lib/versions/1/google/appengine/api/datastore.py", line 293, in Put
    raise _ToDatastoreError(err)
InternalError: the id allocated for a new entity was already in use, please try again

:D . Is there a way to work around this? Is this a work in progress?

Also try to answer my previous mail if possible.

Thanks
Jagan 

2010/9/13 Jaganathan (ஜ௧நாதன்) <ksj...@gmail.com>

Nick Johnson (Google)

unread,
Sep 20, 2010, 5:15:47 AM9/20/10
to Jaganathan K, google-a...@googlegroups.com
Hi,

You can use allocate_id_range (see http://code.google.com/appengine/docs/python/datastore/functions.html) to allocate IDs to cover the range that you imported, to prevent the datastore automatically allocating any IDs in that range.

-Nick Johnson

2010/9/18 Jaganathan (ஜ௧நாதன்) <ksj...@gmail.com>
Reply all
Reply to author
Forward
0 new messages