Pre-allocating datastore IDs - plumbing this with db.Model

274 views
Skip to first unread message

Tim

unread,
Jul 27, 2011, 10:49:28 AM7/27/11
to google-a...@googlegroups.com

This is a bit of an advanced question for the experts with regards to Keys, IDs, Models and the db.allocate_ids() functionality - those who can answer it won't need much more explanation, so apologies if it doesn't make much sense to everyone else (but you may want to file it away in case you ever need to do the same).

My one-page-webapp uses a datastore framework that gets a bit distraught when it has create new items and allocate temporary IDs which will later get replaced with "true" IDs from the datastore (I've been using the actual Key strings as ids), so I thought I'd have a look at pre-allocating IDs.

In the absence of cookbook examples, I've concluded
  • I can pre-allocate a range of IDs for a given class using db.allocate_ids(), but as individual keys encode the id and the parent instance (if any) then I can't convert these pre-allocated IDs into pre-allocated Keys if I may be using parent objects.
  • So I should change the client to use IDs (or maybe "type+id" tuples), and have queries to the GAE server return items with 'obj.key().id()' rather than 'str(obj.key())' 
  • The client can be given a range of pre-allocated IDs that it can safely assign to new records as it creates them, knowing they won't change when the item is created in the datastore
  • For updating/deleting items on GAE, I'll replace calls to "db.get(key)" (where the key came from the client, and yes I then check the object is valid etc before I proceed) with "MyModel.get_by_id(ids)" (I know the types of objects by this stage so it's not like I'm doing a heterogeneous fetch)
  • I'll still store references to other objects by db.ReferenceProperty type (ie key) rather than simple id as it makes migration of data easier (ie use of IDs rather than Keys is a client layer convenience only)
  • I'll implement a new model base class for all db.Model derived classes, that defines a new optional "id" parameter and constructs the correct key (my python knowledge of the constructs is bit green, so I think this does what I expect - kind() is a reserved but undocumented instance method that looks safer than using __name__ by looking at the SDK source)
class BaseModel(db.Model):
    def __init__(self, id=None, parent=None, **args):
        if id != None:
            args["key"] = db.Key.from_path(self.kind(), id, parent=parent)
        db.Model.__init__(self, parent=parent, **args)

The db.Model ctor docs say that "key" can't be used with parent or key_name, looking at the SDK source implies the last line of the above should be fine, but would I do better to omit the "parent=parent" parameter and rely on the fact that, if needed, the parent object is looked up form the key?
  • I can then make instances as before 'obj = MyModel(someproperty=97, another="Hello world")' but I now have a special optional "id" property for construction. I haven't changed parent or property semantics or the like.
Does the above look reasonable enough? I take it the efficiency is pretty much the same (ie there's no great added overhead such as extra database calls introduced by any of the above) but as I'm no great python expert, and the calls above are documented but not always explained, does the above look reasonable enough or am I laying myself open to a world of pain to come?

Cheers

--
Tim

Tim

unread,
Jul 27, 2011, 11:05:12 AM7/27/11
to google-a...@googlegroups.com
Forgot one extra question about the docs

Key docs say


  Key.from_path(kind, id_or_name, parent=none, namespace=None, **kwds)
[snip]
  For example, the following call creates a key for an entity of kind Address with the numeric ID 9876 whose parent is an entity of kind User with the named key 'Boris':
k = Key.from_path('User', 'Boris', 'Address', 9876)

</quote>

I take it the example is incorrect in that the params are jumbled compared to the docs, again I checked the source but my python-foo with respects to "*args" and "**kwargs" and the like is a little vague... 

I take it the example should be something like

  k = Key.from_path('Address', 9876, parent=Key.from_path('User', 'Boris'))

--
T


Tim

unread,
Jul 31, 2011, 6:11:39 AM7/31/11
to google-a...@googlegroups.com

So I guess no one else pre-allocates IDs then :)

Maybe a googler could at least explain/fix the sample in the documentation of Key.from_path (http://code.google.com/appengine/docs/python/datastore/keyclass.html#Key_from_path) as previously noted.

And a question about IDs, they're defined as being Python "longs" (arbitrary length integers) but 0 is not a valid ID (stated in the docs for Key.from_path()) - will the automatic ID allocation ever allocate a negative range at all ? If the answer is "yes" or "we may reserve the right to do this in the future" then an explicit note in the docs might help too (I take it from "0 not valid" that you'll never get a range that spans 0 of course). 

I only ask because I'll be passing the IDs to javascript code to handle, and as JS doesn't have an arbitrary precision integer type, I'll be manipulating  them as strings.

--
T



Stephen

unread,
Jul 31, 2011, 7:26:07 AM7/31/11
to google-a...@googlegroups.com
IDs are 64bit positive integers.

If you allocate IDs for an entity called 'Sequence' you can use them
for any other entity kind, including entities with parents which would
otherwise use a unique range if auto allocated. IDs from 'Sequence'
will be unique within your datastore, which is a superset of the
requirement that IDs be unique per-entity group. However, you must
always then use an ID from Sequence and never allow auto-allocation.

If some of your entities have parents you can't universally Kind.get_by_id().

Tim

unread,
Jul 31, 2011, 11:59:19 AM7/31/11
to google-a...@googlegroups.com


On Sunday, 31 July 2011 12:26:07 UTC+1, Stephen wrote:
IDs are 64bit positive integers.

OK, that's handy to know - I haven't tried  reserving a negative range (or anything past 2^63) but as a python long is arbitrarily large I was guessing it was quite likely there'd be some such limit in the native API underneath.

If you allocate IDs for an entity called 'Sequence' you can use them
for any other entity kind, including entities with parents which would
otherwise use a unique range if auto allocated. IDs from 'Sequence'
will be unique within your datastore, which is a superset of the
requirement that IDs be unique per-entity group. However, you must
always then use an ID from Sequence and never allow auto-allocation.


Really? My reading was that if I allocate a range of IDs, I can create keys using those IDs as well as create Keys without specifying an ID as the Key code will simply be doing the same (ie allocating further ranges) internally.

<quote>
allocate_ids(model, count)
Allocates a batch of IDs in the datastore for a datastore kind and parent combination.
IDs allocated in this manner will not be used by the datastore's automatic ID sequence generator and may be used in Keys without conflict..
</quote>

So you can safely mix auto-allocation and pre-allocation freely, but you shouldn't mix either auto-allocation or pre-allocation with arbitrary self determined allocation unless checked with allocate_id_range(). In my case, I've previously auto-allocated IDs for all my objects, and I'll now be switching to pre-allocation and explicit Key generation for selected model classes.

If some of your entities have parents you can't universally Kind.get_by_id().


I think I know what you're saying, but for anyone else reading, the point is that Kind.get_by_id() MUST be given both the id AND the parent that was specified when the Key was made (so it's easier if you're not using parents as you always know the parent is None). In my case, when I fetch by ID I will actually know the parent due to the way my datastore and queries are organised. But as far as I can see, if you just have the ID but not the parent, then there is no way to get the matching Key (but you could, for example, store the ID as a property on your object and then query on that if need be, at the cost of an extra property and index overhead).

Cheers

--
Tim

Stephen

unread,
Jul 31, 2011, 2:13:06 PM7/31/11
to google-a...@googlegroups.com
On Sun, Jul 31, 2011 at 4:59 PM, Tim <mee...@gmail.com> wrote:
>
> Really? My reading was that if I allocate a range of IDs, I can create keys
> using those IDs as well as create Keys without specifying an ID as the Key
> code will simply be doing the same (ie allocating further ranges)
> internally.

I thought your concern was allocating IDs in advance of knowing
whether an entity has a parent. Entities with different parents (or no
parent) have IDs allocated from different sequence counters by
default. All I'm saying is: rather than jump through hoops trying to
allocate an ID from the correct range, allocate all IDs from one. But
if so, be consistent. Maybe I misunderstand your question...

Nick Johnson (Google)

unread,
Jul 31, 2011, 9:30:11 PM7/31/11
to google-a...@googlegroups.com
Hi Tim,

An ID is part of the key. When you call db.allocate_ids, you pass in a datastore key which specifies the kind name and parent (if any) for which to allocate the ID range. The IDs returned can then be used to create keys as you normally would from an ID:

start_range, end_range = db.allocate_ids(my_key, 10); # my_key is any datastore key
for i in range(start_range, end_range):
  a_new_key = db.Key.from_path(my_key.kind(), i, parent=my_key.parent())
  # Do something with a_new_key.

-Nick Johnson

On Thu, Jul 28, 2011 at 12:49 AM, Tim <mee...@gmail.com> wrote:

This is a bit of an advanced question for the experts with regards to Keys, IDs, Models and the db.allocate_ids() functionality - those who can answer it won't need much more explanation, so apologies if it doesn't make much sense to everyone else (but you may want to file it away in case you ever need to do the same).

My one-page-webapp uses a datastore framework that gets a bit distraught when it has create new items and allocate temporary IDs which will later get replaced with "true" IDs from the datastore (I've been using the actual Key strings as ids), so I thought I'd have a look at pre-allocating IDs.

In the absence of cookbook examples, I've concluded
  • I can pre-allocate a range of IDs for a given class using db.allocate_ids(), but as individual keys encode the id and the parent instance (if any) then I can't convert these pre-allocated IDs into pre-allocated Keys if I may be using parent objects.
False - see above. 
  • So I should change the client to use IDs (or maybe "type+id" tuples), and have queries to the GAE server return items with 'obj.key().id()' rather than 'str(obj.key())' 
Switching to passing IDs in generated pages is a good idea in general, but has absolutely no impact on the functionality.
 
  • The client can be given a range of pre-allocated IDs that it can safely assign to new records as it creates them, knowing they won't change when the item is created in the datastore
Correct. 
  • For updating/deleting items on GAE, I'll replace calls to "db.get(key)" (where the key came from the client, and yes I then check the object is valid etc before I proceed) with "MyModel.get_by_id(ids)" (I know the types of objects by this stage so it's not like I'm doing a heterogeneous fetch)
This is what you should do if you're passing IDs instead of complete keys, but as noted above, has no impact on functionality. 
  • I'll still store references to other objects by db.ReferenceProperty type (ie key) rather than simple id as it makes migration of data easier (ie use of IDs rather than Keys is a client layer convenience only)
Correct. 
  • I'll implement a new model base class for all db.Model derived classes, that defines a new optional "id" parameter and constructs the correct key (my python knowledge of the constructs is bit green, so I think this does what I expect - kind() is a reserved but undocumented instance method that looks safer than using __name__ by looking at the SDK source)
You can do this if you want, but it's purely convenience. Don't override the constructor, create a class method instead. The constructor is used to reconstitute existing entities from the datastore in addition to creating new ones, so overriding it correctly is difficult.

The kind() method is a class method, not an instance method, and is documented: http://code.google.com/appengine/docs/python/datastore/modelclass.html#Model_kind

 
class BaseModel(db.Model):
    def __init__(self, id=None, parent=None, **args):
        if id != None:
            args["key"] = db.Key.from_path(self.kind(), id, parent=parent)
        db.Model.__init__(self, parent=parent, **args)

The db.Model ctor docs say that "key" can't be used with parent or key_name, looking at the SDK source implies the last line of the above should be fine, but would I do better to omit the "parent=parent" parameter and rely on the fact that, if needed, the parent object is looked up form the key?

They can't be used together because a fully qualified key includes the parent and key name, and thus there's no need to specify them separately.
 
  • I can then make instances as before 'obj = MyModel(someproperty=97, another="Hello world")' but I now have a special optional "id" property for construction. I haven't changed parent or property semantics or the like.
Does the above look reasonable enough? I take it the efficiency is pretty much the same (ie there's no great added overhead such as extra database calls introduced by any of the above) but as I'm no great python expert, and the calls above are documented but not always explained, does the above look reasonable enough or am I laying myself open to a world of pain to come?

Cheers

--
Tim

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/EICnNig9mKkJ.
To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.



--
Nick Johnson, Developer Programs Engineer, App Engine


Tim

unread,
Aug 1, 2011, 5:22:10 AM8/1/11
to google-a...@googlegroups.com


On Monday, 1 August 2011 02:30:11 UTC+1, Nick Johnson (Google) wrote:
In the absence of cookbook examples, I've concluded
  • I can pre-allocate a range of IDs for a given class using db.allocate_ids(), but as individual keys encode the id and the parent instance (if any) then I can't convert these pre-allocated IDs into pre-allocated Keys if I may be using parent objects.
False - see above. 

Between your answer and Stephen's, I think I must have phrased this bit poorly, so let me try again.

I'm allocating IDs anticipating the client later creating instances of a class where different instances will have different parent objects that I can't predict at the time of allocating the IDs. Thus, allocating IDs is the best I can reasonably do in advance of sending them to the client, whereas if I was allocating IDs for objects which I knew would always have no parent, or always the same parent, then I could actually convert all the pre-allocated IDs into pre-allocated Keys (and thus leave the client logic handling Keys as it was before I started looking at pre-allocation).

For example, I've got Vehicles and Garages, and I always create a Vehicle with a parent object of the Garage where it was first kept, but Garages themselves never have a parent. The client code includes logic to create new Vehicles and create new Garages, so I can't create Vehicle Keys in advance, as the user create a new Garage and then create a Vehicle for that new Garage (unless I get very wasteful of keys by generating enough to cover a load of possibilities etc). So instead I pre-allocate Garage IDs and Vehicle IDs(*) and let the client work with IDs only - when the client sends the details to the server of what it has made, I'll create the keys with the appropriate parents. Garages (in this poor example) have no parent objects, so I could actually convert the pre-allocated IDs into parent-less Keys and give this to the client to use as it sees fit, but it's easier to maintain one set of logic alone for both data types.

[(*) - Stephen's making the point that I can simply allocate one range of IDs and use these for new Garages or new Vehicles as appropriate. I appreciate the info, and if I had hundreds of types of objects then I may look at that, but for now I'm happy to allocate a range for each type]

Now when a client wants to update a Vehicle, it's only holding the ID, not the Key, but it knows the ID of the garage too, so it actually sends a request like "change the colour of Vehicle 1234 which happens to belong to Garage 6789", because even though there's only one vehicle with an ID of 1234, the backend needs to actually look up the Garage object to be able to call Vehicle.get_by_id() with the correct parent- it can find the Garage via Garage.get_by_id(6789, parent=None). But calling Vehicle.get_by_id(1234) without specifying the correct parent wouldn't find the object (whereas with Keys you can look up an object by itself as the Key encodes the parent).

Nick, thanks for confirming my other assumptions.

  • I'll implement a new model base class for all db.Model derived classes, that defines a new optional "id" parameter and constructs the correct key (my python knowledge of the constructs is bit green, so I think this does what I expect - kind() is a reserved but undocumented instance method that looks safer than using __name__ by looking at the SDK source)
You can do this if you want, but it's purely convenience. Don't override the constructor, create a class method instead. The constructor is used to reconstitute existing entities from the datastore in addition to creating new ones, so overriding it correctly is difficult.

Ah, that's just the kind of nugget that I was hoping for. OK, I'll ditch my base class rather than risk interfering with bits I don't directly see happening.
 
Cheers guys

--
Tim

Stephen

unread,
Aug 1, 2011, 6:16:40 AM8/1/11
to google-a...@googlegroups.com
On Mon, Aug 1, 2011 at 10:22 AM, Tim <mee...@gmail.com> wrote:
>
> [(*) - Stephen's making the point that I can simply allocate one range of
> IDs and use these for new Garages or new Vehicles as appropriate. I
> appreciate the info, and if I had hundreds of types of objects then I may
> look at that, but for now I'm happy to allocate a range for each type]


Even though you're allocating from two ranges rather than one, that's
still less than the datastore would do things automatically, so you
still have to be careful.

For example, suppose you later add an admin web interface that can
also create new cars and add them to garages. If you let the datastore
allocate IDs, the first car you add to any garage through this new
code path will get ID 1 or thereabouts, because children get an ID
from the range associated with their parent. Eventually the
admin-generated car IDs may clash with those pre-allocated from the
global car sequence given out to remote clients, and old data will be
over written.

So you might want to enforce that a Car model can't be created without
passing in an ID.

Tim

unread,
Aug 1, 2011, 12:14:07 PM8/1/11
to google-a...@googlegroups.com


On Monday, 1 August 2011 11:16:40 UTC+1, Stephen wrote:

Even though you're allocating from two ranges rather than one, that's
still less than the datastore would do things automatically, so you
still have to be careful.

For example, suppose you later add an admin web interface that can
also create new cars and add them to garages. If you let the datastore
allocate IDs, the first car you add to any garage through this new
code path will get ID 1 or thereabouts, because children get an ID
from the range associated with their parent. Eventually the
admin-generated car IDs may clash with those pre-allocated from the
global car sequence given out to remote clients, and old data will be
over written.

So you might want to enforce that a Car model can't be created without
passing in an ID.

I think I'm only now getting the subtlety of my faulty assumption.

db.allocate_ids for a given kind does not give unique IDs for that kind, it gives IDs that are only unique for that kind AND for a specific parent as specified by the key used to make the call to db.allocate_ids().
So I can't pre-allocate a bunch of IDs that can be used to make keys with whatever parent may be required without running the risk of duplicate IDs because the ID allocation mechanism, like the Keys themselves, is actually parent specific.

This pretty much blows pre-allocated IDs out of the water for me, at least for what I had in mind, unless I either drop entity groups altogether, or do something like like put all items into an an entity group per user (so the parent is something that represents the user, meaning that I know the parent to use). Of course I could just manage my own ID allocation, but I'm never really keen on that when a data persistance layer has its own ID/Key mechanism.

Oh well, back to the drawing board, thanks for the patient explanations

--
Tim

Stephen

unread,
Aug 1, 2011, 4:45:07 PM8/1/11
to google-a...@googlegroups.com
On Mon, Aug 1, 2011 at 5:14 PM, Tim <mee...@gmail.com> wrote:
>
> Of course I could just manage my own ID allocation, but I'm never
> really keen on that when a data persistance layer has its own ID/Key
> mechanism.

There's nothing extra to manage here except making sure you're
consistent. Enforce this in Car's make() class method.

It's not much different than using Key.from_path('Car', '123') and
Car.get_by_name('123'), which is a fully supported way of managing IDs
in the datastore.

Nick Johnson (Google)

unread,
Aug 1, 2011, 9:08:07 PM8/1/11
to google-a...@googlegroups.com
On Tue, Aug 2, 2011 at 2:14 AM, Tim <mee...@gmail.com> wrote:


On Monday, 1 August 2011 11:16:40 UTC+1, Stephen wrote:

Even though you're allocating from two ranges rather than one, that's
still less than the datastore would do things automatically, so you
still have to be careful.

For example, suppose you later add an admin web interface that can
also create new cars and add them to garages. If you let the datastore
allocate IDs, the first car you add to any garage through this new
code path will get ID 1 or thereabouts, because children get an ID
from the range associated with their parent. Eventually the
admin-generated car IDs may clash with those pre-allocated from the
global car sequence given out to remote clients, and old data will be
over written.

So you might want to enforce that a Car model can't be created without
passing in an ID.

I think I'm only now getting the subtlety of my faulty assumption.

db.allocate_ids for a given kind does not give unique IDs for that kind, it gives IDs that are only unique for that kind AND for a specific parent as specified by the key used to make the call to db.allocate_ids().

That's correct.
 
So I can't pre-allocate a bunch of IDs that can be used to make keys with whatever parent may be required without running the risk of duplicate IDs because the ID allocation mechanism, like the Keys themselves, is actually parent specific.

You could allocate all your IDs out of a single global pool (by specifying a fixed key to db.allocate_ids). As long as you never create an entity without specifying an ID, that will work fine - but if you do create an entity with no ID specified, there's a high probability the generated one will already exist.


This pretty much blows pre-allocated IDs out of the water for me, at least for what I had in mind, unless I either drop entity groups altogether, or do something like like put all items into an an entity group per user (so the parent is something that represents the user, meaning that I know the parent to use). Of course I could just manage my own ID allocation, but I'm never really keen on that when a data persistance layer has its own ID/Key mechanism.

Entity groups bear close consideration. If you don't need them for transactional integrity, don't use them.

-Nick Johnson
 

Oh well, back to the drawing board, thanks for the patient explanations

--
Tim

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/fZyPanewo-AJ.

To post to this group, send email to google-a...@googlegroups.com.
To unsubscribe from this group, send email to google-appengi...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/google-appengine?hl=en.

Robert Kluin

unread,
Aug 2, 2011, 12:02:24 AM8/2/11
to google-a...@googlegroups.com
I currently pre-allocate ids using a global pool for each kind.
However, I treat them as key_names when using them (ie
Kind(key_name=str(id),...)).

I'm not sure if you're tied to using integer ids, but you could also
generate UUIDs. Pretty low collision risk, especially if you're using
entity groups. If I'm not mistaken with the new pricing you'll be
charged for each id allocated by the datastore, another bonus for
UUIDs. ;)


Robert

> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit

> https://groups.google.com/d/msg/google-appengine/-/fZyPanewo-AJ.

Reply all
Reply to author
Forward
0 new messages