Re: Unique Integer ID for a non primary key field for Entities in Google App Engine

151 views
Skip to first unread message

Neo

unread,
Aug 6, 2012, 9:41:19 AM8/6/12
to google-a...@googlegroups.com
Any one??

Alex Burgel

unread,
Aug 6, 2012, 11:07:34 AM8/6/12
to google-a...@googlegroups.com
You could compress then base64 the url and use that value to share.

Also, questions like this should probably go to stackoverflow. You're more likely to get a useful response there.

On Monday, August 6, 2012 9:41:19 AM UTC-4, Neo wrote:
Any one??

Alex Burgel

unread,
Aug 6, 2012, 11:10:49 AM8/6/12
to google-a...@googlegroups.com
Forget the compress part. Hash instead, SHA1 or something. You would need to store and index the hashed value in that row so you can query for it.

hyperflame

unread,
Aug 6, 2012, 11:23:06 AM8/6/12
to Google App Engine
This sounds a lot like a url sharing service like tinyurl, etc.
Perhaps investigate how they create their shortened urls.

Could you use String.hashcode()? I haven't worked with it before, but
it might generate the ints that you need.

On Aug 1, 1:32 am, Neo wrote:
> I have an Entity type say URLInfo which keeps info about URLs. The primary
> key of this entity is URL itself ( that makes sure that I always have
> unique URLs in the datastore). I also want unique integer id for each url
> so that sharing the id becomes easier. Though, I can use GUIDs, but that is
> not a preferred thing for me. How can I achieve this requirement? Integer
> Ids need not be sequential ( preferred, if they are). Entities can be
> generated at a faster rate (that means I can't keep a common counter to
> update each time I generate a new URL record). This is what I have tried so
> far - In the URLInfo class, I defined a field - Long Id and annotate it
> with @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) in the hope
> that it will get automatically generated with the unique value. But when I
> save the new entity (with id set as null), it saves the entity in the
> datastore but doesn't assign any value to this field.
>
> I am trying all this on a local machine. I am using Java/JDO.
>
> Thanks

Jeff Schnitzer

unread,
Aug 6, 2012, 1:36:38 PM8/6/12
to google-a...@googlegroups.com
Just use the allocator to generate an id. Look at the javadocs for
DatastoreService.allocateIds(). It's the same mechanism that GAE uses
to autogenerate numeric ids when saving entities. It scales.

Make up some kind of fake Kind - or even use the URLInfo kind.
Allocate an id every time you construct a new URLInfo. They will not
be exactly monotonically increasing but they will be guaranteed
unique.

Strategies based on hashes are probably a bad idea. I'm guessing you
want the shortest value possible, but shorter hashes will be more
likely to produce collisions.

Jeff

On Tue, Jul 31, 2012 at 11:32 PM, Neo <rising...@gmail.com> wrote:
> I have an Entity type say URLInfo which keeps info about URLs. The primary
> key of this entity is URL itself ( that makes sure that I always have unique
> URLs in the datastore). I also want unique integer id for each url so that
> sharing the id becomes easier. Though, I can use GUIDs, but that is not a
> preferred thing for me. How can I achieve this requirement? Integer Ids need
> not be sequential ( preferred, if they are). Entities can be generated at a
> faster rate (that means I can't keep a common counter to update each time I
> generate a new URL record). This is what I have tried so far - In the
> URLInfo class, I defined a field - Long Id and annotate it with
> @Persistent(valueStrategy = IdGeneratorStrategy.IDENTITY) in the hope that
> it will get automatically generated with the unique value. But when I save
> the new entity (with id set as null), it saves the entity in the datastore
> but doesn't assign any value to this field.
>
> I am trying all this on a local machine. I am using Java/JDO.
>
> Thanks
>
> --
> You received this message because you are subscribed to the Google Groups
> "Google App Engine" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/google-appengine/-/DQYcuXI9AJYJ.
> To post to this group, send email to google-a...@googlegroups.com.
> To unsubscribe from this group, send email to
> google-appengi...@googlegroups.com.
> For more options, visit this group at
> http://groups.google.com/group/google-appengine?hl=en.

Neo

unread,
Aug 7, 2012, 8:08:42 AM8/7/12
to google-a...@googlegroups.com, je...@infohazard.org
Hi Jeff, 

Actually my requirement is that , Whenever I save a new entity, the id field should get autogenerated. If I have to do it on my own (as  you suggested DatastoreService.allocateIds()), how do we handle scenarios like two threads on different machines trying to store the newly created entites in the datastore (thereby trying to assign different ids). I am not well versed with all the concepts. If you can please elaborate, that would be really useful. And you are right in guessing that I need the shortest possible value. Hashes, I suppose are not guaranteed to be unique for two different arbitrary strings.

Thanks,

Ego008

unread,
Aug 7, 2012, 10:25:42 AM8/7/12
to google-a...@googlegroups.com
You can define the Id and use a counter to increase.

在 2012-8-7,20:08,Neo <rising...@gmail.com> 写到:

Hi Jeff, 

Actually my requirement is that , Whenever I save a new entity, the id field should get autogenerated. If I have to do it on my own (as  you suggested DatastoreService.allocateIds()), how do we handle scenarios like two threads on different machines trying to store the newly created entites in the datastore (thereby trying to assign different ids). I am not well versed with all the concepts. If you can please elaborate, that would be really useful. And you are right in guessing that I need the shortest possible value. Hashes, I suppose are not guaranteed to be unique for two different arbitrary strings.

Thanks,

--
You received this message because you are subscribed to the Google Groups "Google App Engine" group.
To view this discussion on the web visit https://groups.google.com/d/msg/google-appengine/-/sO0eNtmB784J.

Jeff Schnitzer

unread,
Aug 7, 2012, 12:27:29 PM8/7/12
to Neo, google-a...@googlegroups.com
Now you have me confused. I thought you want every URLInfo to have a
unique id? This is what the allocator will give you. It generates
unique ids at ferocious rates across an entire cluster of active
machines.

If you somehow want URLInfo objects to share ids, I don't really
understand your business problem.

If you're just looking for a way to ensure that multiple servers don't
create the same URLInfo object at the same time, you need to create
the URLInfo in a transaction that checks for existence of the URLInfo
before writing.

Jeff

Michael Hermus

unread,
Aug 7, 2012, 2:23:55 PM8/7/12
to google-a...@googlegroups.com, je...@infohazard.org
If you use a transaction to retrieve the entity by key (the URL) and then create one if it doesn't exist, only one entity should be successfully created in the event of a collision. Therefore, even if you create two different versions with different unique ids, only one should survive.

Neo

unread,
Aug 8, 2012, 1:29:34 AM8/8/12
to google-a...@googlegroups.com, je...@infohazard.org
Hi Jeff and Michael,

Yes, I am looking for a way to ensure that multiple servers don't create the same URLInfo object at the same time.

As Michael said: If you use a transaction to retrieve the entity by key (the URL) and then create one if it doesn't exist, only one entity should be successfully created in the event of a collision. Therefore, even if you create two different versions with different unique ids, only one should survive.

My question is do we really need to care about the lost Id (as only one entity is going to survive) in the case of collision? And can we have code snippet for allocating and assigning id to the URLInfo in the transaction that guarantees that there shall be only one instance of a URL in the datastore and the Ids of all the URLs are unique and non empty?

Thanks for your patience.

Michael Hermus

unread,
Aug 8, 2012, 7:59:35 AM8/8/12
to google-a...@googlegroups.com, je...@infohazard.org
There is no reason to care about the lost Id, IMO. Presumably collisions are relatively infrequent in any case.

If you use the canonical URL as the key for the entity, it guarantees only one instance of the URL in the datastore. If you use the allocator to generate Ids (as Jeff suggested), they are guaranteed to be unique for the specified entity. The allocation itself doesn't need to be inside the transaction, though; you could grab them in batch and create an instance-local cache to retrieve them. Of course, pay due care to ensure that a given Id is used only once (the simplest way would be to synchronize the retrieval method and make it single threaded).

Neo

unread,
Aug 9, 2012, 11:46:08 PM8/9/12
to google-a...@googlegroups.com, je...@infohazard.org
Hi Michael,
Thanks. That was really useful.
Reply all
Reply to author
Forward
0 new messages