Generating an int64 datastore key from an email address

51 views
Skip to first unread message

John Beckett

unread,
Sep 22, 2016, 6:05:19 PM9/22/16
to google-appengine-go
I have an application where I need to generate an IntID for a datastore entity based on a unique email address.  For privacy and other reasons I can't directly use the email address as a StringID.

I need the generated IntIDs to have low collision rates, so my best guess is to use a cryptographically secure hash function (like SHA3) and export the first 64 bits of the hash to generate the IntID.  However, it's not clear from the documents whether the IntID (which is an int64) always has to be positive or not.  I've never seen a negative ID in the datastore, but I'd like confirmation of this so that I know whether to make it 63 or 64 bytes long.

I'd also like feedback on my suggested method of generating the IntID from an email.

Dave Day

unread,
Sep 22, 2016, 8:35:12 PM9/22/16
to John Beckett, google-appengine-go
The reference for the cloud datastore discourages the use of negative numbers, so I'd stick to 63 bits (or, stringify them and use a name instead):

"The auto-allocated ID of the entity. Never equal to zero. Values less than zero are discouraged and may not be supported in the future."

--
You received this message because you are subscribed to the Google Groups "google-appengine-go" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengine-go+unsub...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Aaron Cannon

unread,
Sep 22, 2016, 9:30:37 PM9/22/16
to John Beckett, google-appengine-go
It would not be computationally hard to generate hashes for a segnificant portion of all extant email addresses. I would suggest using 63 or 64 bits from an HMAC. As long as the secret key stays secret, you should be good.

Aaron

--
This message was sent from a mobile device

--
You received this message because you are subscribed to the Google Groups "google-appengine-go" group.
To unsubscribe from this group and stop receiving emails from it, send an email to google-appengin...@googlegroups.com.

simon...@gmail.com

unread,
Sep 22, 2016, 9:30:46 PM9/22/16
to google-appengine-go
I don't think it's safe to assume that part of a hash will have the same collision rate as the full hash - that could cause problems.

Why not just use the string hash as the key instead? Or let the IntID be assigned and have the string hash on an indexed property (you can use another table to ensure uniqueness).

I think the IntID does have to be positive.

Aaron Cannon

unread,
Sep 22, 2016, 9:50:07 PM9/22/16
to simon...@gmail.com, google-appengine-go
I believe that the TOTP algorithm depends on this property being true.

I would be more worried about an attacker being able to figure out the email address behind an ID, if a raw hash function were used. I think you should instead use an HMAC, or another secure keyed hash.

--
This message was sent from a mobile device

--

John Beckett

unread,
Sep 23, 2016, 5:45:37 AM9/23/16
to google-appengine-go, jgbe...@gmail.com
Thanks for the info Dave.  I had my suspicions, but I couldn't find anything in the docs, so your link helps.

John Beckett

unread,
Sep 23, 2016, 6:02:27 AM9/23/16
to google-appengine-go, simon...@gmail.com
Why not just use the string hash as the key instead? Or let the IntID be assigned and have the string hash on an indexed property (you can use another table to ensure uniqueness).

That would be a solution for many applications, but not for this.  One of the primary goals is to allow an account to be deleted fully (privacy reasons), and then if the account is re-created with the same email, that the same ID is used as before.  For other practical reasons we can't use an email hash as a stringID - primarily as other services rely on using the intID as the identifier.

simon...@gmail.com

unread,
Sep 23, 2016, 9:34:22 AM9/23/16
to google-appengine-go, simon...@gmail.com
Hence the use of another table as the StringID hash -> assigned IntID pointer. You can delete the table containing the data and retain that index table so that if an account is re-created they get the same ID (which suggests things are not really completely deleted)

John Beckett

unread,
Sep 24, 2016, 4:43:09 PM9/24/16
to simon...@gmail.com, google-appengine-go
We really need to be sure that if a user deletes their account, that all their data is really deleted, which is why we want to go the route of a deterministic generator.

--
You received this message because you are subscribed to a topic in the Google Groups "google-appengine-go" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/google-appengine-go/tFja6MSq7lk/unsubscribe.
To unsubscribe from this group and all its topics, send an email to google-appengine-go+unsub...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages