Guid conventions

599 views
Skip to first unread message

Marc Gravell

unread,
Dec 6, 2017, 2:46:34 PM12/6/17
to Protocol Buffers
A question on Stack Overflow earlier (https://stackoverflow.com/questions/47674930/google-protobuf-proto-file-query/4767629) reminded me that I'm not fully "up" on the conventions for using guids in protobuf.

There's no primitive / keyword for them, and AFAIK no "well known type". So : how do folks tend to handle guids? Strings? Bytes? *Should* there be a stronger guid story? Or is this just a non-issue?

Thoughts?

Marc

Adam Cozzette

unread,
Dec 7, 2017, 7:39:39 PM12/7/17
to Marc Gravell, Protocol Buffers
I haven't had to store a GUID/UUID in a proto before but it seems like string or bytes would be the best choice. You would definitely want to use bytes (not string) if you're using the binary representation, since string fields are for UTF-8 only. We could consider eventually creating a well-known type but I'm not sure how much demand there is for one.

--
You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to protobuf+unsubscribe@googlegroups.com.
To post to this group, send email to prot...@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Marc Gravell

unread,
Dec 8, 2017, 3:17:45 AM12/8/17
to Adam Cozzette, Protocol Buffers
One thought on the bytes vs string aspect: as shown in the SO question, it isn't beyond imagining to want to use a guid/uuid with "map" - which wouldn't be possible with (raw) bytes.

As an additional consideration for bytes : there are at least two binary representations of guid/uuid - i.e. which order the bytes come vs the visual string representation (either left to right byte equivalent, or a crazy-endian version where individual portions at different 1/2/4-byte lengths are individually reversed).

It is these ambiguities that makes me think - after hacking around it for a long time - that a specific type might be useful.

Proposal

For efficiency, I would propose a new primitive keyword in .proto terms, that is essentially the same as "bytes", but is mapped (when possible) to a target platforms natural unique identifier primitive. If no such primitive is available (JavaScript, for example), it would be decoded *creating a string*, but expanding to the hex representations - so the byte 0xB7 would contribute the two characters "B7" to the string. To be clarified: with of without hyphen group separators. The reason for this string is to facilitate use in maps, and the correct unit semantics, even on platforms without a native uuid type. In terms of the binary payload: it would be left-to-right byte-for-byte with the visual representation, so the uuid starting "010203040506..." would contribute bytes 0x010203040506...

The size would always be 16 bytes but would be written with the usual length prefix, so: 17 bytes overall. Any unexpected length prefix encountered would result in a deserialization failure. Where a platform doesn't have a native uuid type this 16 byte restriction would be enforced at the point of assignment/addition.

Additional thought: "repeated uuid" (or whatever the keyword) *would be allowed to use "packed" encoding. A receiver would divide the packed length prefix by 16 to determine the number of elements.

In JSON it would be formatted in the text version described above for platforms without a native uuid type.

---end

So: that's my idea of how uuids *should* be handled. But there's quite a bit of work, and it sounds like not much dramatic call for them to be added. So it probably won't happen, being realistic. But I just wanted to brain-dump that. 

Marc


Reply all
Reply to author
Forward
0 new messages