making the case for a shorter uuid

2,885 views
Skip to first unread message

Igal @ getRailo.org

unread,
Dec 24, 2013, 5:41:23 PM12/24/13
to Railo List
we're discussing an option to generate shorter UUIDs.

what do you guys think is better?

a) 23 characters that are CaSe sensitive

b) 25 characters that are not case sensitive

seems to me like it's worth having 2 more characters for the sake of not
needing to worry about CaSe in string comparisons.

any thoughts?

--
Igal Sapir
Railo Core Developer
http://getRailo.org/

Don Quist

unread,
Dec 24, 2013, 5:50:42 PM12/24/13
to ra...@googlegroups.com
25 sounds fine.  No need to add case sensitivity worries when dealing between platforms and APIs

Adam Cameron

unread,
Dec 24, 2013, 6:01:41 PM12/24/13
to ra...@googlegroups.com
Thoughts:
1) isn't a UUID a prescribed thing? (that was a rhetorical question. Yes it is).
2) more importantly... why? What's the use case that makes this a good use of your time?

-- 
Adam

Adam Cameron

unread,
Dec 24, 2013, 6:06:35 PM12/24/13
to ra...@googlegroups.com
If, however, you created a createGuid() function, or put a switch on createUuid() to return a GUID rather than the Allaire-formatted UUID, that would be a reasonable idea. That said, obviously it's just a matter of inserting a hyphen @ position 23 to convert UUID to GUID, but it's perhaps time for CFML to get with the program on this.

-- 
Adam

On Tuesday, 24 December 2013 22:41:23 UTC, Igal wrote:

Igal @ getRailo.org

unread,
Dec 24, 2013, 6:11:54 PM12/24/13
to ra...@googlegroups.com
1) the format of a UUID is prescribed, and that will of course stay as-is with the default implementation for compatibility purposes.  but since we are still talking about a globally- or universally- unique identifier then the format is one thing, and the concept is another.

2) there are reasons to create a shorter universally-unique identifiers though.  especially when you want to put them in URLs or save a file with a unique name.  see the youtube video ids for example -- they are much "friendlier" than the 35/36 character long UUIDs.  implementation is rather simple so it's a good use of time.

3) with respect to your other email -- we already have a function called createGUID().  see http://railodocs.org/index.cfm/function/createguid/version/4.1.0.000
--
Did you find this reply useful? Help the Railo community and add it to the Railo Server wiki at https://github.com/getrailo/railo/wiki
---
You received this message because you are subscribed to the Google Groups "Railo" group.
To view this discussion on the web visit https://groups.google.com/d/msgid/railo/7843c34f-3409-4b61-b98b-b90dc869e11d%40googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.

Adam Cameron

unread,
Dec 24, 2013, 6:27:03 PM12/24/13
to ra...@googlegroups.com


On Tuesday, 24 December 2013 23:11:54 UTC, Igal wrote:
1) the format of a UUID is prescribed, and that will of course stay as-is with the default implementation for compatibility purposes.  but since we are still talking about a globally- or universally- unique identifier then the format is one thing, and the concept is another.

You're getting confused between the "createUuid() function" - which, yeah, you need to keep backwards compat, and what an UUID is. If you describe something as a UUID, it actually has to be a UUID. A UUID is a specific thing. It's not just some conceptually universally unique thingummy.

There's a UUID RFC, and it prescribes a *UUID* as being a 128-bit hexadecimal number. The mechanism to derive the number has recommendations if not prescription, but what it is to be a UUID *is* prescribed.

 

2) there are reasons to create a shorter universally-unique identifiers though.  especially when you want to put them in URLs or save a file with a unique name.  see the youtube video ids for example -- they are much "friendlier" than the 35/36 character long UUIDs.  implementation is rather simple so it's a good use of time.

I question whether they're friendlier. They're just different. We neither type them in nor read them out when they're in a URL (they're clicked on, copy and pasted, or link-shortened then one of the former two), so their length really doesn't matter. One thing that would help though is to present them as a contiguous numeric sequence, rather than having the dashes, because those can cause URLs to "break" due to wrapping.

However, yes, there's a scope for presenting a type of identifier - which happens to be uniqueish - in shorter form. Presenting a UUID in base 36 instead of base 16 should do it (this was probably what you were planning?). But it is no longer a UUID if you present it like that, so you should not describe it thus.

You could call it a "unique ID tag" or something (it's probably already got a name, that said..?)

 
3) with respect to your other email -- we already have a function called createGUID().  see http://railodocs.org/index.cfm/function/createguid/version/4.1.0.000


I thought you did, but nothing came up on Google. I didn't try very hard though, that said. Cheers for the link.

-- 
Adam

Adam Cameron

unread,
Dec 24, 2013, 6:31:39 PM12/24/13
to ra...@googlegroups.com
2) there are reasons to create a shorter universally-unique identifiers though.  especially when you want to put them in URLs or save a file with a unique name.  see the youtube video ids for example 

I suspect those IDs on youtube aren't "unique" in the sense a UUID is. I would expect them to be base 62 sequence. I mean within youtube's auspices, they don't need to be *universally* unique. Just unique. And as they're in charge of them a sequence would be fine.

I would run with base 36 (0-9a-z) not base 62 (0-9a-zA-Z) though, tbh. IE: case-insenstive. Most of CFML is case-insensitive, so that would be more in-keeping with the rest of the language.

-- 
Adam

Igal @ getRailo.org

unread,
Dec 24, 2013, 6:32:12 PM12/24/13
to ra...@googlegroups.com
I'm not getting confused here.  the 25 character option that I mentioned in the first email is exactly a Base36 representation of a UUID.

and since you earlier mentioned a better use of time...

For more options, visit https://groups.google.com/groups/opt_out.

Ken

unread,
Dec 24, 2013, 6:35:11 PM12/24/13
to ra...@googlegroups.com
Are you simply suggesting a higher-base representation of the underlying 16-byte UUID value? For example, we can easily represent a "standard" UUID as a base-36 string:

a = createuuid(); // -> 855CA476-89B1-4BCE-B47CD837EB897156
b
= replace(a, '-', '', 'all'); // ->
855CA47689B14BCEB47CD837EB897156
c
=
createObject("java", "java.math.BigInteger").init(b, 16);
   
// ->
177268350457090975001289573812373385558
d
=
c.toString(36); // -> 7w8da0ok3dy69gjelytcqcrpy

  - Ken

Igal @ getRailo.org

unread,
Dec 24, 2013, 6:36:50 PM12/24/13
to ra...@googlegroups.com
yes, that's exactly what I'm suggesting.
--
Did you find this reply useful? Help the Railo community and add it to the Railo Server wiki at https://github.com/getrailo/railo/wiki
---
You received this message because you are subscribed to the Google Groups "Railo" group.

For more options, visit https://groups.google.com/groups/opt_out.
Message has been deleted
Message has been deleted
Message has been deleted

Ken

unread,
Dec 24, 2013, 6:52:13 PM12/24/13
to ra...@googlegroups.com
[edited]

Obviously it's minor, but I think it would be a convenient shortcut, especially if you provide options for both base-36 and base-32 (optionally avoiding i/1 o/0 ambiguity). If nothing else, shorter representations are marginally friendlier if one occasionally has to provide keys for users to enter manually, and one is lazy.

 - Ken

Kai Koenig

unread,
Dec 24, 2013, 6:52:17 PM12/24/13
to ra...@googlegroups.com
Ignoring the fact that I'm not sure if there's really a need for this, I think if it's only about a base-translation I don't see harm being done if such a function existed.

However - it'd have to be made very clear what it is that one is creating, because it's not what the majority of people would consider to be a UUID (even though it technically matches the RFC) at first glance.

How about if createguid() was changed to accept an argument "base (int)" that if omitted will create a stock-standard UUID and if supplied with for instance "36" would convert it to a base-36 string?

Cheers
Kai


Adam Cameron

unread,
Dec 24, 2013, 7:04:04 PM12/24/13
to ra...@googlegroups.com


On Tuesday, 24 December 2013 23:52:17 UTC, Kai Koenig wrote:
How about if createguid() was changed to accept an argument "base (int)" that if omitted will create a stock-standard UUID and if supplied with for instance "36" would convert it to a base-36 string?


Or decouple the two requirements.

There's a requirement for generating a UUID. Railo already has that.

And there's a requirement for converting a large number between base X and base Y. CFML has formatBaseN, but it only works on integers (so a UUID is too big), and up to base 36. And only from decimal. I presume Railo's similar function has the same limitations.

Having a built-in function to perform the generic base conversions on large numbers, which one could then process a UUID with, would be a much better feature. Because it can also be used on other numbers that aren't necessarily UUIDs.

That would be a better use of your time.

That said, there's a UDF on CFLib which does that already: http://www.cflib.org/udf/baseMToBaseN (NB: I'd probably write it slightly differently if I was writing that today, but hey).

-- 
Adam

Ken

unread,
Dec 24, 2013, 7:06:58 PM12/24/13
to ra...@googlegroups.com


On Tuesday, December 24, 2013 6:52:17 PM UTC-5, Kai Koenig wrote:

How about if createguid() was changed to accept an argument "base (int)" that if omitted will create a stock-standard UUID and if supplied with for instance "36" would convert it to a base-36 string?

Seems like a reasonable approach. Regardless of how it's implemented, though, we'd end up with UUID representations that would no longer pass isvalid('uuid', n). It's easy to see this leading to new validation types being added, then the need for conversion among the types...  and thus the yak is duly shorn. But as you said, there's probably no real harm in adding an optional argument. Although there are multiple types of uuid in the spec, it seems unlikely that Adobe would add new incompatible arguments to createuuid().

 - Ken

Igal @ getRailo.org

unread,
Dec 24, 2013, 8:24:27 PM12/24/13
to ra...@googlegroups.com
what other values do you expect to use other than 36?

For more options, visit https://groups.google.com/groups/opt_out.

Kai Koenig

unread,
Dec 24, 2013, 8:34:37 PM12/24/13
to ra...@googlegroups.com
Potentially anything between 2 and 36. If you want to introduce flexibility in what base people use for the UUID representation, why not expose all the options they can get from the Java layer?

Cheers
Kai


For more options, visit https://groups.google.com/groups/opt_out.


--
Kai Koenig - Ventego Creative Ltd
ph: +64 4 889 3626 - mob: +64 21 928 365 /  +61 435 179 091
web: http://www.ventego-creative.co.nz

Blog in Black: http://www.bloginblack.de
2DDU Podcast: http://www.2ddu.com/
Twitter: @AgentK
--

Igal @ getRailo.org

unread,
Dec 24, 2013, 8:47:47 PM12/24/13
to ra...@googlegroups.com
because the default is base 16 so any lower number means an even longer string, and other numbers below 36 don't make much sense either as they will provide very little gain.

Base36 will give a 25-character long case insensitive string, and Base64 will give a 23-character long CaSe sensitive one. 

the idea is to use url-safe characters so we probably don't want to go over 64 either (and if using Base64 then it should be a url-safe version without the '+' or '/' characters).

also, if you are to store it in a database like MSSQL Server where default string comparisons are not-case-sensitive then Base36 seems like a better choice of the two.

For more options, visit https://groups.google.com/groups/opt_out.

Kai Koenig

unread,
Dec 25, 2013, 10:45:54 PM12/25/13
to ra...@googlegroups.com
Yes, that's fine and I understand why YOU would use Base36 over Base64.

My question still stands though: Why, if you were to introduce a different UUID base representation, would you want to limit that to a new function providing a single "chosen baseN by Railo" instead of just adding an optional argument to the existing function that lets everyone do whatever they want?

Cheers
Kai



For more options, visit https://groups.google.com/groups/opt_out.

Jochem van Dieten

unread,
Dec 26, 2013, 3:48:25 AM12/26/13
to ra...@googlegroups.com

On Dec 25, 2013 2:48 AM, Igal wrote:
> also, if you are to store it in a database like MSSQL Server where default string comparisons are not-case-sensitive then Base36 seems like a better choice of the two.

I think that is an anti-pattern that we should not cater for.

If you are storing UUIDs in a database like MS SQL Server you store them using the uniqueidentifier data type anyway, which is 16-byte binary object. Not only does that save a lot of storage, it does binary comparisons so you don't run the risk of a locale/encoding mismatch turning your index scan into a table scan. Avoiding that worst-case scenario is far more important for a performance minded programmer than anything else: http://jochem.vandieten.net/2008/12/15/querying-ms-sql-server-guuids-from-coldfusion/

Which in turn means that an important question to ask is how different string representations of the UUID are going to work with a database with native UUID support. Can they be thrown at the database directly or do they need to be converted to a standard format first? I know PostgreSQL will accept both the result from createUUID() and createGUID() as well as their base16 representation, but I believe that is already the exception and the norm is that only the strict 36-character string representation ans a32-character string representation are accepted.


So I guess my position comes down to saying that I find it highly unlikely I will be using the feature as proposed, and merely unlikely if it were configurable the way Kai describes.

.Jochem


Adam Cameron

unread,
Dec 27, 2013, 5:26:32 AM12/27/13
to ra...@googlegroups.com
Yes, that's fine and I understand why YOU would use Base36 over Base64.

I think this demonstrates that coupling the two concepts of UUIDs and base conversion into one function is a "sub-optimal" idea.

 
My question still stands though: Why, if you were to introduce a different UUID base representation, would you want to limit that to a new function providing a single "chosen baseN by Railo" instead of just adding an optional argument to the existing function that lets everyone do whatever they want?

 
Or, indeed, a different function that doesn't couple concepts together at all.

Jochem also makes another very good point... DB storage of these things is not WYSIWYG: a UUID is not a 32 (plus padding) char string, it's a 128-bit binary number, conveniently represented as a hexadecimal string in environments where strings are useful. In the DB - where these things will be destined - it's binary. And the DB provides an interface for automatically casting certain accepted formats of string (the GUID style pattern; less often CFML's rendition with the missing dash)... one of which will not be RSUUID (Railo's Special "UUID").

Furthermore (this is not the only discussion about this "feature" doing the rounds), the cited example of Youtube etc using something similar to what's being suggested is not correct. Certainly it's a base-62 string representation of some ID (or, indeed, is actually the ID), but those IDs are not *UU*IDs. YouTube's IDs are only for YouTube. Some other site's similar scheme will be peculiar to that site. They might even use the same-lengthed base-62 string, but that doesn't make that a sort of UUID: there will be no attempt to make those IDs *universally* unique. Just unique within the system. A simple sequence (which is not what YouTube uses, as far as I can tell), is unique within a system. And a way to represent (or obfuscate) a sequence number in fewer characters would be to represent it in a non-decimal base, yes.

So basically - as far as I can tell - this is a poorly devised solution to a misreading of a problem in such a way that the solution doesn't match the problem anyhow.

I sincerely hope this gets a rethink before it goes forward.

-- 
Adam

Bruce Kirkpatrick

unread,
Dec 29, 2013, 9:10:10 PM12/29/13
to ra...@googlegroups.com
Not sure if this is helpful, but mysql has a uuid_short() function which is a special kind of uuid.  If you define a unique server-id for each server in my.cnf, then it will allow you to use a shorter 64-bit number that is guaranteed unique for that server id for some high number of calls per second (see manual).    The number is smallest when server-id=1 - so you actually get a value that is only 18 characters until you have MANY servers.  It may be limited to 256 servers or something.  Railo could have a similar system.

select uuid_short(); returned:
239463695736897536

A bigint is just twice the size of an int field.  That's a lot more efficient then using a 25 char to store an ID and certainly less then 36 char.   I don't know why someone would want to go for a long string instead of a 64-bit number.  It definitely has an impact when you waste that much more memory on the index with thousands or millions of records.

You could further translate an id from the 64-bit value to something much shorter if you know or want to pass the server-id in the url.   You could store the first uuid_short + server_id generated in an "app_uuid_cache" table, and then future records could be an offset from the first id that was cached.   So you would start at id=1 or id=1-1 in the url.  Then you convert this to the actual id before querying.  i.e.: 
database_id = application.uuidCache["myApp"].startValue + url.id;

Then I found it was too tedious to handle biginteger in java/railo because it loses precision unless you manually cast the value to string or biginteger.

I came up with a different approach later that was simpler that relied on a compound key instead.
If you do server_id (int) + table_id (int), you get more efficient storage, and short url that starts at 1-1.   If you can assume the server_id somehow, then just 1.

I also learned, you can also use auto_increment_offset in mysql to skip values on different servers so that none of them conflict further allowing the use of just a simple auto_increment integer for most apps that want to have uniqueness across a server farm.   So your id grows faster if you have a lot of servers, but it will probably stay less then 25 characters.   You can also segment the values by app_id or user_id to further reduce the length and guarantee uniqueness across apps.  Like if youtube had url structure like /#user_id#-#table_id#.html  they wouldn't need some long weird url since no user is going to have millions of videos, they'd likely only ever be up to 5 characters in the url.  Google creates a way more advanced system then the rest of us.  I have no idea why they have crazy urls.

I also use /#keyword#-#app_id#-#table_id#.html for most of my apps to keep the id short, to route the request to the correct app, and to allow automatic 301 redirects for SEO.   If you eliminate the id, or encode it, it creates more work and waste in different ways compared to using compound keys like this.

In my app, each domain is mapped to a site_id and all the tables grow from 1 based on the site_id using an on insert trigger.  So app_id starts at 1, table_id starts at 1 per site_id.   So  getrailo.org/blog-article-1-1.html would be the url for a blog article on my system.

Multiple ways to avoid guid/uuid.

Bruce Kirkpatrick

unread,
Dec 29, 2013, 9:34:03 PM12/29/13
to ra...@googlegroups.com
youtube might have user_id =1000000000   so they might want to further segment user_id into special groups or something else unique, or encode that in base36.  Url could be /GJDGXS-1.html if they really hate long numbers.   It would still only require 2 bytes in the database.  They might not have to store it as a uuid.

If the user chooses a unique friendly username, you could do: youtube.com/u/coolguy/1.html, that would be a reasonable solution too to stick with integer.

It does seem like more then one google app is using base 36.  Maybe they don't bother thinking about integer because they assume 4 billion records isn't good enough for them.   I may never hit 4 billion on anything I do.

On Tuesday, December 24, 2013 5:41:23 PM UTC-5, Igal wrote:
Reply all
Reply to author
Forward
0 new messages