Fwd: Proposal - UUID data type

Kless

unread,

Jul 14, 2008, 4:04:33 AM7/14/08

to

---------- Forwarded message ----------
From: Jerry Stuckle <jstuck...@attglobal.net>
Date: Jul 13, 10:29 pm
Subject: Proposal - UUID data type
To: comp.databases.mysql

Kless wrote:
> On Jul 13, 9:37 pm, Jerry Stuckle <jstuck...@attglobal.net> wrote:
>> Rather, I would think you should propose the UUID data type be added to
>> the SQL standard. There are enough variations from the standards now.
> Those variations must be managed by the language and not by the RDBMS,
> where its *main work* is storing data using the specific data types.
> A lot of languages already have modules to working with UUID [1].

Yes, they must be managed by the language. Which is why it should be
part of the standard. That way, changing databases does not require
changing code.

>> And as it is, UUID's can be stored as a 32 byte string. So it isn't as
>> if there is not an alternative.
> If they're stored in ASCII form (32 hex digits), would indeed be very
> inefficient. So it's necessary that the RDBMS have a specific data
> type to handle the UUIDs.

> In PostgreSQL they're stored as 16 binary bytes [2], and the core
> database does not include any function for generating UUIDs

> [1]http://en.wikipedia.org/wiki/Uuid#Implementations
> [2]http://anoncvs.postgresql.org/cvsweb.cgi/pgsql/src/backend/utils/adt/...

Yep, which in the grand scheme of things, probably makes zero
difference. The difference between 16 and 32 bytes in any single row
is
minuscule.

I am not in favor of adding more database-specific types to ANY
database
- and I think PostGres doing it was a mistake.

If there is a demand for it, then it should be added to the SQL
standard. That is the correct way to propose a change. That's why
there are standards.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstuck...@attglobal.net
==================

Martijn van Oosterhout

unread,

Jul 14, 2008, 4:34:28 AM7/14/08

to

On Mon, Jul 14, 2008 at 01:04:33AM -0700, Kless wrote:
> I am not in favor of adding more database-specific types to ANY
> database
> - and I think PostGres doing it was a mistake.

So you think that adding full text indexing, gist/gin indexes, text,
geometric types should have waited until the SQL standard specified
them? With that kind of thinking we'd still be in the database stone
age.

One of postgresql's greatest strengths is user-defined types, lets use
it.

> If there is a demand for it, then it should be added to the SQL
> standard. That is the correct way to propose a change. That's why
> there are standards.

You are ofcourse free to propose it to them, but the question is if
they'd listen...

Have a nice day,
--
Martijn van Oosterhout <kle...@svana.org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

signature.asc

David E. Wheeler

unread,

Jul 14, 2008, 1:33:21 PM7/14/08

to

On Jul 14, 2008, at 01:34, Martijn van Oosterhout wrote:

>> I am not in favor of adding more database-specific types to ANY
>> database
>> - and I think PostGres doing it was a mistake.
>
> So you think that adding full text indexing, gist/gin indexes, text,
> geometric types should have waited until the SQL standard specified
> them? With that kind of thinking we'd still be in the database stone
> age.

Besides which, I seriously doubt that the SQL standard limits data
types.

Best,

David

--
Sent via pgsql-hackers mailing list (pgsql-...@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Kless

unread,

Jul 14, 2008, 1:49:15 PM7/14/08

to

I write here the answer of Jerry Stuckle [1] because it looks me
interesting and enough logical.

[1] http://groups.google.com/group/comp.databases.mysql/browse_thread/thread/89557609239a995e
-----------------------
Quite frankly, I don't care that PostGres has user-defined types.
They
restrict you to a single database, when others might be better for
other reasons.

And yes, I think other things should have been proposed to the SQL
standards committee. It doesn't take that long to get a good proposal
into the standards. No, it isn't immediate. But if there is a case
to
be made for it, then the committee will act.

Then all databases get the feature, eventually.

As I said. Do it the right way. Submit your proposal. If you have a
case, it will be added to the SQL standard. If not, then it's not
that
important.
-----------------------

Mark Mielke

unread,

Jul 14, 2008, 2:07:49 PM7/14/08

to

Kless wrote:

Yes, they must be managed by the language.  Which is why it should be
part of the standard.  That way, changing databases does not require
changing code.

You are correct that putting widely used features into a standard that is implemented by everyone is good.

This does not extend to the conclusion that one should never put in a feature until it is standard. Look at any successful software product and see how it usually leads the standard rather than follows it. People only tend to make standards once they realize things are getting out of control, which is long after the products are in use.

In PostgreSQL they're stored as 16 binary bytes [2], and the core
database does not include any function for generating UUIDs

Yep, which in the grand scheme of things, probably makes zero
difference.  The difference between 16 and 32 bytes in any single row
is minuscule.

This is incorrect. UUID at 16 bytes is already "long" in terms of being used as a primary index. In an 8K page, one can only fit 512 UUIDs (forgetting the requirement for headers) - if it was stored as 32 bytes - or 36 bytes, or 40 bytes (with punctuation), it would be at less than 256 UUIDs per page. For a join table joining one set of UUID to another set, that's < 256 vs < 128. Doubling the size of an index row roughly doubles the time to look up the value.

I am not in favor of adding more database-specific types to ANY
database
- and I think PostGres doing it was a mistake.

As somebody who wrote his own module to do UUID for PostgreSQL when I needed it in PostgreSQL 8.0, I don't agree. Just as you think defining it in a standard is better than each vendor doing it their own way, I think doing it in one product is better than each user of the product doing it their own way.

If there is a demand for it, then it should be added to the SQL
standard.  That is the correct way to propose a change.  That's why
there are standards.

Provide a real example of any similar product doing this. Exactly which enhancement to a standard was defined without even a prototype existing used in an existing product that purports to implement the standard?

I'm sure one or two examples must exist, but I cannot think of any. Every enhancement I can think of that eventually made it into a standard, was first implemented within a popular product, and then demanded as a standard to be applied to all other products.

Cheers,
mark

-- 
Mark Mielke <ma...@mielke.cc>

David E. Wheeler

unread,

Jul 14, 2008, 2:34:45 PM7/14/08

to

On Jul 14, 2008, at 10:49, Kless wrote:

> I write here the answer of Jerry Stuckle [1] because it looks me
> interesting and enough logical.

It just sounds narrow-minded to me. See:

http://www.oreillynet.com/pub/a/network/2005/07/29/cjdate.html

Andrew Dunstan

unread,

Jul 14, 2008, 2:46:59 PM7/14/08

to

Kless wrote:
> I write here the answer of Jerry Stuckle [1] because it looks me
> interesting and enough logical.
>
>
> [1] http://groups.google.com/group/comp.databases.mysql/browse_thread/thread/89557609239a995e
> -----------------------
> Quite frankly, I don't care that PostGres has user-defined types.
> They
> restrict you to a single database, when others might be better for
> other reasons.
>
> And yes, I think other things should have been proposed to the SQL
> standards committee. It doesn't take that long to get a good proposal
> into the standards. No, it isn't immediate. But if there is a case
> to
> be made for it, then the committee will act.
>
> Then all databases get the feature, eventually.
>
> As I said. Do it the right way. Submit your proposal. If you have a
> case, it will be added to the SQL standard. If not, then it's not
> that
> important.
>
>

The time taken to get something into the standard is a lifetime in
computing terms. If my client has a need for a UDF they need it now, not
when the standards committee gets around to thinking about it.

Many UDTs will be specialised to a single user, and never be candidates
for inclusion in the standards. Excluding any type that isn't in the
standard would be to throw away one of Postgres' greatest strengths, one
we are justly proud of. Maybe you don't care about that, but we do, and
our clients do.

In any case, a standard for UUIDs would almost certainly not specify how
it is to be stored, which is where we got into this discussion.

This debate started with a misconception about how Postgres actually
stores UUIDs, and doesn't seem to have gained much point since then.

cheers

andrew

Mark Mielke

unread,

Jul 14, 2008, 2:47:02 PM7/14/08

to

Kless wrote:
> I write here the answer of Jerry Stuckle [1] because it looks me
> interesting and enough logical.
>

Jerry's answer isn't a real answer - and we don't care what MySQL does
or does not do. PostgreSQL developers are not going to invest time into
helping you get a feature into MySQL - if this is what you are trying to
do, please stop.

MySQL didn't implement SQL-standards views until what - MySQL 4 or 5?
Obviously "standards" is not their goal either. In Open Source / Free
Software, the free contributions are from people with itches that they
scratched. In a company like MySQL, it is more about business value or
somewhere in between. I was a MySQL 3.x/4.x user until I learned
PostgreSQL, and I have no intention of going back. They have so many
incorrect assumptions built into their system, that I chose to switch
databases instead of arguing with them. It's not worth my time, and I
don't intend to go back. So, I will not be helping you get UUID into
MySQL because I just don't care about MySQL...

Cheers,
mark

>
> [1] http://groups.google.com/group/comp.databases.mysql/browse_thread/thread/89557609239a995e
> -----------------------
> Quite frankly, I don't care that PostGres has user-defined types.
> They
> restrict you to a single database, when others might be better for
> other reasons.
>
> And yes, I think other things should have been proposed to the SQL
> standards committee. It doesn't take that long to get a good proposal
> into the standards. No, it isn't immediate. But if there is a case
> to
> be made for it, then the committee will act.
>
> Then all databases get the feature, eventually.
>
> As I said. Do it the right way. Submit your proposal. If you have a
> case, it will be added to the SQL standard. If not, then it's not
> that
> important.
> -----------------------
>

--
Mark Mielke <ma...@mielke.cc>

Gregory Stark

unread,

Jul 14, 2008, 3:09:55 PM7/14/08

to

"Mark Mielke" <ma...@mark.mielke.cc> writes:

> Kless wrote:
>> Yes, they must be managed by the language. Which is why it should be
>> part of the standard. That way, changing databases does not require
>> changing code.
>>
>
> You are correct that putting widely used features into a standard that is
> implemented by everyone is good.
>
> This does not extend to the conclusion that one should never put in a feature
> until it is standard. Look at any successful software product and see how it
> usually leads the standard rather than follows it. People only tend to make
> standards once they realize things are getting out of control, which is long
> after the products are in use.
>

To be fair there are two types of standards. Some standards follow
implementations, others lead and prevent the babel of incompatible
implementations from ever developing. Both ways work in the right context.

But (perhaps unfortunately) the SQL spec is very definitely of the former
type. Nothing goes into the SQL spec that one of the implementors hasn't
already done.

But that said Postgres isn't one of the main participants in the spec
committee. We don't really want to add features that later get added to the
spec in an incompatible way and we have no say in the committee to avoid that
situation.

>>> In PostgreSQL they're stored as 16 binary bytes [2], and the core
>>> database does not include any function for generating UUIDs
>>>
>>
>> Yep, which in the grand scheme of things, probably makes zero
>> difference. The difference between 16 and 32 bytes in any single row
>> is minuscule.

Really? It sounds like 100% difference to me. If you applied that logic to
everything you would have a database which runs at half the speed it would
otherwise.

Keep in mind also that your primary keys have to be stored in every other
table as foreign keys too...

> I'm sure one or two examples must exist, but I cannot think of any. Every
> enhancement I can think of that eventually made it into a standard, was first
> implemented within a popular product, and then demanded as a standard to be
> applied to all other products.

C99? SMTP? NTP?

It tends to be important for network protocols since there's no gain in having
non-interoperable protocols.

--
Gregory Stark
EnterpriseDB http://www.enterprisedb.com
Ask me about EnterpriseDB's PostGIS support!

Tom Lane

unread,

Jul 14, 2008, 3:34:15 PM7/14/08

to

Gregory Stark <st...@enterprisedb.com> writes:

> "Mark Mielke" <ma...@mark.mielke.cc> writes:
>> I'm sure one or two examples must exist, but I cannot think of any. Every
>> enhancement I can think of that eventually made it into a standard, was first
>> implemented within a popular product, and then demanded as a standard to be
>> applied to all other products.

> C99? SMTP? NTP?

> It tends to be important for network protocols since there's no gain in having
> non-interoperable protocols.

Actually, the IETF's mantra has always been "rough consensus and running
code" (cf http://www.faqs.org/rfcs/rfc2031.html). Network protocols
don't get standardized in advance of a working prototype, either.

(No, I take that back: there were some that did. Ever heard of OSI?)

regards, tom lane

Mark Mielke

unread,

Jul 14, 2008, 3:39:05 PM7/14/08

to

Gregory Stark wrote:

"Mark Mielke" <ma...@mark.mielke.cc> writes:

I'm sure one or two examples must exist, but I cannot think of any. Every
enhancement I can think of that eventually made it into a standard, was first
implemented within a popular product, and then demanded as a standard to be
applied to all other products.

C99? SMTP? NTP?

It tends to be important for network protocols since there's no gain in having
non-interoperable protocols.

For C99 - GCC had most of the C99 features years before C99 started. There are now some incompatibles that need to be dealt with.

For SMTP and NTP I think these protocols are just so old that people don't realize how much they have evolved, and how many products existed. I wasn't in the know at the time they were written (I was either a baby or in grade school), but I bet either: 1) they were written before it existed at all (not really an enhancment), or 2) they followed the prototype as it was implemented. There have been many extensions to SMTP that I have been aware of included support for SSL, that I doubt were in the standard first. The "RFC" is a "request for comment". The "STD" process came a lot later.

If we grab a phrase from RFC 1305 for NTP - "In Version 3 a new algorithm to combine the offsets of a number of peer time servers is presented in Appendix F. This algorithm is modelled on those used by national standards laboratories to combine the weighted offsets from a number of standard clocks to construct a synthetic laboratory timescale more accurate than that of any clock separately." This seems pretty clear that the "standard" was updated based upon existing implementation.

To some degree, except for the simplest of designs, it is almost bad to write down what WILL be done, without having experience, or a prototype to based ones conclusions from. Ivory tower stuff. The purpose of a standard is to have one common way that things are done - hopefully the best way - not just the only way that was considered. :-)

Kless

unread,

Jul 15, 2008, 11:34:01 AM7/15/08

to

An answer of Jerry Stuckle:

-----------------------

>> Yes, they must be managed by the language. Which is why it should be
>> part of the standard. That way, changing databases does not require
>> changing code.

> You are correct that putting widely used features into a standard that
> is implemented by everyone is good.

> This does not extend to the conclusion that one should never put in a
> feature until it is standard. Look at any successful software product
> and see how it usually leads the standard rather than follows it.
> People
> only tend to make standards once they realize things are getting out
> of
> control, which is long after the products are in use.

Non-standard features just force people to stick with that one
product.
In the long run, the only people who benefit are the product
developers.

>>> In PostgreSQL they're stored as 16 binary bytes [2], and the core
>>> database does not include any function for generating UUIDs

>> Yep, which in the grand scheme of things, probably makes zero
>> difference. The difference between 16 and 32 bytes in any single row
>> is minuscule.

> This is incorrect. UUID at 16 bytes is already "long" in terms of
> being
> used as a primary index. In an 8K page, one can only fit 512 UUIDs
> (forgetting the requirement for headers) - if it was stored as 32
> bytes
> - or 36 bytes, or 40 bytes (with punctuation), it would be at less
> than
> 256 UUIDs per page. For a join table joining one set of UUID to
> another
> set, that's < 256 vs < 128. Doubling the size of an index row roughly
> doubles the time to look up the value.

Incorrect. Doubling the size of the index has very little effect on
how
long it takes to look up a value. Intelligent databases use a binary
search so doubling the size only means one additional comparison need
be
done. And heavily used indexes are generally cached in memory anyway.

>> I am not in favor of adding more database-specific types to ANY
>> database
>> - and I think PostGres doing it was a mistake.

> As somebody who wrote his own module to do UUID for PostgreSQL when I
> needed it in PostgreSQL 8.0, I don't agree. Just as you think defining
> it in a standard is better than each vendor doing it their own way, I
> think doing it in one product is better than each user of the product
> doing it their own way.

Fine. Whatever you want for your code. But don't expect the rest of
the world to jump because you want it.

>> If there is a demand for it, then it should be added to the SQL
>> standard. That is the correct way to propose a change. That's why
>> there are standards.

> Provide a real example of any similar product doing this. Exactly
> which
> enhancement to a standard was defined without even a prototype
> existing
> used in an existing product that purports to implement the standard?

> I'm sure one or two examples must exist, but I cannot think of any.
> Every enhancement I can think of that eventually made it into a
> standard, was first implemented within a popular product, and then
> demanded as a standard to be applied to all other products.

Most features added to the SQL standard, for instance. Like explicit
JOINs, recursive SQL and a bunch more. Also changes to the C++
standard
such as exceptions were at least in the process of being evaluated and
approved before they were in any product.

There's a reason for having a process to propose features to a
product.
And it does not require the proposed change to be in any product.

--
==================
Remove the "x" from my email address
Jerry Stuckle
JDS Computer Training Corp.
jstuck...@attglobal.net
==================

-----------------------

Abhijit Menon-Sen

unread,

Jul 15, 2008, 1:45:00 PM7/15/08

to

At 2008-07-15 08:34:01 -0700, jona...@googlemail.com wrote:
>
> An answer of Jerry Stuckle:

Please stop cross-posting messages from this list to whatever MySQL list
you're on. It's a boring, pointless waste of time at best, and at worst
will get you written off as a troll in both places pretty soon.

-- ams

Kless

unread,

Jul 15, 2008, 4:04:16 PM7/15/08

to

I'm sorry, but it was necessary that certain answers were answered by
someone with wide knowledge on databases and overall about its own
database. This one was the only way, and I believe that it has been
enough positive, at least for the end users -every one that choose its
data base-. At least this clarifies how is working each community, and
what is to be true or not.

On Jul 15, 6:45 pm, a...@oryx.com (Abhijit Menon-Sen) wrote:

Mark Mielke

unread,

Jul 15, 2008, 3:38:35 PM7/15/08

to

First - please stop copying this list - this is not the "convince Jerry to include UUID in MySQL" mailing list.

Really - I don't care what he thinks. But, on the subjects themselves and how they apply to *PostgreSQL*:

Non-standard features just force people to stick with that one
product.
  In the long run, the only people who benefit are the product
developers.

I chose PostgreSQL over MySQL because it provided numerous features - both standard and non - that I needed on the day I made my decision. I don't care about the long run as a user. One might as well say 90% of the world is wrong for using Microsoft products, because it locks one into Microsoft. One can say this - and people do say this - but none of this changes the fact that 90% of the world is relatively happy with their choice. They voted with their dollars. All decisions should be made on a cost-benefit analysis - they should not be based on some arbitrary code like "I will not choose a solution that locks me in".

Additionally - in the context of MySQL - the main reason I chose PostgreSQL over MySQL is because it provided things like CREATE VIEW, which MySQL did not at the time. People such as Jerry can pretend that standards guarantee that a feature is in all products, but it seems quite clear that just because something is a standard does NOT mean it is implemented the same everywhere, or even at all. At the time I chose PostgreSQL it was my opinion that PostgreSQL was far more standards-compliant than MySQL was going to be for at least a few years. I am glad I came to the correct conclusion. MySQL implemented ACID as an after-thought. I mean - comone.

This is incorrect. UUID at 16 bytes is already "long" in terms of
being
used as a primary index. In an 8K page, one can only fit 512 UUIDs
(forgetting the requirement for headers) - if it was stored as 32
bytes
- or 36 bytes, or 40 bytes (with punctuation), it would be at less
than
256 UUIDs per page. For a join table joining one set of UUID to
another
set, that's < 256 vs < 128. Doubling the size of an index row roughly
doubles the time to look up the value.

Incorrect.  Doubling the size of the index has very little effect on
how
long it takes to look up a value.  Intelligent databases use a binary
search so doubling the size only means one additional comparison need
be
done.  And heavily used indexes are generally cached in memory anyway.

Wrong. A binary search that must read double the number of pages, and compare double the number of bytes, will take double the amount of time. There are factors that will reduce this, such as if you assume that most of the pages are in memory or cache memory, therefore the time to read the page is zero, therefore it's only the time to compare bytes - but at this point, the majority of the time is spent comparing bytes, and it's still wrong. If we add in accounting for the fact that UUID is compared using a possibly inlined memcpy() compared to treating it as a string where it is variable sized, and much harder to inline (double the number of oeprations), and it's pretty clear that the person who would make such a statement as above is wrong.

As another poster wrote - why not double the size of all other data structures too. It costs nothing, right?

Why does MySQL have a 3-byte integer support if they truly believe that saving 1 byte in 4 doesn't result in a savings for keys?

Andrew Dunstan

unread,

Jul 15, 2008, 5:27:15 PM7/15/08

to

Kless wrote:
> I'm sorry, but it was necessary that certain answers were answered by
> someone with wide knowledge on databases and overall about its own
> database. This one was the only way, and I believe that it has been
> enough positive, at least for the end users -every one that choose its
> data base-. At least this clarifies how is working each community, and
> what is to be true or not.

Nonsense. It was not at all necessary.

If someone wants to post on this mailing list they should do it
themselves. If not, you shouldn't cross-post for them.

cheers

andrew