Best practices for primary key (GUID VS Identity)

Sam I Am

unread,

May 29, 2003, 6:23:45 PM5/29/03

to

Hi All

I am about to start designing schema for a large database. I am trying to
weigh up the pros and cons for using either a GUID or auto incrementing
Identity column for the primary keys.

Any ideas or thoughts?

S

Vishal Parkar

unread,

May 29, 2003, 6:39:01 PM5/29/03

to

Refer to these url
http://www.sqlteam.com/item.asp?ItemID=2599

http://www.sqlteam.com/item.asp?ItemID=283

--
--Vishal
"Sam I Am" <sam...@rubbachicken.com> wrote in message
news:O4pw7DjJ...@TK2MSFTNGP10.phx.gbl...

John E. Huschka [MSFT]

unread,

May 29, 2003, 8:18:04 PM5/29/03

to

Personally, I like identities/GUIDs. I disagree with the article. I don't
have time to write out all the reasons (working production problem), but
I've been burned too many times in situations where having one would have
saved me.

I haven't decided on identities vs. GUIDs. The GUIDs seem to take up a lot
of space for just defining a unique key, but they do eliminate some of the
pains of identities.
--
John E. Huschka
Microsoft Business Solutions/FRx

This posting is provided "AS IS" with no warranties, and confers no rights.
**********************************************
"Vishal Parkar" <vgpa...@hotmail.com> wrote in message
news:uQPDbMjJ...@TK2MSFTNGP11.phx.gbl...

Michael Culley

unread,

May 29, 2003, 8:43:21 PM5/29/03

to

> http://www.sqlteam.com/item.asp?ItemID=2599

These sort of articles always brush over (or in this case completely miss)
the problem of changing values in non-identity primary keys. They also
conveniently miss all the other disadvantages of 'natural' primary keys and
assume that the developer is a complete fool and doesn't put a unique
index/constraint on what would have been the natural primary key.

--
Michael Culley

"Vishal Parkar" <vgpa...@hotmail.com> wrote in message
news:uQPDbMjJ...@TK2MSFTNGP11.phx.gbl...

> Refer to these url

>

Largo SQL Tools

unread,

May 29, 2003, 10:01:35 PM5/29/03

to

I read the 2599 article pointed out by Vishal and have these thoughts. I am
in the process of developing my second very large financial application
using SQL Server 2000. We currently have close to 250 tables, and will
probably have 400 or more before we're done. Every single one of our tables
uses a single Identity field as the primary key (this was the case in the
first large app I created using SQL Server 7 and 2000). I wouldn't have it
any other way. There is very little I agree with in the 2599 article. The
author states that Identity columns are bad because:

1) "They're not standard SQL. Most products have it but there's no
consistent implementation." This is only a concern if you think you might
migrate to a different database. If you intend to keep your database a SQL
Server database, this is of no relevence. In this case, Microsoft IS the
standard!

2) "They can't be updated. This violates the relational data model (not
fatal, but not good either). Duplicates can be accidentally inserted
(fatal)." Why is the fact they cannot be updated a bad thing? I prefer my
primary keys to be non-updatable. When primary keys are updateable, you
have the possibility of having hundreds or thousands of records being
updated via R.I. when a key changes. If P.K.'s change often, you can have
major concurrency issues. Also, the liklihood of accidentally inserting
duplicate records using Identity columns is zero unless you reset the
counter or set Identity_Insert On.

3) "They only create numeric values. GUID/NewID() are also numeric only, and
are hard to read." I agree GUID's are hard to read. This is why I don't use
them. The fact that Identity fields are only numeric has never been an
issue with me.

4) "Numeric values are not meaningful in many tables, and adding them
complicates relationships between other tables." Adding them complicates
relationships? I would much rather have a single primary key value related
to a single foreign key field than having two or three primary key fields
related to two or three foreign key fields. To me this later scenerio is
much more complicated (and a potential concurrency issue as I mentioned
before)! Besides, having a single Identity column as the P.K. makes
identifying a particular record very easy. I've often asked one of my
developers something like, "Take a look at record 1157 in the Account
table". He knows immediately I'm referring to the record whose P.K is 1157.
I'd much rather do this than to say, "Take a look at the record in the
Account table where field1 is 'ABC', field2 is 210, and field3 is "1/12/03".

I do agree with one thing the author said, "...Identity can be a lifesaver,
especially when relating tables." Since most of use are creating tables
that have relationships with other tables, why not use Identities? It makes
things much easier, and a single 4 byte integer value is very efficient for
SQL Server to deal with when doing lookups.

In the last 4 to 5 years, I've been the DBA and lead developer on two large
SQL Server apps as I said before. I've yet to run into a situation where I
regretted using Identity columns as P.K.'s except in two cases. The first
was where I had a table where I feared the counter could exceed 2 billion.
The second was a case where I wanted to append data in one table to the same
table in another database. The possibility of duplicates gave me some
concern. For me, the pros of using them definately outweighs the cons.

There is also another benefit I almost forgot to mention. The app I'm
involved with now is a C# app. We make extensive use of typed datasets.
Having a single Identity column as the P.K. makes dealing with datasets that
contain many related tables MUCH easier.

J.R.
Largo SQL Tools
The Finest Collection of SQL Tools Available
http://www.largosqltools.com

"Sam I Am" <sam...@rubbachicken.com> wrote in message
news:O4pw7DjJ...@TK2MSFTNGP10.phx.gbl...

Ted Bouskill

unread,

May 30, 2003, 12:29:00 AM5/30/03

to

I've only encountered one scenario where using a GUID made sense.
Especially when you consider their massive storage size.

If you think about it COM objects use GUID's and the reason is, when a COM
object is registered in the registry which is a crude database, different
developers have to make sure their GUID's don't collide. In other words,
records created in a other locations could be combined and possibly collide.
The random nature of a GUID protects against that.

The only time in my database work that I felt a GUID was justified as a PK
was a CRM application. Sales agents on the road and office staff would
create records which could be combined in the future. We needed to ensure
that new records created in any location could be combined without PK
collisions.

However, that doesn't mean I always use IDENTITY columns. Sometimes the
table itself has a unique field that will not change and that is sometimes a
good choice.

"Sam I Am" <sam...@rubbachicken.com> wrote in message
news:O4pw7DjJ...@TK2MSFTNGP10.phx.gbl...

Sam I Am

unread,

May 30, 2003, 2:45:25 AM5/30/03

to

Thanks for all the advice

This is going to be a rather large sql implementation with millions of
records and clustering. I feel more comfortable going with GUID's throughout
at this stage.

Thanks again

S

"Ted Bouskill" <tedb...@hotmail.com> wrote in message
news:%23dSx%23PmJD...@TK2MSFTNGP12.phx.gbl...

Greg Linwood

unread,

May 30, 2003, 4:23:32 AM5/30/03

to

Hi J.R.

You've given great, in-depth answer that covers a number of the important
differences between GUID / Integer.

One thing I want to address in your post though is: "I feared the counter
could exceed 2 billion.". Perhaps you've worked this out since, but just so
all is clear to whoever reads your post, identity is not limited to 2
billion in SQL 2K as columns with the integer type BIGINT (8 byte) can also
be used for identity which allow for values from -9,223,372,036,854,775,808
through 9,223,372,036,854,775,807.

Another classic scenario where GUIDs can be advantageous over IDENTITY is
where there the application design needs to know the key between master /
detail records at the application layer, BEFORE insertion into the database.
This can eliminate a network round trip between the application layer and
the DB just to generate a key in such a scenario. This is often presented in
argument about the advantages / disadvantages of IDENTITY / GUID as an
advantage for GUID, especially where the benefit of the application layer
knowing the ID before insertion into the DB out-weighs the draw-back of the
extra storage size footprint of a GUID.

Regards,
Greg Linwood
SQL Server MVP

"Largo SQL Tools" <support@largosqltools_nospam_.com> wrote in message
news:e7yd39kJ...@TK2MSFTNGP10.phx.gbl...

David Portas

unread,

May 30, 2003, 4:51:51 AM5/30/03

to

> assume that the developer is a complete fool and doesn't put a unique
> index/constraint on what would have been the natural primary key.

Yes, that's what they assume because that's what actually happens! Look
through posts to this group for many, many examples of tables without
constraints declared on natural keys. Or just look at "Northwind" where MS
demonstrates by example how to destroy the integrity of your data with
IDENTITY.

--
David Portas
------------
Please reply only to the newsgroup
--

"Michael Culley" <mcu...@NOSPAMoptushome.com.au> wrote in message
news:#8aKFRkJ...@TK2MSFTNGP10.phx.gbl...

David Portas

unread,

May 30, 2003, 4:55:20 AM5/30/03

to

> the problem of changing values in non-identity primary keys. They also

Problem? What's the problem with ON UPDATE CASCADE?

--
David Portas
------------
Please reply only to the newsgroup
--

"Michael Culley" <mcu...@NOSPAMoptushome.com.au> wrote in message
news:#8aKFRkJ...@TK2MSFTNGP10.phx.gbl...

Dan Guzman

unread,

May 30, 2003, 9:11:14 AM5/30/03

to

> This is going to be a rather large sql implementation with millions of
> records and clustering. I feel more comfortable going with GUID's
throughout
> at this stage.

Without jumping into the surrogate key debate that's be discussed here
many times, there is an additional consideration regarding GUIDs that I
didn't see mentioned in this thread. GUID values are essentially
randomly assigned under Windows 2000 and above. This can negatively
impact performance with large tables if you have a clustered primary
key/index on a GUID. Furthermore, the buffer cache hit ratio may be
lower than the natural key or identity alternative so you'll need a
strong disk subsystem to support your design decision.

--
Hope this helps.

Dan Guzman
SQL Server MVP

-----------------------
SQL FAQ links (courtesy Neil Pike):

http://www.ntfaq.com/Articles/Index.cfm?DepartmentID=800
http://www.sqlserverfaq.com
http://www.mssqlserver.com/faq
-----------------------

"Sam I Am" <sam...@rubbachicken.com> wrote in message

news:um$OTcnJD...@TK2MSFTNGP10.phx.gbl...

Joe Celko

unread,

May 30, 2003, 1:03:56 PM5/30/03

to

>> I am about to start designing schema for a large database. I am
trying to weigh up the pros and cons for using either a GUID or auto

incrementing IDENTITY column for the primary keys. <<

This is obviously your first large database project, and you have never
had course in data modeling. You have set up a False Dichotomy. A very
false one, in fact.

Neither a GUID or IDENTITY can be a key **by definition**; they are
attributes of the PHYSICAL storage and have nothing to do with the
logical model. You are starting on the physical implementation without
a logical model. And you're already trapping yourself in one of two bad
decisions that will destroy the dat integrity of the project.

The right questions to ask at the start of a project are:

1) Can I buy a package? How much does it cost?
2) Has anyone done this before? Who?
3) What are the industry standards for this application?
4) What are the entities?
5) What are the relationships?
6) What are the business rules?

Have you learned ORM yet? Might be the best place to start.

I made a living fixing database projects that get screwed up by things
like this. I usually come in after the first year. That is a "magic
number" because it is just enough time fot the database to get large,
have been used enough to be important and to have someone do an annual
report off of it. That is when the owners see that everything is crap
and that they need to clean it up.

--CELKO--
===========================
Please post DDL, so that people do not have to guess what the keys,
constraints, Declarative Referential Integrity, datatypes, etc. in your
schema are.

*** Sent via Developersdex http://www.developersdex.com ***
Don't just participate in USENET...get rewarded for it!

Joe Celko

unread,

May 30, 2003, 1:03:57 PM5/30/03

to

>> These sort of articles always brush over the problem of changing

values in non-identity primary keys. <<

I've got an article in the works on that topic. There are no problems if
you know what you are doing. But most programmers who get stuck with DB
work don't and the boss will not pay to train them.

First of all, you want industry-standard keys which can be verified
externally -- things like UPC, VIN, etc. You want check digits and
syntax rules which can be verified internally. If you have to create
your own keys, then you want to do a lot of careful work (I have a few
chapters on the design of encoding schemes in one of my book).

You then use a CREATE DOMAIN statement (or vendor equivalent) for each
key, so that any change will automatically cascade. Or use DRI actions
and cascades.

>> ... and assume that the developer is a complete fool and doesn't put

a unique index/constraint on what would have been the natural primary
key. <<

Well, he is either fool, an incompetent or a sabatuer and I try to
follow the rule that one should not attribute to malice that which can
be explained by stupidity.

Think about it for two seconds. You have a data model in which one of
the rules that you **must enforce** for the model to be valid is that
some column(s) are unique within a table.

If you know this and yet you deliberately leave it out, so that the
database will be corrupted, can you think of any other words than
"fool", "incompetent" or "sabatuer" to describe the behavior? I am
assuming "lazy @!$& bastard" falls under "incompetent".

Michael Culley

unread,

Jun 1, 2003, 5:21:11 PM6/1/03

to

> Yes, that's what they assume because that's what actually happens! Look
> through posts to this group for many, many examples of tables without
> constraints declared on natural keys. Or just look at "Northwind" where MS
> demonstrates by example how to destroy the integrity of your data with
> IDENTITY.

This is the sort of totally illogical arguement that is typical of the
natural key crowd. The fact that some developers screw it up has no bearing
on the validity of the technique.

--
Michael Culley

Michael Culley

unread,

Jun 1, 2003, 5:26:33 PM6/1/03

to

Say you got the SSN no for a person wrong and need to change it then you
have to make sure that no-one is editing any record related to this
customer. It's also possible to have information stored across many
different database systems that you cannot use cascade.

--
Michael Culley

"David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message
news:uP4CDkoJ...@TK2MSFTNGP11.phx.gbl...

Joe Celko

unread,

Jun 1, 2003, 5:34:49 PM6/1/03

to

>> This is only a concern if you think you might
migrate to a different database. If you intend to keep your database a
SQL Server database, this is of no relevence. In this case, Microsoft IS
the standard! <<

"Caesar: Pardon him, Theodotus. He is a barbarian and thinks the customs
of his tribe and island are the laws of nature." - Caesar and
Cleopatra; George Bernard Shaw 1898

LOL! I actually have an article in INTELLIGENT ENTERPRISE wntitled "I
will never have to port this code" on this topic. I guess if your
company is a total loser that will never grow, never afford a new
release, etc. then this might work.

>> Why is the fact they cannot be updated a bad thing? I prefer my
primary keys to be non-updatable <<

REAL WORLD FLASH: Currently the ISBN and UPC/EAN codes are CHAR(10) and
are increasing to CHAR(13) with a change in the check digit algorithm.
It would be nice to accomodate that.

>> the liklihood of accidentally inserting duplicate records [sic] using
Identity columns [sic] is zero unless you reset the counter or set
Identity_Insert On. <<

Wrong. Imagine you have an IDENTITY Property on a table as the primary
key and **natural key** in the data, such as a UPC code. There is
nothing to stop you from putting a million rows with the same UPC code
and destroying your data integrity.

You think a key is something you create; a key should be a set of
attributes that uniquely identify each entity in a set. And if you have
all this experience, why don't you know that a file is not a table, a
row is not a record, and a column is not a field? Is it because you
have been writing sequential file systems in SQL?

>> I would much rather have a single primary key value related to a

single foreign key field [sic] than having two or three primary key
fields [sic] related to two or three foreign key fields [sic]. <<

Why not use what is most understood by the users, such as (longtitude,
latitude) pairs? Why not use things which can be verified in the real
world? Data integrity does not seem to be an issue with you ...

Michael Culley

unread,

Jun 1, 2003, 5:46:31 PM6/1/03

to

Hi Joe,

> I've got an article in the works on that topic. There are no problems if
> you know what you are doing. But most programmers who get stuck with DB
> work don't and the boss will not pay to train them.

So what you are saying is that the natural key is more difficult to work
with?

> You then use a CREATE DOMAIN statement (or vendor equivalent) for each
> key, so that any change will automatically cascade. Or use DRI actions
> and cascades.

You don't have any of these hassles with IDENTITY PKs.

> If you know this and yet you deliberately leave it out, so that the
> database will be corrupted, can you think of any other words than
> "fool", "incompetent" or "sabatuer" to describe the behavior? I am
> assuming "lazy @!$& bastard" falls under "incompetent".

You are right, if someone uses identity pks and they don't use unique
constrains/indexes on what would have otherwise been the natural key then
they are a fool. But this is not the point! Just because some people are
foolish does not mean the identity method is somehow invalid. I see this
illogical arguement repeated again and again by the natural key crowd.

--
Michael Culley

Michael Culley

unread,

Jun 1, 2003, 6:00:24 PM6/1/03

to

Hi Joe,

> Wrong. Imagine you have an IDENTITY Property on a table as the primary
> key and **natural key** in the data, such as a UPC code. There is
> nothing to stop you from putting a million rows with the same UPC code
> and destroying your data integrity.

This is the illogical arguement I was talking about in my previous post.
What stops the same UPC code being inserted is a unique constraint/index on
the UPC code. If the developer leaves off this unique constraint/index then
they haven't done their job properly but this is in no way an arguement
against identity keys.

--
Michael Culley

Steve Kass

unread,

Jun 1, 2003, 11:37:23 PM6/1/03

to

Joe,

And just how does your database handle different products with
the same UPC code? I assume you've read the guidelines and
know that the reuse of codes is permitted. Maybe you wouldn't
do it, but someone will, and a retailer has to accomodate. I suggest
the retailer use his/her own SKU, store the UPC code as an attribute,
and heretical as you might find it, _not_ put a unique constraint on the
UPC code, cuz it's not unique.

Here's the quote, to which you didn't respond when this topic
came up in February, 2002.

(http://www.uc-council.org/reflib/00810/02-TOC/02-01.html)

If possible, an old UCC-12 (U.P.C.) Number should never be reassigned
(see Amplification 3). However, in the event one must be reassigned,
there are minimum retention periods prior to reuse or reassignment. The
minimum retention period is 30 months for apparel items and 48 months
for all other items. These periods allow a reasonable time period for
the SKU to pass through the retail product cycle. A 12-month retention
period should apply to promotion packages such as a price-off label.

Manufacturers should recognize that subsequent changes to their UCC-12
(U.P.C.) Numbers will be costly for distributors to accommodate and will
lead most retailers to view the renumbered item as essentially a new
item offered for sale.

This guideline is intended to embrace current trade practices and to
minimize the volume of UCC-12 (U.P.C.) Numbers in use, consistent with
the basic objective of facilitating automatic checkout devices and
semiautomatic reorder within distributor organizations.

A manufacturer needs to balance the effects of permanently retiring
numbers (and risk running out of numbers) and reusing numbers. If
numbers are reused, careful management is required to avoid violating
the guidelines and ensure that there are no duplicate or overlapping
assignments.

Steve kass
Drew University

Steve Kass

unread,

Jun 1, 2003, 11:40:23 PM6/1/03

to

The developer might leave off the unique constraint on UPC codes
in order to accommodate a UPC codes that has been reassigned
for a different product (with a different SKU or artificial key, of
course).

http://www.uc-council.org/reflib/00810/02-TOC/02-01.html

Steve Kass
Drew University

Michael Culley

unread,

Jun 2, 2003, 12:04:37 AM6/2/03

to

> And just how does your database handle different products with
> the same UPC code?

LOL! Joe would be in a real mess when he discovered this at his magic 1 year
into the project.

--
Michael Culley

David Portas

unread,

Jun 2, 2003, 2:30:42 AM6/2/03

to

My comment was not about the validity of the technique but was a response to
your criticism of Robvolk's article. My opinion is that, while IDENTITY can
be useful when used correctly, I too often see it used badly. The article I
think goes some way towards explaining the basics that a developer should
understand before he starts applying IDENTITY indiscriminately.

--
David Portas
------------
Please reply only to the newsgroup
--

"Michael Culley" <mcu...@NOSPAMoptushome.com.au> wrote in message

news:e6PN2NIK...@TK2MSFTNGP12.phx.gbl...

David Portas

unread,

Jun 2, 2003, 2:50:26 AM6/2/03

to

> Say you got the SSN no for a person wrong and need to change it then you
> have to make sure that no-one is editing any record related to this
> customer.

Absolutely. In many cases that's in the interests of data integrity.

--
David Portas
------------
Please reply only to the newsgroup
--

"Michael Culley" <mcu...@NOSPAMoptushome.com.au> wrote in message

news:#uV21QIK...@TK2MSFTNGP12.phx.gbl...

Michael Culley

unread,

Jun 2, 2003, 5:05:53 PM6/2/03

to

Right, but if you used an identity pk then it wouldn't be a problem.

--
Michael Culley

"David Portas" <REMOVE_BEFORE_R...@acm.org> wrote in message

news:#YOfQMNK...@TK2MSFTNGP11.phx.gbl...

Arthur Hoornweg

unread,

Jun 23, 2003, 4:39:48 AM6/23/03

to

The beauty of the GUIDS is that databases can be completely independant
for as long as you want, and then merged into one single database
without colliding primary keys.

This makes them ideal for offline work where users get an independant
copy of the main database, are free to make changes and additions, and
subsequently synchronize with the main database.

--
Arthur Hoornweg

Please remove the ".net" from my e-mail address
if you want to reply by e-mail!

Tibor Karaszi

unread,

Jun 23, 2003, 6:23:20 AM6/23/03

to

However, if two rows actually represent the same "physical" entity, you don't get help with
catching this duplicate. (Two sides of the same coin :-) .)

--
Tibor Karaszi, SQL Server MVP
Archive at: http://groups.google.com/groups?oi=djq&as_ugroup=microsoft.public.sqlserver

"Arthur Hoornweg" <arthur....@wanadoo.nl.net> wrote in message
news:MPG.1960db3e2...@msnews.microsoft.com...

SamIAm

unread,

Jul 1, 2003, 9:16:12 PM7/1/03

to

"> This is obviously your first large database project, and you have never
> had course in data modeling. You have set up a False Dichotomy. A very
> false one, in fact."

A little too wound up there or too arrogant??

I am not a DBA. Thank g-d. This is not the kind of help and direction I was
looking for.

If someone posted a question on web development asking something that I may
consider simple, would I insult them and tell them to go and take courses
and remind them how I normally would have to save them later??

No obviously not.

"Joe Celko" <anon...@devdex.com> wrote in message
news:OsSB21sJ...@TK2MSFTNGP11.phx.gbl...

Dmand

unread,

Jul 2, 2003, 1:37:36 AM7/2/03

to

Why can't an Identity be a Key? If I want a customer ID say and I want that
to be an integer that is unique, WTF is wrong with using the system
allocated Identity value? Same goes for a GUID, if I want to whack a unique
stamp on something that isn't necessarily unique based on it's other
properties then it does the job.

Anyway, I ain't a DBA either, but here's my 2 cents. The Identity takes less
storage and will match quicker in joins (int versus string). The existence
of Identities in your table may impact update performance as the next
Identity requires knowledge of the current table data. A GUID on the other
hand is randomly generated, though if you make it a unique key (PK) then the
usual performance impact of a unique constraint will still come into to play
and more so with a GUID as again it's a case of int versus string. Oh, and
yeah, what are the business rules? ;-)

"SamIAm" <sam...@rubbachicken.com> wrote in message
news:OfF2HeDQ...@tk2msftngp13.phx.gbl...

David Portas

unread,

Jul 2, 2003, 2:54:32 AM7/2/03

to

> Why can't an Identity be a Key? If I want a customer ID say and I want
that

An IDENTITY can't be a *natural* key because it is not "a subset of the
attributes which uniquely identify the entity" i.e. a Primary Key. IDENTITY
can be used as an artificial, surrogate key and the pros and cons of doing
so have often been repeated in this group.

However, if you do use identity as a key you MUST also declare the true,
natural key of the table as non-NULL and UNIQUE otherwise you have no
natural key and therefore no uniqueness and no RI.

Michael Culley

unread,

Jul 2, 2003, 10:35:52 PM7/2/03

to

> Why can't an Identity be a Key? If I want a customer ID say and I want
that
> to be an integer that is unique, WTF is wrong with using the system
> allocated Identity value? Same goes for a GUID, if I want to whack a
unique
> stamp on something that isn't necessarily unique based on it's other
> properties then it does the job.

Using either an identity or a guid is easier in pretty much every way. The
problem with making things easier for yourself is that you look more like a
beginner and this is frowned upon by the 'experts'.

--
Michael Culley