Using Guid as Primary Key

Gops S

unread,

Dec 2, 2012, 7:51:54 AM12/2/12

to sqlal...@googlegroups.com

Can I use the Guid as primary key? I am newbie to sql and mysql management.

I am worried about auto-generation of primary when it is defined as integer when we backup and restore the database.

Can anyone throw light on using auto-increment id as primary key vs using guid string id as primary key?

Thanks,

Gopal

Wolfgang Keller

unread,

Dec 2, 2012, 4:30:01 PM12/2/12

to sqlal...@googlegroups.com

> Can I use the Guid as primary key? I am newbie to sql and mysql
> management.

Using such auto-generated surrogate keys is always a really bad idea
and the straightest and shortest way to data inconsistency hell
(especially through duplicates).

Sincerely,

Wolfgang

Michael Bayer

unread,

Dec 2, 2012, 10:15:30 PM12/2/12

to sqlal...@googlegroups.com

you've seen two guids generate as duplicates ? the chances of that are smaller than something one would ever see in 100K years.

guids aren't very efficient when it comes to indexes on foreign key/primary key columns though, they take up a lot of space.

anyway you can use GUIDS in sqlalchemy just like:

import uuid
Column(String(32), default=lambda: uuid.uuid4().hex, primary_key=True)

Wolfgang Keller

unread,

Dec 4, 2012, 4:46:57 PM12/4/12

to sqlal...@googlegroups.com

> >> Can I use the Guid as primary key? I am newbie to sql and mysql
> >> management.
> >
> > Using such auto-generated surrogate keys is always a really bad idea
> > and the straightest and shortest way to data inconsistency hell
> > (especially through duplicates).
>
> you've seen two guids generate as duplicates ?

You missed my point - *exactly*. ;-)

The very point of "natural" (as opposed to "surrogate") keys is that any
real world "thing" (such as e.g. a person) represented by a database
entity must have exactly one *and only one* matching record in the
corresponding database table.

Auto-generated "GUID"s are exactly what does *not* prevent the
generation of several records for one single real-world "thing". By
simple multiple manual entry of the same data several times due to
operator error. Identifying a "thing" by a natural identifier, i.e. one
that is actually unique for each and every "thing" *in the real-world*
does prevent such inconsistency.

I once came across an article describing one of the probably worst
cases of database havoc caused by surrogate keys in computer history -
the EMR system of the US veterans health administration. According to
this article, they face the problem today that each and every individual
veteran has an arbitrary, unknown number of corresponding database
records - and there's no way to reconcile this mess due to the sheer
volume and various technical issues such as typos etc.

Sincerely,

Wolfgang

Michael Bayer

unread,

Dec 4, 2012, 5:28:56 PM12/4/12

to sqlal...@googlegroups.com

oh, well this is the "natural keys are better" argument. Unfortunately, surrogate integer PKs are necessary in most real-world installations as they provide far better performance than a typical natural key, when you consider that they are also copied out to all the referencing FK columns and are present in many indexes. Even though SQLA totally supports natural keys as well as it can, I don't generally use natural PKs in my real-world contracts. The DBAs I work with won't allow them.

I'd say any system that actually has any human being *manually entering* a surrogate PK value as part of the application's user interface is intrinsically broken. Especially a GUID value.

I will often add a UNIQUE constraint to the "natural" PK of the table, next to the surrogate PK, to avoid the data duplication issues you refer to. I've used surrogate PKs for decades without data duplication issues.

That said, I did have a bad experience with GUIDs, which were in fact "natural" guids that were generated deterministically from other elements of the data - which was that the performance suffered terribly, specifically because of all that heavy GUID data copied out to all the referencing FKs and the associated indexes.

Christopher Lee

unread,

Dec 6, 2012, 2:30:49 PM12/6/12

to sqlal...@googlegroups.com

I use guid primary keys fairly heavily. There is a performance impact due to the size of the key; you just can't fit as much of the index in memory as you otherwise could. On the flip side, when you have multiple database shards, you never need to worry about keys being duplicated, so you can move data between shards at will. (Of course you can solve that problem with integer keys; it is just more complex, and hence more error prone.)

In addition to their size (bloating both the rows and the indexes, causing fewer rows to fit in memory), there are a few more caveats:

Guids are not necessarily random (it depends on how you generate the guid). Like Michael said, that can lead to worst-case performance, since hashing them naively can lead to many collisions. Either use mathematically random guids, or double-check your hash table performance, or both.

Guids are much more random than auto-incrementing integers. Since databases tend to cluster on disk around their primary key, multiple inserts will spread themselves across the disk. That can be very good for performance, but it is bad if you tend to query for rows in order. With integer keys, when you page in one row, if you query for the next primary key, that data is likely to be sitting resident in database memory, or at least on the disk cache.

I used guid primary keys in Microsoft's SQLServer (around 4-5 years ago), on a table that had high numbers of inserts and deletes. The database had trouble maintaining proper table statistics, which led to the query optimizer making some poor life choices, resulting in very poor performance. I think that is specific to the database and our poor high-churn architecture, but it is worth noting. I have not have that problem in MySQL or PostgreSQL.

None of these problems are insurmountable, of course. It just pays to stay on your toes and test your performance regularly.

--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Reply all

Reply to author

Forward