Foreign keys in Django models without id fields?

2,957 views
Skip to first unread message

Andy Gimblett

unread,
Jul 25, 2005, 2:36:06 PM7/25/05
to django...@googlegroups.com
Hi all,

I picked up Django for a try on Saturday, and I must say, I've been
very impressed so far, and I can see big things in the future... :-)
Having said that, I have a question/some concerns regarding foreign
keys. Apologies in advance, because it's a bit epic...

The short version:

Is there a (working) way to have foreign key relationships between
classes in my model, while avoiding arbitrary "id"
columns/attributes?

Also, am I just being silly worrying about this?

The long version:

Consider the following simplified version of my problem: we aim to
keep track of money spent by various people.
We have two classes:
Person (ie somebody spending money) and
Event (ie a money spending event).
Each Event is associated with exactly one Person.
Each Person may be associated with many events.
So we have a many-to-one relationship between Event and Person.
And of course I want to require that the Person exists before an Event
with that Person is created.

Natural candidate for a foreign key - right?

Now, if I was doing this directly in Postgres, I'd know what to do:

----

BEGIN;
CREATE TABLE keys2_persons (
person varchar(30) NOT NULL PRIMARY KEY
);
CREATE TABLE keys2_events (
moment date NOT NULL,
cost numeric(8, 2) NOT NULL,
person varchar(30) NOT NULL REFERENCES keys2_persons
);
COMMIT;

----

Unfortunately, I can't find a way to reproduce this situation in
Django. Judging by the tutorial, the obvious way to do this is with a
model that looks like this:

----

# keys1/models/keys1.py

class Person(meta.Model):
fields = (
meta.CharField('person', maxlength=30),
)

class Event(meta.Model):
fields = (
meta.DateField('moment'),
meta.FloatField('cost', max_digits=8, decimal_places=2),
meta.ForeignKey(Person),
)

---

... which gives me SQL that looks like:

----

BEGIN;
CREATE TABLE keys1_persons (
id serial NOT NULL PRIMARY KEY,
person varchar(30) NOT NULL
);
CREATE TABLE keys1_events (
id serial NOT NULL PRIMARY KEY,
moment date NOT NULL,
cost numeric(8, 2) NOT NULL,
person_id integer NOT NULL REFERENCES keys1_persons (id)
);
COMMIT;

----

Now, that's fair enough, and it works, but I have two problems:

1. Well, yick. Using unique IDs is such a, well, "MySQL" way of
doing things, and I've got into the habit of doing my databases
properly (ie with meaningful primary keys, and with foreign key
relationships directly between fields) of late...

2. It means that I have to remember to use "person_id" in things
like this:

admin = meta.Admin(
list_display = ('moment', 'cost', 'person_id'),
)

although it does at least seem intelligent enough to display
the object, not its ID. Still, I'd rather call it "person", I
think.

Anyway, I was quite pleased when I saw the "primary_key" and
"to_field" attributes in the model API documentation, because it
seemed a way around this. So I rewrote the above as:

----

# keys2/models/keys2.py

class Person(meta.Model):
fields = (
meta.CharField('person', maxlength=30, primary_key=True),
)

class Event(meta.Model):
fields = (
meta.DateField('moment'),
meta.FloatField('cost', max_digits=8, decimal_places=2),
meta.ForeignKey(Person, to_field='person')
)

----

... which looked likely, but unfortunately gives us this SQL:

----

BEGIN;
CREATE TABLE keys2_persons (
person varchar(30) NOT NULL PRIMARY KEY
);
CREATE TABLE keys2_events (
id serial NOT NULL PRIMARY KEY,
moment date NOT NULL,
cost numeric(8, 2) NOT NULL,
person_id varchar(30) NOT NULL REFERENCES keys2_persons (person)
);
COMMIT;

----

... which is almost what I want but not quite: in particular, I still
have a field called person_id instead of person (even though it
doesn't reference an INT id field). Even worse, the keys2 model,
while accepted by "django-admin.py install", seems to break the admin
interface - eg, I can't add a Person. So maybe I should raise a bug
here - or am I doing something stupid?

So, as I said, I'm wondering two things:

1. Is what I'm trying to do even possible? (And I wonder how it
works for, eg, compound keys - can we have compound primary
keys?)

2. Should I even be worrying about this? Will I ever need to
manipulate these IDs directly? I haven't seen enough of what
happens at the Python object level to know if I ever have to
"join" stuff manually there... Or do I just pull an Event
object out, and get its Person attached for free? What about
creating these things? I've seen Tutorial 1 but that didn't
really answer my question...

Finally, almost as an aside, I believe the ForeignKey example in the
model API documentation at
http://www.djangoproject.com/documentation/model_api/ is misleading:

A many-to-one relationship to the primary key in another
object. So, to give a Topping object a many-to-one relationship to
Pizza (i.e. there are many toppings on a pizza):

meta.ForeignKey(Pizza)

The problem is that this should, of course, be a many-to-many
relationship. Pizzas have many Toppings, and the same Topping can
occur on multiple Pizzas. As a consequence, I was quite confused for
a little while when I read that the first time.

So I think a better example should be chosen. In the spirit of not
just complaining without making a constructive suggestion, I suggest
to use a many-to-one relationship from Pizza to Size: each pizza can
only be one Size ("9 inch", "12 inch", "half-meter rectangular
behemoth", etc.) but each size can apply to multiple pizzas. I'm not
sure how good a suggestion that is, but there you go. :-)

Phew!

Enormous respect to anyone who's managed to read this far, and also to
anyone who can give me an intelligent answer. I won't be at all
suprised if I'm missing something stupid, but OTOH if this is a
problem which hasn't been encountered before, maybe it'll all be worth
it... :-)

Cheers!

-Andy


--
Andy Gimblett
http://gimbo.org.uk/

Jacob Kaplan-Moss

unread,
Jul 25, 2005, 3:42:02 PM7/25/05
to django...@googlegroups.com
Hey Andy --

The short version: Django can indeed handle this.

The long version: Django's trying to help you design better databases
-- please let it!

On Jul 25, 2005, at 1:36 PM, Andy Gimblett wrote:

[snip lots of examples]


> 1. Well, yick. Using unique IDs is such a, well, "MySQL" way of
> doing things, and I've got into the habit of doing my databases
> properly (ie with meaningful primary keys, and with foreign key
> relationships directly between fields) of late...
>

This isn't a "MySQL way" -- it's the RIGHT way of designing
databases. Django's designed to do the right thing, and in good
database design, every table should have a unique ID, preferably an
integer. Read up on data normalization if you like; you'll find that
unique IDs are pretty much agreed on as a best practice.

Think about this: if you've got this table:

"""
CREATE TABLE keys2_persons (
person varchar(30) NOT NULL PRIMARY KEY
);
"""

How would you have multiple people with the same name (I know at
least two "John Roberts"...)? If a value can be duplicated, it can't
be a primary key.

Please, for the love of the person who has to maintain your code, let
Django help you design a good schema.

That said, if you must have a bad design:

[snip more examples]


> 1. Is what I'm trying to do even possible? (And I wonder how it
> works for, eg, compound keys - can we have compound primary
> keys?)
>

Yes, although you'll need to jump through a few hoops. Your keys2.py
almost has it, done; all you're missing is a "name" option to the
Person ForeignKey, so your model should look like this:

"""
class Person(meta.Model):
fields = (
meta.CharField('person', maxlength=30, primary_key=True),
)

class Event(meta.Model):
fields = (
meta.DateField('moment'),
meta.FloatField('cost', max_digits=8, decimal_places=2),
meta.ForeignKey(Person, name="person", to_field='person')
)

"""

Which will create the SQL you want -- but again, you really shouldn't
do this; this is going to bite you in the ass when you try to add
that second "John Roberts" to the database.


> Even worse, the keys2 model,
> while accepted by "django-admin.py install", seems to break the admin
> interface - eg, I can't add a Person. So maybe I should raise a bug
> here - or am I doing something stupid?
>

I'm not sure what you mean by "breaking the admin"; more details
would be nice here -- there's good chance it's breaking because your
schema's too weird for Django. I think the admin should work with
badly designed schemas.

And I wouldn't call you "stupid", really; database design can be
tricky, and I'm certainly responsible for my fair share of schema
abominations. However, Django's been designed to specifically
prevent you (and me) from making these types of mistakes; that's why
you're having to fight it.



> 2. Should I even be worrying about this? Will I ever need to
> manipulate these IDs directly? I haven't seen enough of what
> happens at the Python object level to know if I ever have to
> "join" stuff manually there... Or do I just pull an Event
> object out, and get its Person attached for free? What about
> creating these things? I've seen Tutorial 1 but that didn't
> really answer my question...
>

Exactly -- you shouldn't really be worrying about it; getting a
related object is as simple as "my_event.get_person()". 99% of the
time, you won't need to worry about IDs; they'll just sit there and
look pretty.


> Finally, almost as an aside, I believe the ForeignKey example in the
> model API documentation at
> http://www.djangoproject.com/documentation/model_api/ is misleading:
>

[snip]

Yes, it is -- and your example is *much* better; I'm going to replace
my example with yours when I've got a moment; thanks for the suggestion!

I hope this helps you understand Django's behavior here,

Jacob.


Andy Gimblett

unread,
Jul 26, 2005, 3:49:03 AM7/26/05
to django...@googlegroups.com
On Mon, Jul 25, 2005 at 02:42:02PM -0500, Jacob Kaplan-Moss wrote:
>
> The short version: Django can indeed handle this.

Jolly good. :-)

Er, ok...

> The long version: Django's trying to help you design better
> databases -- please let it!

:-)

> This isn't a "MySQL way"

Sorry - sloppy language on my part which, now that I think about it,
probably makes no sense to anyone else. It arises in my mind because
of some personal history with MySQL... Back before it didn't didn't
handle referential integrity and a whole lot of other stuff very well
or at all, I lived in Maintenance Hell dealing with databases which
used this style. Since then I've mainly dealt with Postgres and have
got used to doing things right, although I hear MySQL has improved
greatly since then, and is almost a real RDBMS. ;-)

> -- it's the RIGHT way of designing databases.

I'm afraid I have to disagree with you there - although I can see how
that style might make writing an object-relational wrapper easier.

> Django's designed to do the right thing, and in good database
> design, every table should have a unique ID, preferably an integer.

Sorry, but no. What you mean is, every table should have a primary
key. While uniquely identifying each row, this isn't necessarily a
"unique ID" in and of itself. In particular, there's certainly no
preference for using an integer, and in most cases the primary key
will turn out to be a compound key consisting of several fields from
the table. In some cases, it might even be _all_ of the fields - but
that's OK. The only case where an "artificial" integer ID is
necessary is when there really isn't a candidate key - there's no
subset of fields which uniquely identifies rows, so you have to add
one yourself. _That's_ when you use an integer ID, but it should be
the only time.

Unfortunately, of course, I chose two examples where that happens to
be the case - which confuses this discussion somewhat. :-)

(Apologies for lecturing... It's what I do. I assume you actually
know this but I want to let you know that despite initial appearances,
I do too.)

> Read up on data normalization if you like; you'll find that unique
> IDs are pretty much agreed on as a best practice.

Really? I'd love to see some references there, because that's news to
me. Instead, see above. I suspect it's one of those things that's
become "best practice" (maybe particularly in the open source world?)
because "everybody does it and it works and doesn't tend to bite you
in the ass" - but really, it's entirely redundant a lot of the time.

(Although again, I can see that it could help make an ORM a lot easier
to write, and if the ORM does its job properly, hey, who cares?)

> Think about this: if you've got this table:
>
> """
> CREATE TABLE keys2_persons (
> person varchar(30) NOT NULL PRIMARY KEY
> );
> """
>
> How would you have multiple people with the same name (I know at
> least two "John Roberts"...)? If a value can be duplicated, it can't
> be a primary key.

Yes - this is a terrible example on my part and I should never have
used it. You're absolutely right, there's no candidate key, so we
need to add an artificial unique ID, which is most easily done with an
integer. And yes, the same is true for my events table too: even _all
the fields_ don't uniquely identify a row, so we absolutely need an
artificial ID there.

But I still disagree that, in general, artifical IDs are The Right
Thing.

A slightly better (and real) example might be this: departments at a
university are uniquely identified by a string, eg 'COMPSCI' for
Computer Science, 'MATHS' for Mathematics, etc. These really are
unique: there will never be another department whose code is 'COMPSCI'
(ignoring mergers with other universities!). So that's an obvious
primary key. Degree schemes are coded in a similar fashion: 'XCSCS'
for BSc Computer Science, 'CSCSWGRM' (or something similar) for BSc
Computer Science and German, etc. Finally, each degree scheme has a
single "home department" responsible for running it (even if it's a
joint degree). So here I might have:

CREATE TABLE department (
department_code varchar(10) NOT NULL PRIMARY KEY,
department_name varchar(80) NOT NULL
);
CREATE TABLE degree_scheme (
degree_scheme_code varchar(20) NOT NULL PRIMARY KEY,
scheme_name varchar(80) NOT NULL,
home_department varchar(10) NOT NULL REFERENCES department (department_code)
);

Here, we essentially have artifical unique IDs, but they're not
integers. Adding unique ID integers to these tables would be entirely
redundant.

I'm still not entirely happy with this example, TBH. With some
thought I'm sure I could come up with tables where the primary key
doesn't involve artificial codes at all, and really is dependent on
just the "real" data. But hopefully I've made my point.

> Please, for the love of the person who has to maintain your code,
> let Django help you design a good schema.

Well, that's me, and I have to say I much prefer writing joins based
on actual meaningful data fields rather than arbitrary IDs - but
that's just from my personal experience. :-) But maybe I don't ever
need to write joins, if Django hides all this from me...

> Which will create the SQL you want -- but again, you really
> shouldn't do this; this is going to bite you in the ass when you try
> to add that second "John Roberts" to the database.

Agreed - persons is a bad motivational example for this technique.
However, it's very nice to know that it _is_ possible - thanks!

Assuming we can agree that I do, despite initial appearance, know how
to design databases, and that I'd like to avoid artificial unique IDs
if possible, are compound keys (ie ones consisting of multiple fields)
handled by Django OK? Or do I have to use artificial ID numbers
there? If compound keys _aren't_ handled OK, then I'll probably just
give up and let Django add all the redundant ID fields it wants to, to
be honest. ;-)

> I'm not sure what you mean by "breaking the admin"; more details
> would be nice here -- there's good chance it's breaking because your
> schema's too weird for Django. I think the admin should work with
> badly designed schemas.

I'll look into it again more carefully and consider raising a bug...

> Exactly -- you shouldn't really be worrying about it; getting a
> related object is as simple as "my_event.get_person()". 99% of the
> time, you won't need to worry about IDs; they'll just sit there and
> look pretty.

Great. That's exactly the news I wanted to hear. :-)

> Yes, it is -- and your example is *much* better; I'm going to
> replace my example with yours when I've got a moment; thanks for the
> suggestion!

Great stuff - thanks!

> I hope this helps you understand Django's behavior here,

Yes, thanks. Apologies again for the duff examples. Many thanks for
the excellent feedback, and for getting me to think more carefully
about my schema...

Best,

-Andy

--
Andy Gimblett
Computer Science Department
University of Wales Swansea
http://www.cs.swan.ac.uk/~csandy/

Gwyn Evans

unread,
Jul 26, 2005, 4:22:38 AM7/26/05
to django...@googlegroups.com
Hi,
Doesn't the issue with using data as a primary key come when the
data needs to change? What about if the user changes their name for
some reason, e.g. marriage, sex-change, politial protest, etc, or if
the Maths Dept starts issuing BAs rather than BScs? :-)

I'm not saying that there's never going to be a place for
data-dependent primary keys, but it seems to me that there's enough
cases where there's a potential for change to justify independant
primary keys as the norm, at least.

The preference for it being an int/long is merely data storage
efficiency. Note that there may be a significant cost in the storage
requirements for the foreign keys of your composite-key table which
may outweigh the cost of the integer based primary key!

/Gwyn

Andy Gimblett

unread,
Jul 26, 2005, 4:44:21 AM7/26/05
to django...@googlegroups.com
> Doesn't the issue with using data as a primary key come when the
> data needs to change? What about if the user changes their name for
> some reason, e.g. marriage, sex-change, politial protest, etc, or if
> the Maths Dept starts issuing BAs rather than BScs? :-)

Sure - but isn't that what triggers and stored procedures are for? ;)

With regard to your examples, I already agreed that a person's name is
a poor choice of primary key, and if the Maths department starts
issuing BAs, well, that doesn't make any difference, does it? We just
add more degree scheme rows to the right table. "BA in Mathematics"
is a genuinely different thing from "BSc in Mathematics". Changing
the name there would be inappropriate.

However, I'm sure you can think of better examples where name change
is valid and then yes, life is a lot easier with independent primary
keys...

> I'm not saying that there's never going to be a place for
> data-dependent primary keys, but it seems to me that there's enough
> cases where there's a potential for change to justify independant
> primary keys as the norm, at least.

Yeah, perhaps I'm just getting too hung up on "scripture", and maybe
I'm out of touch with open source best practice. This is an
interesting debate, I must say.

OTOH I thought Django was "for perfectionists". ;-)

> The preference for it being an int/long is merely data storage
> efficiency. Note that there may be a significant cost in the
> storage requirements for the foreign keys of your composite-key
> table which may outweigh the cost of the integer based primary key!

Oh, sure. But my concern isn't really efficiency, storage, etc. but
rather comprehension. Of course, one can argue that triggers and
stored procedures don't necessarily aid comprehension. ;-)

All the best,

Jonathan

unread,
Jul 26, 2005, 9:48:50 AM7/26/05
to django...@googlegroups.com
I think the admin problem you mentioned may be the same as I have filed
in ticket 81:
http://code.djangoproject.com/ticket/81

Here I am looking a creating an online pilot logbook and wanted to
store the airfield information (currently just name). However all
airfields globally have a unique 4-letter ICAO identifying code which
would make sense to be the primary key. However as pasted into the
ticket, django doesn't like this - at least not in the admin interface.

Cheers,
Jonathan.

Jacob Kaplan-Moss

unread,
Jul 26, 2005, 10:16:46 AM7/26/05
to django...@googlegroups.com
Hey Andy --


> Sorry, but no. What you mean is, every table should have a primary
> key. While uniquely identifying each row, this isn't necessarily a
> "unique ID" in and of itself. In particular, there's certainly no
> preference for using an integer, and in most cases the primary key
> will turn out to be a compound key consisting of several fields from
> the table. In some cases, it might even be _all_ of the fields - but
> that's OK. The only case where an "artificial" integer ID is
> necessary is when there really isn't a candidate key - there's no
> subset of fields which uniquely identifies rows, so you have to add
> one yourself. _That's_ when you use an integer ID, but it should be
> the only time.
>

That's a really good point -- there's nothing particularly meaningful
about the use of an integer as a primary key. However, it can be
*really* hard to tell what's actually a dependent value in key's
clothing. Looking through every model used on our sites -- 213
classes -- I can't find a single case where I'd want to use anything
but the provided automatic integer unique ID.

(That said, there's one example in Django itself that uses a non-
integer unique ID; I'll let you dig it up if you're interested).


> (Apologies for lecturing... It's what I do. I assume you actually
> know this but I want to let you know that despite initial appearances,
> I do too.)
>

I'm also sorry if I come across as lecturing; it's hard to tell what
someone's experience level is, and my instinct is to assume that
people know jack about database design (this is an opinion formed
after years of maintaining other people's code) I suppose you could
say I have post-traumatic-database-maintenance stress disorder :)

Anyway, my point shouldn't be to lecture you on how to design tables;
I'm trying to explain to you why Django does what it does (including
our reasoning about best practices that led to that behavior).


>> Read up on data normalization if you like; you'll find that unique
>> IDs are pretty much agreed on as a best practice.
>>
>
> Really? I'd love to see some references there, because that's news to
> me. Instead, see above. I suspect it's one of those things that's
> become "best practice" (maybe particularly in the open source world?)
> because "everybody does it and it works and doesn't tend to bite you
> in the ass" - but really, it's entirely redundant a lot of the time.
>

You're right that I might have spoken too soon; I can't really find
much that explicitly says "integer keys". However, the "it doesn't
bite you in the ass" principle is, for me at least, synonymous with
best practices. I guess there's really two different worlds that
collide when it comes to database design: there's the CompSci set-
theory crowd who have all sorts of mathematical proofs for how and
why you should do this or not do that; and then there's the I'm-on-a-
deadline programmer who mostly cares about doing things fast and
minimizing future tearing-out-of-the-hair. I'm certainly in the
latter group -- I could probably fumble through a definition of what
3NF is, but that's about it -- and so I guess I speak more from
experience than from any solid conceptual grounding.

Besides, what do you really loose by having an "entirely redundant"
unique integer field in each database? Storage space, to be sure, so
if you're designing for systems where storage is measured in bytes,
perhaps it's an issue. However, our database -- over three million
rows in 276 tables -- consumes about 0.05% of our available disk
space, so I find it hard to care about the overhead of those integers.

Cleanliness might be an issue -- redundant rows are never
particularly elegant -- but I think that's offset by consistency.

Speed certainly isn't an issue; it's faster to join on integer keys
than on anything else.

So, *is* there a compelling reason not to just stick an integer
primary key in every table? To me it seems the benefits outweigh any
problems.


> A slightly better (and real) example might be this: departments at a
> university are uniquely identified by a string, eg 'COMPSCI' for
> Computer Science, 'MATHS' for Mathematics, etc. These really are
> unique: there will never be another department whose code is 'COMPSCI'
> (ignoring mergers with other universities!).
>
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

But, see, you've just made my point for me -- it's nearly impossible
to find an example where "meaningful" primary keys don't sometimes
change. For example, at a previous job we had a database that keyed
of a person's social security number, which we assumed would never
change. Wrong -- a customer had her identity stolen and the
government issued her a new SSN. It took three programmers nearly a
day to replace her SSN in all the places it appeared with the new one!

The point is, using "meaningless" unique IDs saves you from ever
having to deal with these corner cases. Again, Django's not going to
prevent you from shooting your future self in the foot; think of the
hoops to jump through to use non-integer primary keys as a trigger
lock. If you really know what you're doing, go ahead.


> Assuming we can agree that I do, despite initial appearance, know how
> to design databases, and that I'd like to avoid artificial unique IDs
> if possible, are compound keys (ie ones consisting of multiple fields)
> handled by Django OK? Or do I have to use artificial ID numbers
> there? If compound keys _aren't_ handled OK, then I'll probably just
> give up and let Django add all the redundant ID fields it wants to, to
> be honest. ;-)
>

OK, so let's bring this back to Django. I'm not really sure I fully
understand what a compound key is; it's basically where you say "a
row isn't uniquely identified by any single key, but instead by the
unique amalgam of these N fields," right?

If so, you're right in thinking that Django doesn't handle that case;
it's got mechanisms for ensuring composite uniqueness -- see
"unique_together" (http://www.djangoproject.com/documentation/
model_api/) -- but not for composite primary keys. Do you think this
is an important enough feature to add to Django's model API? I think
it would probably be most useful as a way of ensuring that Django can
successfully "wrap" existing schemas, which is certainly a notable
goal of ours.


>> I'm not sure what you mean by "breaking the admin"; more details
>> would be nice here -- there's good chance it's breaking because your
>> schema's too weird for Django. I think the admin should work with
>> badly designed schemas.
>>
>
> I'll look into it again more carefully and consider raising a bug...
>

Please do; as much as I believe that my way is the "right" way, I
don't want to force that belief onto others; Django should be able to
handle anything you throw at it.

You know, this is really fun -- keep it coming!

Jacob


xtian

unread,
Jul 26, 2005, 11:10:40 AM7/26/05
to django...@googlegroups.com
On 7/26/05, Jacob Kaplan-Moss <ja...@jacobian.org> wrote:
>
> ...
> So, *is* there a compelling reason not to just stick an integer
> primary key in every table? To me it seems the benefits outweigh any
> problems.
>

Not an argument against synthetic keys, but one possible reason not to
go with integer keys is that they will only be unique within the
current database. While that's generally all you'd need, and certainly
what I'd use most of the time, sometimes it can be handy to have GUIDs
as synthetic primary keys. If you're migrating entities from one
database to another (content management import/export is a context
I've seen it), it enables you to determine whether an item is new in
the destination, or an update to an existing entity (so you can do
some kind of conflict resolution/merge).

That said, GUIDs certainly take more space and are harder to generate
and use (especially debugging!) than integers. I'd normally only use
them for high-level entities that might need some kind of cross-system
reference.

>
> OK, so let's bring this back to Django. I'm not really sure I fully
> understand what a compound key is; it's basically where you say "a
> row isn't uniquely identified by any single key, but instead by the
> unique amalgam of these N fields," right?
>
> If so, you're right in thinking that Django doesn't handle that case;
> it's got mechanisms for ensuring composite uniqueness -- see
> "unique_together" (http://www.djangoproject.com/documentation/
> model_api/) -- but not for composite primary keys. Do you think this
> is an important enough feature to add to Django's model API? I think
> it would probably be most useful as a way of ensuring that Django can
> successfully "wrap" existing schemas, which is certainly a notable
> goal of ours.
>

The only situation where I might prefer a composite key to a single
synthetic one is in a many-to-many joining table - I think Django
handles these, but I'm not sure whether it adds the extra integer id
to the intermediate links.

However, if the relationship table had any extra attributes (say start
and end dates for a video rental), then I'd be inclined to give it
full entity status, and an integer id.

It's been interesting reading - thanks to both parties!

xtian

Arthur Lee

unread,
Jul 26, 2005, 2:02:12 PM7/26/05
to django...@googlegroups.com
Jacob Kaplan-Moss wrote:
A slightly better (and real) example might be this: departments at a
university are uniquely identified by a string, eg 'COMPSCI' for
Computer Science, 'MATHS' for Mathematics, etc.  These really are
unique: there will never be another department whose code is 'COMPSCI'
(ignoring mergers with other universities!).

    
   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

But, see, you've just made my point for me -- it's nearly impossible  
to find an example where "meaningful" primary keys don't sometimes  
change.  For example, at a previous job we had a database that keyed  
of a person's social security number, which we assumed would never  
change.  Wrong -- a customer had her identity stolen and the  
government issued her a new SSN.  It took three programmers nearly a  
day to replace her SSN in all the places it appeared with the new one!

The point is, using "meaningless" unique IDs saves you from ever  
having to deal with these corner cases.  Again, Django's not going to  
prevent you from shooting your future self in the foot; think of the  
hoops to jump through to use non-integer primary keys as a trigger  
lock.  If you really know what you're doing, go ahead.

  
But this can't be true.  Using unique integers we actually fail in this scenario.  If we are to use integers assigned in sequence, usually starting from 0 or 1, and we were to use this for all databases, merging of two databases will almost guarantee to cause an id collision.  OTOH, had we used a meaningful string, it will probably had a lower chance of such collision.  Furthermore, I would like to point out that it is possible to easily change a primary key, provided the database had properly defined relationships.  Some (good) database management system will allow such changes to be cascaded if the  foreign keys are properly defined.

Arthur

gheorghe

unread,
Aug 2, 2005, 2:47:59 PM8/2/05
to Django users
Another scenario not uncomon where integer ID are not the best is when
2 tables from 2 location need to be merged into one table (ex. collect
data from 2 stores and maintain it centraly somewhere).

There are many cases where people are working closely to the database
like datawarehosing or if there is more then one application accessing
the database
some business rules need to better be programmed in the database
itself.
If I was in the situation of working closely to the database and most
of my relationships are bases on integers instead of some well
organised data structures, what would I do if I need to join, reason
abount values in agregation or apply businness rules based on values if
my most important data (data that sets the relationships and structure)
is just integers?

The problem with value changing is no problem if the database has all
the relationships set as needed so cascading updates/deletes happen
automaticaly at least when unig MSSQL server.

xtian

unread,
Aug 3, 2005, 3:32:48 AM8/3/05
to django...@googlegroups.com
On 8/2/05, gheorghe <gmi...@gmail.com> wrote:
>
> Another scenario not uncomon where integer ID are not the best is when
> 2 tables from 2 location need to be merged into one table (ex. collect
> data from 2 stores and maintain it centraly somewhere).
>
> There are many cases where people are working closely to the database
> like datawarehosing or if there is more then one application accessing
> the database
> some business rules need to better be programmed in the database
> itself.

This is the scenario I mentioned above - any kind of replication means
that ids should probably be GUIDs, to avoid clashes, and detect when
something should be an update rather than an insert.

I don't quite see what you mean about business rules needing keys that
are not synthetic, though. There's no reason you couldn't have
business rules that follow id-based foreign key relationships.

> If I was in the situation of working closely to the database and most
> of my relationships are bases on integers instead of some well
> organised data structures, what would I do if I need to join, reason
> abount values in agregation or apply businness rules based on values if
> my most important data (data that sets the relationships and structure)
> is just integers?

You can still do joins with synthetic ids.

>
> The problem with value changing is no problem if the database has all
> the relationships set as needed so cascading updates/deletes happen
> automaticaly at least when unig MSSQL server.

While the DBMS can automatically propagate the changed primary key to
all dependent tables, this can cause *huge* performance problems - it
may have to update millions of rows and associated index entries. I've
worked on systems where the volume of data something like this would
affect would require taking the production system offline, and need
DBA management of storage space.

If synthetic keys are used, this situation can be handled easily.

Cheers,
xtian

gheorghe

unread,
Aug 3, 2005, 10:39:21 AM8/3/05
to Django users
" this can cause *huge* performance problems"

True, but I believe that's why we usualy set up 2 databases one
transactinal, where we stay put on to much normalization, relatinships,
indexes etc., and query oriented databases where this performance may
no matter that much since all the database updates are "nightly bach
updaes" anyhow, everything else is only queries.

As far as "needing keys that are not synthetic" what I'm sayng is that
many times I apply rules on a key value I also join on that key while
if it was integer I'll have 2 different keys one to join and one to
condition, multiply this by a couple and you get insted of a 100 lines
query a 200 lines query, that's all i'm saying. Now probably this is
just a silly asumption and I probably only need more experience on
doing it that way to see if it 's actualy true or not.

Reply all
Reply to author
Forward
0 new messages