Composite Primary Keys

3,119 views
Skip to first unread message

David Cramer

unread,
Jul 27, 2008, 6:55:14 PM7/27/08
to Django developers
I'm to the point in a project where I *need* composite primary keys,
not using them would be a bit retarded and wasteful. So, I'm going to
write up the patch.

Here's how I was doing it before QS-RF:

- MyModel._meta.pk would return a tuple if it had multiple
- MyModel._meta.pks would always return a tuple
- pk='hello' would still work as it does now
- pk=('hello,') would work
- pk=dict(field='hello') would work (to allow for unordered pk lists)
- Composite primary keys would be created simply by passing
primary_key=True as an argument on more than one field in your model.

Anyone have any feedback, or started working on any code?

David Cramer

unread,
Jul 27, 2008, 6:59:30 PM7/27/08
to Django developers
To bring up a few more points I just noticed, that are on
http://code.djangoproject.com/wiki/MultipleColumnPrimaryKeys

# A number of things use (content_type_id, object_pk) tuples to refer
to some object -- look at the comment framework, or the admin log API.
Again, a composite PK system would need to somehow not break this.

I believe for this one there should be a PrimaryKeyField, or there
truly should be GenericForeignKey support. Expecting a charfield to
magically work here doesn't seem like a good idea. In summary, this
should break (in my opinion).

# Admin URLs; they're of the form "/app_label/module_name/pk/"; there
would need to be a way to map URLs to objects in the absence of a
primary key.

These could simply be /app_label/module_name/pk1;pk2;pk3/ -- Again,
the same problem exists that what if ";" is part of the primary key. I
don't have an answer. I see this more as an incremental change to the
framework vs a one-shot fix all.

Malcolm Tredinnick

unread,
Jul 27, 2008, 7:13:30 PM7/27/08
to django-d...@googlegroups.com

On Sun, 2008-07-27 at 15:55 -0700, David Cramer wrote:
> I'm to the point in a project where I *need* composite primary keys,
> not using them would be a bit retarded and wasteful. So, I'm going to
> write up the patch.
>
> Here's how I was doing it before QS-RF:
>
> - MyModel._meta.pk would return a tuple if it had multiple
> - MyModel._meta.pks would always return a tuple

It's be nice to do away with the need for the "pks" attribute, since it
provides more than one way to do something. Yes, it means that sometimes
you might need to test whether pk is a list or tuple, but that should be
relatively rare.

> - pk='hello' would still work as it does now
> - pk=('hello,') would work
> - pk=dict(field='hello') would work (to allow for unordered pk lists)
> - Composite primary keys would be created simply by passing
> primary_key=True as an argument on more than one field in your model.

This won't work nicely, since it doesn't allow you to specify the order
that will be used when assigning. Instead, for multi-column primary
keys, I'd suggest specifying them as an attribute on Meta:

class Meta:
primary_keys = ('username', 'location')

The order given there is the order used to specify them in assignments.
Having to reorder fields in a model to do that otherwise (if you only
used the primary_key attribute on the field) sets a kind of dangerous
precedence, since it says that might be an okay approach, except nothing
else can do it because ordering is then reserved for primary keys. We
already have a few issues that arise out of managers being treated
differently according to their order of declaration and it would be nice
to avoid that in this new feature.

> Anyone have any feedback, or started working on any code?

I've got some code that is going to land in the next week (before the
sprint, most likely) that adds most of the necessary support for
multicolumn fields. This is primarily to fix a few bugs with generic
relations in a non-specific way (it will work for all similar fields).
The stuff I'd done on multi-pk work -- which I'm not going to finish
before 1.0, because it's not that critical, so you're not duplicating
anything I'm doing there -- seemed to work nicely on top of that. So you
might want to keep an eye out for that. The main case where it's
applicable is when somebody does filter(foo__pk=(1, 2, 3)) because you
can no longer just replace "pk" with the name of the real attribute,
instead you have to treat it as multiple columns.

Regards,
Malcolm


David Cramer

unread,
Jul 27, 2008, 7:18:17 PM7/27/08
to django-d...@googlegroups.com
The Meta attribute sounds ok, my ordering concept was based on the order they are presented in the model definition. The one thing I don't like about unique/primary key's in this situation, is that there are two ways to declare them, and they differ based on if there's one or more than one. It's one of those things that has always bothered me, but if there's no one wanting to change unique_together and unique=bool, then I guess it makes sense to add primary_keys as a Meta argument.
--
David Cramer
Director of Technology
iBegin
http://www.ibegin.com/

David Cramer

unread,
Jul 28, 2008, 3:01:35 AM7/28/08
to Django developers
Here's an initial class which takes over Model._meta.pk:
http://www.pastethat.com/knsGJ

This maintains backwards compatibility from my tests (so far), but
someone may have a better approach than me.

I've got it working for syncing the database, and accessing the
primary keys. It also still works when doing pk=1 in the queries, etc.

I have to handle doing object.pk = N. We'll take the same approach as
the query logic (ordered list, dictionary, or single value for a non-
composite).

David Cramer

unread,
Jul 28, 2008, 4:37:38 AM7/28/08
to Django developers
Ok now for more fun problems...

fieldname = ForeignKey(MyModelWithAComposite)..


1) should db_column really mean db_column_prefix in this situation, or
should it be a tuple, ew?

2) should the composite keys be fieldname_pkfieldname, e.g.

class Hi:
char = CharField(primary_key=True)

class Model1:
key1 = ForeignKey(Hi)
key2 = IntField()

class Meta:
primary_keys = ('key1', 'key2')

class Model2:
fkey = ForeignKey(Model1)

Should model2 have a schema of:

id (INT)
fkey_key1_id (CHARFIELD)
fkey_key2 (INT)

Malcolm Tredinnick

unread,
Jul 29, 2008, 3:00:24 PM7/29/08
to django-d...@googlegroups.com

On Mon, 2008-07-28 at 01:37 -0700, David Cramer wrote:
> Ok now for more fun problems...
>
> fieldname = ForeignKey(MyModelWithAComposite)..
>
>
> 1) should db_column really mean db_column_prefix in this situation, or
> should it be a tuple, ew?

I don't see anything too wrong with a tuple. It's a multi-valued
attribute. Remember that this is going to be a relatively rare case, so
it's not like everybody's code will be scattered about with these
tuples. Explicit db_column setting is rare anyway, doubly so for a
composite primary key.

>
> 2) should the composite keys be fieldname_pkfieldname, e.g.
>
> class Hi:
> char = CharField(primary_key=True)
>
> class Model1:
> key1 = ForeignKey(Hi)
> key2 = IntField()
>
> class Meta:
> primary_keys = ('key1', 'key2')

By the way, I made a typo earlier. This attribute should really be a
singular noun. You only have one primary key on a model (by definition).
It has multiple columns in this case, but it's only a single key.

>
> class Model2:
> fkey = ForeignKey(Model1)
>
> Should model2 have a schema of:
>
> id (INT)
> fkey_key1_id (CHARFIELD)
> fkey_key2 (INT)

It has to, doesn't it? What's the alternative you're thinking of.

At this early point in the process, it's probably worth just making it
work with something reasonable like this and not worrying about it being
perfect first time. At that point, we can then do a more detailed review
and some things might seem better another way. As with the other big
design things people are proposing, the timing's a little unfortunate at
the moment just because all my spare cycles and then some are going
towards the 1.0 release, so I can look at things quickly (and I've
thought a lot about this particular problem in the past), but I'm not
going to be able to spend a few hours thinking about the tricky corners
for a while.

Nothing you've proposed so far looks insane, so I'd be tempted to run
with it and see what you can build.

Regards,
Malcolm


David Cramer

unread,
Jul 29, 2008, 3:39:52 PM7/29/08
to django-d...@googlegroups.com
Sorry, question #2 was more in reference to column names as the defaults.

Right now I have working code for accessing .pk, .pks, and .primary_key. They all reference the same CompositePrimaryKey instance. It also correctly adjusts primary keys so that it syncs them properly. Setting and getting the primary key in all situations throughout my code, and in the core seems to be working. There's a bit of an uglish hack making this work, as I had to do it in two stages since _meta.fields isn't populated yet in contribute_to_class, but _meta.primary_key cannot be set because it references the CompositePrimaryKey.

Right now I'm working on the RelatedField support. I'm at a design hurdle where I'm trying to take the best approach to handling multiple fields in one field. What I've done for my own stuff, is simple throw in add_to_class methods with the correct field, but in this situation that's not a great solution because we end up with extra fields on the class which aren't that meaningful.

Have any suggestions for what you'd like to see happen to handle this? I'd prefer we not have to tweak a lot of the Field internals just to support composites :)

Rock

unread,
Aug 27, 2008, 6:23:55 PM8/27/08
to Django developers
Any progress on this patch David? I would be happy to take a look at
whatever you have and perhaps help out with completing the patch.

David Cramer

unread,
Aug 27, 2008, 6:27:55 PM8/27/08
to django-d...@googlegroups.com
Really I'm stuck at an architectural point.

I have database validation and synchronization done, and the admin is working.

What is left is more or less handling relatedfield lookups. The issue is, that field's are designed to reference more than one field, so it's a tough design deicision to make on how that should be approached.


On Wed, Aug 27, 2008 at 5:23 PM, Rock <ro...@rockhoward.com> wrote:

Any progress on this patch David? I would be happy to take a look at
whatever you have and perhaps help out with completing the patch.





Alberto García Hierro

unread,
Aug 27, 2008, 7:15:16 PM8/27/08
to django-d...@googlegroups.com

El 28/08/2008, a las 0:27, David Cramer escribió:

> Really I'm stuck at an architectural point.
>
> I have database validation and synchronization done, and the admin
> is working.
>
> What is left is more or less handling relatedfield lookups. The
> issue is, that field's are designed to reference more than one
> field, so it's a tough design deicision to make on how that should
> be approached.

I think the best (and the only one right) solution involves adding
multicolumn fields to Django and doing lookups with some syntax like
Model.objects.get(pk=('foo', 1)). There are other hackish approaches,
like using hash(tuple(pk[0], pk[1], ..., pk[n])) as foreign key, that
could work. However, I won't rely on them, since I'm not sure if
hash() implementation is guaranteed to be kept as is.

On other related point, what's the status of multicolumn fields?

Regards,
Alberto

David Cramer

unread,
Aug 27, 2008, 7:17:42 PM8/27/08
to django-d...@googlegroups.com
What I had briefly discussed with malcom was using ordered tuples but switching up the defaults to use actualy field lookups.

MyModel.objects.get(pk=(1, 2)) or MyModel.objects.get(foo=1, bar=2)

If we could come up with some design for multi-column fields I'm wiling to put in the work.

Rock

unread,
Aug 27, 2008, 7:27:12 PM8/27/08
to Django developers

Well for one thing, if one of the columns happens to be named "ID", we
should use that for the relatedfields lookup column and that is that.
(BTW, does your approach allow the Django supplied ID field to be
combined with some other field(s) to make a multi-column key? This
would be bang up for future partitioning support.)

Next I would suggest adding a meta model column designation like
"id_field" to specify a field to use for related classes. This might
be a good "80/20" solution that could serve for an initial test
version.

Rock

unread,
Aug 28, 2008, 3:04:16 PM8/28/08
to Django developers
Ha! It turns out that a to_field option already exists for ForeignKey.
(I did not know that yesterday.) I have just verified that
to_field(SomeClass,"id")
works fine even if the PRIMARY KEY uses multiple columns. However, and
this
is the key point, the id field has to be marked as UNIQUE.

To prove all of this I have created a sqlresetpartition and
resetpartition command
that looks for classes to partition within your settings. It sets up
the required
multi-column primary key, marks the ID field as UNIQUE, and also does
the other
magic to create partitioned tables. (It also creates partitioned
indexes as required.)

This only works for Oracle and the solution is special cased for my
needs, but it
points the way forward. I will look over the partitioning logic for
MySQL when I get
a chance. Maybe by Django 1.5 or so we can include direct support for
partitioned models.

BTW, once I have the models properly created with partitions, none of
my other
django code requires any changes at all. Besides improved performance,
I must
say that it is a thrill to delete hundreds of thousands of old rows in
less than a
second. This is the key feature I needed for my django app.
> > design deicision to make on how that should be approached.- Hide quoted text -
>
> - Show quoted text -

Rock

unread,
Aug 28, 2008, 6:36:26 PM8/28/08
to Django developers
To be clear, the syntax is:

myfkey = models.ForeignKey(SomeClass,to_field="id")

David Cramer

unread,
Aug 28, 2008, 9:05:43 PM8/28/08
to django-d...@googlegroups.com
I'm not quite sure how that relates to Composite Primary Keys?

A ForeignKey would point to multiple internal fields, but it should look like it's a single field. At the same time, this would open up the possibility for Composite Foreign Keys, which would mean it could point to multiple public fields.


On Thu, Aug 28, 2008 at 5:36 PM, Rock <ro...@rockhoward.com> wrote:

To be clear, the syntax is:

myfkey = models.ForeignKey(SomeClass,to_field="id")






Rock

unread,
Sep 2, 2008, 3:44:01 PM9/2/08
to Django developers

One use case for Composite Primary Keys is for setting up database
partitions. In my case I am using Range-Hash partitions with the range
determined by an IntegerField called "ISOweek" and the hash working
off of the "id" field supplied by Django. To allow this partitioning
to work, the primary key must be a composite primary key incorporating
the "ISOweek" and the "id" fields. My versions of the sqlreset and
reset management functions do this while also ensuring that "id" is
marked as unique even though it is not the primary key. This allows a
ForeignKey pointed at my partitioned model to work correctly by
setting "id" as the to_field. (If "id" is not set as unique, Django
and/or the database will fail in its' attempt to set up the full
foreign key relationship.)

The initial version of Composite Primary Keys should not preclude this
scenario, however full support for setting up and managing partitioned
models need not be included at this time. (I plan to help add that
later.) The interesting point is that support for related fields for
the Composite Primary Key is not required in order to support this
particular use case.

Rock

David Cramer

unread,
Sep 3, 2008, 1:02:17 AM9/3/08
to Django developers
For anyone who's interested, it'd be great to meetup at DjangoCon to
go over a good design approach to composite fields.

Malcolm Tredinnick

unread,
Sep 3, 2008, 12:08:55 PM9/3/08
to django-d...@googlegroups.com

On Tue, 2008-09-02 at 22:02 -0700, David Cramer wrote:
> For anyone who's interested, it'd be great to meetup at DjangoCon to
> go over a good design approach to composite fields.

Take notes. There's going to be a lot going on at DjangoCon (including
celebrating), so there will be a group who are putting off thinking
about this until quieten down who'll likely be busy throughout the
weekend. In a few weeks we can start looking at this more concretely on
the list and working out how it might fit into the whole.

Regards,
Malcolm

David Cramer

unread,
Sep 4, 2008, 2:22:14 AM9/4/08
to Django developers
This is one of those things that I really need to get hammered out
(for our platform). I'm willing to do all the work, but I need a
design around it to where the patch won't get rejected :)

On Sep 3, 11:08 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Malcolm Tredinnick

unread,
Sep 4, 2008, 2:39:52 AM9/4/08
to django-d...@googlegroups.com

On Wed, 2008-09-03 at 23:22 -0700, David Cramer wrote:
> This is one of those things that I really need to get hammered out
> (for our platform). I'm willing to do all the work, but I need a
> design around it to where the patch won't get rejected :)

No question. And for that, we have this very handy mailing list with
archives that will store all the discussion. Meanwhile, this weekend, we
have a two day conference with a very full schedule, no scheduled
downtime and a limited attendance group. This is all at the end of a
long process of nothing but dawn to dusk bug fixing for a group of us,
so some of us will be relaxing. So by all means have a conversation with
some people, that's a given. But you'll still need to bring the results
to the mailing list in a coherent format so that it can be looked at
when we start to consider the patch.

This is a serious proposal. It's being taken seriously. But realise the
broader things going on as well. If you're hoping/expecting to get this
resolved in a couple of days, you are, frankly, optimistic.

Regards,
Malcolm


David Cramer

unread,
Sep 16, 2008, 2:53:49 AM9/16/08
to Django developers
So I've managed to handle the pk alias with a sketchy
CompositePrimaryKey object (which I want to replace), and I just
finished support for pk aliases (properly) in query.py (filters).

The last thing on the list, is composite fields, which would replace
the CompositePrimaryKey class. I've started a new discussion and this
is one change I won't make the decision on, as I have no real opinion
on how it's implemented.

http://groups.google.com/group/django-developers/browse_thread/thread/32f861c8bd5366a5

On Sep 4, 1:39 am, Malcolm Tredinnick <malc...@pointy-stick.com>
wrote:

Eric

unread,
Oct 4, 2008, 1:03:58 PM10/4/08
to Django developers
Hi,
i just discover this thread, I am working on this problem; you may
take a look at
http://kenai.com/projects/django-trac/pages/LegacyModule

legacy is a module in my "django hacks trac" (or djac) project; it
aims to deal with
tables with no primary key or with composite pk. It provides 2
methods:

- use of oid field (works on sqlite, oracle, postgres <= 8)
- composite pk (for mysql that provides no oid field)

cheers,
Eric

David Cramer

unread,
Oct 4, 2008, 1:06:08 PM10/4/08
to django-d...@googlegroups.com
What we hope to achieve here is full support within the Django core, as opposed to specifying some kind of model. The only thing I have left to do is implement composite fields, but there's still no API for it.

Joey Wilhelm

unread,
Oct 30, 2008, 4:40:00 PM10/30/08
to Django developers
David,

What is the current status of this patch? I'm starting up a new
project which pretty much desperately needs this support as well. I
could work around it, but the thought of adding AutoFields to all of
these models which really -do not- need them, makes me a bit ill.

I would be more than willing to help test your implementation if there
is anything usable yet. This is one of the pieces that's getting me
all twitchy waiting for it.

David Cramer

unread,
Oct 30, 2008, 4:46:09 PM10/30/08
to django-d...@googlegroups.com
It allows you to use them, automatically creates them, and has some of the admin handling done. However, there's still no API design around multi-column fields (no one seems to want to talk about it) so I'm pretty much stopped working on it.

e.g. You can't say field1 = this, field2 = that, and then say compositekey = field1,field2 you instead are forced to do key1=blah, key2=blah in all your lookups, and no easy foreignkey properties.

I'm running this on production environments, so it works fine, but I can up SVN and fix any conflicts and post a patch again.

Joey Wilhelm

unread,
Oct 30, 2008, 4:57:10 PM10/30/08
to django-d...@googlegroups.com
That would be great. The project I am working on now won't be doing anything too terribly complex just yet; I mainly need the admin support to make my life a little easier.

As to the API, I saw several proposals earlier along on this thread, but obviously nothing solid. Did anything ever come from DjangoCon on this topic? What issues still need to be addressed in this design?

On Thu, Oct 30, 2008 at 13:46, David Cramer <dcr...@gmail.com> wrote:
It allows you to use them, automatically creates them, and has some of the admin handling done. However, there's still no API design around multi-column fields (no one seems to want to talk about it) so I'm pretty much stopped working on it.

e.g. You can't say field1 = this, field2 = that, and then say compositekey = field1,field2 you instead are forced to do key1=blah, key2=blah in all your lookups, and no easy foreignkey properties.

I'm running this on production environments, so it works fine, but I can up SVN and fix any conflicts and post a patch again.

David Cramer

unread,
Oct 30, 2008, 5:12:18 PM10/30/08
to django-d...@googlegroups.com
Well GenericRelation's or whatever are an attempt to provide some composite space, but we need an expandable space, not something tied to content type's. We need to say mykey = CompositeField(field, field, field) or something.

No one wanted to discuss it at DjangoCon, so no.
Reply all
Reply to author
Forward
0 new messages