Composite primary keys

299 views
Skip to first unread message

Michal Petrucha

unread,
Mar 14, 2011, 9:14:42 PM3/14/11
to django-d...@googlegroups.com
Good evening (or whatever it is in everyone's timezone).

I'm an undergrad computer science student at the Faculty of
Mathematics, Physics and Informatics, Commenius University,
Bratislava, Slovakia and I'm willing to participate in this year's
GSoc. I'm interested in fixing the six-year-old open ticket in trac
concerning the subject, http://code.djangoproject.com/ticket/373

Before I dig deeper into the issue I would like to know whether there
is interest in having this fixed and whether it is worth a full GSoC
project. Also, is there already any work regarding this issue
underway? If so, would it be reasonable for me to go on with this
project?

Anyway, if I'm to take on the task, there are quite a few design
considerations to be taken care of.

For starters, I've read through David Cramer's work on this from
two-three years ago. I'd stick to the API skeleton that was agreed on
back then, however, there remain lots of other unresolved questions.

The following list is in no way meant to be exhaustive, it's just a
list of things that came to my mind during the past hour. In fact, I'd
appreciate other issues that would need to be kept in mind I forgot to
mention.

- The composite primary key would be specified as a tuple of strings
in a primary_key attribute inside a model's Meta class instead of
having a field with primary_key=True.

- The pk property of model instances would be a tuple instead of a
single value for composite key models.

- The admin could reference composite keys using some kind of smart
string escaping, for example escaping the , (comma) and using it as
a delimiter.

- This could maybe even be used in generic relations. Is it
reasonable to support this in generic relations? The following post
suggests it might not be the best idea:
http://groups.google.com/group/django-developers/msg/dea0e360c6cd37a6

- The managers and querysets would have to be updated to handle
composite primary keys correctly.

- Consequently, there would need to be added support in the SQL
compiler.

- The same holds for syncdb, inspectdb would be also nice.

- ForeignKeys would have to be backed by multiple database columns.
How should they be named by default? How should their names be
overriden? Should db_column expect a tuple of strings? Should there
be another db_column_prefix option to prefix the names with a
common string?

- What about the ForeignKey field's attname? Should it be a tuple of
names or should it be a single string pointing to a tuple of
attributes?

- How should a person trying to add a ForeignKey field pointing to
its own model into the primary_key be punished?

- Does it make sense to make a subset of columns created by a
ForeignKey part of a primary key?

- The forms framework would need a way to pass composite ForeignKeys
as parameters.

- What about OneToOne?

Some items in this list are my ideas on how to implement something,
some are things already decided during previous attempts, others are
questions which I believe are not entirely up to me to decide on. I'll
be grateful for any comments on any of these points and as I said
earlier, any other considerations related to the topic.

Also, these would somehow have to be split into two parts: those
to be focused on from the perspective of this project and those to be
postponed to a later stage or with little relevance.

Thanks in advance for any feedback.

Michal Petrucha

signature.asc

Andrew Godwin

unread,
Mar 15, 2011, 11:11:43 AM3/15/11
to django-d...@googlegroups.com
On 14/03/11 21:14, Michal Petrucha wrote:
> Good evening (or whatever it is in everyone's timezone).
>
> I'm an undergrad computer science student at the Faculty of
> Mathematics, Physics and Informatics, Commenius University,
> Bratislava, Slovakia and I'm willing to participate in this year's
> GSoc. I'm interested in fixing the six-year-old open ticket in trac
> concerning the subject, http://code.djangoproject.com/ticket/373

Firstly, thanks for proposing a GSoC project - it's always nice to have
more students. Bear in mind that the list of accepted organisations
hasn't been published yet, but we're pretty hopeful Django will make it
in again this year.

> Before I dig deeper into the issue I would like to know whether there
> is interest in having this fixed and whether it is worth a full GSoC
> project. Also, is there already any work regarding this issue
> underway? If so, would it be reasonable for me to go on with this
> project?
>
> Anyway, if I'm to take on the task, there are quite a few design
> considerations to be taken care of.
>
> For starters, I've read through David Cramer's work on this from
> two-three years ago. I'd stick to the API skeleton that was agreed on
> back then, however, there remain lots of other unresolved questions.
>
> The following list is in no way meant to be exhaustive, it's just a
> list of things that came to my mind during the past hour. In fact, I'd
> appreciate other issues that would need to be kept in mind I forgot to
> mention.
>
> - The composite primary key would be specified as a tuple of strings
> in a primary_key attribute inside a model's Meta class instead of
> having a field with primary_key=True.

That seems to match with our current handling of things like unique, so
that seems fairly reasonable.

> - The pk property of model instances would be a tuple instead of a
> single value for composite key models.

So the property will vary type based on the model? Obviously this is
needed for backwards-compatability, but it's still going to affect e.g.
library authors, so we'd need to make sure everything that previously
accepted a PK also accepts a tuple.

> - The admin could reference composite keys using some kind of smart
> string escaping, for example escaping the , (comma) and using it as
> a delimiter.

That should work, though it might be a bit ugly; again, I can't think of
a better alternative.

> - This could maybe even be used in generic relations. Is it
> reasonable to support this in generic relations? The following post
> suggests it might not be the best idea:
> http://groups.google.com/group/django-developers/msg/dea0e360c6cd37a6

Generic relations are ugly and inefficient as it is; I'm not sure that
including them in something like this is strictly necessary, much as
Malcolm suggests, and probably outside the scope of a GSoC project
(something of this scale is already touching a lot of places in the
model layer as it is)

> - The managers and querysets would have to be updated to handle
> composite primary keys correctly.
>
> - Consequently, there would need to be added support in the SQL
> compiler.
>
> - The same holds for syncdb, inspectdb would be also nice.

I'd say inspectdb isn't strictly necessary - not very many people use
it. I'll also be looking at potentally merging in some schema alteration
code (i.e. moving parts of the South database backends) over the next
few months, so it's going to affect that, too, though I'm more than
happy to help fix that part (the existing creation code will definitely
need changing, at least).

> - ForeignKeys would have to be backed by multiple database columns.
> How should they be named by default? How should their names be
> overriden? Should db_column expect a tuple of strings? Should there
> be another db_column_prefix option to prefix the names with a
> common string?

These are some very good questions, and something we need a good
discussion here about; I'd personally say doing "fkname_remotecolname"
is best for the columns (which then means "author_id" still matches most
DBs, and you could also have something like "passport_country_id,
passport_number").

I'm also -0 on the idea of a db_column_prefix, but db_column is going to
need to take tuples/sequences for multi-column FKs.

> - What about the ForeignKey field's attname? Should it be a tuple of
> names or should it be a single string pointing to a tuple of
> attributes?

I'd say a single string pointing to a tuple - I don't like fields which
'magically' make more attributes than the name they were assigned. That
also makes sense if you think of the foreign key as a single entity
composed of several values.

> - How should a person trying to add a ForeignKey field pointing to
> its own model into the primary_key be punished?

Preferably with a model validation error.

> - Does it make sense to make a subset of columns created by a
> ForeignKey part of a primary key?

Provided they're then mixed in with at least one other column, yes.
However, I'm not sure this is strictly necessary for us to cover.

> - The forms framework would need a way to pass composite ForeignKeys
> as parameters.

Yes, that's going to be interesting. I'd love to see a more concrete
example of a proposed API here, though I expect it to involve tuples,
and some form of escaping for returning the values via POST.

> - What about OneToOne?

OneToOne fields will need to have the same capabilities, though they
share nearly all their code with ForeignKeys. A related issue is going
to be making sure model inheritance works properly - that uses
OneToOneFields between non-abstract parents and their children.

> Some items in this list are my ideas on how to implement something,
> some are things already decided during previous attempts, others are
> questions which I believe are not entirely up to me to decide on. I'll
> be grateful for any comments on any of these points and as I said
> earlier, any other considerations related to the topic.
>
> Also, these would somehow have to be split into two parts: those
> to be focused on from the perspective of this project and those to be
> postponed to a later stage or with little relevance.

Yes; I'd say that there's plenty of work here for a GSoC project, so I'd
consider limiting the scope of a first phase to make sure things work
alright (there's a lot of work even without ForeignKeys and model
inheritance - you have to touch queries, creation, model forms, the
admin, and so on).

I'd recommend taking a good look through the model layer code if you
haven't already and getting an idea of what sort of changes need to be
made; this kind of feature is going to be heavy on integration and
changes to pretty core code, so we'd want to make sure you kept
up-to-date with trunk reasonably well as well, otherwise it can be easy
to end up with an unmergeable branch.

Finally, a warning that to attempt this you'll need to have a decent
knowledge of SQL, the various database backends, and at least the usage
of the Django model layer - this is an area where several have tried and
not got very far, so it takes a little bit of determination and a
willingness to delve into some of the more complex parts of Django's
codebase.

Andrew

Russell Keith-Magee

unread,
Mar 15, 2011, 8:06:36 PM3/15/11
to django-d...@googlegroups.com
On Tue, Mar 15, 2011 at 11:11 PM, Andrew Godwin <and...@aeracode.org> wrote:
> On 14/03/11 21:14, Michal Petrucha wrote:
>>
>> Good evening (or whatever it is in everyone's timezone).
>>
>> I'm an undergrad computer science student at the Faculty of
>> Mathematics, Physics and Informatics, Commenius University,
>> Bratislava, Slovakia and I'm willing to participate in this year's
>> GSoc. I'm interested in fixing the six-year-old open ticket in trac
>> concerning the subject, http://code.djangoproject.com/ticket/373
>
> Firstly, thanks for proposing a GSoC project - it's always nice to have more
> students. Bear in mind that the list of accepted organisations hasn't been
> published yet, but we're pretty hopeful Django will make it in again this
> year.

I'd like to second what Andrew has said. This is definitely a project
with enough meat to fill a GSoC, and it's something that has been on
Django's wish list for a long time.

I also agree with Andrew's answers to your questions, with one exception:

>>  - The composite primary key would be specified as a tuple of strings
>>    in a primary_key attribute inside a model's Meta class instead of
>>    having a field with primary_key=True.
>
> That seems to match with our current handling of things like unique, so that
> seems fairly reasonable.

The difference between primary_key and unique is that you can have
multiple unique conditions, including a number of individual field
uniques *plus* a series of grouped uniques. However you can only have
one primary key field (or field set).

I'd like to suggest an alternate representation.

At present, you can make any field a primary key by setting
primary_key=True on that field. The Meta class then validates to
ensure that there is only one field that has that attribute set (and
inserts an AutoField if no field has it set). Another way to represent
primary keys would be to relax this validation constraint. If you mark
no field as primary_key=True, then an AutoField is added. If you mark
one field, then that field is the primary key. And if you mark
multiple fields, then you have a composite primary key composed of
those fields.

This removes the need to introduce a new Meta flag, but more
importantly, it means you won't have to resolve discrepancies between
fields with primary_key and a Meta.primary_key option.

Yours,
Russ Magee %-)

Christophe Pettus

unread,
Mar 15, 2011, 8:14:39 PM3/15/11
to django-d...@googlegroups.com

On Mar 15, 2011, at 5:06 PM, Russell Keith-Magee wrote:

> And if you mark
> multiple fields, then you have a composite primary key composed of
> those fields.

A concern here is that composite indexes, like unique, are sensitive to the ordering of the fields, which means that the ordering of the fields in the class declaration becomes important. That could, potentially, be surprising.

--
-- Christophe Pettus
x...@thebuild.com

Javier Guerra Giraldez

unread,
Mar 15, 2011, 10:15:29 PM3/15/11
to django-d...@googlegroups.com, Christophe Pettus
On Tue, Mar 15, 2011 at 7:14 PM, Christophe Pettus <x...@thebuild.com> wrote:
> A concern here is that composite indexes, like unique, are sensitive to the ordering of the fields, which means that the ordering of the fields in the class declaration becomes important.

a simplistic proposal:

the order of the fields on a composite index is determined by the
exact value given to the primary_key argument.

that way, just setting primary_key=True on a few fields won't
guarantee order, but something like:

class City (models.Model):
country = models.CharField (max_length=2, primary_key=1)
state = models.CharField (max_length=2, primary_key=2)
city = models.CharField (max_length=3, primary_key=2)
Name = models.CharField (max_length=20)

would set the (country,state,city) primary key in the obvious order.

in short: giving any non-falsy value to primary_key would add the
field to the key, the exact value would determine ordering.


--
Javier

Michal Petrucha

unread,
Mar 16, 2011, 1:49:17 AM3/16/11
to django-d...@googlegroups.com
On Tue, Mar 15, 2011 at 09:15:29PM -0500, Javier Guerra Giraldez wrote:
> On Tue, Mar 15, 2011 at 7:14 PM, Christophe Pettus <x...@thebuild.com> wrote:
> > A concern here is that composite indexes, like unique, are
> > sensitive to the ordering of the fields, which means that the
> > ordering of the fields in the class declaration becomes important.

This is the same reason a new Meta flag has been agreed upon in the
past. (That, however, does not mean it has to be that way.)

> a simplistic proposal:
>
> the order of the fields on a composite index is determined by the
> exact value given to the primary_key argument.
>
> that way, just setting primary_key=True on a few fields won't
> guarantee order, but something like:
>
> class City (models.Model):
> country = models.CharField (max_length=2, primary_key=1)
> state = models.CharField (max_length=2, primary_key=2)
> city = models.CharField (max_length=3, primary_key=2)
> Name = models.CharField (max_length=20)
>
> would set the (country,state,city) primary key in the obvious order.
>
> in short: giving any non-falsy value to primary_key would add the
> field to the key, the exact value would determine ordering.

I like this proposal. It might even be easier to implement than
fiddling with new Meta flags and everything.

One minor detail though, just setting primary_key=True on multiple
fields would still have to guarantee some ordering since the primary
key would be represented by a tuple. If you don't know for sure in
which order the values are, you can't really use the .pk property.

This would require a thorough explanation in the docs, that if you
supply values that compare equal to primary_key, the order of fields
will be the same as in the model definition.

Michal Petrucha

signature.asc

Yishai Beeri

unread,
Mar 16, 2011, 2:58:57 AM3/16/11
to django-d...@googlegroups.com

This feels like something that wants a named-tuple (or a full blown dict).
Alternatively, provide a method on the model class that takes the
name=value arguments (queryset style) and returns the right pk tuple.
Otherwise, the exact ordering of the fields in the pk tuple becomes yet
another implicit(!) part of the model's contract - and any code that wants
to use this model will be that much more brittle.

Johannes Dollinger

unread,
Mar 16, 2011, 5:24:12 AM3/16/11
to django-d...@googlegroups.com
I would be nice if support for composite primary keys would be implemented as a special case of general composite fields. There would be no need for new Meta options:

class Foo(Model):
x = models.FloatField()
y = models.FloatField()
coords = models.CompositeField((x, y), db_index=True)
a = models.ForeignKey(A)
b = models.ForeignKey(B)
pair = models.CompositeField((a, b), primary_key=True)

A CompositeField descriptor would then return a namedtuple of its values and would support queries:

filter(coords__x=42)
filter(coords=(1,2))

Adding the individual fields may be optional, e.g, CompositeField((FloatField(), FloatField()), db_index=True).

This has been proposed before: http://groups.google.com/group/django-developers/browse_thread/thread/32f861c8bd5366a5

__
Johannes

Christophe Pettus

unread,
Mar 16, 2011, 11:43:00 AM3/16/11
to django-d...@googlegroups.com

On Mar 16, 2011, at 2:24 AM, Johannes Dollinger wrote:

> I would be nice if support for composite primary keys would be implemented as a special case of general composite fields.

It's appealing, but the reality is that no existing back-end actually has such an animal as a composite field. In all of these cases, what we're really creating is a composite index on a set of standard fields. Introducing a more powerful index-creation syntax into Django isn't a bad idea, but we shouldn't call it a "field" if it is not.

Carl Meyer

unread,
Mar 16, 2011, 12:13:26 PM3/16/11
to Django developers
I'm not expressing an opinion one way or another on composite primary
key syntax, but I don't agree here that a Django model "field" must
map one-to-one to a database column. It already does not, in the case
of ManyToManyField, and at some point I would like to introduce
(irrespective of composite primary keys) a more general ORM
abstraction for composite fields (i.e. model Fields that map to more
than one database column) as a path to cleaning up the implementation
of GenericForeignKey.

Carl

Christophe Pettus

unread,
Mar 16, 2011, 12:58:09 PM3/16/11
to django-d...@googlegroups.com

On Mar 16, 2011, at 9:13 AM, Carl Meyer wrote:

> I'm not expressing an opinion one way or another on composite primary
> key syntax, but I don't agree here that a Django model "field" must
> map one-to-one to a database column.

That's fair, but a composite index lacks some of the characteristics of a field (assignability, for example). Most DBs don't have functions that explicitly iterate over indexes, so such a thing isn't really readable, either.

It might be appealing to have a models.Index base class that represents an index on a table, and have db_index=True be a shortcut to creating one. That might be more machinery than we want just for composite primary keys though.

Jacob Kaplan-Moss

unread,
Mar 17, 2011, 10:33:43 AM3/17/11
to django-d...@googlegroups.com

I like this quite a bit. Of all the various syntaxes proposed here so
far, this is the first one that feels like it "fits" with the rest of
Django, and the first one I'm +1 on.

I'm sensitive to Christophe's point that a "composite field" doesn't
map to a relational concept very well, but quite frankly that ship has
sailed. We've got ManyToManyFields, GenericForeignKeys, and once you
branch out into the ecosystem you find TagFields, PickleFields (ugh)
and so forth.

Jacob

Mike Axiak

unread,
Mar 17, 2011, 11:18:33 AM3/17/11
to django-d...@googlegroups.com, Jacob Kaplan-Moss
Just to be clear, for this to be valid syntax doesn't this idea have to be written as::

   class Foo(Model):
      x = models.FloatField()
      y = models.FloatField()
      coords = models.CompositeField(('x', 'y'), db_index=True)

      a = models.ForeignKey(A)
      b = models.ForeignKey(B)
      pair = models.CompositeField(('a', 'b'), primary_key=True)

(Note the quotes around the field names.)

Not that it matters too much, but I think any discussion of syntax should have valid python.

Cheers,
Mike


Łukasz Rekucki

unread,
Mar 17, 2011, 11:28:11 AM3/17/11
to django-d...@googlegroups.com
On 17 March 2011 16:18, Mike Axiak <mca...@gmail.com> wrote:
> Just to be clear, for this to be valid syntax doesn't this idea have to be
> written as::
>    class Foo(Model):
>       x = models.FloatField()
>       y = models.FloatField()
>       coords = models.CompositeField(('x', 'y'), db_index=True)
>       a = models.ForeignKey(A)
>       b = models.ForeignKey(B)
>       pair = models.CompositeField(('a', 'b'), primary_key=True)
> (Note the quotes around the field names.)

Actually, it works without the quotes as long as the fields are
defined in the same class before the CompositeField:
http://ideone.com/LPNzS

--
Łukasz Rekucki

Michal Petrucha

unread,
Mar 17, 2011, 3:58:04 PM3/17/11
to django-d...@googlegroups.com
On Thu, Mar 17, 2011 at 09:33:43AM -0500, Jacob Kaplan-Moss wrote:
> On Wed, Mar 16, 2011 at 4:24 AM, Johannes Dollinger
> <emul...@googlemail.com> wrote:
> > I would be nice if support for composite primary keys would be
> > implemented as a special case of general composite fields. There
> > would be no need for new Meta options:
> >
> > class Foo(Model):
> >    x = models.FloatField()
> >    y = models.FloatField()
> >    coords = models.CompositeField((x, y), db_index=True)
> >    a = models.ForeignKey(A)
> >    b = models.ForeignKey(B)
> >    pair = models.CompositeField((a, b), primary_key=True)
> >
> > A CompositeField descriptor would then return a namedtuple of its
> > values and would support queries:
> >
> >    filter(coords__x=42)
> >    filter(coords=(1,2))
> >
> > Adding the individual fields may be optional, e.g,
> > CompositeField((FloatField(), FloatField()), db_index=True).
> >
> > This has been proposed before:
> > http://groups.google.com/group/django-developers/browse_thread/thread/32f861c8bd5366a5
I must have overlooked this thread before...

> I like this quite a bit. Of all the various syntaxes proposed here so
> far, this is the first one that feels like it "fits" with the rest of
> Django, and the first one I'm +1 on.

I agree as well. This approach looks much cleaner from the design
perspective. At least the syntax is more consistent than having the
same information scattered throughout the individual fields and
several Meta attributes in different cases.

However, we'd either have a different API for specifying unique
constraints for sets of fields than for composite keys or we'd have
two options for the unique thing. (Or we'd lose backwards
compatibility.) The current API could be simulated by creating an
implicit CompositeField for each unique tuple with a reasonable
name.

A ForeignKey referencing a model with a primary CompositeField could
then act as a CompositeField itself, creating implicit fields unless
explicitly specified. Would this be a good idea? It would make it
easier to mess with the values of the fields directly, possibly
breaking the references to other rows, however, this is possible even
with the way it is now.

There is one thing though that's bothering me a little bit... At a
first glance this looks to me like a lot more work than my original
proposal. Now I'm not sure whether I should try to squeeze it all into
a single project with the primary key support and ForeignKey and
everything or rather just do the CompositeField with proper queryset
support and save the primary keys for a later time. Thoughts?

> I'm sensitive to Christophe's point that a "composite field" doesn't
> map to a relational concept very well, but quite frankly that ship has
> sailed. We've got ManyToManyFields, GenericForeignKeys, and once you
> branch out into the ecosystem you find TagFields, PickleFields (ugh)
> and so forth.

Just a note, my first approach at the composite ForeignKey field would
fall into this category anyway. I think we can't avoid that with this
kind of functionality...

Michal Petrucha

signature.asc

Christophe Pettus

unread,
Mar 21, 2011, 3:33:01 AM3/21/11
to django-d...@googlegroups.com
I'd like to make one more pitch for a slightly different implementation here. My concern with CompositeField isn't based on the fact that it doesn't map one-to-one with a field in the table; it's that it doesn't have any of the semantics that are associated with a field. In particular, it can't be:

- Assigned to.
- Iterated over.
- Or even have a value.

My suggestion is to create an Index type that can be included in a class just like a field can. The example we've been using would then look like:

class Foo(Model):
x = models.FloatField()
y = models.FloatField()

a = models.ForeignKey(A)
b = models.ForeignKey(B)

coords = models.CompositeIndex((x, y))
pair = models.CompositeIndex((a, b), primary_key=True)

We could have FieldIndex (the equivalent of the current db_index=True), CompositeIndex, and RawIndex, for things like expression indexes and other things that can be specified just as a raw SQL string.

I think this is a much better contract to offer in the API than one based on field which would have to throw exceptions left and right for most of the common field operations.

Johannes Dollinger

unread,
Mar 21, 2011, 5:03:44 AM3/21/11
to django-d...@googlegroups.com

Am 21.03.2011 um 08:33 schrieb Christophe Pettus:

> I'd like to make one more pitch for a slightly different implementation here. My concern with CompositeField isn't based on the fact that it doesn't map one-to-one with a field in the table; it's that it doesn't have any of the semantics that are associated with a field. In particular, it can't be:
>
> - Assigned to.
> - Iterated over.
> - Or even have a value.

You would be able to use composite fields normally (as in "normal django field"):

>>> foo = Foo.objects.create(coords=(0, 0))
>>> foo.coords
(0, 0)
>>> foo.coords = (4, 2)
>>> foo.coords.x # == foo.x == foo.coords[0]
4

Sidenote:: Subclassing the default implementation of composite field values should be easy:

>>> type(foo.coords)
<type 'Vector'>
>>> foo.coords.length
4.4721359549995796
>>> foo.coords += foo.velocity

__
Johannes

Michal Petrucha

unread,
Mar 21, 2011, 7:20:19 AM3/21/11
to django-d...@googlegroups.com
On Mon, Mar 21, 2011 at 12:33:01AM -0700, Christophe Pettus wrote:
> I'd like to make one more pitch for a slightly different
> implementation here. My concern with CompositeField isn't based on
> the fact that it doesn't map one-to-one with a field in the table;
> it's that it doesn't have any of the semantics that are associated
> with a field. In particular, it can't be:
>
> - Assigned to.
> - Iterated over.
> - Or even have a value.
I disagree. The CompositeField would need to have a value to be able
to implement a ForeignKey pointing to a modedel with a composite
primary key.

The CompositeField itself would be just a proxy to the actual atomic
fields. You should be able to assign a tuple (or namedtuple) to it,
specifying the actual values for the fields. Similarly, you'll be able
to retrieve its value which is a tuple or a namedtuple.

This way, the following code would work for composite primary keys the
same way it works for simple keys:

class CompositeModel(models.Model):
a = models.IntegerField()
b = models.IntegerField()
key = models.CompositeField((a, b), primary_key=True)

class ReferencingModel(models.Model):
cm = models.ForeignKey(CompositeModel)

cminstance = CompositeModel.objects.get(something)
newref = ReferencingModel()
newref.cm = cminstance.pk

> My suggestion is to create an Index type that can be included in a
> class just like a field can. The example we've been using would
> then look like:
>
> class Foo(Model):
> x = models.FloatField()
> y = models.FloatField()
> a = models.ForeignKey(A)
> b = models.ForeignKey(B)
>
> coords = models.CompositeIndex((x, y))
> pair = models.CompositeIndex((a, b), primary_key=True)
>
> We could have FieldIndex (the equivalent of the current
> db_index=True), CompositeIndex, and RawIndex, for things like
> expression indexes and other things that can be specified just as a
> raw SQL string.
>
> I think this is a much better contract to offer in the API than one
> based on field which would have to throw exceptions left and right
> for most of the common field operations.

I don't see how ForeignKeys would be possible this way.

Michal Petrucha

signature.asc

akaariai

unread,
Mar 21, 2011, 9:04:52 AM3/21/11
to Django developers
On Mar 21, 1:20 pm, Michal Petrucha <johnn...@ksp.sk> wrote:
> > My suggestion is to create an Index type that can be included in a
> > class just like a field can.  The example we've been using would
> > then look like:
>
> > class Foo(Model):
> >    x = models.FloatField()
> >    y = models.FloatField()
> >    a = models.ForeignKey(A)
> >    b = models.ForeignKey(B)
>
> >    coords = models.CompositeIndex((x, y))
> >    pair = models.CompositeIndex((a, b), primary_key=True)
>
> > We could have FieldIndex (the equivalent of the current
> > db_index=True), CompositeIndex, and RawIndex, for things like
> > expression indexes and other things that can be specified just as a
> > raw SQL string.
>
> > I think this is a much better contract to offer in the API than one
> > based on field which would have to throw exceptions left and right
> > for most of the common field operations.
>
> I don't see how ForeignKeys would be possible this way.
>

In much the same way:

class FooBar(Model):
a = models.ForeignKey(A)
b = models.ForeignKey(B)
pair = models.ForeignKey(Foo, fields=(a, b))

Note that this is very close to what SQL does. If you have a composite
unique index or composite foreign key you define the fields and then
the index / foreign key. Though I don't know how much value that
argument has in this discussion.

You could add some DRY and allow a shortcut:
class FooBar(Model):
pair = models.ForeignKey(Foo)
# a and b are created automatically.

Now, to make things work consistently pair should be a field. But on
the other hand when using a ModelForm, the pair should probably not be
a field of that form. This is more clear in an example having a (city,
state, country) primary key. These should clearly be separate fields
in a form.

In my opinion, if the composite structures are called fields or
something else isn't that important. There are cases where composite
structures behave like a field and some cases where they do not. The
main problem is how the composite structures should behave in
ModelForms and serialization, should they be assignable, how the
relate to model __init__ method, should they be in model fields
iterators, how they are used in QuerySets and so on. When these
questions are answered it is probably easier to answer if the
composite structures should be called fields or something else.

- Anssi

Jacob Kaplan-Moss

unread,
Mar 21, 2011, 3:20:41 PM3/21/11
to django-d...@googlegroups.com
On Mon, Mar 21, 2011 at 2:33 AM, Christophe Pettus <x...@thebuild.com> wrote:
> I'd like to make one more pitch for a slightly different implementation here.  My concern with CompositeField isn't based on the fact that it doesn't map one-to-one with a field in the table; it's that it doesn't have any of the semantics that are associated with a field.  In particular, it can't be:
>
> - Assigned to.
> - Iterated over.
> - Or even have a value.

Obviously there's no code here yet, so we don't know exactly. I'd also
be -1 on an implementation of a CompositeField that didn't have those
values. However, it's reasonably easy to come up with a CompositeField
that is assignable, iterable, and has values. Here's the basics::

class CompositeField(object):
def __init__(self, *fields):
self.fields = fields

def contribute_to_class(self, cls, name):
nt_name = "%s_%s" % (cls.__name__, name)
nt_fields = " ".join(f.name for f in self.fields)
self.nt = collections.namedtuple(nt_name, nt_fields)
setattr(cls, name, self)

def __get__(self, instance, owner):
if instance:
return self.nt._make(getattr(instance, f, None) for f
in self.nt._fields)
raise AttributeError("Composite fields only work on instances.")

def __set__(self, instance, value):
for (field, val) in zip(self.nt._fields, value):
setattr(instance, field, val)

It works, too::

class Person(models.Model):
first_name = models.CharField(max_length=100)
last_name = models.CharField(max_length=100)
name = CompositeField(first_name, last_name)

# ...

>>> p = Person(first_name="Jacob", last_name="KM")

>>> p.name
: Person_name(first_name='Jacob', last_name='KM')

>>> p.name.last_name
: 'KM'

>>> p.name = ("John", "Doe")

>>> p.last_name
: 'Doe'

>>> for f in p.name:
..: print f
..:
..:
John
Doe

Fields even sorta get saved correctly to the DB with just these few
lines of code. Of course there's a lot missing here to correctly
handle actual composite keys -- this tiny example won't work in
querysets, for example -- but the basics of the Python-side behavior's
right there.

> My suggestion is to create an Index type that can be included in a class just like a field can.

I think we're talking slightly different concerns here: I'm mostly
interested in the Python-side API, and to my eyes a composite field
matches more closely what's happening on the Python side of things.
Python's not generating an index, after all, so using something called
"index" for compositing multiple attributes together seems weird to
me. But at the DB level, "index" makes perfect sense. Thing is, we've
always tried to make Django's APIs behave well in Python *first*, and
then think about the DB concerns. And again, to me "composite field"
matches more closely the behavior we want out of the Python side of
things.

All that said, there's a lot to like about your Index proposal.
Perhaps there's a way we can merge these two things together somehow?

Jacob

Michal Petrucha

unread,
Mar 24, 2011, 10:44:37 AM3/24/11
to django-d...@googlegroups.com
On Mon, Mar 21, 2011 at 02:20:41PM -0500, Jacob Kaplan-Moss wrote:
> On Mon, Mar 21, 2011 at 2:33 AM, Christophe Pettus <x...@thebuild.com> wrote:
> > I'd like to make one more pitch for a slightly different
> > implementation here.  My concern with CompositeField isn't based
> > on the fact that it doesn't map one-to-one with a field in the
> > table; it's that it doesn't have any of the semantics that are
> > associated with a field.  In particular, it can't be:
> >
> > - Assigned to.
> > - Iterated over.
> > - Or even have a value.
>
> Obviously there's no code here yet, so we don't know exactly. I'd also
> be -1 on an implementation of a CompositeField that didn't have those
> values. However, it's reasonably easy to come up with a CompositeField
> that is assignable, iterable, and has values. Here's the basics::
>
[snippet]

>
> Fields even sorta get saved correctly to the DB with just these few
> lines of code. Of course there's a lot missing here to correctly
> handle actual composite keys -- this tiny example won't work in
> querysets, for example -- but the basics of the Python-side behavior's
> right there.

This is exactly where I would start. One question though: should I use
namedtuple in here or should we try to keep compatibility with python
< 2.6? Maybe a module like django.utils.namedtuplecompat providing a
fallback implementation?

> > My suggestion is to create an Index type that can be included in a
> > class just like a field can.
>
> I think we're talking slightly different concerns here: I'm mostly
> interested in the Python-side API, and to my eyes a composite field
> matches more closely what's happening on the Python side of things.
> Python's not generating an index, after all, so using something called
> "index" for compositing multiple attributes together seems weird to
> me. But at the DB level, "index" makes perfect sense. Thing is, we've
> always tried to make Django's APIs behave well in Python *first*, and
> then think about the DB concerns. And again, to me "composite field"
> matches more closely the behavior we want out of the Python side of
> things.
>
> All that said, there's a lot to like about your Index proposal.
> Perhaps there's a way we can merge these two things together somehow?

The composite index could be achieved by the standard field option
Field.db_index applied to a CompositeField. This would still be
consistent with the rest of the API since it would not require any new
construct to do this thing.

To sum up, I believe the CompositeField could be a general solution to
both composite primary keys and composite indexes.

Michal Petrucha

signature.asc

Christophe Pettus

unread,
Mar 31, 2011, 12:54:49 PM3/31/11
to django-d...@googlegroups.com

On Mar 21, 2011, at 12:20 PM, Jacob Kaplan-Moss wrote:

> I think we're talking slightly different concerns here: I'm mostly
> interested in the Python-side API, and to my eyes a composite field
> matches more closely what's happening on the Python side of things.

I agree 100%! I think I'm just drawing a different conclusion from that point, which is that indexes are more metadata on the database rather than a critical part of the Python API: In an imaginary perfect database (like, say, the SQL spec envisions), we wouldn't need to talk about indexes as all.

The more I think about it, the less I like including this directly in the field declaration part of the model, including my Index type proposal. It just doesn't seem to belong there.

What concerns me about composite fields is that they seem to be a lot of Python machinery just to accomplish the goal of allowing this annotation. If they were super-useful in their own right, that would be one thing, but I'm not sure that I see the utility of them absent indexes and foreign keys. I'm also bothered, perhaps excessively, about having two different ways of getting at the same field in the model just to support this.

So, another proposal:

In the foreign key case, just extending the ForeignKey syntax to allow for multiple related fields makes the most sense:

overThere = models.ForeignKey(OtherModel, to_field=('first_name', 'last_name', ))

For indexes on the table for the model, include the declaration in the Meta class, since that's the obvious place to stick indexing:

class SomeModel:

class Meta:
primary_key = 'some_field'
indexes = ['some_field', 'some_other_field', ('field1', '-field2', ), ]
raw_indexes = [ 'some_invariant_function(some_field)' ]

(This was proposed by someone else, and isn't original to me; apologies that I can't find the email to give credit.)

Of course, the existing syntax would still work as a shortcut for primary_key and indexes.

Thoughts?

Michal Petrucha

unread,
Mar 31, 2011, 2:03:27 PM3/31/11
to django-d...@googlegroups.com
On Thu, Mar 31, 2011 at 09:54:49AM -0700, Christophe Pettus wrote:
> What concerns me about composite fields is that they seem to be a
> lot of Python machinery just to accomplish the goal of allowing this
> annotation. If they were super-useful in their own right, that
> would be one thing, but I'm not sure that I see the utility of them
> absent indexes and foreign keys. I'm also bothered, perhaps
> excessively, about having two different ways of getting at the same
> field in the model just to support this.

Just an observation, there already are two ways of getting at the same
field in case of primary keys...

> So, another proposal:
>
> In the foreign key case, just extending the ForeignKey syntax to
> allow for multiple related fields makes the most sense:
>
> overThere = models.ForeignKey(OtherModel, to_field=('first_name', 'last_name', ))
>
> For indexes on the table for the model, include the declaration in
> the Meta class, since that's the obvious place to stick indexing:
>
> class SomeModel:
>
> class Meta:
> primary_key = 'some_field'
> indexes = ['some_field', 'some_other_field', ('field1', '-field2', ), ]
> raw_indexes = [ 'some_invariant_function(some_field)' ]
>
> (This was proposed by someone else, and isn't original to me;
> apologies that I can't find the email to give credit.)
>
> Of course, the existing syntax would still work as a shortcut for
> primary_key and indexes.
>
> Thoughts?

One thing I'm missing in this proposal is the behavior of the pk
property. Since the primary key is a tuple, I can't imagine any other
representation in .pk than some flavor of a tuple. And to make things
like SomeModel.objects.get(pk=some_value) possible, we'd still have to
implement most of the functionality of a composite field.

Michal Petrucha

signature.asc
Reply all
Reply to author
Forward
0 new messages