RFC: Composite fields API

659 views
Skip to first unread message

Michal Petrucha

unread,
May 12, 2011, 7:41:43 AM5/12/11
to django-d...@googlegroups.com
As most of you have probably noticed by now, in a week and a half I'll
start working on the implementation of composite fields. Before that
we should probably agree on the final form of the API.

This lengthy mail is mostly a recapitulation of things mentioned in
the past, like [1], [2], [3] and questions raised in these discussions
that have not been answered so far.


CompositeField will be a new type of model fields. This type will be
virtual, i. e. it won't be backed by any real database column by
itself. Instead, it will act as a proxy to a given set of atomic
fields and will be used to set options for and perform operations on
the whole set as one field.


The constructor of a CompositeField will require at least two
positional parameters, each positional parameter will be a single
atomic field. The order of this parameters will be important as
explained below. The parameters will have to be field instances, lazy
loading won't be necessary (the recommended place of composite field
definitions will be after atomic fields).

CompositeField will accept these three field options:
- db_index (creates a multi-column index across the underlying fields)
- primary_key (creates a composite primary key in the model)
- unique (creates a unique constraint for the set of fields)

Other field options either wouldn't make sense or would be too
difficult to implement.

There is a clash with the current API here, in the ``unique`` option.
This would supersede the current ``unique_together`` Meta option. I
see three options possible:

1) Leave out the ``unique`` option and live with ``unique_together``.
This would pribably imply also leaving out ``db_index``, otherwise
the API would be a complete mess.

2) Allow ``CompositeField.unique`` but also keep ``unique_together``.
The problem I see with this approach is that there would be two
quite different ways to achieve the same effect.

3) Make ``CompositeField.unique`` the way to go and deprecate
``unique_together``.
This way, specifying a unique constraint on a tuple of fields would
work the same way it works on single fields which is IMO a
significant benefit. There's, however, the issue of breaking
backwards compatibility. Furthermore, one would have to add a new
field, albeit virtual, just to create a simple constraint, which
may seem weird to some.

I don't feel like deciding in this and either one is fine as far as
the implementation is concerned.

One minor detail, should the field silently ignore invalid options or
should it issue warnings?


Moving on...


The value of a CompositeField will be represented by an instance of a
CompositeValue class. This will be a descendant of tuple and will
resemble namedtuple present in Python >= 2.5. It will support
iteration, numbered indexing and access to individual field values
using attributes corresponding to underlying field names. The order of
values will be the same as the order of fields specified in the model
definition.

Assigning a value to a CompositeField will be possible using any
iterable as long as its length equals the number of atomic fields (and
the values can be assigned to the corresponding fields, obviously).


Due to the nature of this field type, other lookup filters than
``exact`` and ``in`` would have unclear semantics and won't be
supported. The original plan was to also exclude support for ``in``
but as it turns out, ``in`` is used in several places under the
assumption that primary keys support it, for example DeleteQuery
or UpdateQuery. Therefore both filters will be implemented.


This should be everything as far as the models API is concerned. As
for the other parts of Django, the changes will be kept to a working
minimum.

Forms: Only support in ModelChoiceFields will be added for composite
primary keys; there won't be any special form field type for now.

Admin: Again, only support for composite primary keys will be added in
the quoting/unquoting function to make it possible to access such
models.

GFK: For now, GenericForeignKey won't be able to reference models with
composite primary keys.


I'm also thinking about implementing an abstract class, VirtualField.
This could be useful mainly as a base class for fields with no direct
database column. That means, it would mainly handle things like
add_to_class (adding itself to the list of virtual fields instead of
local ones), specifying arbitrary lookup filters when asked for one
etc. CompositeField could then be a descendant of this class.

However, I can't currently imagine any other use-case for this
abstract class than CompositeField. The question is, then, is there
any interest in having an abstract mechanism like this? Can anyone
imagine a use-case? (The question is, should I implement this
functionality directly inside CompositeField or factor it out into
something more general?)


I'll really appreciate each comment.

Michal Petrucha

[1] https://groups.google.com/forum/#!topic/django-developers/Eg1AHjAvNps
[2] https://groups.google.com/forum/#!topic/django-developers/Y0aAb792cTw/discussion
[3] http://people.ksp.sk/~johnny64/GSoC-full-proposal

signature.asc

Tom Evans

unread,
May 12, 2011, 9:49:03 AM5/12/11
to django-d...@googlegroups.com
On Thu, May 12, 2011 at 12:41 PM, Michal Petrucha
<michal....@ksp.sk> wrote:
> As most of you have probably noticed by now, in a week and a half I'll
> start working on the implementation of composite fields. Before that
> we should probably agree on the final form of the API.
>
> <snip>

Hi Michal

This looks really, really good. A few comments:

Value of a composite field: It should descend from namedtuple. From
1.4 onwards, Django only supports 2.5+, so it's not necessary to fudge
things for Python 2.4

unique/unique_together: They should both be supported. unique_together
should raise a PendingDeprecationWarning, and it should disappear
according to the deprecation timeline. unique_together only exists as
a Meta option as there is no field to attach that logic to - now there
is.

I also have a few questions:

If a model has a composite field that is marked as primary_key=True,
how will this affect instance.pk? Presumably this will now return a
tuple - will this affect automatic URL generation in the admin?

Cheers

Tom

Michal Petrucha

unread,
May 12, 2011, 10:04:18 AM5/12/11
to django-d...@googlegroups.com
On Thu, May 12, 2011 at 02:49:03PM +0100, Tom Evans wrote:
> Hi Michal
>
> This looks really, really good. A few comments:

Thanks for the response.

> Value of a composite field: It should descend from namedtuple. From
> 1.4 onwards, Django only supports 2.5+, so it's not necessary to fudge
> things for Python 2.4

Ah, great. That would actually make things easier, namedtuple could be
used directly without any special class factory. At the time of
writing the proposal, IIRC, it wasn't clear 2.4 would be dropped.

> unique/unique_together: They should both be supported. unique_together
> should raise a PendingDeprecationWarning, and it should disappear
> according to the deprecation timeline. unique_together only exists as
> a Meta option as there is no field to attach that logic to - now there
> is.

That's also my point of view, however, there have been objections that
I mentioned (having to name the index). That's why I'd like to hear a
few more opinions.

> I also have a few questions:
>
> If a model has a composite field that is marked as primary_key=True,
> how will this affect instance.pk? Presumably this will now return a
> tuple - will this affect automatic URL generation in the admin?

The instance.pk property will work the same way as it works for any
other field, i. e. it will be an alias for the field with
primary_key=True. In this case, the CompositeField. That means,
retrieval and assignment will happen using iterables.

The admin will handle this by extending quote and unquote -- one
character will be picked as a separator (probably comma) and quoted
inside the atomic values.

Looking at the code, it is already quoted anyway so the only thing
that will change is that for models with composite primary keys this
character will appear in the primary key value.

Michal

signature.asc

Javier Guerra Giraldez

unread,
May 12, 2011, 10:13:07 AM5/12/11
to django-d...@googlegroups.com
On Thu, May 12, 2011 at 8:49 AM, Tom Evans <teva...@googlemail.com> wrote:
> unique/unique_together: They should both be supported. unique_together
> should raise a PendingDeprecationWarning, and it should disappear
> according to the deprecation timeline. unique_together only exists as
> a Meta option as there is no field to attach that logic to - now there
> is.

while i'm +1 about using unique on the composite field, i'm -0 about
deprecating unique_together. there are times when i want to ensure
composite uniqueness but don't consider those fields as part o a
composite. in those cases, i wouldn't want to define the composite
field just to say its unique.

yes, i know about the Python Zen on "only one obvious way", but in
this case it seems to be two semantically different things (even if
the generated SQL is the same). for me, readability wins

--
Javier

Carl Meyer

unread,
May 12, 2011, 8:16:51 PM5/12/11
to django-d...@googlegroups.com
Hi Michal,

I'm looking forward to seeing this project take shape! Comments below:

On 05/12/2011 06:41 AM, Michal Petrucha wrote:
[..]


> The constructor of a CompositeField will require at least two
> positional parameters, each positional parameter will be a single
> atomic field. The order of this parameters will be important as
> explained below. The parameters will have to be field instances, lazy
> loading won't be necessary (the recommended place of composite field
> definitions will be after atomic fields).

This sounds fine.

> CompositeField will accept these three field options:
> - db_index (creates a multi-column index across the underlying fields)
> - primary_key (creates a composite primary key in the model)
> - unique (creates a unique constraint for the set of fields)
>
> Other field options either wouldn't make sense or would be too
> difficult to implement.
>
> There is a clash with the current API here, in the ``unique`` option.
> This would supersede the current ``unique_together`` Meta option. I
> see three options possible:
>
> 1) Leave out the ``unique`` option and live with ``unique_together``.
> This would pribably imply also leaving out ``db_index``, otherwise
> the API would be a complete mess.
>
> 2) Allow ``CompositeField.unique`` but also keep ``unique_together``.
> The problem I see with this approach is that there would be two
> quite different ways to achieve the same effect.
>
> 3) Make ``CompositeField.unique`` the way to go and deprecate
> ``unique_together``.
> This way, specifying a unique constraint on a tuple of fields would
> work the same way it works on single fields which is IMO a
> significant benefit. There's, however, the issue of breaking
> backwards compatibility. Furthermore, one would have to add a new
> field, albeit virtual, just to create a simple constraint, which
> may seem weird to some.

I agree with Javier - I favor option 2. In my mind, although the final
result at the database level may be the same (a unique index across
multiple database columns), in conceptual terms at the ORM level it is
really two different things. There are many cases where I want to
specify that two fields should be unique together, but they really are
two separate fields; I'm never going to want to access it as a single
field or composite value. In this case, specifying a CompositeField
would confuse the intent and be more verbose than unique_together. I
think the conceptual distinction is clear, and it will actually be less
confusing to users to have both options available than to have
CompositeField become the only way to specify an index on multiple columns.

> One minor detail, should the field silently ignore invalid options or
> should it issue warnings?

Explicit is better than implicit, and errors should never pass silently
unless explicitly silenced. If the option is invalid, it should not just
be a warning, it should be an outright failure (though if the check is
expensive, it could possibly happen in model-validation rather than
always at runtime).

> The value of a CompositeField will be represented by an instance of a
> CompositeValue class. This will be a descendant of tuple and will
> resemble namedtuple present in Python >= 2.5. It will support
> iteration, numbered indexing and access to individual field values
> using attributes corresponding to underlying field names. The order of
> values will be the same as the order of fields specified in the model
> definition.

Yes, Tom is right of course - now that Python 2.5 is minimal version, we
can just use namedtuple.

> Assigning a value to a CompositeField will be possible using any
> iterable as long as its length equals the number of atomic fields (and
> the values can be assigned to the corresponding fields, obviously).

I mentioned this in an earlier thread, but I'd really like to see the
API allow me to specify my own class as the value class, as long as it
satisfies some basic API requirements. In my mind the long-term goal
here is that GFKs should be reasonably implementable as a CompositeField
or a CompositeField subclass without exploiting undocumented internal APIs.

If there are implementation complexities that push this feature out of
scope for GSoC, that's fine - but I want to make sure we don't make that
future expansion difficult by design choices we make now.

[...]


> I'm also thinking about implementing an abstract class, VirtualField.
> This could be useful mainly as a base class for fields with no direct
> database column. That means, it would mainly handle things like
> add_to_class (adding itself to the list of virtual fields instead of
> local ones), specifying arbitrary lookup filters when asked for one
> etc. CompositeField could then be a descendant of this class.
>
> However, I can't currently imagine any other use-case for this
> abstract class than CompositeField. The question is, then, is there
> any interest in having an abstract mechanism like this? Can anyone
> imagine a use-case? (The question is, should I implement this
> functionality directly inside CompositeField or factor it out into
> something more general?)

I wouldn't spend time on something we don't have any use case in mind
for (unless making this split makes the code easier to read and
understand). This is something that most likely could easily be done
later, if we find we need it.

Carl

onelson

unread,
May 13, 2011, 12:01:19 PM5/13/11
to django-d...@googlegroups.com
I'm not that familiar with GFK's and how they work in django, but I just wanted to check... 
Will we have (non-generic) FK support for this, or is that another can-o-worms that won't get touched for some time?

Regards,
Owen

Michal Petrucha

unread,
May 14, 2011, 9:21:36 AM5/14/11
to django-d...@googlegroups.com

Adding support for composite PK targets into related fields (i. e.
ForeignKey, OneToOne and ManyToMany) is something I intend to devote
the whole second half of the program to. That means, if all goes well,
this should be all right by the end of this summer.

Michal

signature.asc

Michal Petrucha

unread,
May 14, 2011, 11:46:18 AM5/14/11
to django-d...@googlegroups.com
> > 2) Allow ``CompositeField.unique`` but also keep ``unique_together``.
> > The problem I see with this approach is that there would be two
> > quite different ways to achieve the same effect.
>
> I agree with Javier - I favor option 2. In my mind, although the final
> result at the database level may be the same (a unique index across
> multiple database columns), in conceptual terms at the ORM level it is
> really two different things. There are many cases where I want to
> specify that two fields should be unique together, but they really are
> two separate fields; I'm never going to want to access it as a single
> field or composite value. In this case, specifying a CompositeField
> would confuse the intent and be more verbose than unique_together. I
> think the conceptual distinction is clear, and it will actually be less
> confusing to users to have both options available than to have
> CompositeField become the only way to specify an index on multiple columns.

One point I forgot to mention in the original e-mail is that there
would be an inconsistency: creating a unique index will be possible
using either a unique CompositeField or unique_together where as a
non-unique index will be possible only with an explicit
CompositeField. At least as far as I know there is currently no option
to create a non-unique index over several columns.

> > One minor detail, should the field silently ignore invalid options or
> > should it issue warnings?
>
> Explicit is better than implicit, and errors should never pass silently
> unless explicitly silenced. If the option is invalid, it should not just
> be a warning, it should be an outright failure (though if the check is
> expensive, it could possibly happen in model-validation rather than
> always at runtime).

Thanks for the pointer, model validation looks like the right place
for this.

> I mentioned this in an earlier thread, but I'd really like to see the
> API allow me to specify my own class as the value class, as long as it
> satisfies some basic API requirements. In my mind the long-term goal
> here is that GFKs should be reasonably implementable as a CompositeField
> or a CompositeField subclass without exploiting undocumented internal APIs.
>
> If there are implementation complexities that push this feature out of
> scope for GSoC, that's fine - but I want to make sure we don't make that
> future expansion difficult by design choices we make now.

I'll make it possible to insert custom hooks into the
assignment/retrieval routines which should be sufficient for the
purposes of GFKs.

> [...]
> > I'm also thinking about implementing an abstract class, VirtualField.
> > This could be useful mainly as a base class for fields with no direct
> > database column. That means, it would mainly handle things like
> > add_to_class (adding itself to the list of virtual fields instead of
> > local ones), specifying arbitrary lookup filters when asked for one
> > etc. CompositeField could then be a descendant of this class.
> >
> > However, I can't currently imagine any other use-case for this
> > abstract class than CompositeField. The question is, then, is there
> > any interest in having an abstract mechanism like this? Can anyone
> > imagine a use-case? (The question is, should I implement this
> > functionality directly inside CompositeField or factor it out into
> > something more general?)
>
> I wouldn't spend time on something we don't have any use case in mind
> for (unless making this split makes the code easier to read and
> understand). This is something that most likely could easily be done
> later, if we find we need it.

Fair enough, unless it turns out to be a better approach anyway. (-:

Michal

signature.asc

Luke Plant

unread,
May 14, 2011, 12:14:05 PM5/14/11
to django-d...@googlegroups.com
On 12/05/11 12:41, Michal Petrucha wrote:

> 1) Leave out the ``unique`` option and live with ``unique_together``.
> This would pribably imply also leaving out ``db_index``, otherwise
> the API would be a complete mess.
>
> 2) Allow ``CompositeField.unique`` but also keep ``unique_together``.
> The problem I see with this approach is that there would be two
> quite different ways to achieve the same effect.
>
> 3) Make ``CompositeField.unique`` the way to go and deprecate
> ``unique_together``.
> This way, specifying a unique constraint on a tuple of fields would
> work the same way it works on single fields which is IMO a
> significant benefit. There's, however, the issue of breaking
> backwards compatibility. Furthermore, one would have to add a new
> field, albeit virtual, just to create a simple constraint, which
> may seem weird to some.

I'd go with (2), we can easily live with these two different ways to do
something, because, from a given starting point, there is actually only
"one obvious way" to achieve what you want i.e. if you have a composite
field already, there is one obvious way to make it unique, and if you
have two separate fields, there is one obvious way to make them 'unique
together'.

> I'm also thinking about implementing an abstract class, VirtualField.
> This could be useful mainly as a base class for fields with no direct
> database column. That means, it would mainly handle things like
> add_to_class (adding itself to the list of virtual fields instead of
> local ones), specifying arbitrary lookup filters when asked for one
> etc. CompositeField could then be a descendant of this class.
>
> However, I can't currently imagine any other use-case for this
> abstract class than CompositeField. The question is, then, is there
> any interest in having an abstract mechanism like this? Can anyone
> imagine a use-case? (The question is, should I implement this
> functionality directly inside CompositeField or factor it out into
> something more general?)

Feel free to make such a class if it makes development easier, but keep
it private/undocumented until we have at least 2 use cases.

Luke

--
"Procrastination: Hard work often pays off after time, but laziness
always pays off now." (despair.com)

Luke Plant || http://lukeplant.me.uk/

Ian Clelland

unread,
May 16, 2011, 2:19:08 PM5/16/11
to django-d...@googlegroups.com
On Thu, May 12, 2011 at 5:16 PM, Carl Meyer <ca...@oddbird.net> wrote:

On 05/12/2011 06:41 AM, Michal Petrucha wrote:
> On Thu, May 12, 2011 at 02:49:03PM +0100, Tom Evans wrote:
> The value of a CompositeField will be represented by an instance of a
> CompositeValue class. This will be a descendant of tuple and will
> resemble namedtuple present in Python >= 2.5. It will support
> iteration, numbered indexing and access to individual field values
> using attributes corresponding to underlying field names. The order of
> values will be the same as the order of fields specified in the model
> definition.

Yes, Tom is right of course - now that Python 2.5 is minimal version, we
can just use namedtuple.

As far as I can tell, namedtuple was added in Python 2.6, not 2.5, so a compatibility class may still be necessary.


--
Regards,
Ian Clelland
<clel...@gmail.com>

akaariai

unread,
May 17, 2011, 5:05:10 AM5/17/11
to Django developers
On May 12, 2:41 pm, Michal Petrucha <michal.petru...@ksp.sk> wrote:
> Due to the nature of this field type, other lookup filters than
> ``exact`` and ``in`` would have unclear semantics and won't be
> supported. The original plan was to also exclude support for ``in``
> but as it turns out, ``in`` is used in several places under the
> assumption that primary keys support it, for example DeleteQuery
> or UpdateQuery. Therefore both filters will be implemented.

I wonder how to implement __in lookups in SQLite3. SQLite3 doesn't
support where (col1, col2) in ((val3, val4),(val5, val6)). But other
DBs do (at least MySQL, Oracle and PostgreSQL). I do not know what
would be the best way to write something equivalent in SQLite3. The
obvious choice is to rewrite it as an OR lookup (as mentioned in the
full proposal). Maybe write it as an OR lookup for every DB for the
initial patch, and later on this can be improved to have per database
handling. In lookups with subselects are a harder problem. Those would
need to be rewritten as joined subselects with a distinct clause. [1]
Not in lookups could be still harder due to weird null handling. (1
not in (null) -> Unknown). [2]

I hope there will be an easy solution to this problem, as this feature
is something which would be really, really valuabe for Django (no more
telling DBAs: by the way, no composite foreign keys...). One simple
solution would be to disallow __in lookups with subselects (or run the
subselects separately) and use OR lookups when given a list of values.
This should be relatively easy to implement and could be improved
later on.

- Anssi

[1] http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:953229842074
[2] http://asktom.oracle.com/pls/asktom/f?p=100:11:1089369944141559::::P11_QUESTION_ID:442029737684

Michal Petrucha

unread,
May 17, 2011, 10:32:54 AM5/17/11
to django-d...@googlegroups.com
On Tue, May 17, 2011 at 02:05:10AM -0700, akaariai wrote:
> On May 12, 2:41 pm, Michal Petrucha <michal.petru...@ksp.sk> wrote:
> > Due to the nature of this field type, other lookup filters than
> > ``exact`` and ``in`` would have unclear semantics and won't be
> > supported. The original plan was to also exclude support for ``in``
> > but as it turns out, ``in`` is used in several places under the
> > assumption that primary keys support it, for example DeleteQuery
> > or UpdateQuery. Therefore both filters will be implemented.
>
> I wonder how to implement __in lookups in SQLite3. SQLite3 doesn't
> support where (col1, col2) in ((val3, val4),(val5, val6)). But other
> DBs do (at least MySQL, Oracle and PostgreSQL). I do not know what
> would be the best way to write something equivalent in SQLite3. The
> obvious choice is to rewrite it as an OR lookup (as mentioned in the
> full proposal). Maybe write it as an OR lookup for every DB for the
> initial patch, and later on this can be improved to have per database
> handling.

You're right, SQLite3 is the troublemaker here and the reason I wanted
to leave these lookups out initially.

Well, this depends on the level at which these lookups will be
handled. The doable, albeit somewhat hacky way is to handle
this when creating the SQL string for IN, recognize composite lookups
and turn them into a disjunction. The more robust and probably
"proper" way would be to delegate this to the database backend. The
backend could then decite whether it wants
(col1, col2) IN ((val1, val2), (val3, val4))
or
((col1 = val1) AND (col2 = val2)) OR ((col1 = val3) AND (col2 = val4))
This would, however, require enhancing the backend interface.

I think I'll go with the first option, the second one would require
even more non-trivial design decisions to be made regarding the
backend interface and I think I have enough on my plate anyway.

> In lookups with subselects are a harder problem. Those would
> need to be rewritten as joined subselects with a distinct clause. [1]
> Not in lookups could be still harder due to weird null handling. (1
> not in (null) -> Unknown). [2]
>
> I hope there will be an easy solution to this problem, as this feature
> is something which would be really, really valuabe for Django (no more
> telling DBAs: by the way, no composite foreign keys...). One simple
> solution would be to disallow __in lookups with subselects (or run the
> subselects separately) and use OR lookups when given a list of values.
> This should be relatively easy to implement and could be improved
> later on.

Uh oh. This is black magic, probably heavily backend-dependent, too. I
can tell for sure that I don't intend to incorporate any subquery
support for composite lookups for now. Trying to do a composite __in
lookup using a subquery will probably just throw an exception for now,
the user will be required to evaluate it himself.

Proper subquery support is something that can be addressed once the
rest of the implementation is stable.

Michal

signature.asc

akaariai

unread,
May 17, 2011, 11:15:16 AM5/17/11
to Django developers
On May 17, 5:32 pm, Michal Petrucha <michal.petru...@ksp.sk> wrote:
> Proper subquery support is something that can be addressed once the
> rest of the implementation is stable.

To me the plan looks very reasonable (both disallowing subqueries and
converting to disjunction form), unless there is some part in the
internals which expects pk__in=qs to work. In that case it could just
be converted to something like:
if pk is multipart_pk:
qs = list(qs.values_list('pk_part1', 'pk_part2'))
continue as now.

In any case, in my opinion pushing as much of this work to later
patches is the way to go. The only question is how much can be pushed
to later patches. I do not know the answer to that, unfortunately...

- Anssi

Michael P. Jung

unread,
May 19, 2011, 11:06:33 AM5/19/11
to django-d...@googlegroups.com
About one year ago I wrote a CompositeField implementation, which I
proposed as a clean way of grouping fields together, while making it
easy to reuse that composition of field:

https://bitbucket.org/mp/django-composite-field

It's a slightly different approach and is centered around defining a
field which is composed of multiple field and can be reused in multiple
models.

<bikeshedding>
One could also call that field MultiField, MixinField, GroupField,
FieldSet,...
</bikeshedding>

I proposed this to be added in Django during the DjangoCon Europe
sprints, but sadly it didn't gain any attention. As a result I just
prepared a release on pypi including some documentation and test cases.


I'm very sorry that my comment to this RFC comes so late. I have
subscribed the django-developers mailing list, but don't read it on a
regular basis.


--mp

Reply all
Reply to author
Forward
0 new messages