This lengthy mail is mostly a recapitulation of things mentioned in
the past, like [1], [2], [3] and questions raised in these discussions
that have not been answered so far.
CompositeField will be a new type of model fields. This type will be
virtual, i. e. it won't be backed by any real database column by
itself. Instead, it will act as a proxy to a given set of atomic
fields and will be used to set options for and perform operations on
the whole set as one field.
The constructor of a CompositeField will require at least two
positional parameters, each positional parameter will be a single
atomic field. The order of this parameters will be important as
explained below. The parameters will have to be field instances, lazy
loading won't be necessary (the recommended place of composite field
definitions will be after atomic fields).
CompositeField will accept these three field options:
- db_index (creates a multi-column index across the underlying fields)
- primary_key (creates a composite primary key in the model)
- unique (creates a unique constraint for the set of fields)
Other field options either wouldn't make sense or would be too
difficult to implement.
There is a clash with the current API here, in the ``unique`` option.
This would supersede the current ``unique_together`` Meta option. I
see three options possible:
1) Leave out the ``unique`` option and live with ``unique_together``.
This would pribably imply also leaving out ``db_index``, otherwise
the API would be a complete mess.
2) Allow ``CompositeField.unique`` but also keep ``unique_together``.
The problem I see with this approach is that there would be two
quite different ways to achieve the same effect.
3) Make ``CompositeField.unique`` the way to go and deprecate
``unique_together``.
This way, specifying a unique constraint on a tuple of fields would
work the same way it works on single fields which is IMO a
significant benefit. There's, however, the issue of breaking
backwards compatibility. Furthermore, one would have to add a new
field, albeit virtual, just to create a simple constraint, which
may seem weird to some.
I don't feel like deciding in this and either one is fine as far as
the implementation is concerned.
One minor detail, should the field silently ignore invalid options or
should it issue warnings?
Moving on...
The value of a CompositeField will be represented by an instance of a
CompositeValue class. This will be a descendant of tuple and will
resemble namedtuple present in Python >= 2.5. It will support
iteration, numbered indexing and access to individual field values
using attributes corresponding to underlying field names. The order of
values will be the same as the order of fields specified in the model
definition.
Assigning a value to a CompositeField will be possible using any
iterable as long as its length equals the number of atomic fields (and
the values can be assigned to the corresponding fields, obviously).
Due to the nature of this field type, other lookup filters than
``exact`` and ``in`` would have unclear semantics and won't be
supported. The original plan was to also exclude support for ``in``
but as it turns out, ``in`` is used in several places under the
assumption that primary keys support it, for example DeleteQuery
or UpdateQuery. Therefore both filters will be implemented.
This should be everything as far as the models API is concerned. As
for the other parts of Django, the changes will be kept to a working
minimum.
Forms: Only support in ModelChoiceFields will be added for composite
primary keys; there won't be any special form field type for now.
Admin: Again, only support for composite primary keys will be added in
the quoting/unquoting function to make it possible to access such
models.
GFK: For now, GenericForeignKey won't be able to reference models with
composite primary keys.
I'm also thinking about implementing an abstract class, VirtualField.
This could be useful mainly as a base class for fields with no direct
database column. That means, it would mainly handle things like
add_to_class (adding itself to the list of virtual fields instead of
local ones), specifying arbitrary lookup filters when asked for one
etc. CompositeField could then be a descendant of this class.
However, I can't currently imagine any other use-case for this
abstract class than CompositeField. The question is, then, is there
any interest in having an abstract mechanism like this? Can anyone
imagine a use-case? (The question is, should I implement this
functionality directly inside CompositeField or factor it out into
something more general?)
I'll really appreciate each comment.
Michal Petrucha
[1] https://groups.google.com/forum/#!topic/django-developers/Eg1AHjAvNps
[2] https://groups.google.com/forum/#!topic/django-developers/Y0aAb792cTw/discussion
[3] http://people.ksp.sk/~johnny64/GSoC-full-proposal
Hi Michal
This looks really, really good. A few comments:
Value of a composite field: It should descend from namedtuple. From
1.4 onwards, Django only supports 2.5+, so it's not necessary to fudge
things for Python 2.4
unique/unique_together: They should both be supported. unique_together
should raise a PendingDeprecationWarning, and it should disappear
according to the deprecation timeline. unique_together only exists as
a Meta option as there is no field to attach that logic to - now there
is.
I also have a few questions:
If a model has a composite field that is marked as primary_key=True,
how will this affect instance.pk? Presumably this will now return a
tuple - will this affect automatic URL generation in the admin?
Cheers
Tom
Thanks for the response.
> Value of a composite field: It should descend from namedtuple. From
> 1.4 onwards, Django only supports 2.5+, so it's not necessary to fudge
> things for Python 2.4
Ah, great. That would actually make things easier, namedtuple could be
used directly without any special class factory. At the time of
writing the proposal, IIRC, it wasn't clear 2.4 would be dropped.
> unique/unique_together: They should both be supported. unique_together
> should raise a PendingDeprecationWarning, and it should disappear
> according to the deprecation timeline. unique_together only exists as
> a Meta option as there is no field to attach that logic to - now there
> is.
That's also my point of view, however, there have been objections that
I mentioned (having to name the index). That's why I'd like to hear a
few more opinions.
> I also have a few questions:
>
> If a model has a composite field that is marked as primary_key=True,
> how will this affect instance.pk? Presumably this will now return a
> tuple - will this affect automatic URL generation in the admin?
The instance.pk property will work the same way as it works for any
other field, i. e. it will be an alias for the field with
primary_key=True. In this case, the CompositeField. That means,
retrieval and assignment will happen using iterables.
The admin will handle this by extending quote and unquote -- one
character will be picked as a separator (probably comma) and quoted
inside the atomic values.
Looking at the code, it is already quoted anyway so the only thing
that will change is that for models with composite primary keys this
character will appear in the primary key value.
Michal
while i'm +1 about using unique on the composite field, i'm -0 about
deprecating unique_together. there are times when i want to ensure
composite uniqueness but don't consider those fields as part o a
composite. in those cases, i wouldn't want to define the composite
field just to say its unique.
yes, i know about the Python Zen on "only one obvious way", but in
this case it seems to be two semantically different things (even if
the generated SQL is the same). for me, readability wins
--
Javier
I'm looking forward to seeing this project take shape! Comments below:
On 05/12/2011 06:41 AM, Michal Petrucha wrote:
[..]
> The constructor of a CompositeField will require at least two
> positional parameters, each positional parameter will be a single
> atomic field. The order of this parameters will be important as
> explained below. The parameters will have to be field instances, lazy
> loading won't be necessary (the recommended place of composite field
> definitions will be after atomic fields).
This sounds fine.
> CompositeField will accept these three field options:
> - db_index (creates a multi-column index across the underlying fields)
> - primary_key (creates a composite primary key in the model)
> - unique (creates a unique constraint for the set of fields)
>
> Other field options either wouldn't make sense or would be too
> difficult to implement.
>
> There is a clash with the current API here, in the ``unique`` option.
> This would supersede the current ``unique_together`` Meta option. I
> see three options possible:
>
> 1) Leave out the ``unique`` option and live with ``unique_together``.
> This would pribably imply also leaving out ``db_index``, otherwise
> the API would be a complete mess.
>
> 2) Allow ``CompositeField.unique`` but also keep ``unique_together``.
> The problem I see with this approach is that there would be two
> quite different ways to achieve the same effect.
>
> 3) Make ``CompositeField.unique`` the way to go and deprecate
> ``unique_together``.
> This way, specifying a unique constraint on a tuple of fields would
> work the same way it works on single fields which is IMO a
> significant benefit. There's, however, the issue of breaking
> backwards compatibility. Furthermore, one would have to add a new
> field, albeit virtual, just to create a simple constraint, which
> may seem weird to some.
I agree with Javier - I favor option 2. In my mind, although the final
result at the database level may be the same (a unique index across
multiple database columns), in conceptual terms at the ORM level it is
really two different things. There are many cases where I want to
specify that two fields should be unique together, but they really are
two separate fields; I'm never going to want to access it as a single
field or composite value. In this case, specifying a CompositeField
would confuse the intent and be more verbose than unique_together. I
think the conceptual distinction is clear, and it will actually be less
confusing to users to have both options available than to have
CompositeField become the only way to specify an index on multiple columns.
> One minor detail, should the field silently ignore invalid options or
> should it issue warnings?
Explicit is better than implicit, and errors should never pass silently
unless explicitly silenced. If the option is invalid, it should not just
be a warning, it should be an outright failure (though if the check is
expensive, it could possibly happen in model-validation rather than
always at runtime).
> The value of a CompositeField will be represented by an instance of a
> CompositeValue class. This will be a descendant of tuple and will
> resemble namedtuple present in Python >= 2.5. It will support
> iteration, numbered indexing and access to individual field values
> using attributes corresponding to underlying field names. The order of
> values will be the same as the order of fields specified in the model
> definition.
Yes, Tom is right of course - now that Python 2.5 is minimal version, we
can just use namedtuple.
> Assigning a value to a CompositeField will be possible using any
> iterable as long as its length equals the number of atomic fields (and
> the values can be assigned to the corresponding fields, obviously).
I mentioned this in an earlier thread, but I'd really like to see the
API allow me to specify my own class as the value class, as long as it
satisfies some basic API requirements. In my mind the long-term goal
here is that GFKs should be reasonably implementable as a CompositeField
or a CompositeField subclass without exploiting undocumented internal APIs.
If there are implementation complexities that push this feature out of
scope for GSoC, that's fine - but I want to make sure we don't make that
future expansion difficult by design choices we make now.
[...]
> I'm also thinking about implementing an abstract class, VirtualField.
> This could be useful mainly as a base class for fields with no direct
> database column. That means, it would mainly handle things like
> add_to_class (adding itself to the list of virtual fields instead of
> local ones), specifying arbitrary lookup filters when asked for one
> etc. CompositeField could then be a descendant of this class.
>
> However, I can't currently imagine any other use-case for this
> abstract class than CompositeField. The question is, then, is there
> any interest in having an abstract mechanism like this? Can anyone
> imagine a use-case? (The question is, should I implement this
> functionality directly inside CompositeField or factor it out into
> something more general?)
I wouldn't spend time on something we don't have any use case in mind
for (unless making this split makes the code easier to read and
understand). This is something that most likely could easily be done
later, if we find we need it.
Carl
Adding support for composite PK targets into related fields (i. e.
ForeignKey, OneToOne and ManyToMany) is something I intend to devote
the whole second half of the program to. That means, if all goes well,
this should be all right by the end of this summer.
Michal
One point I forgot to mention in the original e-mail is that there
would be an inconsistency: creating a unique index will be possible
using either a unique CompositeField or unique_together where as a
non-unique index will be possible only with an explicit
CompositeField. At least as far as I know there is currently no option
to create a non-unique index over several columns.
> > One minor detail, should the field silently ignore invalid options or
> > should it issue warnings?
>
> Explicit is better than implicit, and errors should never pass silently
> unless explicitly silenced. If the option is invalid, it should not just
> be a warning, it should be an outright failure (though if the check is
> expensive, it could possibly happen in model-validation rather than
> always at runtime).
Thanks for the pointer, model validation looks like the right place
for this.
> I mentioned this in an earlier thread, but I'd really like to see the
> API allow me to specify my own class as the value class, as long as it
> satisfies some basic API requirements. In my mind the long-term goal
> here is that GFKs should be reasonably implementable as a CompositeField
> or a CompositeField subclass without exploiting undocumented internal APIs.
>
> If there are implementation complexities that push this feature out of
> scope for GSoC, that's fine - but I want to make sure we don't make that
> future expansion difficult by design choices we make now.
I'll make it possible to insert custom hooks into the
assignment/retrieval routines which should be sufficient for the
purposes of GFKs.
> [...]
> > I'm also thinking about implementing an abstract class, VirtualField.
> > This could be useful mainly as a base class for fields with no direct
> > database column. That means, it would mainly handle things like
> > add_to_class (adding itself to the list of virtual fields instead of
> > local ones), specifying arbitrary lookup filters when asked for one
> > etc. CompositeField could then be a descendant of this class.
> >
> > However, I can't currently imagine any other use-case for this
> > abstract class than CompositeField. The question is, then, is there
> > any interest in having an abstract mechanism like this? Can anyone
> > imagine a use-case? (The question is, should I implement this
> > functionality directly inside CompositeField or factor it out into
> > something more general?)
>
> I wouldn't spend time on something we don't have any use case in mind
> for (unless making this split makes the code easier to read and
> understand). This is something that most likely could easily be done
> later, if we find we need it.
Fair enough, unless it turns out to be a better approach anyway. (-:
Michal
> 1) Leave out the ``unique`` option and live with ``unique_together``.
> This would pribably imply also leaving out ``db_index``, otherwise
> the API would be a complete mess.
>
> 2) Allow ``CompositeField.unique`` but also keep ``unique_together``.
> The problem I see with this approach is that there would be two
> quite different ways to achieve the same effect.
>
> 3) Make ``CompositeField.unique`` the way to go and deprecate
> ``unique_together``.
> This way, specifying a unique constraint on a tuple of fields would
> work the same way it works on single fields which is IMO a
> significant benefit. There's, however, the issue of breaking
> backwards compatibility. Furthermore, one would have to add a new
> field, albeit virtual, just to create a simple constraint, which
> may seem weird to some.
I'd go with (2), we can easily live with these two different ways to do
something, because, from a given starting point, there is actually only
"one obvious way" to achieve what you want i.e. if you have a composite
field already, there is one obvious way to make it unique, and if you
have two separate fields, there is one obvious way to make them 'unique
together'.
> I'm also thinking about implementing an abstract class, VirtualField.
> This could be useful mainly as a base class for fields with no direct
> database column. That means, it would mainly handle things like
> add_to_class (adding itself to the list of virtual fields instead of
> local ones), specifying arbitrary lookup filters when asked for one
> etc. CompositeField could then be a descendant of this class.
>
> However, I can't currently imagine any other use-case for this
> abstract class than CompositeField. The question is, then, is there
> any interest in having an abstract mechanism like this? Can anyone
> imagine a use-case? (The question is, should I implement this
> functionality directly inside CompositeField or factor it out into
> something more general?)
Feel free to make such a class if it makes development easier, but keep
it private/undocumented until we have at least 2 use cases.
Luke
--
"Procrastination: Hard work often pays off after time, but laziness
always pays off now." (despair.com)
Luke Plant || http://lukeplant.me.uk/
On 05/12/2011 06:41 AM, Michal Petrucha wrote:
> On Thu, May 12, 2011 at 02:49:03PM +0100, Tom Evans wrote:
> The value of a CompositeField will be represented by an instance of aYes, Tom is right of course - now that Python 2.5 is minimal version, we
> CompositeValue class. This will be a descendant of tuple and will
> resemble namedtuple present in Python >= 2.5. It will support
> iteration, numbered indexing and access to individual field values
> using attributes corresponding to underlying field names. The order of
> values will be the same as the order of fields specified in the model
> definition.
can just use namedtuple.
You're right, SQLite3 is the troublemaker here and the reason I wanted
to leave these lookups out initially.
Well, this depends on the level at which these lookups will be
handled. The doable, albeit somewhat hacky way is to handle
this when creating the SQL string for IN, recognize composite lookups
and turn them into a disjunction. The more robust and probably
"proper" way would be to delegate this to the database backend. The
backend could then decite whether it wants
(col1, col2) IN ((val1, val2), (val3, val4))
or
((col1 = val1) AND (col2 = val2)) OR ((col1 = val3) AND (col2 = val4))
This would, however, require enhancing the backend interface.
I think I'll go with the first option, the second one would require
even more non-trivial design decisions to be made regarding the
backend interface and I think I have enough on my plate anyway.
> In lookups with subselects are a harder problem. Those would
> need to be rewritten as joined subselects with a distinct clause. [1]
> Not in lookups could be still harder due to weird null handling. (1
> not in (null) -> Unknown). [2]
>
> I hope there will be an easy solution to this problem, as this feature
> is something which would be really, really valuabe for Django (no more
> telling DBAs: by the way, no composite foreign keys...). One simple
> solution would be to disallow __in lookups with subselects (or run the
> subselects separately) and use OR lookups when given a list of values.
> This should be relatively easy to implement and could be improved
> later on.
Uh oh. This is black magic, probably heavily backend-dependent, too. I
can tell for sure that I don't intend to incorporate any subquery
support for composite lookups for now. Trying to do a composite __in
lookup using a subquery will probably just throw an exception for now,
the user will be required to evaluate it himself.
Proper subquery support is something that can be addressed once the
rest of the implementation is stable.
Michal
https://bitbucket.org/mp/django-composite-field
It's a slightly different approach and is centered around defining a
field which is composed of multiple field and can be reused in multiple
models.
<bikeshedding>
One could also call that field MultiField, MixinField, GroupField,
FieldSet,...
</bikeshedding>
I proposed this to be added in Django during the DjangoCon Europe
sprints, but sadly it didn't gain any attention. As a result I just
prepared a release on pypi including some documentation and test cases.
I'm very sorry that my comment to this RFC comes so late. I have
subscribed the django-developers mailing list, but don't read it on a
regular basis.
--mp