Composite fields

1,050 views
Skip to first unread message

Thomas Stephenson

unread,
Mar 4, 2015, 8:16:37 AM3/4/15
to django-d...@googlegroups.com
Considering the past two proposals I've made here, I doubt I'll get more than an echo chamber effect on this one.

For the past week or so I've spent a bit of time on a feature I've always wanted to see land in django -- composite fields. The tickets have been open in the bug tracker for quite some time (and there's a few related ones, such as multi-column primary keys that can all be killed with the one stone).

The work is available on this branch of my fork of django for the moment -- I haven't opened up a PR yet because there's still some features that are still to be implemented that will be explained below, but I want to give everybody a chance to tell me where I can stick it before I spend *too* much time on it.

So, without further ado, the proposal.


Composite Fields - Implemented

A composite field is an encapsulation of the functionality of a subset of fields on a model. Composite fields can be defined in one of two ways:

1. Statically declared composite fields

A statically declared composite field is defined in the same way a django model is defined. There are two customisable transformation functions, CompositeField.value_to_dict(self, value) and CompositeField.value_from_dict(self, value) which can be used to associate the field with a python object.

All the serialization functions are implemented via the implementations of the subfields.

For example,  

class MoneyField(models.CompositeField):
   currency_code = models.CharacterField(max_length=3)
   amount = models.DecimalField(max_digits=16, decimal_digits=4)

   ## Overriding __init__ can be used to pass field parameters to the subfields

   def value_from_dict(self, value):
       if value is None:
          return None
       return Money(**value)

   def value_to_dict(self, value):
      if value is None:
         return None
      return {attr: getattr(value, attr) for attr in ('currency_code', 'amount')}

2. Inline composite fields.

An inline composite field is declared at the field definition site on the body of a model, by providing the subfields as the 'fields' argument of the CompositeField constructor. There are no transformation parameters available to override when declaring a composite field in this fashion -- the value of the field is always available as a python `dict` as an attribute on the MyModel

class MyModel(models.Model):
    id = models.CompositeField(fields = [
       ('a', models.IntegerField()),
       ('b', models.CharField(max_length=30)
    ], primary_key=True)

This method for defining composite fields has a few drawbacks, but can be useful if the only reason to add the composite field to the model was to implement a unique_together or index_together constraint *

* Although it's still possible to do that directly on class Meta.

3. Null

Setting the value of a multi-column field to NULL is different than setting any of the individual subfields to NULL. But there are cases (e.g. Money) where we would like to be able to set `null=True` on a composite field, but still retain 'NOT NULL' constraints on each of the subfield columns.

To solve this problem, every table which implements a CompositeField will also add an implicit (semi-hidden) `isnull` subfield on the attribute, which keeps track of whether it is the value of the composite field that is null, or any of the particular subfields.


3. Querying.

The syntax for querying over the subfields of a composite field will be familiar to anyone who has queried over a relationship attribute in django.

model.objects.filter(price__currency_code='USD', price__amount__lt=Decimal('50.00')).all()

In addition, composite fields can be queried via EXACT and IN lookups. It is possible to implement custom lookups for specific statically defined fields, but not recommended and not part of the official API.

4. Restrictions

The following restrictions are currently imposed on the use of composite fields. None of these are restrictions that can't be worked around in future extensions, but they're restrictions which considerably simplify both the implementation and API.

- No related fields as a subfield of a composite field
- No nested composite fields
- No inheritance of composite fields (apart from inheriting from CompositeField itself).

5. Changes to the Field API

As discussed in the other thread I posted. I've changed the implementation so that _get_cache_name can still be dependent on the name, but I think using attname is more useful anyway.

Composite Fields -- unimplemented

These features are still not implemented

- multi column primary keys. unique_together and index_together are implemented and adding a primary key constraint should be a similar operation.
- some small issues with multi-table inheritance.
- more test coverage
- proper documentation
- anything that comes out of this thread.


  

Thomas Stephenson

unread,
Mar 4, 2015, 7:43:38 PM3/4/15
to django-d...@googlegroups.com
OK, no need for everyone to shout -- your message is heard loud and
clear. I'll go and find something else to work on.
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/ed326cce-6784-429b-869b-f6f66d3c77fd%40googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.

Curtis Maloney

unread,
Mar 4, 2015, 7:53:04 PM3/4/15
to django-d...@googlegroups.com
Have you reviewed all the existing works on trying to add composite fields?

I know several managed to reach a level of maturity that was almost merge quality, and am sure I heard some of the recent ORM changes would ease support.

On 5 March 2015 at 11:42, Thomas Stephenson <ova...@gmail.com> wrote:
OK, no need for everyone to shout -- your message is heard loud and
clear. I'll go and find something else to work on.


Wow... impatient much?  Not everyone works in your timezone, or to your schedule.  I, for one, was planning to take some time to review your proposal carefully, instead of "first thing in the morning whilst still having my coffee", as I feel it's a topic that deserves careful consideration.

Don't you think it's worth giving everyone a chance to read your email before you give up and move on?  It's only been 11 hours.  Many of us were asleep for most of that.

--
Curtis

 

Josh Smeaton

unread,
Mar 4, 2015, 8:01:33 PM3/4/15
to django-d...@googlegroups.com
Hey Thomas,

You've only waited 11 hours for a response here. Your contributions are definitely welcome, and it's obvious that you've done a lot of work and put in a lot of effort. Personally, I thank you.

The reason you're not getting a response, at this point, is because the number of people able to intelligently comment on your proposal is very small. I can think of a few people that would have some interest, including Ansii, Loic, and probably Marc. I'm interested in the ORM in general, but I really have no opinion on Composite Fields at the moment, so I have nothing to really offer you except some positive wishes.

I know there is a desire to have composite fields and keys. The last brief discussion that I can find is here: https://groups.google.com/forum/#!searchin/django-developers/composite%7Csort:date/django-developers/ww6o-1kvI28/OE_V6e8qlWgJ which calls out a particular branch where work had already commenced. AFAIK, it's also a Google Summer of Code suggested project.

So please, don't be disheartened. As I said, the number of users that could give you feedback is small, and there is a very cyclic pattern of activity amongst those developers. I would encourage you to read previous discussions and look at previous attempts, and continue with your work. It most definitely is wanted. Someone will be around, eventually, to offer help and guidance when needed.

As to your other messages on this mailing list that haven't seen a lot of activity, I wouldn't take that too personally. Open up a ticket on trac, and let the traditional triaging take place to let you know if you're on the right track or not. Feedback on the ML is usually reserved for ideas or features that are contentious, though that's not always the case. It's obvious that you have an interest in improving Django, and that you have the skill and desire to make that happen. I really do hope you continue, despite the lack of activity on the ML at the moment.

All the best :)

Josh

Thomas Stephenson

unread,
Mar 4, 2015, 9:59:44 PM3/4/15
to django-d...@googlegroups.com
Hey there, 

Yeah, I've looked through some (probably not all) of the previous proposals to support composite fields. I was going to wait until some more work had been done to split concrete fields from virtual fields (ticket 16508), because I thought that would be an ORM change that would have drastically simplified things, but in the end I've found that I was able to keep the implementation contained without it.

Sorry if I sounded impatient -- it was just a bit of humour to kickstart the conversation, since my previous thread (about a couple of small changes to the field API I'd like to make to support this proposal) had been languishing for four days without a single response.

Thomas

Michael Manfre

unread,
Mar 4, 2015, 10:02:48 PM3/4/15
to django-d...@googlegroups.com
As the others have already stated, patience is required if you want meaningful feedback.

Have you thought about how CompositeFields and SubFields will interact with migrations?

Regards,
Michael Manfre

--

Josh Smeaton

unread,
Mar 4, 2015, 10:21:28 PM3/4/15
to django-d...@googlegroups.com
With regards to the field API - that probably doesn't need an ML discussion. Open a ticket on Trac, submit a PR, and the people that will want to see it will review it. If there's confusion about how to proceed, then an ML discussion can be had, when there are actually interested parties involved. I imagine Russ and Piros (if he's still kicking around, I think he is) will have some comments once there's something concrete to review.

2 cents

Thomas Stephenson

unread,
Mar 4, 2015, 10:39:51 PM3/4/15
to django-d...@googlegroups.com
> Have you thought about how CompositeFields and SubFields will interact with migrations?

I've given it some though, but at the moment I've just hacked out the one point in the code where it makes a difference, marked it with a FIXME and was going to return to it after I could store and query values etc. The subfield is just a wrapper class for another field, so it just requires the composite field to be reconstructed. 

The biggest complication is that a composite field has either:
- An argument which contains fields which need to be reconstructed; or
- The possibility of providing extra parameters to it's subfields.

Thomas

Loïc Bistuer

unread,
Mar 4, 2015, 10:47:51 PM3/4/15
to django-d...@googlegroups.com
Hi Thomas,

fields/related.py is begging for a refactor of the existing relational fields, and ideally we'd build composite fields on top of this refactor.

The ORM has seen huge changes in the recent past: migrations, expressions, lookups, and last but not least a *public* footprint in Model._meta, each having both positive and negative (mostly backwards compatibility concerns) implications when it comes to redesigning relations. All of these didn't exist at the time of the previous two SoC attempts.

Taking a step back and looking at the problem from this new perspective takes time, and in the last couple of months both Anssi and I have been experimenting with a few ideas. Anssi has dedicated a lot of time to it in the last few weeks, let's wait for his feedback. Best case scenario, his design and implementation work and we don't have a SoC project anymore; if he can't follow through maybe he'll be interested in mentoring yet another SoC on this issue.

--
Loïc
> To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CA%2Bm8oA-6MxpsQXRFE4pMB2t8sBWjrj%3DGMDULkYFsui%2B66TwM%3DA%40mail.gmail.com.

Thomas Stephenson

unread,
Mar 4, 2015, 11:21:49 PM3/4/15
to django-d...@googlegroups.com
Thanks for that info. Just to be clear, I'm not interested in SoC grants or mentoring, I'm mostly writing this stuff for my own concerns (aimed at improving our internal use of django by contributing changes we want to see upstream). I've fought hard with management to be in a position to devote a portion of my time to open source projects we use in-house, so I'm getting paid for whatever contributions I make (assuming I can make any).

We mainly use the ORM in conjunction with the django-rest-framework, we're not interested in admin, forms, templating etc. so that's where most of my changes will be directed.

I'd be happy to put this work aside to help with any refactor of the fields/related.py module then come back to this at a later point, but I think my implementation (so far) is reasonably sound and would integrate well with whatever work that is going on with Relations.

Thomas 

Anssi Kääriäinen

unread,
Mar 5, 2015, 6:30:19 AM3/5/15
to django-d...@googlegroups.com
I've started doing some refactorings to the fields/related.py. The ultimate goal is to have field.rel (ForeignObjectRel) instances to be Field subclasses.

I first went ahead and did exactly this with the idea of changing everything in one go. Turns out this was a bad idea, after around 1700 lines changed everything was broken and there were multiple hard to debug failures.

I did a fresh start, and my plan is now to do the following:
  - First, make ForeignObjectRel to act like a Field instance (part of this is done in https://github.com/django/django/pull/4241)
  - Make ForeignObjectRel Field subclass. This will likely rename the classes to something like ReverseForeignKey, ReverseManyToMany and so on.
  - Finally, add the new reverse field instances directly to the remote model's _meta

This is just clean-up in the fields/related.py. The composite fields work doesn't need to rely on this. To get to a state where we have composite primary keys and composite joins we should:
  - Split ForeignKey to a concrete field instance, and a related field instance. After this all related fields are purely virtual. This means that author = models.ForeignKey(Author) will automatically generate a author_id = IntegerField() on the model. Unfortunately this also means model._meta will now contains two fields for each foreign key defined on the model instead of just one.
  - Add composite fields (but not yet composite foreign keys or primary keys)
  - Add composite primary keys
  - Add composite foreign keys

Addition of composite fields can be done at the same time with changes to fields/related.py, so it should be possible to start working on composite fields right away.

Michal Petrucha did a lot of work to add composite fields to Django. The syntax he had was:

class MyModel(models.Model):
    x = models.IntegerField()
    y = models.IntegerField()
    point = models.CompositeField(x, y)

I think we should stick to that.

It is essential that we don't try to do too much in one go. Even small changes tend to be hard to do. Trying to do all this in one go will result in a patch that will be nearly impossible to review, and which will conflict badly with other changes once finished.

If you want to work on the composite fields part of this, I suggest you should take a close look at what was done in Petrucha's GSoC work (https://github.com/koniiiik/django/, unfortunately I don't recall which branch contains the latest code). The code got to a point where it was pretty much commit ready. The commit didn't happen because migrations were getting ready for merge at the same time, and committing Petrucha's work would have caused severe problems for migrations.

The first part in the composite fields work should be making the point field example to work. This will mean supporting .filter(point__in=((1, 2), (2, 3))), and support for .values('point'). Both of these will be surprisingly complex to do correctly. In addition there will likely be a lot of work to do in other parts of Django, too (for example in migrations), so implementing just "simple" composite fields will be a lot of work.

I like the MoneyField idea, but lets punt composite field definitions in that way for later.

 - Anssi

Thomas Stephenson

unread,
Mar 5, 2015, 9:47:49 AM3/5/15
to django-d...@googlegroups.com
 > Turns out this was a bad idea, after around 1700 lines changed everything was broken and there were multiple hard to debug failures.

Yeah, been in this situation too many times to count. 


> Split ForeignKey to a concrete field instance, and a related field instance. After this all related fields are purely virtual. This means that author = models.ForeignKey(Author) will automatically generate a author_id = IntegerField() on the model. Unfortunately this also means model._meta will now contains two fields for each foreign key defined on the model instead of just one.

The foreign_key_id field could be a subfield of foreign key (then model._meta would not contain two entries for the foreign key). Unfortunately that would open the door to composite field inheritance, but it could be handled like enum inheritance is handled in python -- you can subclass as much as you want, but you can't declare additional fields on subclasses.

If that happened, then composite foreign keys would be a lot easier (but still work that can be deferred). 


Michal Petrucha did a lot of work to add composite fields to Django. The syntax he had was:

I'm not exactly a fan of that syntax. It works for the unique_together and index_together use cases (and the primary key use case), but it puts all the subfields on model._meta and doesn't allow you encapsulate the behaviour of composite fields outside the model definition. So you can't, for example, define a reusable `MoneyField` that represents two columns in the target model.


The first part in the composite fields work should be making the point field example to work. This will mean supporting .filter(point__in=((1, 2), (2, 3))), and support for .values('point'). Both of these will be surprisingly complex to do correctly. In addition there will likely be a lot of work to do in other parts of Django, too (for example in migrations), so implementing just "simple" composite fields will be a lot of work.

Well, I've already got that working (well, I've got point__exact working and I can add point__in easily enough, it's just a matter of adding the relevant lookup transformations to get_lookup_transform. There were some comments surrounding that function which suggest it needs a refactoring, but I don't think it does.

Thomas

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Marc Tamlyn

unread,
Mar 5, 2015, 9:58:26 AM3/5/15
to django-d...@googlegroups.com
I think `Model._meta` may well always want all the concrete underlying fields included somehow, especially for migrations work. A possibility for your external `MoneyField` could be to add other fields using `contribute_to_class`, though this may be a bad idea...

Thomas Stephenson

unread,
Mar 5, 2015, 10:07:09 AM3/5/15
to django-d...@googlegroups.com
Well, I was thinking that although the subfields would still be added to Model._meta.fields, you could

- add an 'include_subfields=False' default argument to get_fields()
- only allow direct lookups of subfields via Model._meta.get_field(composite).get_subfield(subfield) or (alternately, depending on people's tastes) Model._meta.get_field(composite__subfield)

Thomas

Markus Holtermann

unread,
Mar 5, 2015, 11:13:09 AM3/5/15
to django-d...@googlegroups.com
While I like your approach, Thomas, I should also note that I'm not an expert on those parts of Django and haven't kept up with the Michal's proposal.

With respect to migrations I like Anssi's approach to have both `author` and `author_id` in _meta.fields. It will probably simplify a couple of things and we can get rid of some workarounds.

Furthermore, I suggest that we take the time and proceed with a DEP (https://github.com/django/deps) to find common ground what the API should look like before we rush a concrete implementation.

/Markus

Thomas Stephenson

unread,
Mar 6, 2015, 1:17:13 AM3/6/15
to django-d...@googlegroups.com
I'll reiterate that I mispoke when I said that "subfields wouldn't be added to model._meta". They would be, it's just that the contributor would be the CompositeField, not the model. Subfields are just wrappers for other field types and need a reference to the composite field. The difference that I was trying to make when I said "would not be available on model._meta" is that they wouldn't be available via the public API without passing special flags. 

I'm all for creating a DEP for the proposal, it seems like a better place for nutting out these issues than a mailing list thread. What's the current index into the deps? I'm guessing that it's 191.

Thomas

Anssi Kääriäinen

unread,
Mar 6, 2015, 2:49:22 AM3/6/15
to django-d...@googlegroups.com
I think we can separate composite field implementation into two steps, first implement composite fields for user defined existing fields, then make it possible for fields to auto-create its subfields.

I'm not sure of the meta issue. It seems in some cases you'll want access to the subfields, especially if foreign keys are going to be split using this same idea.

+1 for DEP, this is exactly the kind of issue where writing a DEP will be usefull, and getting an acceptance for the proposal will make implementation easier.

  - Anssi

Thomas Stephenson

unread,
Mar 6, 2015, 5:52:56 AM3/6/15
to django-d...@googlegroups.com
Y'OK. I'll create DEP 191 for the composite fields which point to model fields defined on the model (using the syntax you provided above) and DEP 192 for composite fields which define their own subfields (using the model syntax). 

I was getting confused trying to create a specification which covers all the implementation details (so far) of both cases anyway.

Thomas Stephenson

unread,
Mar 7, 2015, 2:13:19 AM3/7/15
to django-d...@googlegroups.com
PR created. Unfortunately that's all the work I'm going to do on this over the weekend, so I'll come back and review comments on monday. Some of the reST syntax is slightly off. I will fix that up monday -- mainly just posting it so people have something to review, if they're that way inclined.

Thomas
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-developers+unsubscribe@googlegroups.com.
To post to this group, send email to django-developers@googlegroups.com.

Thomas Stephenson

unread,
Mar 7, 2015, 2:13:58 AM3/7/15
to django-d...@googlegroups.com
Whoops -- would be useful to provide a link

Aymeric Augustin

unread,
Mar 7, 2015, 6:31:53 AM3/7/15
to django-d...@googlegroups.com
Hello Thomas,

It’s hard for me to digest a two-page-long email on a complex topic during
the week. I bookmarked your first email when it came in. It’s Saturday, 11am,
and I dedicated my first chunk of quality brain time to reading the entire
thread. I’ll let you ponder what the effect of your second email was.

In fact, if you propose something stupid, you’ll get a quick answer, because
it’s easy to explain. Not getting answers right now means that your proposal
is good enough to require consideration. Yes, that’s counter-intuitive.

With that out of the way, here are my thoughts on your proposal. Some overlap
with what other people have said in the thread.

1) I would find it reassuring if you described the landscape of past attempts,
what good ideas you’re keeping, what bad ideas you’re throwing away — in
short, if you convinced us that you’re building on top of past attempts. The
main goal of this exercise is to guarantee that you don’t overlook a use case
or an argument that made consensus in the past. (Don’t spend too much time on
code; it my experience it’s harder to reuse and code matters much less than
design.

2) The syntax for inline composite fields doesn't look very nice. Could you
simplify it somehow? Anssi’s proposal is good. I assume that a composite field
could add the subfields to the model class if they aren't defined explicitly
and their names passed in arguments to the composite field.

3) Have you checked that all the Field APIs make sense for CompositeField?
It's quite obvious that value_from/to_dict are needed. It's less obvious that
everything else is still needed.

4) I'm wary of the extra 'isnull' column. Couldn't we required that, if the
composite field is NULL, at least one of the subfields is NULL? My idea is
that a nullable composite field would consider itself NULL if all nullable
subfields are NULL. Declaring a composite subfield nullable when all subfields
are non-nullable would be an error. In your MoneyField example, the amount
subfield should be nullable when the composite field is nullable.

5) I understand the first two restrictions. They required deeper refactorings
to be lifted. The reason for the third one is less clear. Is it just a matter
of beating the metaclass into cooperation?

Best,

--
Aymeric.

Thomas Stephenson

unread,
Mar 7, 2015, 11:16:12 PM3/7/15
to django-d...@googlegroups.com
Aymeric,

Thanks for your input. I feel some of your concerns have been addressed in the DEPs I made, which have included quite a bit of input from this thread along with the original design. That said, some of the points you've raised are new and haven't been raised by other people, so I'll give you a full reply.

1) That's still something I have to do, but more time consuming than other parts of creating the proposal and I've put it off until I have a while to do it. The original implementation was done without too much consideration of prior attempts, because I was more interested in getting something working than I was in learning from the past.

2) I agree. The new syntax I've proposed is to add one or more class-level methods to the django.db.models API. The new syntax for "inline" fields is:


class MyModel(models.Model):
    x = models.IntegerField()
    y = models.IntegerField()
    point = models.constrain(x, y, unique=True)

The reason for not using the CompositeField constructor is that a lot of the options which make sense for standalone field constructors (and all the other field classes) make absolutely no sense when you're mainly leveraging composite fields to provide a table level constraint to two existing fields. Also there are some things that are commonly done in practice (like providing `verbose_name` as a positional arg) that make adding multiple leading positional arguments difficult.

Note: It's a shame that we can't use py3's keyword only arguments here.

3) No, not all the field base API makes sense for composite fields -- in fact most of it doesn't. This is a huge problem with fields in general, not just composite fields -- there's just too much functionality on the basic Field class that assumes a one-to-one mapping between a Field and a database column and too many parameters accepted by field base that only make sense for subsets of the available field types.

e.g. 
- What does "blank" or "max_length" mean for an IntegerField?
- What does 'rel' mean for a NullBooleanField?
- What does get_col mean for a ManyToManyField?
etc.

To fix this would mean introducing significant backwards incompatible changes and, while I would support such a change, I'm not sure it's a discussion that affects the ability to implement composite fields.

4) The addition of the `isnull` field is mainly to support the ability to query for whether a composite field is NULL. If we leave it implementation defined as to how to interpret whether a specific configuration of subfields maps to a python `None` value for the composite field, then it becomes really difficult to define lookup transformations to query for null values in the table. 

But although it sounded like a good idea when I came up with it, it raises more questions than it solves. I'm more than ready to go back to the drawing board on that one.

5) Inheritance in the model API is complicated enough as it is without adding inheritance of field types. Yes, it could be supported by "beating the metaclass into submission", but it's not something that I think adds any particular value. I did change the "no subclassing at all" restriction to "only one class in an inheritance heirarchy can define subfields" when writing the DEP though, because Ansarri wants the ability for users to extend ForeignKey with their own python behaviour. 

Thomas



--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Aron Podrigal

unread,
Mar 7, 2015, 11:36:58 PM3/7/15
to django-d...@googlegroups.com
Hi Thomas,

First, thanks for your work. I've had a plan to start working on this feature last week,
but I got too many things to work on with higher priority, Thankfully you opted for this, again Thanks.

just sharing my thoughts,

1)  I like the MoneyField model style definition.

2) I don't like the inline declaration, I would opt for Anssi's proposal.

3) as Aymeric mentioned, NOT NULL only makes sense for a real column on the database, so that should be only be handled on the SubFields.
Assigning python `None' to the CompositeField should be handled by the value_from_dict method.

4) querying syntax, sounds fare. However depending on how we decide on exposing the subfields through the Model Meta API, 
eg. if the subfields would be returned by default, then we should also allow directly querying of the subfield.

Aron

Aron Podrigal

unread,
Mar 8, 2015, 12:29:06 AM3/8/15
to django-d...@googlegroups.com
Hi Thomas, I replied earlier before you posted, looks like my message got sent actually a lot later.

having a function like  models.constrain(x, y, unique=True) makes sense for inline declarations.

About null handling, If all columns on the database are null=True and all columns have a null value, then the composite field value would be  `None'. Otherwise it would be a dict mapping to the subfields, and a query needs to specify the subfields it queries against.


Aron.


On Saturday, March 7, 2015 at 11:16:12 PM UTC-5, Thomas Stephenson wrote:
Aymeric,

Thanks for your input. I feel some of your concerns have been addressed in the DEPs I made, which have included quite a bit of input from this thread along with the original design. That said, some of the points you've raised are new and haven't been raised by other people, so I'll give you a full reply.

1) That's still something I have to do, but more time consuming than other parts of creating the proposal and I've put it off until I have a while to do it. The original implementation was done without too much consideration of prior attempts, because I was more interested in getting something working than I was in learning from the past.

2) I agree. The new syntax I've proposed is to add one or more class-level methods to the django.db.models API. The new syntax for "inline" fields is:

class MyModel(models.Model):
    x = models.IntegerField()
    y = models.IntegerField()
    point = models.constrain(x, y, unique=True)
good 

Thomas Stephenson

unread,
Mar 8, 2015, 12:13:27 PM3/8/15
to django-d...@googlegroups.com
@Aymeric

Here's why I think the `isnull` column is necessary.

Say you've got a custom field implementation:

class Point(models.CompositeField):

     x = models.IntegerField()
     y = models.IntegerField()
     # etc.

and you use this to define a coordinate system to locate objects in your models. You justify that whenever you provide a point, you'll want both an x and y coordinate, so you make both of the subfields non-null.

But then you come across a model which may or may not have a location. You'd like to say

class MaybeLocatable(models.Model):
   point = Point(null=True)

but you can't, because both of your subfields are non-nullable, so you define a nonsensical condition to apply to your Point 
"If I don't have an x-coordinate, then the point value should be considered null"
so you update the Point field appropriately (and apply all the necessary migrations).

You then want to query for all `MaybeLocatable` objects that are considered null. You can query for point__isnull=True (which is supposed to be supported). How does one define the lookup transformation for that?

If there was an implicit 'isnull' column added, then the lookup transformation for both `point__isnull=True` and `point=None` become a lookup for the database column `point__isnull = True`. If you don't add that extra column, how does the framework what transform to make? It could query for both `point__x__isnull` and `point__y__isnull`, but that wouldn't match the semantics for the column.

And you also can't say "well, look at the field definition and if a column is nullable, then use it as a marker, because what if you instead had:

class Point(models.CompositeField):
   x = models.IntegerField(null=True)
   y = models.IntegerField()
   z = models.IntegerField(null=True)

as your original class? `x` could still be used as the "marker" column, but any transformation you'd make with the above rule. So the framework can't actually define a workable `isnull` query (or an 'exact' query when the python value is `None`). The whole issue of the "marker" column is fraught anyway, because it breaks all the NOT NULL constraints on other columns anyway.



So instead of all that, the user decides that they're going to solve the problem with

class MaybeLocation(models.Model):
   has_point = models.BooleanField()
   point = models.Point(default=Point(0,0))

which is exactly the same solution as including the implicit `isnull` field when the field is created with `null=True`, except the user has to do it explicitly, and implement the logic surrounding the definition themselves.

They also can't define a new CompositeField like

class MaybePoint(...):
   has_point = models.BooleanField()
   point = Point()

because of the inheritance limitations (which I might have to lift in the medium-long term anyway, but which I'd like to preserve for as long as possible to keep the initial implementation and API simple).

The point is (no pun intended) that I can't really think of a good way to map a python None to the values of a composite field _without_ the extra implicit column. It's not the nicest solution in the world, but it works.

Thomas

Thomas Stephenson

unread,
Mar 8, 2015, 12:29:54 PM3/8/15
to django-d...@googlegroups.com
ps. There's a third option which I just thought of:

class Point(...):

   x = models.IntegerField()
   y = models.IntegerField()
   z = models.IntegerField(null=True)

   def __init__(self, null=False, **kwargs):
       if null:
         x.null = y.null = True
      super(Point, self).__init__(**kwargs)

   def deconstruct(self):
      # etc.

The framework could handle this and mandate "If a composite field has `null=True`, then all subfields are forced into the state `null=True` etc.", applying the `__init__` and `deconstruct` stuff itself.

Yep -- I like this solution better.

Thomas
       

Michael Manfre

unread,
Mar 8, 2015, 12:49:32 PM3/8/15
to django-d...@googlegroups.com
On Sun, Mar 8, 2015 at 12:12 PM, Thomas Stephenson <ova...@gmail.com> wrote:
@Aymeric

Here's why I think the `isnull` column is necessary.

Say you've got a custom field implementation:

class Point(models.CompositeField):
     x = models.IntegerField()
     y = models.IntegerField()
     # etc.

and you use this to define a coordinate system to locate objects in your models. You justify that whenever you provide a point, you'll want both an x and y coordinate, so you make both of the subfields non-null.

But then you come across a model which may or may not have a location. You'd like to say

class MaybeLocatable(models.Model):
   point = Point(null=True)

but you can't, because both of your subfields are non-nullable, so you define a nonsensical condition to apply to your Point 
"If I don't have an x-coordinate, then the point value should be considered null"
so you update the Point field appropriately (and apply all the necessary migrations).

If a point can be nullable, then the subfields should be nullable. Enforcing that both x and y have a value should be handled with a constraint and/or validators. I really don't like "_isnull" subfield. It doesn't result in good table design and adds restrictions/complication to the ORM/migrations.
 
You then want to query for all `MaybeLocatable` objects that are considered null. You can query for point__isnull=True (which is supposed to be supported). How does one define the lookup transformation for that?

If there was an implicit 'isnull' column added, then the lookup transformation for both `point__isnull=True` and `point=None` become a lookup for the database column `point__isnull = True`. If you don't add that extra column, how does the framework what transform to make? It could query for both `point__x__isnull` and `point__y__isnull`, but that wouldn't match the semantics for the column.

The ORM would do the translation based upon the Composite field being nullable. E.g. "point__isnull=True" and "point=None" would result in "[point].[x] IS NULL AND [point].[y] IS NULL"
 
And you also can't say "well, look at the field definition and if a column is nullable, then use it as a marker, because what if you instead had:

class Point(models.CompositeField):
   x = models.IntegerField(null=True)
   y = models.IntegerField()
   z = models.IntegerField(null=True)

as your original class? `x` could still be used as the "marker" column, but any transformation you'd make with the above rule. So the framework can't actually define a workable `isnull` query (or an 'exact' query when the python value is `None`). The whole issue of the "marker" column is fraught anyway, because it breaks all the NOT NULL constraints on other columns anyway.


If the composite field is nullable, then all subfields should be nullable. We can add checks to identify any invalid configurations.
 
So instead of all that, the user decides that they're going to solve the problem with

class MaybeLocation(models.Model):
   has_point = models.BooleanField()
   point = models.Point(default=Point(0,0))

which is exactly the same solution as including the implicit `isnull` field when the field is created with `null=True`, except the user has to do it explicitly, and implement the logic surrounding the definition themselves.

The more I think about it, my opinion is shifting solidly toward -1 on implicit database fields in general. I can't think of any situations when the benefit outweighs the extra complexity and standard support issues when things are implicit. There are a lot of negatives to saving 1 line of code in a model (or composite field) definition).

Your MaybeLocation example also imposes an extra database column on the user for each nullable composite field they define, even if it may be redundant for their data. For the below example, Publication would have three implicit "isnull" fields added, even if "status" defines the business rules about when deliverable, report, and poster fields are required.

class Publication(models.Modle):
    status = model.CharField(max_length=20)
    deliverable = MyDeliverableCompositeField(null=True)
    report = MyReportCompositeField(null=True)
    poster = MyPosterCompositeField(null=True)
   
 
They also can't define a new CompositeField like

class MaybePoint(...):
   has_point = models.BooleanField()
   point = Point()

because of the inheritance limitations (which I might have to lift in the medium-long term anyway, but which I'd like to preserve for as long as possible to keep the initial implementation and API simple).

The point is (no pun intended) that I can't really think of a good way to map a python None to the values of a composite field _without_ the extra implicit column. It's not the nicest solution in the world, but it works.

I'd argue that not being able to have a hierarchy of Composite fields is a feature.

Regards,
Michael Manfre 

Michael Manfre

unread,
Mar 8, 2015, 12:53:08 PM3/8/15
to django-d...@googlegroups.com
-1 on silently changing the field definition. At best this would result in an unnecessary migration when the error is discovered. Django should error out with a message that the composite field definition is invalid.

Regards,
Michael Manfre

Thomas Stephenson

unread,
Mar 9, 2015, 2:07:28 AM3/9/15
to django-d...@googlegroups.com
> It doesn't result in good table design and adds restrictions/complication to the ORM/migrations.

Well, honestly, making all subfields always nullable (whether it is or not) in order to support the composite field being potentially nullable doesn't really result in good table design either.

Enforcing that both x and y have a value should be handled with a constraint and/or validators. 

There is _no_ problem that is better handled with validators than an equivalent database constraint unless you assume that the framework is the only client your database will ever have. 

Also, I'm not sure where the "constraint" is coming from, since we've removed the possibility of providing a NOT NULL constraint on the column.

-1 on silently changing the field definition. At best this would result in an unnecessary migration when the error is discovered. Django should error out with a message that the composite field definition is invalid.

Changing the subfield definitions in the composite field constructor has to be supported, there's no way of passing parameters to subfields if you don't. What mistake is there to be found? Either you want the composite field to be nullable or you don't. You're being explicit about it by passing `null=True` into the field definition and it's a standard field parameter.



But I'm happy for that to be the final word on the matter and I'll remove it from the spec. I haven't found a single person who thinks that supporting it is a good idea, so that's probably a good indication that it isn't. 

Anssi Kääriäinen

unread,
Mar 9, 2015, 2:44:24 AM3/9/15
to django-d...@googlegroups.com
On Mon, Mar 9, 2015 at 8:06 AM, Thomas Stephenson <ova...@gmail.com> wrote:
>> It doesn't result in good table design and adds restrictions/complication
>> to the ORM/migrations.
>
> Well, honestly, making all subfields always nullable (whether it is or not)
> in order to support the composite field being potentially nullable doesn't
> really result in good table design either.
>
>> Enforcing that both x and y have a value should be handled with a
>> constraint and/or validators.
>
> There is _no_ problem that is better handled with validators than an
> equivalent database constraint unless you assume that the framework is the
> only client your database will ever have.

You'll likely need both. The database constraint is there to catch all
errors, whether generated by the framework or some other client. The
validator will supply nice user-facing error messages. This is how
unique constraints work for example: there is a model validator that
says "this value already exists in the database", but we also have
database level constraint to make sure there is no way to store
invalid values.

>> -1 on silently changing the field definition. At best this would result in
>> an unnecessary migration when the error is discovered. Django should error
>> out with a message that the composite field definition is invalid.
>
> Changing the subfield definitions in the composite field constructor has to
> be supported, there's no way of passing parameters to subfields if you
> don't. What mistake is there to be found? Either you want the composite
> field to be nullable or you don't. You're being explicit about it by passing
> `null=True` into the field definition and it's a standard field parameter.

If the user supplies the fields, then we don't want to change them
silently. We could have checks framework error for incompatible
settings however. If the field auto-creates the subfields, then it can
alter them in any way it wishes.

Making the composite field null only when all of its subfields are
null seems the right behavior to me. Otherwise you'd have strange
cases where .filter(money__amount=10) is true, but
.filter(money__isnull=True) is also true for the same row.

- Anssi

Thomas Stephenson

unread,
Mar 12, 2015, 11:07:52 PM3/12/15
to django-d...@googlegroups.com
All the null handling stuff has been removed from the specification and replaced with slightly more stringent restrictions on `value_to_dict`. There is an updated version which includes that change, plus a couple more alterations that were discussed here and on the PR available here.

Thomas

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.

Thomas Stephenson

unread,
Mar 17, 2015, 11:09:49 AM3/17/15
to django-d...@googlegroups.com
Not impatient or anything, but...

Bump.

Asif Saifuddin

unread,
Mar 18, 2015, 12:47:27 AM3/18/15
to django-d...@googlegroups.com
but? IRC discussion?

Thomas Stephenson

unread,
Mar 18, 2015, 4:55:14 AM3/18/15
to django-d...@googlegroups.com
IRC discussion sounds fine with me, but I'd like to be involved. What's the channel?

Thomas

Curtis Maloney

unread,
Mar 18, 2015, 5:28:43 AM3/18/15
to django-d...@googlegroups.com
Dev discussions typically happen on #django-dev on the Freenode IRC network.

--
C


Tim Graham

unread,
Mar 18, 2015, 9:33:14 AM3/18/15
to django-d...@googlegroups.com
Thomas, Please be patient. As for me, I'm focused on finishing up the 1.8 release. I've added composite fields and a link to your DEP to the 1.9 roadmap to help ensure it gets attention for that release cycle: https://code.djangoproject.com/wiki/Version1.9Roadmap

Victor Porton

unread,
Oct 16, 2016, 10:06:24 AM10/16/16
to Django developers (Contributions to Django itself)
I have implemented something like this: https://bitbucket.org/portonv/composite-fields

(I did my implementation before reading my thread.)

Please feel free to use and modify my code.

Well, now I should read this thread and https://github.com/django/deps/blob/master/draft/0191-composite-fields.rst

On Wednesday, March 4, 2015 at 3:16:37 PM UTC+2, Thomas Stephenson wrote:
Considering the past two proposals I've made here, I doubt I'll get more than an echo chamber effect on this one.

For the past week or so I've spent a bit of time on a feature I've always wanted to see land in django -- composite fields. The tickets have been open in the bug tracker for quite some time (and there's a few related ones, such as multi-column primary keys that can all be killed with the one stone).

The work is available on this branch of my fork of django for the moment -- I haven't opened up a PR yet because there's still some features that are still to be implemented that will be explained below, but I want to give everybody a chance to tell me where I can stick it before I spend *too* much time on it.

So, without further ado, the proposal.


Composite Fields - Implemented

A composite field is an encapsulation of the functionality of a subset of fields on a model. Composite fields can be defined in one of two ways:

1. Statically declared composite fields

A statically declared composite field is defined in the same way a django model is defined. There are two customisable transformation functions, CompositeField.value_to_dict(self, value) and CompositeField.value_from_dict(self, value) which can be used to associate the field with a python object.

All the serialization functions are implemented via the implementations of the subfields.

For example,  

class MoneyField(models.CompositeField):
   currency_code = models.CharacterField(max_length=3)
   amount = models.DecimalField(max_digits=16, decimal_digits=4)

   ## Overriding __init__ can be used to pass field parameters to the subfields

   def value_from_dict(self, value):
       if value is None:
          return None
       return Money(**value)

   def value_to_dict(self, value):
      if value is None:
         return None
      return {attr: getattr(value, attr) for attr in ('currency_code', 'amount')}

2. Inline composite fields.

An inline composite field is declared at the field definition site on the body of a model, by providing the subfields as the 'fields' argument of the CompositeField constructor. There are no transformation parameters available to override when declaring a composite field in this fashion -- the value of the field is always available as a python `dict` as an attribute on the MyModel

class MyModel(models.Model):
    id = models.CompositeField(fields = [
       ('a', models.IntegerField()),
       ('b', models.CharField(max_length=30)
    ], primary_key=True)

This method for defining composite fields has a few drawbacks, but can be useful if the only reason to add the composite field to the model was to implement a unique_together or index_together constraint *

* Although it's still possible to do that directly on class Meta.

3. Null

Setting the value of a multi-column field to NULL is different than setting any of the individual subfields to NULL. But there are cases (e.g. Money) where we would like to be able to set `null=True` on a composite field, but still retain 'NOT NULL' constraints on each of the subfield columns.

To solve this problem, every table which implements a CompositeField will also add an implicit (semi-hidden) `isnull` subfield on the attribute, which keeps track of whether it is the value of the composite field that is null, or any of the particular subfields.


3. Querying.

The syntax for querying over the subfields of a composite field will be familiar to anyone who has queried over a relationship attribute in django.

model.objects.filter(price__currency_code='USD', price__amount__lt=Decimal('50.00')).all()

In addition, composite fields can be queried via EXACT and IN lookups. It is possible to implement custom lookups for specific statically defined fields, but not recommended and not part of the official API.

4. Restrictions

The following restrictions are currently imposed on the use of composite fields. None of these are restrictions that can't be worked around in future extensions, but they're restrictions which considerably simplify both the implementation and API.

- No related fields as a subfield of a composite field
- No nested composite fields
- No inheritance of composite fields (apart from inheriting from CompositeField itself).

5. Changes to the Field API

As discussed in the other thread I posted. I've changed the implementation so that _get_cache_name can still be dependent on the name, but I think using attname is more useful anyway.

Composite Fields -- unimplemented

These features are still not implemented

- multi column primary keys. unique_together and index_together are implemented and adding a primary key constraint should be a similar operation.
- some small issues with multi-table inheritance.
- more test coverage
- proper documentation
- anything that comes out of this thread.


  
Reply all
Reply to author
Forward
0 new messages