GSoC Meta refactor: Bikeshedding time!!

2,199 views
Skip to first unread message

Russell Keith-Magee

unread,
Aug 15, 2014, 9:38:30 PM8/15/14
to Django Developers
Hi all,

tl;dr - Daniel's GSoC is coming to a close; we need some help verifying that we've got our taxonomy correct and API names that make sense.

The long version:

Daniel Pyrathon has been making great progress with his GSoC project to refactor and formalise the _meta object on Django models. [1]

For those who haven't been following along, the project aims to finally document the interface provided by _meta - in particular, the API methods that let you introspect the fields and relations that exist on a model. Despite it's importance to tools like admin, t's never been a formally listed as a stable API. And over the last 8 years, _meta has also picked up lots of cruft, so there are a few messy internal pieces, duplicated functionality, and so on.

Aside from the benefit of cleaning up documenting a core piece of Django, this project has the side effect of providing an interface against which others can develop - which means it is possible to develop backends for other data stores that are "Django compliant". Daniel has already demonstrated this with a theoretical "Email Model" wrapper around Google's Gmail API [2]. This is model that has no connection to Django's Model base class, but quacks enough like a Django model that it can be used in Django's admin, with Django ModelForms, etc. With a little bit of effort, it is now conceivable that SQLAlchemy, MongoDB, and many other data stores could be exposed in a way that they can be viewed in Django's admin, and modified using Django's forms.

We're probably not going to get his refactor committed by the formal end of the GSoC, but we're getting close, and Daniel has said he's interested in continuing beyond the end of the formal GSoC period to get the PR to completion. A huge Thank You goes to Daniel for his excellent efforts over the last few months.

However, now we're at the pointy end, and that means some bike shedding. 

At the core of the _meta API is a set of methods to retrieve the fields on a model - given a model, _meta allows us to query that model and discover all the fields and relations that are associated with that model. Django has lots of different field types, but we've never really formalised nomenclature for some of them. 

After a SoC worth of discussion, Daniel, myself, and various contributors on IRC and Github have ended up with the following taxonomy:

a) "Pure" data fields - things like CharField, IntegerField, etc. Fields that manifest as a single column on the model's underlying table.

b) "Relating" data fields - This means ForeignKey. Fields that manifest as a single column, but represent a relation to another model.

c) "Pure" external fields - Fields that manifest as an external table. Django doesn't really have any examples of these at present. Conceptually, something like a "document" field type in a document-based store might fall into this category.

d) "Relating" external fields - This means ManyToMany fields. Fields that are manifested as an external table, but represent a relation to a different model. 

e) "Pure" virtual fields - Fields that are conceptual wrappers around other fields on the same model. Virtual fields don't have column representations themselves; they are a wrapper around columns provided by other fields. Again, Django doesn't have any of these at present, but it's easy to think of examples of "virtual" fields like Point (a wrapper around an X and Y field). Composite fields would probably fall into this group.

f) "Relating" virtual fields - Fields that are conceptual wrappers around other fields on the same model that represent a relation to another model. Generic Foreign Key is the example here.

g) Related objects - The reverse side of (b) - a field representing all the objects that are related to this model in a singular relation (i.e., the reverse of a FK)

h) Related ManyToMany - The reverse side of (d) - a field representing all the objects that are related to this model in a multiple relation. (i.e., the reverse of a M2M)

i) Related Virtual - The reverse side of (f) - a field representing all the objects that are related to this model through a virtual relation (i.e., a GenericRelation)

So - firstly, we need a sanity check. Does this taxonomy capture all field types that you can think of? Are there any interpretations of composite fields, or any other esoteric field type (existing or imagined) that don't fit in this taxonomy?

Secondly - the hard part - naming. The current API (i.e., Django 1.6 API) is a little confused in relation to this nomenclature:

a) fields
b) No specific name; included in "fields"
c) No matching field type
d) many_to_many
e) No matching field type\
f) virtual_fields (even though it's a relating type)
g) get_all_related_objects()
h) get_all_related_many_to_many_objects()
i) Included in virtual_fields (even though it's a reverse type)

So there's variation on whether to have a _fields suffix, whether to use m2m or many_to_many, whether to use attributes or methods, and on basic classification of some field types.

Here's my suggestion for a normalised set of names that match the taxonomy:

a) data_fields
b) foreign_key_fields
c) external_data_fields
d) many_to_many_fields
e) virtual_data_fields
f) virtual_foreign_key_fields
g) related_data_objects
h) related_many_to_many_objects
i) related_virtual_objects

These would all be exposed in two ways:

 * As properties (so, MyModel._meta.data_fields would be a list of type (a) fields on a model), and
 
 * As keyword arguments to get_fields(), by dropping the last part of the name (i.e., MyModel._meta.get_fields(data=True), which would return the same result as MyModel._meta.data_fields). 

I'm obviously preferring the long-form name here (many_to_many vs m2m, etc). This is obviously one point for discussion, where we seek opinions. Any other naming suggestions are also welcome.

The final form of the formal API would then be:

 * get_field(name) returns the field with a given name.

 * get_fields(data=True, foreign_key=True, ...), returns all the fields that match the given flags

 * A set of optimised and cached properties - data_fields, foreign_key_fields, etc; essentially cached wrappers around calls to get_fields() with specific flags enabled.

This API replaces all the other field-related methods on _meta, including get_all_related_objects(), get_concrete_fields_with_model(), get_m2m_with_model(), and so on. Daniel's branch demonstrates that the most of these extra methods don't provide any performance benefit; they just complicate the internals.

The "old" API methods can also be entirely implemented using calls to the "new" API; so there's a fully backwards-compatible path for introducing the new API.

Comments welcome. Obviously, this has enormous potential to devolve into bike shedding, so I'd appreciate it if people kept that in mind. If you have a preference for something like short vs long form names, feel free to state it, but please don't let this devolve into arguments over the relative merits of pith over verbosity in API naming. It's much more important that we clarify the matters of substance - i.e., that we have a complete and correct taxonomy - not that we fixate on the names themselves.


Yours,
Russ Magee %-)

Anssi Kääriäinen

unread,
Aug 16, 2014, 3:06:51 PM8/16/14
to django-d...@googlegroups.com
On Saturday, August 16, 2014 4:38:30 AM UTC+3, Russell Keith-Magee wrote:
b) "Relating" data fields - This means ForeignKey. Fields that manifest as a single column, but represent a relation to another model.

This definition will not work when multicolumn foreign keys are introduced. Especially not with the name foreign_key_fields. This would either mean that relating data fields do contain fields that have more than single backing column, or that foreign_key_fields do not contain all foreign key fields.

Michal Petrucha's work on virtual fields aims to make ForeignKeys virtual fields - they have one or more backing pure data fields, and then the relation is handled by a virtual fields. The work done by him shows that this way works well. The patch was actually close to committable already during 1.7 development, but as it didn't play well with migrations we had to defer it. The point here is that I expect that we will want to make ForeignKeys virtual fields soonish. This doesn't play well with the categorization.
 
d) "Relating" external fields - This means ManyToMany fields. Fields that are manifested as an external table, but represent a relation to a different model. 

Should we define this category as m2m fields? Calling it many_to_many_fields, but defining it as including all external storage fields seems a bit problematic. 
 
So - firstly, we need a sanity check. Does this taxonomy capture all field types that you can think of? Are there any interpretations of composite fields, or any other esoteric field type (existing or imagined) that don't fit in this taxonomy?

It seems the proposed API has fields in one, and only one category. Maybe it would be better to have categorization where fields fall in to multiple categories? The categories could be data, virtual, relation, reverse_relation and m2m.

For example m2m field would be virtual, related or reverse_related and of course m2m. In the future a foreign key would create a backing data field. The foreign key itself would be virtual relation field. The reverse side of the foreign key would be virtual and reverse_related. GenericForeignKey would also be a virtual related field (with two backing data fields).

I don't see how a TranslationField would fit into the above categorization. A TranslationField is defined as a field that gets a single translation from related translations table. So, it is the reverse side of a foreign key with an additional restriction on language (in effect generating a join condition JOIN article_translations ON article.id = article_translations.article_id AND article_translations.language = 'fi'). At least as defined this isn't in category g as it doesn't return all reverse objects of category b. It doesn't fit in to any other category either. So, we need some changes to the wording.

As another example we might someday want to allow fully custom join condition fields. These fields wouldn't be foreign key, external data nor many to many fields nor the reverse of those categories

Comments welcome. Obviously, this has enormous potential to devolve into bike shedding, so I'd appreciate it if people kept that in mind. If you have a preference for something like short vs long form names, feel free to state it, but please don't let this devolve into arguments over the relative merits of pith over verbosity in API naming. It's much more important that we clarify the matters of substance - i.e., that we have a complete and correct taxonomy - not that we fixate on the names themselves.

I don't think users actually want to get fields based on the suggested categorization. I feel we get an easier to use and more flexible API if we have higher level categories and allow fields to match multiple categories. As a practical example if I want all relation fields, that is going to be hard using the suggested API. Getting all relation fields is a more realistic use case than getting related virtual objects.

If we want to have all fields to match single and only single category, then we need to redefine the categories to make sure ForeignKeys as virtual fields are possible, and that more esoteric custom join based fields fit in to the categorization.

BTW where are the github discussions located? I didn't spot them from the referenced PR 2894.

 - Anssi

Shai Berger

unread,
Aug 16, 2014, 3:44:53 PM8/16/14
to django-d...@googlegroups.com
Hi,

It seems to me that the taxonomy doesn't handle well FileField and ImageField.
It could be bundled in with ForeignKey (as the data it really represents is
only pointed at by the related column data), but not with the current wording.

For ImageField, there is -- in addition to the above -- the relation to
height_field and width_field. It would appear to be a mix between a pure field
and a virtual field.

My 2 cents,
Shai.

Russell Keith-Magee

unread,
Aug 18, 2014, 12:45:17 AM8/18/14
to Django Developers
Hi Anssi,

On Sun, Aug 17, 2014 at 3:06 AM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
On Saturday, August 16, 2014 4:38:30 AM UTC+3, Russell Keith-Magee wrote:
b) "Relating" data fields - This means ForeignKey. Fields that manifest as a single column, but represent a relation to another model.

This definition will not work when multicolumn foreign keys are introduced. Especially not with the name foreign_key_fields. This would either mean that relating data fields do contain fields that have more than single backing column, or that foreign_key_fields do not contain all foreign key fields.

Michal Petrucha's work on virtual fields aims to make ForeignKeys virtual fields - they have one or more backing pure data fields, and then the relation is handled by a virtual fields. The work done by him shows that this way works well. The patch was actually close to committable already during 1.7 development, but as it didn't play well with migrations we had to defer it. The point here is that I expect that we will want to make ForeignKeys virtual fields soonish. This doesn't play well with the categorization.

Interesting.
  
d) "Relating" external fields - This means ManyToMany fields. Fields that are manifested as an external table, but represent a relation to a different model. 

Should we define this category as m2m fields? Calling it many_to_many_fields, but defining it as including all external storage fields seems a bit problematic. 

That's exactly what I proposed in my formal naming scheme. I was being deliberately abstract in the descriptions, but in practice, I agree (d) is "many_to_many_fields", and the API should say this (unless someone can think of a good reason why it shouldn't) - like a "relating external field" that isn't an m2m relation.  
 
So - firstly, we need a sanity check. Does this taxonomy capture all field types that you can think of? Are there any interpretations of composite fields, or any other esoteric field type (existing or imagined) that don't fit in this taxonomy?

It seems the proposed API has fields in one, and only one category. Maybe it would be better to have categorization where fields fall in to multiple categories? The categories could be data, virtual, relation, reverse_relation and m2m.

For example m2m field would be virtual, related or reverse_related and of course m2m. In the future a foreign key would create a backing data field. The foreign key itself would be virtual relation field. The reverse side of the foreign key would be virtual and reverse_related. GenericForeignKey would also be a virtual related field (with two backing data fields).

I understand what you're driving at here, and I've had similar thoughts over the course of the SoC. The catch is that this makes the API for get_fields() fairly complicated.

If every field fits into one specific type, then get_fields() just requires a single boolean flag (do I include fields of type X) for each field type. We can also easily add new field types by adding new booleans to the API.

However, if a field fits into multiple categories, then it's impossible (or, at least, exceedingly complicated) to make a single call to get_fields() that will specify all your field requirements. "Get me all non-virtual data fields" requires "virtual=False, data=True, m2m=False", but "Get all virtual data fields that represent m2ms" requires "virtual=True, data=False, m2m=True". You can't pass in both sets of arguments at the same time, so you either have to make multiple calls to get_fields(), or you have to invent some sort of query syntax for get_fields() that allows union queries. 

Plus, at the end of the day, get_fields() is abstracted behind highly cached and optimised properties for key lookups. These properties are effectively a cached call to get_fields() with a specific set of arguments - so even if get_fields() doesn't expose a "one category per field" requirement, the API will require, at some level, names that have clear (and preferably non-overlapping) membership.

I don't see how a TranslationField would fit into the above categorization. A TranslationField is defined as a field that gets a single translation from related translations table. So, it is the reverse side of a foreign key with an additional restriction on language (in effect generating a join condition JOIN article_translations ON article.id = article_translations.article_id AND article_translations.language = 'fi'). At least as defined this isn't in category g as it doesn't return all reverse objects of category b. It doesn't fit in to any other category either. So, we need some changes to the wording.

As another example we might someday want to allow fully custom join condition fields. These fields wouldn't be foreign key, external data nor many to many fields nor the reverse of those categories

Comments welcome. Obviously, this has enormous potential to devolve into bike shedding, so I'd appreciate it if people kept that in mind. If you have a preference for something like short vs long form names, feel free to state it, but please don't let this devolve into arguments over the relative merits of pith over verbosity in API naming. It's much more important that we clarify the matters of substance - i.e., that we have a complete and correct taxonomy - not that we fixate on the names themselves.

I don't think users actually want to get fields based on the suggested categorization. I feel we get an easier to use and more flexible API if we have higher level categories and allow fields to match multiple categories. As a practical example if I want all relation fields, that is going to be hard using the suggested API. Getting all relation fields is a more realistic use case than getting related virtual objects.

Quite probably true. As a point of interest, the current (as in, 1.6) API actually doesn't differentiate between category (a) "pure data" and category (b) "relating data (i.e., FK)" fields - if you ask for "data fields" you get pure data *and* foreign keys. So, at least as far as Django's own usage is concerned, you're correct in saying that taxonomy I've described isn't fully required. 

Daniel's survey of internal usage reveals that there are three use cases for getting a list of fields in Django's internal API:

 * Get all data and m2m fields (i.e., categories  a, b, and d). This is effectively "all fields on *this* model"

 * Get all data, m2m, related objects, related m2m, and virtual fields (i.e., categories a, b, d, f, g, h, i - excluding c and e because Django doesn't currently have any fields of this type). This is "all fields on this model, or related to this model"

 * Get all m2m fields (i.e., category d)
 
So - at the very least, we need names to describe those three groups. My intention with describing a richer taxonomy is to try and give names to other groupings of interest. 

If we want to have all fields to match single and only single category, then we need to redefine the categories to make sure ForeignKeys as virtual fields are possible, and that more esoteric custom join based fields fit in to the categorization.

Agreed - that's why I threw this out there for discussion :-)

Properties like "data", "virtual", "external", "related", "relating" - these are high level concepts describing the way a field manifests. However, that doesn't mean we need to expose these properties as part of the formal API.

Part of the underlying problem here -- lets say we roll out Django 1.7 with some version of this API, and in 1.8, foreign key fields change to become virtual. That effectively becomes backwards incompatible for queries that are sensitive to a "virtual" flag; but it doesn't change the underlying need to identify that a field is a foreign key. We need to capture the latter use case, but not necessarily the former.
 
BTW where are the github discussions located? I didn't spot them from the referenced PR 2894.

The discussions on github aren't the best record of the discussion that have been had.  They're mostly tied to earlier versions of the patch, and an earlier pull request (the number of which I can't seem to find right now). Unfortunately, most of the productive discussions in this area have been on IRC or voice chat, so there isn't a good archive.

Russ %-)

Russell Keith-Magee

unread,
Aug 18, 2014, 12:50:04 AM8/18/14
to Django Developers
Hi Shai,

I'm not sure this is a problem. 

I agree that conceptually, part of the data for a FileField/ImageField is held "externally"; but it's a different kind of external. From the database's perspective, the record is complete and correct when you're storing a string. The fact that the string represents a file system path is a very significant implementation detail - after all, you need to know to show a file browsing dialog (or whatever UI you want) - but then, the same is true of a date field needing a date picker, and a boolean field needing a checkbox. It doesn't affect the way the database needs to interact with the data it is storing.

However, this might be a manifestation of the sort of problem Anssi raised - that the taxonomy I've suggested is too rich, and that we need to simplify to the practical use cases, rather than try and build a complex and descriptive API.

Russ %-)

Daniel Pyrathon

unread,
Aug 18, 2014, 5:10:19 AM8/18/14
to django-d...@googlegroups.com
Hi All,

First of all, thanks Russell for bringing this discussion up.

Regarding get_fields complication
Throughout the development of this project, I have realised that 90% of the API usage inside and outside of Django can rely entirely on 4 or 5 cached properties.
The most used API calls are:

- get all data fields
- get all m2m fields
- get all related data
- get all related m2m
- get field by name

These 5 are by far the most used endpoints of the API. Said this, there is a small set of very necessary endpoints that are called in only a few places, such as:

- get all related data with hidden fields 
- get all related data including proxy relations
- get all data fields that have a column

Some of these, have been refactored in-place and are not part of the API any more. Others unfortunately are still subsets of the API but I personally see very few people (none actually) wanting this information for other use.
For this reason, as an end-user, you should think of this API only as a set of (cached) properties as most likely you will never need to use the get_fields API directly. To make this a little clearer, I have attached an image of the API.



Regarding FileField
I think this is still a data field. The fact that it stores an image path and it's getter/setter does some magic does not change it's "identity". ImageField is, based on the definition of virtual so far, a virtual field.

Anssi Kääriäinen

unread,
Aug 18, 2014, 6:03:32 AM8/18/14
to django-d...@googlegroups.com
On Monday, August 18, 2014 7:45:17 AM UTC+3, Russell Keith-Magee wrote:
I understand what you're driving at here, and I've had similar thoughts over the course of the SoC. The catch is that this makes the API for get_fields() fairly complicated.

If every field fits into one specific type, then get_fields() just requires a single boolean flag (do I include fields of type X) for each field type. We can also easily add new field types by adding new booleans to the API.

However, if a field fits into multiple categories, then it's impossible (or, at least, exceedingly complicated) to make a single call to get_fields() that will specify all your field requirements. "Get me all non-virtual data fields" requires "virtual=False, data=True, m2m=False", but "Get all virtual data fields that represent m2ms" requires "virtual=True, data=False, m2m=True". You can't pass in both sets of arguments at the same time, so you either have to make multiple calls to get_fields(), or you have to invent some sort of query syntax for get_fields() that allows union queries. 

Plus, at the end of the day, get_fields() is abstracted behind highly cached and optimised properties for key lookups. These properties are effectively a cached call to get_fields() with a specific set of arguments - so even if get_fields() doesn't expose a "one category per field" requirement, the API will require, at some level, names that have clear (and preferably non-overlapping) membership.

If fields are in multiple categories then users will want to do the full range of set operation on the categories. Encoding that in to the API doesn't sound promising.


I don't think users actually want to get fields based on the suggested categorization. I feel we get an easier to use and more flexible API if we have higher level categories and allow fields to match multiple categories. As a practical example if I want all relation fields, that is going to be hard using the suggested API. Getting all relation fields is a more realistic use case than getting related virtual objects.

Quite probably true. As a point of interest, the current (as in, 1.6) API actually doesn't differentiate between category (a) "pure data" and category (b) "relating data (i.e., FK)" fields - if you ask for "data fields" you get pure data *and* foreign keys. So, at least as far as Django's own usage is concerned, you're correct in saying that taxonomy I've described isn't fully required. 

Daniel's survey of internal usage reveals that there are three use cases for getting a list of fields in Django's internal API:

 * Get all data and m2m fields (i.e., categories  a, b, and d). This is effectively "all fields on *this* model"

 * Get all data, m2m, related objects, related m2m, and virtual fields (i.e., categories a, b, d, f, g, h, i - excluding c and e because Django doesn't currently have any fields of this type). This is "all fields on this model, or related to this model"

 * Get all m2m fields (i.e., category d)
 
So - at the very least, we need names to describe those three groups. My intention with describing a richer taxonomy is to try and give names to other groupings of interest. 

If we want to have all fields to match single and only single category, then we need to redefine the categories to make sure ForeignKeys as virtual fields are possible, and that more esoteric custom join based fields fit in to the categorization.

Agreed - that's why I threw this out there for discussion :-)

Properties like "data", "virtual", "external", "related", "relating" - these are high level concepts describing the way a field manifests. However, that doesn't mean we need to expose these properties as part of the formal API.

Part of the underlying problem here -- lets say we roll out Django 1.7 with some version of this API, and in 1.8, foreign key fields change to become virtual. That effectively becomes backwards incompatible for queries that are sensitive to a "virtual" flag; but it doesn't change the underlying need to identify that a field is a foreign key. We need to capture the latter use case, but not necessarily the former.
 
Could we go with a minimal API for get_fields()? Instead of having categorization on the get_fields() API, we could provide field flags for the categories. With field flags it is straightforward to filter the return list of get_fields(). As an example, fetching those fields which are relations but which aren't virtual: [f for f in get_fields() if f.relational and not f.virtual]. If this path is taken, then I am not sure how minimal the get_fields() API should be. We likely need flags for at least if the field is defined on local, parent or some remote model.

As for changing ForeignKey to virtual field plus concrete field representation - I just realized this will be backwards incompatible no matter what we do regarding categorization. An all-fields including get_fields() call will return separate author (virtual) and author_id (concrete) fields after the split. I am not sure what we can do about this. It would be very unfortunate if we can't refactor the way ForeignKeys work due to the meta API. Any ideas how we can avoid the backwards compatibility trap?

 - Anssi

Ivan Kharlamov

unread,
Aug 18, 2014, 7:33:20 AM8/18/14
to django-d...@googlegroups.com
On 08/18/2014 02:03 PM, Anssi Kääriäinen wrote:
> As for changing ForeignKey to virtual field plus concrete field
> representation - I just realized this will be backwards incompatible no
> matter what we do regarding categorization. An all-fields including
> get_fields() call will return separate author (virtual) and author_id
> (concrete) fields after the split. I am not sure what we can do about
> this. It would be very unfortunate if we can't refactor the way
> ForeignKeys work due to the meta API. Any ideas how we can avoid the
> backwards compatibility trap?

Excuse me if I misunderstood the issue being discussed, but is it not
viable to (in some situations) filter out the fields that are present in
to_field of ForeignKeys?

Shai Berger

unread,
Aug 18, 2014, 7:44:49 AM8/18/14
to django-d...@googlegroups.com
Hi again,

Below, ">D" are quotations from Daniel's message I'm replying to, and
">R" are from Russell's message that opened this thread.

>D *Regarding FileField*

It took me some time to clear for myself why FileField is a data field, and not
like FK: The point is not where the data is stored and whether the DB field
contents are "it" or just a pointer -- the point is whether another model is
involved. I think this is a key distinction; wording the taxonomy in these
terms can clarify certain other issues. As an example,

>R a) "Pure" data fields - things like CharField, IntegerField, etc. Fields
>R that manifest as a single column on the model's underlying table.

this wording makes it a little unclear (at least to me) if parent fields fall
into this category (as they do not "manifest as a single column on the model's
underlying table"). But the higher-level language, "fields that store primitive
data on the model instance itself" -- with a better word for "primitive", or a
full explanation on what it means -- is something I find much clearer and more
accurate.

>D ImageField is, based on the definition of virtual so far, a virtual field.

Well, as it is not related nor relating, the only virtual field it could be is:

>R e) "Pure" virtual fields - Fields that are conceptual wrappers around other
>R fields on the same model. Virtual fields don't have column representations
>R themselves; they are a wrapper around columns provided by other fields.

But ImageField does have its own column representation. In fact, it is not a
wrapper around the width and height fields in the way that, say,
GenericForeignKey is around the content_type and object_id fields -- if you
change these fields, nothing about the image changes. So, yes, I maintain that
it is a different beast.

>
> On Monday, August 18, 2014 6:50:04 AM UTC+2, Russell Keith-Magee wrote:
> >
> > I agree that conceptually, part of the data for a FileField/ImageField is
> > held "externally"; but it's a different kind of external. From the
> > database's perspective, the record is complete and correct when you're
> > storing a string. The fact that the string represents a file system path
> > is a very significant implementation detail - after all, you need to
> > know to show a file browsing dialog (or whatever UI you want) - but
> > then, the same is true of a date field needing a date picker, and a
> > boolean field needing a checkbox. It doesn't affect the way the database
> > needs to interact with the data it is storing.
> >

As I said above, I think focusing on the database here is a red herring. This
is a model-level API, and we should be focusing on the interactions between
models and fields, and between them and other code.

In that spirit, I think some more relevant categories might be

1) "hidden fields" (the parent_ptr?),
2) "read-only fields" (currently I suspect this only applies to DateTime fields
with auto_now=True, but I can imagine calculated aggregation fields),
3) "fields you shouldn't mess with" (id, parent_ptr, image's width_field and
height_field -- you can edit these, but in all probability you don't want to),
4)"fields you should only edit via API" (passwords)

HTH,
Shai.

Daniel Pyrathon

unread,
Aug 18, 2014, 7:46:17 AM8/18/14
to django-d...@googlegroups.com
Hi Anssi

With regards to Michal Petrucha's changes, I have the feeling you are correct. A solution to this could be to provide a cached property called **foreign_keys** that will disguise the actual identity of a ForeignKey. Once the Michael's changes are introduced, we could start deprecating the cached property and raise a soft warning announcing the change, just like we are doing with all the other API calls in Options.
Said this, if developers decide to go with the get_fields() API, then unfortunately this will not help.

Daniel Pyrathon

unread,
Aug 18, 2014, 8:18:19 AM8/18/14
to django-d...@googlegroups.com
Hi Shai,

Thanks for getting back, so..


On Monday, August 18, 2014 1:44:49 PM UTC+2, Shai Berger wrote:
Hi again,

Below, ">D" are quotations from Daniel's message I'm replying to, and
">R" are from Russell's message that opened this thread.

>D  *Regarding FileField*

It took me some time to clear for myself why FileField is a data field, and not
like FK: The point is not where the data is stored and whether the DB field
contents are "it" or just a pointer -- the point is whether another model is
involved. I think this is a key distinction; wording the taxonomy in these
terms can clarify certain other issues. As an example,

>R  a) "Pure" data fields - things like CharField, IntegerField, etc. Fields
>R  that manifest as a single column on the model's underlying table.

this wording makes it a little unclear (at least to me) if parent fields fall
into this category (as they do not "manifest as a single column on the model's
underlying table"). But the higher-level language, "fields that store primitive
data on the model instance itself" -- with a better word for "primitive", or a
full explanation on what it means -- is something I find much clearer and more
accurate.

Thanks, will change the docs.
 

>D  ImageField is, based on the definition of virtual so far, a virtual field.

Well, as it is not related nor relating, the only virtual field it could be is:

>R  e) "Pure" virtual fields - Fields that are conceptual wrappers around other
>R  fields on the same model. Virtual fields don't have column  representations
>R  themselves; they are a wrapper around columns provided by other fields.

But ImageField does have its own column representation. In fact, it is not a
wrapper around the width and height fields in the way that, say,
GenericForeignKey is around the content_type and object_id fields -- if you
change these fields, nothing about the image changes. So, yes, I maintain that
it is a different beast.
 
That's correct! sorry about that, I am looking at the implementation of ImageField and it looks like the width_field and height_field are optional (is that correct?).

>R Pure" virtual fields - Fields that are conceptual wrappers around other fields on the same model. Virtual fields don't have column representations themselves; they are a wrapper around columns provided by other fields.

ImageField does have a column representation of it's own and It requires migrations, so this means that it looks nearer to a data field, but where to put it? As you say correctly, it's a beast of it's own.
Another solution could be to refactor ImageField to make it 100% virtual compatible, but if we do this it also makes sense to refactor a lot of other ambiguities in the codebase.

Collin Anderson

unread,
Aug 18, 2014, 10:58:20 AM8/18/14
to django-d...@googlegroups.com
The goal is to have "API methods that let you introspect the fields and relations that exist on a model", right? Why go though the trouble of finding the one specific type for each field (that we'll never be able to change later)? Why have a get_fields() method with an ever-growing number of kwargs?

I want all "related" fields. Easy:
(f for f in _meta.fields if hasattr(f, 'rel'))

I want all read-only fields. Easy:
(f for f in _meta.fields if not f.editable)

I want all fields that can be edited through a form. Something like:
(f for f in _meta.fields if hasattr(f, 'formfield'))

I want all "local" fields (not that you should care). Easy:
(f for f in _meta.fields if f.model == _meta.model)

I want all fields that have an actual column in the database. Something like
(f for f in _meta.fields if f.db_type())

I want all fields that function like a ManyToMany. Easy:
(f for f in _meta.fields if f.get_internal_type() == 'ManyToManyField')

I want all fields that have ForeignKey.to_field pointing to them. Something like:
set(_meta.get_field(fname) for rel in _meta.related for fname in rel.to_fields)

Collin Anderson

unread,
Aug 18, 2014, 11:12:50 AM8/18/14
to django-d...@googlegroups.com
Also, I think we should avoid discriminating between "virtual" and non-virtual (as with local vs parent). Why should it matter how a field is stored in the database? I think the distinction will make it harder to use non-relational databases.

It maybe helpful to recognize if a field has a "parent" field: width_field.parent == image_field. int_field.parent == composite_field.

Daniel Pyrathon

unread,
Aug 18, 2014, 2:18:54 PM8/18/14
to django-d...@googlegroups.com
Hi Colin,

Thanks for getting back
Ha! I wish it was so easy :( unfortunately there are quite a bit of edge cases that need to be take into consideration. For example:
field.rel: many fields have a rel, but point to None. Also there are cases when rel points to a swapped model. In this case we need to do some more work.
ManyToMany: do we really want to hard-code this? or do we want to have an internal component handle it for us. What happens when NoSQL comes to Django (I really believe in this..).
get_fields() also gives us the possibility of changing internals without necessarily affecting behaviour. It's that layer of abstraction that I feel is needed.
Said this, if we can simplify it and still avoid duplication and provide the level of functionality it's giving us now, then that would be great.
 

Russell Keith-Magee

unread,
Aug 20, 2014, 4:19:33 AM8/20/14
to Django Developers
I think Daniel and I might have come up with a way to meet both these requirements - a minimalist API for get_fields, with at least some protection against the known incoming backwards compatibility issue.

The summary so far: it appears that a complex taxonomy isn't especially helpful - firstly, because any complex taxonomy is going to have edge cases that are hard to categorize, but also because a complex taxonomy leads to a much more complex internal API that is going to be prone to backwards compatibility problems.

So - instead of worrying about 'virtual' and other properties like that, lets look at why the _meta API is fundamentally used - to get a list of fields that need to be handled in data processing. This primarily means forms, but other forms of serialisation are also included. In these use cases, there are always going to be per-field differences (even a CharField and an IntegerField require *slightly* different handling), so we won't focus on internal representations, storage mechanisms, or anything like that. Instead, lets focus on cardinality - a field represents some sort of data that has a cardinality with the object on which it is stored. If something has cardinality 1, you can display a single field. If it's cardinality N, you need to display a list, or some sort of inline.

This results in 3 categories that are mutually exclusive:

a) "Data fields": Fields of cardinality 0-1:

 * A CharField stores 0 or 1 strings (0 is the case of a nullable field).

 * An IntegerField stores 0 or 1 integers.

 * A FileField stores 0 or 1 file paths.

 * An ImageField stores 0 or 1 file paths - although in being modified, it might modify some other fields.

 * A ForeignKey stores 0 or 1 references to another object. 

 * A GenericForeignKey stores 0 or 1 references to another object.

 * A notional "DocumentField" on a NoSQL store references 0 or 1 external documents.

b) "ManyToMany Fields": Fields that are locally defined that represent a cardinality 0-N relationship with another object:

 * Many to Many fields store 0-N references to a second model.

c) "Related Objects": Fields that represent a cardinality 0-N relationship with this object, but aren't locally defined:

 * The 'related' side of a ForeignKey

 * The 'related' side of a ManyToMany

 * A GenericRelation representing the reverse side of a GenericForeignKey

These three types are mutually exclusive - you either have cardinality 1 *or* cardinality N, not both; and you're either locally defined on this object or you're not. I can't think of an example of "cardinality 1 data that isn't defined on this object", but it would fit into this taxonomy if it were needed; I also can't think of a field definition that would span models.

In addition to this basic classification, a field can be marked as "hidden". The immediate use for this is to hide the related_name='+' case of a FK or M2M. Looking forward, it would be used to mask fields that exist, but aren't intended to be user visible - for example, in the potential future case where a ForeignKey is split in two, or a Composite Key, there would be a "hidden" integer field (or fields) storing the actual data, and a virtual (but non-hidden) field that is the public API for manipulating the relationship. This would also be backwards compatible, because the "visible" field list hasn't changed.

Fields are also tracked according to their parentage; this is used by tools interacting with inheritance relationships to know which fields are actually on this model, and which are inherited from a base class.

This yields the following formal API for _meta:

 * get_fields(data, many_to_many, related, include_hidden, include_parents)

 * @property data_fields (=> get_fields(data=True, many_to_many=False, related=False, include_hidden=False, include_parents=True)

 * @property many_to_many_fields (=> get_fields(data=False, many_to_many=True, related=False, include_hidden=False, include_parents=True)

 * @property related_objects (=> get_fields(data=False, many_to_many=False, related=True, include_hidden=False, include_parents=True)

Does this sound any more sane as an API?
 
My one lingering question is whether the "many_to_many" name/category is too explicit. I can conceive how an ArrayField could be considered a data field (it stores 0-1 arrays of data), or a "many_to_many" field (because it stores 0-N instances of some data). This all hinges on whether the definition for that field category is that it is a relationship with another *model*, or if it's just cardinality N data. It's trivial to call it a Data field and just leave it at that, but I'm wondering if there might be benefit in broadening the definition of "many_to_many".

Russ %-)

Marc Tamlyn

unread,
Aug 20, 2014, 4:46:37 AM8/20/14
to django-d...@googlegroups.com
I'd say ArrayField is a straight up data field at the moment. It stores 0-1 lists of data. It's no different to CommaSeparatedIntegerField (seriously, why does that exists...)

*If* PG gets the relevant update that will allow `integer[] references` (i.e. ArrayField(ForeignKey)) then this would be different, and would be more like a m2m field.

There is an argument that it's 0-N anyway, but in the implementation both within Django and in the database I don't think the distinction is useful at the point, from an ORM point of view in any case. For a forms point of view it's quite different.


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com.

For more options, visit https://groups.google.com/d/optout.

Ivan Kharlamov

unread,
Aug 20, 2014, 7:52:54 AM8/20/14
to django-d...@googlegroups.com
On 08/20/2014 12:46 PM, Marc Tamlyn wrote:
> I'd say ArrayField is a straight up data field at the moment. It stores
> 0-1 lists of data. It's no different to CommaSeparatedIntegerField
> (seriously, why does that exists...)
>
> *If* PG gets the relevant update that will allow `integer[] references`
> (i.e. ArrayField(ForeignKey)) then this would be different, and would be
> more like a m2m field.
>
> There is an argument that it's 0-N anyway, but in the implementation
> both within Django and in the database I don't think the distinction is
> useful at the point, from an ORM point of view in any case. For a forms
> point of view it's quite different.
>
>
> On 20 August 2014 09:19, Russell Keith-Magee <rus...@keith-magee.com
When I look at this situation from the point of view of forms, there are

1. Fields of cardinality 0-1
2. Fields of cardinality 0-N

and

a. Fields that do not represent reference to another model (object)
b. Fields that represent reference to another model (object)

1. and 2. are mutually exclusive; a. and b. are also mutually exclusive.

IMO, this way the future Django form would not need to care whether the
field is m2m or ArrayField(ForeignKey)) or ListField(EmbeddedModelField)
because all of them would be 2.&b.

One may also want to add two mutually-exclusive subcategories to b:

b1. Relationship is locally defined
b2. Relationship is not locally defined.

> --
> You received this message because you are subscribed to the Google
> Groups "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it,
> send an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to
> django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> <https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com?utm_medium=email&utm_source=footer>.
>
> For more options, visit https://groups.google.com/d/optout.
>
>
> --
> You received this message because you are subscribed to the Google
> Groups "Django developers" group.
> To unsubscribe from this group and stop receiving emails from it, send
> an email to django-develop...@googlegroups.com
> <mailto:django-develop...@googlegroups.com>.
> To post to this group, send email to django-d...@googlegroups.com
> <mailto:django-d...@googlegroups.com>.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/CAMwjO1HLabZ7C%3D87Y3F50PWUYDncH1ip_VgtQN-cPOXthk8yHQ%40mail.gmail.com
> <https://groups.google.com/d/msgid/django-developers/CAMwjO1HLabZ7C%3D87Y3F50PWUYDncH1ip_VgtQN-cPOXthk8yHQ%40mail.gmail.com?utm_medium=email&utm_source=footer>.

Ivan Kharlamov

unread,
Aug 20, 2014, 8:28:34 AM8/20/14
to django-d...@googlegroups.com
To add more examples to my proposition:

1) CharField(), IntegerField(), FileField(), ImageField()

are all members of both: a. and 1.

2) ArrayField(), DictionaryField()

are all members of both: a. and 2.

3) ForeignKey(), GenericForeignKey(), EmbeddedModelField(),
GenericRelation(),

are all members of both: b. and 1.

4) ManyToManyField(), ArrayField(ForeignKey), ListField(EmbeddedModelField)

are all members of both: b. and 2.


As Collin Anderson wrote about "virtual" fields on 08/18/2014 07:12 PM:

> Also, I think we should avoid discriminating between "virtual" and
> non-virtual (as with local vs parent). Why should it matter how a field
> is stored in the database? I think the distinction will make it harder
> to use non-relational databases.

One may want to expand his statement and say that the form, ideally,
should not care whether the field relationship is locally defined or not.

Which is not to say that b1 and b2 subcategories are not useful at all,
but they should not be needed in form representations.

Ivan Kharlamov

unread,
Aug 20, 2014, 8:42:41 AM8/20/14
to django-d...@googlegroups.com
Excuse me for posting multiple emails at a time, but I'd like to make a
correction:
It just occured to me that I misused the term 'cardinality'. The best
way to correct myself is to replace this:

1. Fields of cardinality 0-1
2. Fields of cardinality 0-N

with this:

1. Fields that can have 0-1 values.
2. Fields that can have 0-N values.


Thanks for brilliant work and best regards,
Ivan

Anssi Kääriäinen

unread,
Aug 20, 2014, 1:29:49 PM8/20/14
to django-d...@googlegroups.com
On Wednesday, August 20, 2014 11:19:33 AM UTC+3, Russell Keith-Magee wrote:
I think Daniel and I might have come up with a way to meet both these requirements - a minimalist API for get_fields, with at least some protection against the known incoming backwards compatibility issue.

The summary so far: it appears that a complex taxonomy isn't especially helpful - firstly, because any complex taxonomy is going to have edge cases that are hard to categorize, but also because a complex taxonomy leads to a much more complex internal API that is going to be prone to backwards compatibility problems.

So - instead of worrying about 'virtual' and other properties like that, lets look at why the _meta API is fundamentally used - to get a list of fields that need to be handled in data processing. This primarily means forms, but other forms of serialisation are also included. In these use cases, there are always going to be per-field differences (even a CharField and an IntegerField require *slightly* different handling), so we won't focus on internal representations, storage mechanisms, or anything like that. Instead, lets focus on cardinality - a field represents some sort of data that has a cardinality with the object on which it is stored. If something has cardinality 1, you can display a single field. If it's cardinality N, you need to display a list, or some sort of inline.

This results in 3 categories that are mutually exclusive:

a) "Data fields": Fields of cardinality 0-1:
<SNIP>

b) "ManyToMany Fields": Fields that are locally defined that represent a cardinality 0-N relationship with another object:
<SNIP>

c) "Related Objects": Fields that represent a cardinality 0-N relationship with this object, but aren't locally defined:
<SNIP>
 
These three types are mutually exclusive - you either have cardinality 1 *or* cardinality N, not both; and you're either locally defined on this object or you're not. I can't think of an example of "cardinality 1 data that isn't defined on this object", but it would fit into this taxonomy if it were needed; I also can't think of a field definition that would span models.

The reverse of OneToOneField is a cardinality 1 data that isn't defined on this object.

In addition to this basic classification, a field can be marked as "hidden". The immediate use for this is to hide the related_name='+' case of a FK or M2M. Looking forward, it would be used to mask fields that exist, but aren't intended to be user visible - for example, in the potential future case where a ForeignKey is split in two, or a Composite Key, there would be a "hidden" integer field (or fields) storing the actual data, and a virtual (but non-hidden) field that is the public API for manipulating the relationship. This would also be backwards compatible, because the "visible" field list hasn't changed.

There are use cases that do not fit this categorization. For example when instantiating a model from database you will need to supply the hidden integer field data for a foreign key, but you must skip the foreign key field itself. That is, a model with relation to author is initialized as MyModel(pk=1, author_first_name='foo', author_last_name='bar') (technically this is done through *args for performance reasons), not with MyModel(pk=1, author=author_instance). Similar considerations likely apply to serialization of models.

Form fields for a model is another consideration. If one wants those fields that should have a field in a form, that is currently defined as [f for f in model._meta.fields if f.editable]. The editable fields set doesn't necessarily match the above categorization. In fact, I believe if we inspect Django's code base it will be clear there can't be any categorization where fields belong to only one category, but which fulfills all use cases in Django. It is like trying to categorize animals for every use case. If you want mammals, then categorization to sea and land creatures will not work. If you want sea creatures, then categorization to mammals and fish is useless.

The point is that I am convinced we will need to provide field flags to complement the get_fields() API no matter what API we choose for get_fields(). In fact, if we define and document a sane set of field flags, then the get_fields() API isn't that important, it just needs to be useful for the most common use cases.
 
Fields are also tracked according to their parentage; this is used by tools interacting with inheritance relationships to know which fields are actually on this model, and which are inherited from a base class.

This yields the following formal API for _meta:

 * get_fields(data, many_to_many, related, include_hidden, include_parents)

 * @property data_fields (=> get_fields(data=True, many_to_many=False, related=False, include_hidden=False, include_parents=True)

 * @property many_to_many_fields (=> get_fields(data=False, many_to_many=True, related=False, include_hidden=False, include_parents=True)

 * @property related_objects (=> get_fields(data=False, many_to_many=False, related=True, include_hidden=False, include_parents=True)

Does this sound any more sane as an API?

Yes, with the cave-eat that for example model initialization fields through *args do not map to this API, at least not after foreign key split to virtual field + concrete fields. Similar for editable fields. So, +1 if we also consider defining and documenting an useful set of field flags.

I wonder if a better name for the related category exists. My first instinct is that foreign key fields should match the related flag. Could it be made cleaner that these are relations defined on remote model? Maybe just remote_relations could work?

 - Anssi

Shai Berger

unread,
Aug 20, 2014, 2:21:36 PM8/20/14
to django-d...@googlegroups.com
On Wednesday 20 August 2014 10:29:49 Anssi Kääriäinen wrote:
> On Wednesday, August 20, 2014 11:19:33 AM UTC+3, Russell Keith-Magee wrote:
> >
> > This yields the following formal API for _meta:
> > * get_fields(data, many_to_many, related, include_hidden,
> > include_parents)
> >
> > * @property data_fields
> >
> > * @property many_to_many_fields
> >
> > * @property related_objects
> > (=> get_fields(data=False, many_to_many=False, related=True,
> > include_hidden=False, include_parents=True)
> >
> +1 if we also consider defining and documenting an useful set of field flags.

+1 Anssi.

>
> I wonder if a better name for the related category exists. My first
> instinct is that foreign key fields should match the related flag. Could it
> be made cleaner that these are relations defined on remote model? Maybe
> just remote_relations could work?
>

Since this is a bikeshedding thread, I'll say that the name "related_objects"
makes me itch a little as well -- mostly, because each of the objects returned
by the property is not a related-object, but rather a manager of related-
objects.

I was considering "related_managers", but that is sort-of mixing the "what"
with the "how", and also thinking about the possible Array(FK) fields, I prefer
"related_collections".

Shai.

Russell Keith-Magee

unread,
Aug 20, 2014, 10:26:57 PM8/20/14
to Django Developers
That's a nice conceptual grouping of the examples you've provided, but it misses an important group: related objects - that is, the objects that represent the "other side" of M2M, FK, and O2O fields. 

These certainly *could* be included based purely on their cardinality relationships (reverse FK and M2M are both 0-N, O2O are 0-1), but the fact that they're not locally defined is significant, and a reverse m2m is "not locally defined" in a slightly different way to a *forward* m2m (which isn't locally "defined" in the sense that there is no database column, but it is locally defined in the sense that there is an explicit field definition).

Yours,
Russ Magee %-)

Russell Keith-Magee

unread,
Aug 20, 2014, 11:07:50 PM8/20/14
to Django Developers
On Thu, Aug 21, 2014 at 1:29 AM, Anssi Kääriäinen <anssi.ka...@thl.fi> wrote:
On Wednesday, August 20, 2014 11:19:33 AM UTC+3, Russell Keith-Magee wrote:
I think Daniel and I might have come up with a way to meet both these requirements - a minimalist API for get_fields, with at least some protection against the known incoming backwards compatibility issue.

The summary so far: it appears that a complex taxonomy isn't especially helpful - firstly, because any complex taxonomy is going to have edge cases that are hard to categorize, but also because a complex taxonomy leads to a much more complex internal API that is going to be prone to backwards compatibility problems.

So - instead of worrying about 'virtual' and other properties like that, lets look at why the _meta API is fundamentally used - to get a list of fields that need to be handled in data processing. This primarily means forms, but other forms of serialisation are also included. In these use cases, there are always going to be per-field differences (even a CharField and an IntegerField require *slightly* different handling), so we won't focus on internal representations, storage mechanisms, or anything like that. Instead, lets focus on cardinality - a field represents some sort of data that has a cardinality with the object on which it is stored. If something has cardinality 1, you can display a single field. If it's cardinality N, you need to display a list, or some sort of inline.

This results in 3 categories that are mutually exclusive:

a) "Data fields": Fields of cardinality 0-1:
<SNIP>

b) "ManyToMany Fields": Fields that are locally defined that represent a cardinality 0-N relationship with another object:
<SNIP>

c) "Related Objects": Fields that represent a cardinality 0-N relationship with this object, but aren't locally defined:
<SNIP>
 
These three types are mutually exclusive - you either have cardinality 1 *or* cardinality N, not both; and you're either locally defined on this object or you're not. I can't think of an example of "cardinality 1 data that isn't defined on this object", but it would fit into this taxonomy if it were needed; I also can't think of a field definition that would span models.

The reverse of OneToOneField is a cardinality 1 data that isn't defined on this object.

And the obvious answer was looking right at me :-). I had that mentally wrapped into (c) (because historically O2O is handled as a redundant case of FK). This suggests to me that either (a) the "related" flag is more about "objects that have a relationship with this one", rather than being specifically about cardinality, or (b) there's another group for 0-1 cardinality reverse relationships. 
 
In addition to this basic classification, a field can be marked as "hidden". The immediate use for this is to hide the related_name='+' case of a FK or M2M. Looking forward, it would be used to mask fields that exist, but aren't intended to be user visible - for example, in the potential future case where a ForeignKey is split in two, or a Composite Key, there would be a "hidden" integer field (or fields) storing the actual data, and a virtual (but non-hidden) field that is the public API for manipulating the relationship. This would also be backwards compatible, because the "visible" field list hasn't changed.

There are use cases that do not fit this categorization. For example when instantiating a model from database you will need to supply the hidden integer field data for a foreign key, but you must skip the foreign key field itself. That is, a model with relation to author is initialized as MyModel(pk=1, author_first_name='foo', author_last_name='bar') (technically this is done through *args for performance reasons), not with MyModel(pk=1, author=author_instance). Similar considerations likely apply to serialization of models.
 
Form fields for a model is another consideration. If one wants those fields that should have a field in a form, that is currently defined as [f for f in model._meta.fields if f.editable]. The editable fields set doesn't necessarily match the above categorization. In fact, I believe if we inspect Django's code base it will be clear there can't be any categorization where fields belong to only one category, but which fulfills all use cases in Django. It is like trying to categorize animals for every use case. If you want mammals, then categorization to sea and land creatures will not work. If you want sea creatures, then categorization to mammals and fish is useless.

Sure - but let me say in advance that the API I've proposed here isn't a purely theoretical exercise. Daniel has implemented this API to prove that it's sufficient to meet all Django's existing use cases. The three categories (plus the two include qualifiers) I've described meets that criterion. However, it might be missing potential future use cases, and that's really what we're trying to flesh out here.

The "editable" thing doesn't especially concern me, for exactly the reason your example demonstrates. The role of the meta API (to me, at least) is to provide a candidate list of fields that need to be dealt with as part of introspection. The only reason the flags/categories in Meta matter is the extent to which they represent the need for fundamentally different classes of data handling. If you're building a form, that means you're going to need to check the editable flag on each field. It also means deferring some behaviour to the field itself (to_python calls, calls to persist files to storage, and so on). I don't believe this means we need to embed the concept of "editability" into the Meta API. 

The point is that I am convinced we will need to provide field flags to complement the get_fields() API no matter what API we choose for get_fields(). In fact, if we define and document a sane set of field flags, then the get_fields() API isn't that important, it just needs to be useful for the most common use cases.
  
Well, no - it needs to be useful for *all* the use cases *in Django's codebase*. The end goal here is to provide a formal API definition so that someone else can take the specification, make a duck that quacks exactly like it, and use it because it is compatible with Django's internals.

As an indicative goal - I'm thinking a good GSoC project for next year would be to implement a Django-compatible model layer for SQLAlchemy. That means the student will need to implement get_fields() (or whatever API we end up with) to sufficient depth that they can expose a SQLAlchemy model in Django's Admin, using Django's forms. Daniel's proof-of-concept project wrapping an email API demonstrates that this isn't a theoretical goal - it's has the potential of being real. 
 
Fields are also tracked according to their parentage; this is used by tools interacting with inheritance relationships to know which fields are actually on this model, and which are inherited from a base class.

This yields the following formal API for _meta:

 * get_fields(data, many_to_many, related, include_hidden, include_parents)

 * @property data_fields (=> get_fields(data=True, many_to_many=False, related=False, include_hidden=False, include_parents=True)

 * @property many_to_many_fields (=> get_fields(data=False, many_to_many=True, related=False, include_hidden=False, include_parents=True)

 * @property related_objects (=> get_fields(data=False, many_to_many=False, related=True, include_hidden=False, include_parents=True)

Does this sound any more sane as an API?

Yes, with the cave-eat that for example model initialization fields through *args do not map to this API, at least not after foreign key split to virtual field + concrete fields. Similar for editable fields. So, +1 if we also consider defining and documenting an useful set of field flags.

Sure - and the purpose of this thread is to tease out what those "useful" flags are. At the moment, it's not clear to me where exactly the conceptual holes lie from your perspective. As I said, the API I've proposed here is sufficient to meet all *current* use cases in the code base.
 
As best as I can make out, it appears you see a problem with the concept of "hidden" - because in various circumstances, different fields will be "hidden" in different ways (especially in the composite/virtual foreign key future). Taking that "future virtualised foreign key" case - if you're dealing with the database, it's the virtual field that needs to be hidden, because the database only cares about fields with an actual column/table underneath it; but if you're dealing with a form, you don't want the field for the underlying field, you want the virtual field. However, given a virtual field representation, I imagine it is possible to get back to the field (or fields) that hold the underlying representation; all that is important is that you can iterate over a list of "fields", and from there, determine a list of column names. The fact that the column name comes from a different underlying column isn't important; what's important is that the "foreign key" is only counted once in the introspection process.

So - what I really need here is a counterproposal from someone familiar with the composite key work. I'm not bound to any of the details of the proposal I've given here - I'm just relating the end point of work from SoC. It works with the current use cases exposed by Django, but when this hits master, we're going to need to live with it long term, so I want to make sure we're not boxing ourselves into a corner, or introducing categorisations that aren't representative.

I wonder if a better name for the related category exists. My first instinct is that foreign key fields should match the related flag. Could it be made cleaner that these are relations defined on remote model? Maybe just remote_relations could work?

I think the first step is to work out what the buckets/flags are - once we've got a clear picture of what they represent, naming discussions will make a lot more sense, since we will know what it is we're actually trying to name.

Russ %-)

Russell Keith-Magee

unread,
Aug 20, 2014, 11:09:36 PM8/20/14
to Django Developers
Hi Shai,

I'll accept that there are probably better names for this - but as I said in my response to Anssi, until we actually know what "it" is, it will be more productive to focus on the categories and flags that need to exist. Once we've got those categories nailed down, the names will hopefully be more obvious (or, at least, more obviously a bikeshed).

Russ %-)

Anssi Kääriäinen

unread,
Aug 21, 2014, 2:26:45 AM8/21/14
to django-d...@googlegroups.com
On Thu, 2014-08-21 at 11:07 +0800, Russell Keith-Magee wrote:




> The point is that I am convinced we will need to provide field
> flags to complement the get_fields() API no matter what API we
> choose for get_fields(). In fact, if we define and document a
> sane set of field flags, then the get_fields() API isn't that
> important, it just needs to be useful for the most common use
> cases.
>
> Well, no - it needs to be useful for *all* the use cases *in Django's
> codebase*. The end goal here is to provide a formal API definition so
> that someone else can take the specification, make a duck that quacks
> exactly like it, and use it because it is compatible with Django's
> internals.
>
>
> As an indicative goal - I'm thinking a good GSoC project for next year
> would be to implement a Django-compatible model layer for SQLAlchemy.
> That means the student will need to implement get_fields() (or
> whatever API we end up with) to sufficient depth that they can expose
> a SQLAlchemy model in Django's Admin, using Django's forms. Daniel's
> proof-of-concept project wrapping an email API demonstrates that this
> isn't a theoretical goal - it's has the potential of being real.

Unfortunately I don't see SQLAlchemy models in Admin as realistic at all
by just providing correct _meta API. Of course, the fields returned by
get_fields() calls will also need to quack the right way. Case in point,
Daniel's Gmail PoC uses Django fields internally.

Maybe there is some miscommunication here - what I meant by saying that

In fact, if we define and document a sane set of field flags,
then the get_fields() API isn't that important, it just needs to
be useful for the most common use cases.

meant that if we have sane field flags, then one can always use
get_fields(**return_all_possible_fields), and then use flags on fields
to get a set of fields one is interested in.

It is trivially true that get_fields() isn't sufficient for all use
cases in Django without relying on field flags. One can't even get the
fields for a form without relying on field flags filtering.

So, get_fields() doesn't need to return correct list of fields for all
use cases in Django, it is sufficient that one can get the correct list
of fields with further filtering.

I am not sure if you were objecting to anything else than "it
[get_fields()] just needs to be useful for the most common use cases". I
hope I cleared up what I meant by that.

I remain convinced we should document field flags in addition to the
get_fields() API.

- Anssi


Anssi Kääriäinen

unread,
Aug 21, 2014, 2:52:55 AM8/21/14
to django-d...@googlegroups.com
On Thu, 2014-08-21 at 11:07 +0800, Russell Keith-Magee wrote:




No counterproposal from me - as I said earlier I think the get_fields()
API as defined is sufficient.

We already have one field that doesn't play well with the categorization
when it comes to model serialization or initialization. That field is
GenericForeignKey. If it is returned by get_fields(data=True), then
get_fields(data=True) can't be used for model serialization or
initialization without filtering the return set further down using
flags.

In short, I see all concrete fields as an important category itself. We
can't provide that with the proposed categorization.

The easiest way to get all the concrete fields would be [f for f in
get_fields(**return_all_fields) if f.concrete]. It is also notable that
ordering matters, and for that reason inspecting the underlying fields
of a virtual field doesn't sound promising.

For the ForeignKey backwards compatibility problem - maybe we could get
the virtual fields patch committed for 1.8. This way we wouldn't have
any backwards compatibility problem to begin with. The composite fields
patch was otherwise close to committable, but the problem was that it
didn't play well with migrations. I have no idea what the actual
problems were. Anybody have more knowledge why composite fields do not
play well with migrations, and what is needed to solve the problems?

- Anssi

Daniel Pyrathon

unread,
Aug 21, 2014, 3:53:36 PM8/21/14
to django-d...@googlegroups.com
Hi Shai,

Thanks for the comments!


On Wednesday, August 20, 2014 8:21:36 PM UTC+2, Shai Berger wrote:
On Wednesday 20 August 2014 10:29:49 Anssi Kääriäinen wrote:
> On Wednesday, August 20, 2014 11:19:33 AM UTC+3, Russell Keith-Magee wrote:
> >
> > This yields the following formal API for _meta:
> >  * get_fields(data, many_to_many, related, include_hidden,
> >                     include_parents)
> >  
> >  * @property data_fields
> >
> >  * @property many_to_many_fields
> >
> >  * @property related_objects
> >  (=> get_fields(data=False, many_to_many=False, related=True,
> >                          include_hidden=False, include_parents=True)
> >
> +1 if we also consider defining and documenting an useful set of field flags.

+1 Anssi.

>
> I wonder if a better name for the related category exists. My first
> instinct is that foreign key fields should match the related flag. Could it
> be made cleaner that these are relations defined on remote model? Maybe
> just remote_relations could work?
>

Since this is a bikeshedding thread, I'll say that the name "related_objects"
makes me itch a little as well -- mostly, because each of the objects returned
by the property is not a related-object, but rather a manager of related-
objects.

the related_objects property should return objects of instance RelatedObject. What do you intend by manager
of related objects?
 

I was considering "related_managers", but that is sort-of mixing the "what"
with the "how", and also thinking about the possible Array(FK) fields, I prefer
"related_collections".

what about straight up "relations"?

get_fields(data, m2m, relations)
 

Shai.

Shai Berger

unread,
Aug 21, 2014, 5:20:46 PM8/21/14
to django-d...@googlegroups.com
Hi Daniel,

On Thursday 21 August 2014 22:53:36 Daniel Pyrathon wrote:
>
> On Wednesday, August 20, 2014 8:21:36 PM UTC+2, Shai Berger wrote:
> > makes me itch a little as well -- mostly, because each of the objects
> > returned by the property is not a related-object, but rather a manager of
> > related-objects.
>
> the related_objects property should return objects of instance
> RelatedObject. What do you intend by manager of related objects?
>

I think I confused data with metadata. If I understand correctly, a
RelatedObject applied to a model instance yields a manager, the same way a
field applied to a model instance yields a data item.

I must admit, the entire set of names of the *RelatedObjectDescriptor family
of classes has always confused me quite a bit.

> > I was considering "related_managers", but that is sort-of mixing the
> > "what"
> > with the "how", and also thinking about the possible Array(FK) fields, I
> > prefer
> > "related_collections".
>
> what about straight up "*relations"*?
>
> get_fields(data, m2m, relations)
>

I take Russell's advice on this: Let's first clear up the concepts, then worry
about the names.

Thanks,
Shai.

Collin Anderson

unread,
Aug 22, 2014, 1:35:23 PM8/22/14
to django-d...@googlegroups.com
I like the direction the API is heading.

RE: composite fields:
Would it help to have an attribute on the individual fields like: field.parent = TheCompositeField()? This might help determine which fields to pass into __init__(), and might help when serializing.

RE: related objects:
I'm interested in doing a djangocon.us sprint to determine the feasibility of turning related objects into actual virtual fields.

Russell Keith-Magee

unread,
Aug 22, 2014, 8:09:04 PM8/22/14
to Django Developers
Hi Collin,

Both of these issues relate to Michal Petrucha's (koniiiik) Composite Field work, which has been the subject of a couple of GSoC projects. As I understand it, the issues you've described have all been resolved in that branch; as Anssi said in a previous thread, it's mostly ready to go, and would have been in 1.7, except for some interesting interactions with migrations. If you want to contribute, getting that branch, getting it up to speed with the current master, and taking it for a spin would be a great help. 

The relevant Github repositories are:


I don't remember if they're completely independent, or if one depends on the other; you'll need to dig up the django-dev discussions from last year (or bug Michal on IRC :-) to confirm their current state and usage instructions.

Yours,
Russ Magee %-)
Reply all
Reply to author
Forward
0 new messages