b) "Relating" data fields - This means ForeignKey. Fields that manifest as a single column, but represent a relation to another model.
d) "Relating" external fields - This means ManyToMany fields. Fields that are manifested as an external table, but represent a relation to a different model.
So - firstly, we need a sanity check. Does this taxonomy capture all field types that you can think of? Are there any interpretations of composite fields, or any other esoteric field type (existing or imagined) that don't fit in this taxonomy?
Comments welcome. Obviously, this has enormous potential to devolve into bike shedding, so I'd appreciate it if people kept that in mind. If you have a preference for something like short vs long form names, feel free to state it, but please don't let this devolve into arguments over the relative merits of pith over verbosity in API naming. It's much more important that we clarify the matters of substance - i.e., that we have a complete and correct taxonomy - not that we fixate on the names themselves.
On Saturday, August 16, 2014 4:38:30 AM UTC+3, Russell Keith-Magee wrote:b) "Relating" data fields - This means ForeignKey. Fields that manifest as a single column, but represent a relation to another model.This definition will not work when multicolumn foreign keys are introduced. Especially not with the name foreign_key_fields. This would either mean that relating data fields do contain fields that have more than single backing column, or that foreign_key_fields do not contain all foreign key fields.
Michal Petrucha's work on virtual fields aims to make ForeignKeys virtual fields - they have one or more backing pure data fields, and then the relation is handled by a virtual fields. The work done by him shows that this way works well. The patch was actually close to committable already during 1.7 development, but as it didn't play well with migrations we had to defer it. The point here is that I expect that we will want to make ForeignKeys virtual fields soonish. This doesn't play well with the categorization.
d) "Relating" external fields - This means ManyToMany fields. Fields that are manifested as an external table, but represent a relation to a different model.
Should we define this category as m2m fields? Calling it many_to_many_fields, but defining it as including all external storage fields seems a bit problematic.
So - firstly, we need a sanity check. Does this taxonomy capture all field types that you can think of? Are there any interpretations of composite fields, or any other esoteric field type (existing or imagined) that don't fit in this taxonomy?It seems the proposed API has fields in one, and only one category. Maybe it would be better to have categorization where fields fall in to multiple categories? The categories could be data, virtual, relation, reverse_relation and m2m.
For example m2m field would be virtual, related or reverse_related and of course m2m. In the future a foreign key would create a backing data field. The foreign key itself would be virtual relation field. The reverse side of the foreign key would be virtual and reverse_related. GenericForeignKey would also be a virtual related field (with two backing data fields).
I don't see how a TranslationField would fit into the above categorization. A TranslationField is defined as a field that gets a single translation from related translations table. So, it is the reverse side of a foreign key with an additional restriction on language (in effect generating a join condition JOIN article_translations ON article.id = article_translations.article_id AND article_translations.language = 'fi'). At least as defined this isn't in category g as it doesn't return all reverse objects of category b. It doesn't fit in to any other category either. So, we need some changes to the wording.
As another example we might someday want to allow fully custom join condition fields. These fields wouldn't be foreign key, external data nor many to many fields nor the reverse of those categories
Comments welcome. Obviously, this has enormous potential to devolve into bike shedding, so I'd appreciate it if people kept that in mind. If you have a preference for something like short vs long form names, feel free to state it, but please don't let this devolve into arguments over the relative merits of pith over verbosity in API naming. It's much more important that we clarify the matters of substance - i.e., that we have a complete and correct taxonomy - not that we fixate on the names themselves.
I don't think users actually want to get fields based on the suggested categorization. I feel we get an easier to use and more flexible API if we have higher level categories and allow fields to match multiple categories. As a practical example if I want all relation fields, that is going to be hard using the suggested API. Getting all relation fields is a more realistic use case than getting related virtual objects.
If we want to have all fields to match single and only single category, then we need to redefine the categories to make sure ForeignKeys as virtual fields are possible, and that more esoteric custom join based fields fit in to the categorization.
BTW where are the github discussions located? I didn't spot them from the referenced PR 2894.
I understand what you're driving at here, and I've had similar thoughts over the course of the SoC. The catch is that this makes the API for get_fields() fairly complicated.If every field fits into one specific type, then get_fields() just requires a single boolean flag (do I include fields of type X) for each field type. We can also easily add new field types by adding new booleans to the API.However, if a field fits into multiple categories, then it's impossible (or, at least, exceedingly complicated) to make a single call to get_fields() that will specify all your field requirements. "Get me all non-virtual data fields" requires "virtual=False, data=True, m2m=False", but "Get all virtual data fields that represent m2ms" requires "virtual=True, data=False, m2m=True". You can't pass in both sets of arguments at the same time, so you either have to make multiple calls to get_fields(), or you have to invent some sort of query syntax for get_fields() that allows union queries.Plus, at the end of the day, get_fields() is abstracted behind highly cached and optimised properties for key lookups. These properties are effectively a cached call to get_fields() with a specific set of arguments - so even if get_fields() doesn't expose a "one category per field" requirement, the API will require, at some level, names that have clear (and preferably non-overlapping) membership.
I don't think users actually want to get fields based on the suggested categorization. I feel we get an easier to use and more flexible API if we have higher level categories and allow fields to match multiple categories. As a practical example if I want all relation fields, that is going to be hard using the suggested API. Getting all relation fields is a more realistic use case than getting related virtual objects.
Quite probably true. As a point of interest, the current (as in, 1.6) API actually doesn't differentiate between category (a) "pure data" and category (b) "relating data (i.e., FK)" fields - if you ask for "data fields" you get pure data *and* foreign keys. So, at least as far as Django's own usage is concerned, you're correct in saying that taxonomy I've described isn't fully required.Daniel's survey of internal usage reveals that there are three use cases for getting a list of fields in Django's internal API:* Get all data and m2m fields (i.e., categories a, b, and d). This is effectively "all fields on *this* model"* Get all data, m2m, related objects, related m2m, and virtual fields (i.e., categories a, b, d, f, g, h, i - excluding c and e because Django doesn't currently have any fields of this type). This is "all fields on this model, or related to this model"* Get all m2m fields (i.e., category d)So - at the very least, we need names to describe those three groups. My intention with describing a richer taxonomy is to try and give names to other groupings of interest.If we want to have all fields to match single and only single category, then we need to redefine the categories to make sure ForeignKeys as virtual fields are possible, and that more esoteric custom join based fields fit in to the categorization.
Agreed - that's why I threw this out there for discussion :-)Properties like "data", "virtual", "external", "related", "relating" - these are high level concepts describing the way a field manifests. However, that doesn't mean we need to expose these properties as part of the formal API.Part of the underlying problem here -- lets say we roll out Django 1.7 with some version of this API, and in 1.8, foreign key fields change to become virtual. That effectively becomes backwards incompatible for queries that are sensitive to a "virtual" flag; but it doesn't change the underlying need to identify that a field is a foreign key. We need to capture the latter use case, but not necessarily the former.
Hi again,
Below, ">D" are quotations from Daniel's message I'm replying to, and
">R" are from Russell's message that opened this thread.
>D *Regarding FileField*
It took me some time to clear for myself why FileField is a data field, and not
like FK: The point is not where the data is stored and whether the DB field
contents are "it" or just a pointer -- the point is whether another model is
involved. I think this is a key distinction; wording the taxonomy in these
terms can clarify certain other issues. As an example,
>R a) "Pure" data fields - things like CharField, IntegerField, etc. Fields
>R that manifest as a single column on the model's underlying table.
this wording makes it a little unclear (at least to me) if parent fields fall
into this category (as they do not "manifest as a single column on the model's
underlying table"). But the higher-level language, "fields that store primitive
data on the model instance itself" -- with a better word for "primitive", or a
full explanation on what it means -- is something I find much clearer and more
accurate.
>D ImageField is, based on the definition of virtual so far, a virtual field.
Well, as it is not related nor relating, the only virtual field it could be is:
>R e) "Pure" virtual fields - Fields that are conceptual wrappers around other
>R fields on the same model. Virtual fields don't have column representations
>R themselves; they are a wrapper around columns provided by other fields.
But ImageField does have its own column representation. In fact, it is not a
wrapper around the width and height fields in the way that, say,
GenericForeignKey is around the content_type and object_id fields -- if you
change these fields, nothing about the image changes. So, yes, I maintain that
it is a different beast.
--To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAJxq84_OcibE72RKB9T60BJW9AtY8_YYhmhM5dXH36TtW3KsYw%40mail.gmail.com.
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
I think Daniel and I might have come up with a way to meet both these requirements - a minimalist API for get_fields, with at least some protection against the known incoming backwards compatibility issue.
The summary so far: it appears that a complex taxonomy isn't especially helpful - firstly, because any complex taxonomy is going to have edge cases that are hard to categorize, but also because a complex taxonomy leads to a much more complex internal API that is going to be prone to backwards compatibility problems.So - instead of worrying about 'virtual' and other properties like that, lets look at why the _meta API is fundamentally used - to get a list of fields that need to be handled in data processing. This primarily means forms, but other forms of serialisation are also included. In these use cases, there are always going to be per-field differences (even a CharField and an IntegerField require *slightly* different handling), so we won't focus on internal representations, storage mechanisms, or anything like that. Instead, lets focus on cardinality - a field represents some sort of data that has a cardinality with the object on which it is stored. If something has cardinality 1, you can display a single field. If it's cardinality N, you need to display a list, or some sort of inline.This results in 3 categories that are mutually exclusive:a) "Data fields": Fields of cardinality 0-1:
<SNIP>
b) "ManyToMany Fields": Fields that are locally defined that represent a cardinality 0-N relationship with another object:
<SNIP>
c) "Related Objects": Fields that represent a cardinality 0-N relationship with this object, but aren't locally defined:
<SNIP>
These three types are mutually exclusive - you either have cardinality 1 *or* cardinality N, not both; and you're either locally defined on this object or you're not. I can't think of an example of "cardinality 1 data that isn't defined on this object", but it would fit into this taxonomy if it were needed; I also can't think of a field definition that would span models.
In addition to this basic classification, a field can be marked as "hidden". The immediate use for this is to hide the related_name='+' case of a FK or M2M. Looking forward, it would be used to mask fields that exist, but aren't intended to be user visible - for example, in the potential future case where a ForeignKey is split in two, or a Composite Key, there would be a "hidden" integer field (or fields) storing the actual data, and a virtual (but non-hidden) field that is the public API for manipulating the relationship. This would also be backwards compatible, because the "visible" field list hasn't changed.
Fields are also tracked according to their parentage; this is used by tools interacting with inheritance relationships to know which fields are actually on this model, and which are inherited from a base class.This yields the following formal API for _meta:* get_fields(data, many_to_many, related, include_hidden, include_parents)* @property data_fields (=> get_fields(data=True, many_to_many=False, related=False, include_hidden=False, include_parents=True)* @property many_to_many_fields (=> get_fields(data=False, many_to_many=True, related=False, include_hidden=False, include_parents=True)* @property related_objects (=> get_fields(data=False, many_to_many=False, related=True, include_hidden=False, include_parents=True)Does this sound any more sane as an API?
On Wednesday, August 20, 2014 11:19:33 AM UTC+3, Russell Keith-Magee wrote:I think Daniel and I might have come up with a way to meet both these requirements - a minimalist API for get_fields, with at least some protection against the known incoming backwards compatibility issue.<SNIP>The summary so far: it appears that a complex taxonomy isn't especially helpful - firstly, because any complex taxonomy is going to have edge cases that are hard to categorize, but also because a complex taxonomy leads to a much more complex internal API that is going to be prone to backwards compatibility problems.So - instead of worrying about 'virtual' and other properties like that, lets look at why the _meta API is fundamentally used - to get a list of fields that need to be handled in data processing. This primarily means forms, but other forms of serialisation are also included. In these use cases, there are always going to be per-field differences (even a CharField and an IntegerField require *slightly* different handling), so we won't focus on internal representations, storage mechanisms, or anything like that. Instead, lets focus on cardinality - a field represents some sort of data that has a cardinality with the object on which it is stored. If something has cardinality 1, you can display a single field. If it's cardinality N, you need to display a list, or some sort of inline.This results in 3 categories that are mutually exclusive:a) "Data fields": Fields of cardinality 0-1:<SNIP>b) "ManyToMany Fields": Fields that are locally defined that represent a cardinality 0-N relationship with another object:
<SNIP>c) "Related Objects": Fields that represent a cardinality 0-N relationship with this object, but aren't locally defined:These three types are mutually exclusive - you either have cardinality 1 *or* cardinality N, not both; and you're either locally defined on this object or you're not. I can't think of an example of "cardinality 1 data that isn't defined on this object", but it would fit into this taxonomy if it were needed; I also can't think of a field definition that would span models.
The reverse of OneToOneField is a cardinality 1 data that isn't defined on this object.
In addition to this basic classification, a field can be marked as "hidden". The immediate use for this is to hide the related_name='+' case of a FK or M2M. Looking forward, it would be used to mask fields that exist, but aren't intended to be user visible - for example, in the potential future case where a ForeignKey is split in two, or a Composite Key, there would be a "hidden" integer field (or fields) storing the actual data, and a virtual (but non-hidden) field that is the public API for manipulating the relationship. This would also be backwards compatible, because the "visible" field list hasn't changed.
There are use cases that do not fit this categorization. For example when instantiating a model from database you will need to supply the hidden integer field data for a foreign key, but you must skip the foreign key field itself. That is, a model with relation to author is initialized as MyModel(pk=1, author_first_name='foo', author_last_name='bar') (technically this is done through *args for performance reasons), not with MyModel(pk=1, author=author_instance). Similar considerations likely apply to serialization of models.
Form fields for a model is another consideration. If one wants those fields that should have a field in a form, that is currently defined as [f for f in model._meta.fields if f.editable]. The editable fields set doesn't necessarily match the above categorization. In fact, I believe if we inspect Django's code base it will be clear there can't be any categorization where fields belong to only one category, but which fulfills all use cases in Django. It is like trying to categorize animals for every use case. If you want mammals, then categorization to sea and land creatures will not work. If you want sea creatures, then categorization to mammals and fish is useless.
The point is that I am convinced we will need to provide field flags to complement the get_fields() API no matter what API we choose for get_fields(). In fact, if we define and document a sane set of field flags, then the get_fields() API isn't that important, it just needs to be useful for the most common use cases.
Fields are also tracked according to their parentage; this is used by tools interacting with inheritance relationships to know which fields are actually on this model, and which are inherited from a base class.This yields the following formal API for _meta:* get_fields(data, many_to_many, related, include_hidden, include_parents)* @property data_fields (=> get_fields(data=True, many_to_many=False, related=False, include_hidden=False, include_parents=True)* @property many_to_many_fields (=> get_fields(data=False, many_to_many=True, related=False, include_hidden=False, include_parents=True)* @property related_objects (=> get_fields(data=False, many_to_many=False, related=True, include_hidden=False, include_parents=True)Does this sound any more sane as an API?
Yes, with the cave-eat that for example model initialization fields through *args do not map to this API, at least not after foreign key split to virtual field + concrete fields. Similar for editable fields. So, +1 if we also consider defining and documenting an useful set of field flags.
I wonder if a better name for the related category exists. My first instinct is that foreign key fields should match the related flag. Could it be made cleaner that these are relations defined on remote model? Maybe just remote_relations could work?
On Wednesday 20 August 2014 10:29:49 Anssi Kääriäinen wrote:
> On Wednesday, August 20, 2014 11:19:33 AM UTC+3, Russell Keith-Magee wrote:
> >
> > This yields the following formal API for _meta:
> > * get_fields(data, many_to_many, related, include_hidden,
> > include_parents)
> >
> > * @property data_fields
> >
> > * @property many_to_many_fields
> >
> > * @property related_objects
> > (=> get_fields(data=False, many_to_many=False, related=True,
> > include_hidden=False, include_parents=True)
> >
> +1 if we also consider defining and documenting an useful set of field flags.
+1 Anssi.
>
> I wonder if a better name for the related category exists. My first
> instinct is that foreign key fields should match the related flag. Could it
> be made cleaner that these are relations defined on remote model? Maybe
> just remote_relations could work?
>
Since this is a bikeshedding thread, I'll say that the name "related_objects"
makes me itch a little as well -- mostly, because each of the objects returned
by the property is not a related-object, but rather a manager of related-
objects.
I was considering "related_managers", but that is sort-of mixing the "what"
with the "how", and also thinking about the possible Array(FK) fields, I prefer
"related_collections".
Shai.