Hi all,
The deadline for 1.8 alpha is rapidly approaching, and one of the features that has been proposed for inclusion is Daniel Pyrathon's GSoC project refactoring the _meta object.
A huge thanks to Tim, Carl, Collin, Loïc, Anssi, and everyone else that has provided detailed reviews and feedback - those review notes have been invaluable. And a huge thanks, of course, to Daniel, who has done all the heavy lifting of actually *writing* the patch.
The purpose of this thread is to establish what is standing in the way of a merge. The PR isn't 100% ready for merge - there are a couple of code cleanups still required, and a handful of small design issues that still exist. I'd like this thread to either resolve these design issues, acknowledge that the issues can be deferred to post-Alpha, or get the discussion to a point where the technical board has a concrete proposition to resolve, or clearly establish that this feature isn't going to happen for 1.8.
The design issues I'm aware of aren't especially fundamental issues - they're all mostly about naming. They're not quite bikesheds, though, because the names are either significant, already in use, or likely to change meaning in the near future. So - it's important that we get the names right, but the important thing isn't really the names themselves - it's the concept underpinning the names.
The code and docs as it stands can be found on Github:
In this discussion "Old" Meta refers to the older, officially non-supported API in Django 1.7 and earlier. The "New" Meta is this patch - the API developed during GSoC as a candidate for 1.8.
Summary of API
============
Documentation for the new API is available on the PR, but as a quick summary - the proposal in this pull request is to formally document 2 methods on _meta:
* get_field(name)
Returns a single field instance with the given name.
get_field() exists in the old API, but it also accepts an 'many_to_many' argument; this argument will be deprecated as part of the new API.
* get_fields(include_parents=True, include_hidden=False)
Returns a list of fields associated with a model.
include_parents controls whether the list of fields is those defined literally on this model, or if fields on the parent model are also included (for inheritance situations).
include_hidden controls whether hidden fields are returned. Hidden fields are fields that exist for internal purposes, but aren't part of public API, or have been explicitly hidden by a user. For example, the reverse relation associated with a ForeignKey with a related_name that starts with "+" is a hidden field - the reverse relation exists, but is hidden from public use.
Proposed API usage
===============
The old API contained a large number of API endpoints (e.g., get_concrete_fields_with_model()), each of which was tuned for a specific use cases. Each of these API endpoints had their own caching scheme, and there were some interesting inconsistencies and duplication in the return values.
The new API replaces this large number of methods with two primitives from which all the other values can be derived. The intention is that instead of having a specific method that returns exactly the right results for a single use case, the new API will provide a generic list of fields, and provide the tools to filter/extend results
So - instead of providing "MyModel._meta.get_concrete_fields_with_model()", users of the new _meta API will use:
[(f, f.model) for f in MyModel._meta.get_fields() if f.concrete]
For backwards compatibility, there is an implementation of all the old methods using the new API. However, Django's own code no longer uses these methods, and they have been marked as deprecated and will be removed in a future release.
This approach takes the focus off having a comprehensive API for finding every possible field type, and moves the focus to having flags on each field that enable end-users to identify the types of fields that are appropriate for any given use case.
The flags currently defined in the docs are:
* concrete - the field has an underlying database column.
* editable - the field is designed to be user-modified. e.g., AutoFields are non-editable.
* has_relation - the field represents a relationship with another model
* hidden - fields that are concrete, but aren't intended to be used directly - they're backing for some higher-level functionality (e.g., the content_type and object_id columns for a GenericForeignKey)
All fields also have a `model` attribute that describes the model that holds the field.
Fields that define has_relation=True will also define flags for the cardinality of the relation they define:
* one_to_one
* many_to_one
* one_to_many
* many_to_many
All relation fields also have a `related_model` attribute that describes the model that the relation is pointing to. If the relation is not to a single type (e.g., GFKs), this value is None.
In future, if we add a new field type, the intention will be to add new flags as required to allow those new field types to be differentiated. This won't affect most "data" columns (e.g., a hstore or JSON field) - it's for fields that have unusual underlying representations, like Composite fields.
Outstanding issues:
===============
This API works quite well for Django internally; as I indicated earlier, all the existing Django code has been migrated to use the new API. However, there are two outstanding design issues that I am aware of. (If there are others that I've missed, please raise them!)
* Handling of Generic Foreign Keys (esp in admin)
In the old API, get_fields() didn't return GenericForeignKeys. The new API does, but they can be filtered out.
The only place this really matters in Django itself is in contrib.admin, which can't display widgets for GFKs; as a result, they need to be manually excluded from field lists.
This is currently being done with an internal method named `_get_non_gfk_field()`.
In his review notes, Carl commented that we should be avoiding field-specific names like "GFK", in favour of capturing the underlying concept. This is a fair point, but leads to the question of what the "underlying" concept is here.
At one level, there's a reasonable argument that GFKs can be singled out here - the use is an internal method, in contrib.admin, and GFKs are the only field type admin doesn't support out of the box.
At a higher level, though, a GFK is the only example of a virtual field in Django - a field. This might change, however - some of the Composite Field changes, plus possible change to ForeignKey might lead to more "virtual" fields being introduced - some of which *will* be visible in the admin.
The question: Do we -
1) Accept this particular internal specific naming of GFK as a quirk reflecting the limitations of contrib.admin
2) Try to nail down what a "virtual" field means, or find some alternative flag to identify what "GFK" is a proxy for in this case.
In my opinion, (1) is acceptable at the moment; when Composite Keys (and possibly the ForeignKey refactor) land, we can revisit what virtual means and possibly address this issue in admin at the same time. It is for this reason that I don't think virtual has a place in formal documentation for 1.8, even though the flag exists, and is used by GFKs.
* The "model", "parent_model" and "related_model" properties on fields.
The new API ensures that every field has a `model` attribute, describing the model that owns/defines the field.
However, there's a backwards compatibility problem. RelatedObject - the object used to represent the "other" side of FK and M2M relations - is also returned by get_fields(), and *it* uses the '.model' attribute to store the model that defines the field that the RelatedObject is managing. So - if Book has a foreign key to Author, Author will have a RelatedObject named book_set; and book_set.model will be Book.
Related Objects also have a "parent_model" attribute that identifies the other end of the relation - the model that is being pointed *at*.
As part of this refactor, relation fields like ForeignKey have aquired a "related_model" attribute that - so given a ForeignKey, you can tell what model it is pointing at.
This means there's an inconsistency here - if you think of RelatedObject as being a "reverse" ForeignKey/M2M, then the model should be Author, and related_model should be Book. The name "parent_model" doesn't make a whole lot of sense in this context
The question: Do we:
1) Accept this as a backwards incompatible change.
2) Accept the inconsistency in the new API.
3) Find a pair of names other than model/related_model to represent the "subject/object" pair in fields
My inclination is for (1). RelatedObject isn't stable API at present; the whole purpose of formalising _meta is to formalise some of these historical inconsistencies.
RelatedObject is an internal implementation detail, and one that's only visible if you're using the old "get_all_related_objects()" (and similar) API.
I think the inconvenience caused by making this change is worth it. The new naming is internally consistent, and much less surprising; the set of people who will be affected by this change is also the set of people who are already spelunking through _meta; it's highly unlikely they're going to have a completely painless 1.8 transition.
Summary
=======
I think PR 3114 is pretty close to ready; I've highlighted my preferred options for the two outstanding design issues, both of which are pretty easy to resolve once a decision is made.
Comments? Feedback? Objections?
Yours
Russ Magee %-)