base implementations of natural key methods

176 views
Skip to first unread message

Jian Li

unread,
May 6, 2014, 10:20:51 PM5/6/14
to django-d...@googlegroups.com
Hi!

In the course of implementing `natural_key` for many different models, I've noticed that the implementation is fairly predictable; it tends to use the fields already marked as unique. To avoid writing a separate implementation for each model, I've written a patch that implements the relevant logic on the model and manager base classes:



Details:

- The proposed implementation is recursive, which enables it to create natural foreign keys even when the foreign key structure is arbitrarily deep.

- When a natural key is not available but the user specifies `--natural-foreign`, the proposed implementation will raise an exception. This is inconsistent with the current behavior of silently falling back to using the "artificial" foreign key. However, I would argue that this is the correct behavior, and also the behavior implied by the documentation; if the user wants a natural foreign key serialization and Django is not able to provide it, Django should let the user know with an exception.

- I was unsure whether to add the model method to `ModelBase` or `Model`.

- I've manually tested a nearly identical patch against 1.5.4, but haven't had a chance to test against the development version.

Please let me know what you think, and keep up the amazing work!

Russell Keith-Magee

unread,
May 7, 2014, 8:07:55 PM5/7/14
to Django Developers
Hi Jian, 

What you've proposed is an interesting idea, and a magnificent example of what you can do by exploiting the contents of _meta - but as a proposal for Django's core, I think it falls down in a couple of critical places.

1) A model without unique_together can still implement natural key. For example, the User model has a natural key, but doesn't have a unique_together clause. So, if you've just go a unique field, you can still have a natural key, but this helper won't help. 

2) Assuming you can resolve (1) (which wouldn't be that hard in practice), multiple fields on a model can have unique=True set. How do you pick which one is *really* the natural key?

3) A model can have *multiple* unique_together clauses. Your code only appears to handle the case of "unique_together = ('field1','field2')", but it's also legitimate to say "unique_together = [('field1', 'field2'), ('field1', 'field3')]" - i.e., multiple independent unique_together clauses. Again, how do you pick which one is *really* the natural key?

4) What about if you have multiple fields with unique=True *and* multiple unique_together clauses?

5) Even if we could resolve those issues, there's a backwards compatibility issue implicit in changing behaviour around serialisation, and Django (as a project) is very sensitive to those issues.

There *might* be a solution that covers problems 1-4, and maybe even problem 5; but that solution is almost certainly going to be declarative (i.e., naming which field/field pair is the natural key). However, once you're declaring the fields that are your natural key, you're 90% of the way to an implementation of get_natural_key() -- so you have to ask why bother developing and maintaining a moderately complex automated method for something that isn't that hard to declare explicitly in 2 lines of code.

So, I'm -0 on this as currently proposed. If you've got any creative solutions to these problems, please raise them. The broader principle of having models have useful properties by virtue of their basic definition is appealing to me. I'm just not convinced that based on the details you've presented so far, we would get a net gain.

Yours,
Russ Magee %-)

Jian Li

unread,
May 7, 2014, 9:44:34 PM5/7/14
to django-d...@googlegroups.com
Hi Russ,

Thanks for the feedback!

I believe the proposed code already addresses points 1-4.

1) The implementation does in fact return the correct natural key, `(self.username,)`, for the User model.

2) When multiple fields have `unique=True`, the implementation does not choose from among them; it returns a tuple containing *all* non-auto-generated unique fields.

3) In the presence of multiple `unique_together` tuples, the implementation arbitrarily chooses the first one.

4) In the presence of both `unique_together` fields and `unique` fields, the implementation arbitrarily goes with the former.

You can see the specific algorithm at line 1422.

The implementation always chooses a natural key which uniquely identifies the relevant model. The choice may be arbitrary in the presence of multiple cues, but so is any individually-implemented choice of a natural key.

At the very worst, if the user does not agree with the choice for the specific model, they can fall back to individually implementing the natural key for that model; this is no worse than the current situation. So I would argue that the base implementation proposed here is strictly better than the status quo.

In the course of serializing about a dozen different models, I've found this to be a very useful base implementation that has saved me from the tedium of writing many lines of boilerplate.


--
You received this message because you are subscribed to the Google Groups "Django developers" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAJxq84-1ENdKbGPPbRAOFqXxmunCWxgxxwvyyaUR6rBQ1Exajw%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Russell Keith-Magee

unread,
Jun 24, 2014, 8:33:37 PM6/24/14
to Django Developers
Hi Jian,

Apologies for the delay in responding.



On Thu, May 8, 2014 at 9:44 AM, Jian Li <ji...@jianli.us> wrote:
Hi Russ,

Thanks for the feedback!

I believe the proposed code already addresses points 1-4.

1) The implementation does in fact return the correct natural key, `(self.username,)`, for the User model.

2) When multiple fields have `unique=True`, the implementation does not choose from among them; it returns a tuple containing *all* non-auto-generated unique fields.

Ok - I can see how that could work. I don't necessarily agree that it's obvious, since it would produce an over specified key - but from a technical perspective, it would work.
 
3) In the presence of multiple `unique_together` tuples, the implementation arbitrarily chooses the first one.

4) In the presence of both `unique_together` fields and `unique` fields, the implementation arbitrarily goes with the former.

My apologies - I didn't catch that in my scan of your implementation.
 
You can see the specific algorithm at line 1422.

The implementation always chooses a natural key which uniquely identifies the relevant model. The choice may be arbitrary in the presence of multiple cues, but so is any individually-implemented choice of a natural key.

At the very worst, if the user does not agree with the choice for the specific model, they can fall back to individually implementing the natural key for that model; this is no worse than the current situation. So I would argue that the base implementation proposed here is strictly better than the status quo.

In the course of serializing about a dozen different models, I've found this to be a very useful base implementation that has saved me from the tedium of writing many lines of boilerplate.

That's the thing - I don't see it as boilerplate. I see it as a behaviour that *should* be explicitly defined, not implicitly derived from a model definition.

Here's another example why: Changes in schema.

You build some models, and generate some fixtures. Then you add a new unique field to your model (or remove one). You try to load your fixtures - and the load fails, because the serialisation scheme has implicitly changed. Ok, sure - you could work your way out of the problem by introducing a temporary manual definition for the serialiser, but the whole problem would have been avoided if you'd just used a manual definition in the first place.

This is one of those times where dogmatic adherence to DRY is a bad thing. DRY doesn't mean you don't ever type the same thing twice. It means that you don't ever express the same *concept* twice. And in this case, the structure of your natural keys is a concept that is independent of the uniqueness constraints in your model - they're related concepts, to be sure - but they're independent.

So - I'm still -0. If I had to provide an alternative, I'd say that a Meta definition to allow you to explicitly define your natural key:

class MyModel(Model):
    ...
    class Meta:
        natural_key = ('field1', 'field2')  

that then automatically rolls out into the appropriate definitions (but would be overridden by manually defined key functions) would be a preferable solution - but we're talking about 2 methods that are, for the most part, 2 lines of code each. I just don't see the overhead of defining natural key functions as something that is particularly onerous, and the overhead involved in building, maintaining and testing automated (or semi-automated) natural key functions just isn't worth it, IMHO.

Yours,
Russ Magee %-)

Reply all
Reply to author
Forward
0 new messages