Should annotations allow model field constraints?

386 views
Skip to first unread message

Josh Smeaton

unread,
Dec 4, 2014, 9:04:21 PM12/4/14
to django-d...@googlegroups.com
Trac user jbg has found a regression with the aggregation/annotation refactor #14030. The crux of the problem comes down to whether annotations and aggregations should respect constraints on model fields. The specific example raised is as follows:

Model.objects.aggregate(
    sum_price
=Sum('price', output_field=DecimalField(max_digits=5, decimal_places=2)))

Assuming the field "price" is defined as "DecimalField(max_digits=5, decimal_places=2)", it doesn't matter if output_field is defined or not, as the implicit F('price') will result in exactly the same behaviour.

Each row in the table will adhere to the max_digits and decimal_places restriction. It has to, it's part of the column definition. But once you sum those values together, they are going to overflow. The same problem would exist if you were to concatenate two CharFields with a max_length=10.

Ticket https://code.djangoproject.com/ticket/23941 has been opened to track this.

These constraints do not necessarily make sense because we're not persisting these values anywhere. They are read-only computations, and should not require bounds. If these values were to be persisted in another model, then the input fields on that model are required to ensure the values are valid.

I've implemented https://github.com/django/django/pull/3655/ which effectively ignores the max_digits and decimal_places arguments in the base expression class. Subclasses are free to enforce validation if it is necessary. Further, callers using aren't required to supply bounds checking in this situation. This is valid (with my patch):

Model.objects.aggregate(
    sum_price
=Sum('price', output_field=DecimalField()))

And the same goes with CharField() not requiring a max_length argument.

Can anyone see any issues with this approach?

If the above seems sensible, I'll alter the test cases to remove arguments like decimal_places and max_length, and update the documentation to call out that these kinds of arguments are not used for annotations.

Thanks,

Josh

Anssi Kääriäinen

unread,
Dec 5, 2014, 1:06:15 AM12/5/14
to django-d...@googlegroups.com
On Thu, 2014-12-04 at 18:04 -0800, Josh Smeaton wrote:

>
> I've implemented https://github.com/django/django/pull/3655/ which
> effectively ignores the max_digits and decimal_places arguments in the
> base expression class. Subclasses are free to enforce validation if it
> is necessary. Further, callers using aren't required to supply bounds
> checking in this situation.

It seems wrong to throw away constraints the user has explicitly set.
So, if you say:

.annotate(sum_age=Sum('age', output_field=DecimalField(max_digits=3)))

then it seems that the max_digits should really be 3.

So, if the user explicitly asks for max_digits=3 in the output_field
argument, then we should enforce that. Or, alternatively we should
disallow setting max_digits for explicit output_field in expressions.
But just throwing away the value the user has explicitly set is wrong.

- Anssi

Marc Tamlyn

unread,
Dec 5, 2014, 3:44:50 AM12/5/14
to django-d...@googlegroups.com
I think this should be an error personally - I'm not sure how we check that effectively and efficiently.

It reminds me of asking postgres to cast a certain type to another one - it will throw an error if any record it tries to cast can't be done.

Is it possible with say a CharField to not have a max_length parameter here, even though you can't make a concrete field without it? If not then we have another issue.

Marc


--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/1417760145.22478.72.camel%40TTY32.
For more options, visit https://groups.google.com/d/optout.

Alex Hill

unread,
Dec 5, 2014, 6:24:25 AM12/5/14
to django-d...@googlegroups.com
First, thanks for all your hard work in this area Josh.

From a quick look at the code, my understanding is that output_field can be None, in which case the ORM will use the model's field type, or it can be specified explicitly in the query.

I think that if specified explicitly, any constraints should be honoured and an exception raised if the computed aggregate value violates a constraint. If it's not specified explicitly, any constraints on the model field should be ignored as per your reasoning above. I should be able to write Model.objects.aggregate(sum_price=Sum('price')), where the sum value violates the constraints, without any error.

That provides sensible default behaviour which can be easily overridden if necessary.

Cheers,
Alex

Carl Meyer

unread,
Dec 5, 2014, 2:11:08 PM12/5/14
to django-d...@googlegroups.com
I just commented on the ticket with more details, but I think "not
enforcing" is actually the right answer, and consistent with how other
fields behave.

I agree that it's odd that you can provide these validation parameters
(to DecimalField and many other field types) when instantiating them for
output_field, and they won't be enforced, but I don't see a reasonable
way to address that in the general case.

Well, actually, I do see one way - I think it would probably be better
if `output_field` took classes, not instances. But unfortunately I think
that would be quite hard to implement, since `get_internal_type` is an
instance method, not a classmethod.

Carl

signature.asc

Josh Smeaton

unread,
Dec 5, 2014, 6:02:53 PM12/5/14
to django-d...@googlegroups.com
Thanks for all the comments. I'll address the relevant bits below.
 
So, if the user explicitly asks for max_digits=3 in the output_field 
argument, then we should enforce that. Or, alternatively we should 
disallow setting max_digits for explicit output_field in expressions. 

I don't know how we could enforce model field arguments in the general case. For DecimalFields I removed the explicit call to format_number (from the converters) which now doesn't enforce max_digits/decimal_places, but there is no other code that checks model field constraints. I think the right choice is to disallow setting any constraints in the first place, but I don't know how we'd enforce that in the general case either.

Is it possible with say a CharField to not have a max_length parameter here, even though you can't make a concrete field without it? If not then we have another issue.

The checks framework and validators enforce these kinds of constraints, but neither are triggered when constructing fields for use in output_field. However, the latest change that allows a DecimalField to be constructed without digits/decimals fails on sqlite, and any other backend that has a db_converter that does a format_value on the result before the expression has a chance to convert it.

So it looks like we can't just ignore or disallow constraints without some more substantial changes due to backend converters.

First, thanks for all your hard work in this area Josh.

Thanks!

I think that if specified explicitly, any constraints should be honoured and an exception raised if the computed aggregate value violates a constraint.

This is similar to what Ansii thinks the behaviour should be. But we don't know if a user has provided the field, or if the field is being repurposed from the underlying field (for F() expressions), or if it was derived internally. Even if we did know, how would expressions know how to validate custom fields with custom constraints?

 I agree that it's odd that you can provide these validation parameters 
(to DecimalField and many other field types) when instantiating them for 
output_field, and they won't be enforced, but I don't see a reasonable 
way to address that in the general case. 

Exactly. db_column, db_index, blank .. none of these are applicable to expressions. How could we pick and choose which to enforce or ignore, especially when considering custom fields that we have no visibility of. 

I think your notes on the ticket are fairly accurate. Though a few things to note.

We can't work with just a class, even if the get_internal_type was a classmethod. There are other methods that are called, such as get_db_converters. We could instantiate with no args, but I'm not sure that's a better API.

So in a way, this is simply a bug in DecimalField (or perhaps in the way an internal-type of DecimalField is special-cased in ExpressionNode.convert_value), in that its "validation" constraints are applied (and result in hard errors) in data conversion, where normally field validation does not apply at all.

I think this is the best way to look at it, and I failed to communicate that position properly. As I mentioned above, I was under the wrong impression that DecimalField could be passed along without any arguments, but some backends (sqlite and oracle) do use them to convert database values to python values. This seems to be unique to DecimalFields though. If this limitation can be overcome, and documentation calls out that arguments passed to annotation fields are redundant, I think that this ticket will be satisfied.

If users have an interest in applying constraints, then those constraints can be modelled as new expressions, enforced at the database. Something like Cast(expression, Decimal(5, 2)). That's really a new feature though, and should be raised as such if the need arises.

Josh
 

Alex Hill

unread,
Dec 5, 2014, 9:48:08 PM12/5/14
to django-d...@googlegroups.com
Hi Josh,
 
This is similar to what Ansii thinks the behaviour should be. But we don't know if a user has provided the field, or if the field is being repurposed from the underlying field (for F() expressions), or if it was derived internally.

I was thinking you'd instantiate the field without the constraints in _resolve_output_field, i.e. when output_field is None. Essentially it's the same as the "just provide a class" approach: it would require a contract which says all fields must be able to be instantiated in an output_field-compatible (i.e. "unconstrained") way by providing no arguments.

That seems related to the limitation in DecimalField that we're talking about correcting anyway. Do you have a feeling of how near or far Django's built-in fields are of meeting that requirement currently?
 
We could instantiate with no args, but I'm not sure that's a better API.

It's nicer at least in the sense that you don't ignore constraints. The lack of constraints on the output field is implied by the API. I quite like that.

Cheers,
Alex

Josh Smeaton

unread,
Dec 8, 2014, 9:24:48 PM12/8/14
to django-d...@googlegroups.com
I've marked the patch as ready for checkin. Instantiating fields without any arguments for decimal field is now supported by all backends. The format_number function has learnt to understand max_digits or decimal_places being None. The format_number function needs review though. When decimal_places is None, the decimal will be rounded to max_digits (using context.prec). I've decided to throw an error if the result is rounded though, so that the value must be within max_digits.

This patch does:

- fixes the regression
- ignores precision and decimal places for annotations in the base class
- documents that model fields for output_field take no args

This patch does not:

- allow using classes instead of instances
- take into consideration any arguments passed to model fields

I don't feel that passing classes and then having to instantiate them internally is a nice enough API for the complexity it brings, especially when fields can be created without arguments by the user. If we come across a field that does not allow construction without arguments, then we should fix that field. If, in the future, we do want to support some arguments then users will have a weird choice between passing an instance or a class.

As for support of model field arguments, I think that's a separate design decision and patch, and shouldn't hold up fixing the previous behaviour of aggregation over decimal fields.

Regards,

Reply all
Reply to author
Forward
0 new messages