I do not rule out that there are other examples of use of the generated
fields that could lead to the same errors and if they come to mind, please
add them in the comments.
== Misinferred attributes
In the common example of a field generated as a concatenation of two
CharField in a Model:
{{{
#!python
class Person(models.Model):
first_name = models.CharField(max_length=255)
last_name = models.CharField(max_length=255)
full_name = models.GeneratedField(
expression=Concat(
"first_name", models.Value(" "), "last_name"
),
db_persist=True,
)
}}}
The SQL code for the generated column has an automatically inferred max
length of only 255 characters (`varchar(255)`), while the field should be
able to contain strings of 511 characters (`varchar(511)`):
{{{
#!sql
CREATE TABLE "community_person" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"first_name" varchar(255) NOT NULL,
"last_name" varchar(255) NOT NULL,
"full_name" varchar(255) GENERATED ALWAYS AS (
COALESCE("first_name", '') ||
COALESCE(COALESCE(' ', '') ||
COALESCE("last_name", ''), '')
) STORED
);
}}}
==== Proposal
To solve the problem you could alternatively:
a. make the maximum length extraction process smarter for the
automatically created output fields
b. mandatorily require specifying the output field when some of the fields
involved could lead to situations like the ones above
== Missing attributes
If in the previous example, I explicitly specify the output field without
attributes, the migration is generated without generating errors:
{{{
#!python
class Person(models.Model):
first_name = models.CharField(max_length=255)
last_name = models.CharField(max_length=255)
full_name = models.GeneratedField(
expression=Concat(
"first_name", models.Value(" "), "last_name"
),
db_persist=True,
output_field=models.CharField(),
)
}}}
The SQL code contains an incorrect `varchar(None)` type:
{{{
#!sql
CREATE TABLE "community_person" (
"id" integer NOT NULL PRIMARY KEY AUTOINCREMENT,
"first_name" varchar(255) NOT NULL,
"last_name" varchar(255) NOT NULL,
"full_name" varchar(None) GENERATED ALWAYS AS (
COALESCE("first_name", '') ||
COALESCE(COALESCE(' ', '') ||
COALESCE("last_name", ''), '')
) STORED
);
}}}
which will generate an error when trying to apply the migration to the
database:
{{{
#!console
sqlite3.OperationalError: near "None": syntax error
}}}
==== Proposal
To solve the problem you could alternatively:
a. make the SQL code generation process smarter so as to generate working
SQL code
b. request to specify mandatory attributes for the output fields raising
an error in the migration creation phase
== Cumulative proposal
To solve both problems shown above, given the little time remaining before
the release of the stable version of Django 5, the two most drastic
solutions from both of the above proposals could be adopted.
==== Required output field
Always require specifying the output field ''(except when you are sure
that the extracted type cannot generate error situations?)''
Example of error message:
{{{
#!console
django.core.exceptions.FieldError: Expression doesn't contain an explicit
type. You must set output_field.
}}}
==== Required attributes
Request to specify mandatory attributes for the output fields raising an
error in the migration creation phase.
{{{
#!console
$ python3 -m manage makemigrations
SystemCheckError: System check identified some issues:
ERRORS:
community.Person.full_name: (fields.E120) CharFields must define a
'max_length' attribute.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944>
Django <https://code.djangoproject.com/>
The Web framework for perfectionists with deadlines.
Old description:
New description:
Continuing with my experiments with the generated fields I found these two
error situations which I reported together because they are very connected
and could lead to a single solution.
I do not rule out that there are other examples of use of the generated
fields that could lead to the same errors and if they come to mind, please
add them in the comments.
=== Misinferred attributes
==== Proposal
=== Missing attributes
==== Proposal
=== Cumulative proposal
To solve both problems shown above, given the little time remaining before
the release of the stable version of Django 5, the two most drastic
solutions from both of the above proposals could be adopted.
==== Required output field
Always require specifying the output field ''(except when you are sure
that the extracted type cannot generate error situations?)''
Example of error message:
{{{
#!console
django.core.exceptions.FieldError: Expression doesn't contain an explicit
type. You must set output_field.
}}}
==== Required attributes
Request to specify mandatory attributes for the output fields raising an
error in the migration creation phase.
{{{
#!console
$ python3 -m manage makemigrations
SystemCheckError: System check identified some issues:
ERRORS:
community.Person.full_name: (fields.E120) CharFields must define a
'max_length' attribute.
}}}
--
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:1>
* cc: Lily Foote, Mariusz Felisiak (added)
* stage: Unreviewed => Accepted
Comment:
Thank you Paolo! I reproduced both scenarios, accepting.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:2>
* owner: nobody => Om Dahale
* status: new => assigned
Comment:
Hello, I would like to work on this issue.
As pointed out by Paolo, I can -
1. make the output_field required when it can generate errors
2. and ask to specify the required attributes like max_length on migration
creation phase.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:3>
Comment (by Paolo Melchiorre):
Replying to [comment:3 Om Dahale]:
> Hello, I would like to work on this issue.
Since this issue is a release blocker I hope you have time to propose an
MR ASAP :)
> As pointed out by Paolo, I can -
> 1. make the output_field required when it can generate errors
> 2. and ask to specify the required attributes like max_length on
migration creation phase.
Precisely, the two points I proposed as a solution are the ones that can
be implemented faster than others.
For the point above:
1. "when it can generate errors" it can be difficult in all the cases,
maybe you can start asking every time for `output_field`
2. "attributes like max_length" we need to add tests for all model field
types
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:4>
Comment (by David Sanders):
Hm I think there are 2 tickets here… or at least 2 PRs. Perhaps forum
discussion is required but I don't think we ought to go down the route of
trying to make `resolve_expression()` smart enough to determine that
concat'ing 2 varchars means we need to add the max-lengths together –
there may be a can of worms here since expressions could be ever so
complex? 🤔
We definitely need to fix the broken "None" in the DDL 👍
> Always require specifying the output field (except when you are sure
that the extracted type cannot generate error situations?)
Perhaps 👍 "Explicit is better than implicit" and all that, though I
think that documenting "For complex expressions consider always declaring
an output_field" is a nice option.
I'm keen to hear Mariusz & Lily's thoughts.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:5>
Comment (by Om Dahale):
Replying to [comment:4 Paolo Melchiorre]:
>
> Since this issue is a release blocker I hope you have time to propose an
MR ASAP :)
>
Okay
>
> Precisely, the two points I proposed as a solution are the ones that can
be implemented faster than others.
>
> For the point above:
>
> 1. "when it can generate errors" it can be difficult in all the cases,
maybe you can start asking every time for `output_field`
> 2. "attributes like max_length" we need to add tests for all model field
types
Noted
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:6>
Comment (by Om Dahale):
Replying to [comment:5 David Sanders]:
> Hm I think there are 2 tickets here… or at least 2 PRs. Perhaps forum
discussion is required but I don't think we ought to go down the route of
trying to make `resolve_expression()` smart enough to determine that
concat'ing 2 varchars means we need to add the max-lengths together –
there may be a can of worms here since expressions could be ever so
complex? 🤔
>
> We definitely need to fix the broken "None" in the DDL 👍
>
> > Always require specifying the output field (except when you are sure
that the extracted type cannot generate error situations?)
>
> Perhaps 👍 "Explicit is better than implicit" and all that, though I
think that documenting "For complex expressions consider always declaring
an output_field" is a nice option.
>
> I'm keen to hear Mariusz & Lily's thoughts.
So shall I continue my work on the 2 points given above?
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:7>
Comment (by Paolo Melchiorre):
Replying to [comment:5 David Sanders]:
> Perhaps forum discussion is required ...
I started a discussion in the forum:
https://forum.djangoproject.com/t/release-blocker-missing-or-misinferred-
attributes-in-output-fields-of-generated-fields/25103
> ... I don't think we ought to go down the route of trying to make
`resolve_expression()` smart enough to determine that concat'ing 2
varchars means we need to add the max-lengths together ... 🤔
In fact, in my proposal for the "Cumulative proposal", I excluded this
hypothesis.
> We definitely need to fix the broken "None" in the DDL 👍
Yes, I think it absolutely needs to be resolved.
> > Always require specifying the output field (except when you are sure
that the extracted type cannot generate error situations?)
>
> Perhaps 👍 "Explicit is better than implicit" and all that, though I
think that documenting "For complex expressions consider always declaring
an output_field" is a nice option.
It could be a good solution, but it should be highlighted that deriving
the `output_field` from the expression could hide pitfalls even in
apparently simple cases such as concatenating two `CharField` on `SQLite`.
Not all Django developers look under the hood of the `ORM` or check the
`SQL` code generated by migrations.
Perhaps it might be safer to always require the user to specify an
`output_field`.
> I'm keen to hear Mariusz & Lily's thoughts.
Me too, and also the people who collaborated on this feature like Simon,
Lily, and Adam.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:8>
Comment (by Paolo Melchiorre):
Replying to [comment:7 Om Dahale]:
> So shall I continue my work on the 2 points given above?
I think you should.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:9>
Comment (by Om Dahale):
Replying to [comment:9 Paolo Melchiorre]:
> Replying to [comment:7 Om Dahale]:
> > So shall I continue my work on the 2 points given above?
>
> I think you should.
Okay
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:10>
Comment (by Lily Foote):
I'm curious if there are other expressions that could lead to an increased
column size or if it's just `Concat`. Maybe we could add something to
`Concat` (and other relevant expressions) that can express that column
size change?
If that's not feasible I would lean towards detecting the case where the
column size could be invalid and require `output_field` explicitly in
those cases. I'd be sad if `GeneratedField` always required an explicit
`output_field`.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:11>
Comment (by Paolo Melchiorre):
I published an introduction to database generated columns, using SQLite
and the new GeneratedField added in Django 5.0
In this article, I shared the work I have done in testing GenerateField
using SQLite as a database backend, and I hope it can be useful to people
working in this issue.
https://www.paulox.net/2023/11/07/database-generated-columns-part-1
-django-and-sqlite/
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:12>
Comment (by Paolo Melchiorre):
Replying to [comment:11 Lily Foote]:
> I'm curious if there are other expressions that could lead to an
increased column size or if it's just `Concat`.
Unfortunately, many other cases come to mind, for example a field
generated as a product of two IntegerField which should be a
BigIntegerField.
> If that's not feasible I would lean towards detecting the case where the
column size could be invalid and require `output_field` explicitly in
those cases. I'd be sad if `GeneratedField` always required an explicit
`output_field`.
I agree with you but from the various tests I have done there are more and
more cases in which the automatically derived output field has problems.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:13>
Comment (by Simon Charette):
Resolving the `output_field` of combined expressions is a complex task
that the ORM only has limited support for #26355, #31506, #33397, this is
not a problem specific to `GenerateField`.
It is particularly tricky in the case of `varchar(x) + varchar(y)` as some
backends restrict the allowed length of `varchar` (e.g. MySQL as 255) and
using a `TextField` might come with backend specific limitations.
Our stance so far has been to make a best effort guess and
[https://docs.djangoproject.com/en/4.2/ref/models/expressions/#output-
field refuse the temptation to guess otherwise].
I suggest that we take the same approach here and expect the user to
specify an explicit `output_field`.
For the second part of the problem, `CharField(None)`, I think that the
addition of a check is in order. Adding a
`GeneratedField._check_output_field` that simply delegates to
`self.output_field.check(**kwargs)` seems like the way to go to catch all
misuse of `output_field`.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:14>
Comment (by Om Dahale):
Just fyi I am getting the hang of it now and will be raising a PR soon 🙂
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:15>
Comment (by David Sanders):
Thanks Simon. I think we all seem to be in consensus about explicit
output_field… Lily & I were chatting on Discord (Paolo I believe you're on
there too) about how it seems unavoidable. 👍
@Om Dahale Thanks, I was going to suggest actually starting with an easier
ticket as I assumed you were a new contributor? But if you feel like
you're up to the task then good luck, just be aware that someone else may
need to "take over" if things stall seeing as this is a release blocker ;)
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:16>
Comment (by Simon Charette):
FWIW I create [https://github.com/django/django/pull/17453 a draft MR] to
demonstrate how `Concat` resolving could be enhanced but I still think
that we should not included it in 5.0. Might be worth a follow up
cleanup/optimization ticket though.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:17>
Comment (by Paolo Melchiorre):
Replying to [comment:14 Simon Charette]:
> Resolving the `output_field` of combined expressions is a complex task
that the ORM only has limited support for #26355, #31506, #33397, this is
not a problem specific to `GenerateField`.
Thanks for pointing this out, it helped me put this issue in the right
perspective.
> It is particularly tricky in the case of `varchar(x) + varchar(y)` as
some backends restrict the allowed length of `varchar` (e.g. MySQL at 255)
...
I would add that, to complicate things, the restrictions on the various
types of fields also vary between different versions of the same database.
> > The SQL code for the generated column has an automatically inferred
max length of only 255 characters (varchar(255))
>
> ... In this particular case it's tempting to adjust the logic of
`Concat._resolve_output_field` to be smarter and generate a
`CharField(max_length=511)` but I suspect it might have unintended effect
and not something we'd like to include in 5.0 at this point.
> I suggest that we take the same approach here and expect the user to
specify an explicit `output_field`.
I totally agree, and thanks for rephrasing my initial proposal more
clearly.
> For the second part of the problem, `CharField(None)`, I think that the
addition of a check is in order. Adding a
`GeneratedField._check_output_field` that simply delegates to
`self.output_field.check(**kwargs)` seems like the way to go to catch all
misuse of `output_field`.
Thanks for pointing out this approach.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:18>
Comment (by Paolo Melchiorre):
Replying to [comment:16 David Sanders]:
> Thanks Simon. I think we all seem to be in consensus about explicit
output_field… Lily & I were chatting on Discord (Paolo I believe you're on
there too) about how it seems unavoidable. 👍
Given the different time zone, I have only now read Discord, but I repeat
that, as I suspected in the description of the issue, it is inevitable at
this point to request an explicit `output_field`.
> ... just be aware that someone else may need to "take over" if things
stall seeing as this is a release blocker ;)
Thanks for making that clear.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:19>
Comment (by Paolo Melchiorre):
Replying to [comment:17 Simon Charette]:
> FWIW I create [https://github.com/django/django/pull/17453 a draft MR]
to demonstrate how `Concat` resolving could be enhanced but I still think
that we should not included it in 5.0. Might be worth a follow up
cleanup/optimization ticket though.
To avoid mixing too many things here, I have already opened Issue #34954
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:20>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"8b1acc0440418ac8f45ba48e2dfcf5126c83341b" 8b1acc04]:
{{{
#!CommitTicketReference repository=""
revision="8b1acc0440418ac8f45ba48e2dfcf5126c83341b"
Refs #30446, Refs #34944 -- Fixed crash when adding GeneratedField with
string Value().
This should allow smarter output_field inferring in functions dealing
with text expressions.
Regression in f333e3513e8bdf5ffeb6eeb63021c230082e6f95.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:21>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"73869a51632b9f826d4d7b9e888f81d60b16e8b6" 73869a51]:
{{{
#!CommitTicketReference repository=""
revision="73869a51632b9f826d4d7b9e888f81d60b16e8b6"
[5.0.x] Refs #30446, Refs #34944 -- Fixed crash when adding GeneratedField
with string Value().
This should allow smarter output_field inferring in functions dealing
with text expressions.
Regression in f333e3513e8bdf5ffeb6eeb63021c230082e6f95.
Backport of 8b1acc0440418ac8f45ba48e2dfcf5126c83341b from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:22>
Comment (by Om Dahale):
Replying to [comment:14 Simon Charette]:
> For the second part of the problem, `CharField(None)`, I think that the
addition of a check is in order. Adding a
`GeneratedField._check_output_field` that simply delegates to
`self.output_field.check(**kwargs)` seems like the way to go to catch all
misuse of `output_field`.
Hello, Simon!
I raised a [https://github.com/django/django/pull/17458 PR] can you please
have a look and let me know if I am going in the right direction?
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:23>
Comment (by Om Dahale):
Replying to [comment:16 David Sanders]:
> @Om Dahale Thanks, I was going to suggest actually starting with an
easier ticket as I assumed you were a new contributor? But if you feel
like you're up to the task then good luck, just be aware that someone else
may need to "take over" if things stall seeing as this is a release
blocker ;)
Yes, understood.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:24>
* has_patch: 0 => 1
Comment:
https://github.com/django/django/pull/17458
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:25>
Comment (by Om Dahale):
Reposting from PR..
I did this and now receiving the message to specify `max_length`
{{{
def check(self, **kwargs):
databases = kwargs.get("databases") or []
return [
*super().check(**kwargs),
*self._check_output_field(**kwargs),
*self._check_supported(databases),
*self._check_persistence(databases),
]
def _check_output_field(self, **kwargs):
# Ignore field name checks as the output_field is not a field.
self.output_field.name = self.name + "_output_field"
self.output_field.model = self.model
return self.output_field.check(**kwargs)
}}}
is this correct?
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:26>
Comment (by Om Dahale):
I am available full day today so let's wrap this up quickly
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:27>
Comment (by Om Dahale):
Hello, Paolo Simon David
Can you help me out, just let me know if it is correct and if not then
please point out the mistakes?
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:28>
* owner: Om Dahale => Mariusz Felisiak
Comment:
[https://github.com/django/django/pull/17470 PR]
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:29>
* needs_better_patch: 0 => 1
Comment:
There is a minor typo but otherwise PR looks good.
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:30>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"de4884b114534f43c49cf8c5b7f10181e737f4e9" de4884b1]:
{{{
#!CommitTicketReference repository=""
revision="de4884b114534f43c49cf8c5b7f10181e737f4e9"
Reverted "Refs #30446, Refs #34944 -- Fixed crash when adding
GeneratedField with string Value()."
This reverts commit 8b1acc0440418ac8f45ba48e2dfcf5126c83341b.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:31>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"5b1d0a6be06a14d130b2b7fa305dbc9e2a393761" 5b1d0a6]:
{{{
#!CommitTicketReference repository=""
revision="5b1d0a6be06a14d130b2b7fa305dbc9e2a393761"
[5.0.x] Reverted "Refs #30446, Refs #34944 -- Fixed crash when adding
GeneratedField with string Value()."
This reverts commit 8b1acc0440418ac8f45ba48e2dfcf5126c83341b.
Backport of de4884b114534f43c49cf8c5b7f10181e737f4e9 from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:32>
* needs_better_patch: 1 => 0
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:33>
* stage: Accepted => Ready for checkin
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:34>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"c705625ebff0141ed2b95dd3c8174bda8270a47f" c705625]:
{{{
#!CommitTicketReference repository=""
revision="c705625ebff0141ed2b95dd3c8174bda8270a47f"
Refs #34944 -- Propagated system checks for GeneratedField.output_field.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:36>
* status: assigned => closed
* resolution: => fixed
Comment:
In [changeset:"5875f03ce61b85dfd9ad34f7b871c231c358d432" 5875f03c]:
{{{
#!CommitTicketReference repository=""
revision="5875f03ce61b85dfd9ad34f7b871c231c358d432"
Fixed #34944 -- Made GeneratedField.output_field required.
Regression in f333e3513e8bdf5ffeb6eeb63021c230082e6f95.
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:35>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"ddbe5c86e8180edffc2bb3e327da75953a4c6c2c" ddbe5c8]:
{{{
#!CommitTicketReference repository=""
revision="ddbe5c86e8180edffc2bb3e327da75953a4c6c2c"
[5.0.x] Fixed #34944 -- Made GeneratedField.output_field required.
Regression in f333e3513e8bdf5ffeb6eeb63021c230082e6f95.
Backport of 5875f03ce61b85dfd9ad34f7b871c231c358d432 from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:37>
Comment (by Mariusz Felisiak <felisiak.mariusz@…>):
In [changeset:"fcc55f8c263619a4c8a36f908c8f6efc0a546e7a" fcc55f8c]:
{{{
#!CommitTicketReference repository=""
revision="fcc55f8c263619a4c8a36f908c8f6efc0a546e7a"
[5.0.x] Refs #34944 -- Propagated system checks for
GeneratedField.output_field.
Backport of c705625ebff0141ed2b95dd3c8174bda8270a47f from main
}}}
--
Ticket URL: <https://code.djangoproject.com/ticket/34944#comment:38>