Generated Field

Paolo Melchiorre

unread,

Apr 13, 2022, 6:25:48 AM4/13/22

to django-d...@googlegroups.com

Hi all,

I am at PyCon DE 2022 in Berlin with Markus and I shared with him this
idea that I have been thinking about for a few months, given his
interest I also share it with you.

I figured we could add in Django a "GeneratedField" which accepts a
"base_field" attribute in a similar way to "ArrayField" and then an
expression to generate a value for the field.

For example this model:

class Album(models.Model):
...
title = models.CharField(max_length=120)
search = GeneratedField(
SearchVectorField(),
F('title')
)

would generate an SQL code like this in PostgreSQL:

CREATE TABLE album (
...
title char(120),
search tsvector GENERATED ALWAYS AS title STORED
);

I found some documentation for different database backend about
generated column (or similar names)

# SQLite
https://www.sqlite.org/gencol.html
https://www.sqlite.org/releaselog/3_31_0.html

# PostgreSQL
https://www.postgresql.org/docs/12/ddl-generated-columns.html
https://www.postgresql.org/docs/12/release-12.html

# Oracle
https://oracle-base.com/articles/11g/virtual-columns-11gr1

# MySQL
https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html
https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html

# MariaDB
https://mariadb.com/kb/en/changes-improvements-in-mariadb-102/
https://mariadb.com/kb/en/generated-columns/

I'd love to hear your thoughts on this idea and if you think it might
be something worth continuing to investigate.

Greetings from PyCon DE 2022 in Berlin,
Paolo

--
https://www.paulox.net

Adam Johnson

unread,

Apr 13, 2022, 7:01:46 AM4/13/22

to Django developers (Contributions to Django itself)

I'd be interested in seeing this. Generated columns are a useful SQL feature that are missing from Django.

Nice initial research on backend coverage - it looks like they're widely supported.

Some ideas...

Is it necessary to take a base field? Can we not determine the output field type for some kinds of expression? e.g. F("some_integer_field") + 1 can be assumed to have output field type IntegerField.

Would it be worth blocking assignment to the field? Allowing "model.generated_field = ..." would at least be misleading. (This could be done with a special descriptor.)

Good luck!

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAKFO%2Bx5GHUEVdzi2awYtH5C17tTPTPh%2ByoDP%3DKC18pF8%2Bi_7PA%40mail.gmail.com.

Mariusz Felisiak

unread,

Apr 13, 2022, 7:05:49 AM4/13/22

to Django developers (Contributions to Django itself)

Related tickets:

- https://code.djangoproject.com/ticket/31300: Add function-based virtual fields on PostgreSQL and Oracle.

- https://code.djangoproject.com/ticket/31565: Support GENERATED ALWAYS columns for MySQL and PostgreSQ

Related DEP:

- https://github.com/django/deps/pull/39 - Refactor ORM with VirtualField and CompositeField

and an accepted ticket for non-database backed calculated field (see discussion https://groups.google.com/g/django-developers/c/ADSuUUuZp3Q/m/eZGYZv74AQAJ):

- https://code.djangoproject.com/ticket/28822 - Add DBCalculatedField to model to annotate models automatically

Kye Russell

unread,

Apr 13, 2022, 7:09:38 AM4/13/22

to Django developers (Contributions to Django itself)

I’d love to see this!

Kye

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/5bf85475-4d97-4249-bd68-baf9906d9fffn%40googlegroups.com.

charettes

unread,

Apr 13, 2022, 9:48:57 AM4/13/22

to Django developers (Contributions to Django itself)

> Is it necessary to take a base field? Can we not determine the output field type for some kinds of expression? e.g. F("some_integer_field") + 1 can be assumed to have output field type IntegerField.

It should be possible by simply accessing the `output_field` property of the provided expression once it's resolved yes.

e.g.

# Will crash as the output field of some_integer_field has not been resolved.

(F("some_integer_field") + 1).output_field

# Will work as F resolved to a Col pointing at SomeModel.some_integer_field (assuming it exists)

(F("some_integer_field") + 1).resolve_expression(Query(SomeModel), allow_joins=False).output_field

But as Paolo demonstrate in his example I think we might want a distinct separation between the column type and the expression used to generate the value of this field if unless we want to force users to do gymnastics to rely on type coercion.

For example, in the reported case the user would have to do `GeneratedField(SearchVector('title'))` which would actually result in `tsvector GENERATED ALWAYS AS to_tsvector(title)` STORED which is not exactly the same. It's not an issue here but I wouldn't be surprised that not being able to configured a generated field to generate a particular SQL output might be problematic in some cases.

What about we make the expected signature `GeneratedField(expression, base_field=None)` where a missing `base_field` defaults to `expression.output_field`? That would allow the exact expected SQL to be generated with `GeneratedField('title', base_field=SearchVectorField())` if there's a requirement for it.

Cheers,

Simon

Adam Johnson

unread,

Apr 13, 2022, 10:34:19 AM4/13/22

to Django developers (Contributions to Django itself)

What about we make the expected signature `GeneratedField(expression, base_field=None)` where a missing `base_field` defaults to `expression.output_field`? That would allow the exact expected SQL to be generated with `GeneratedField('title', base_field=SearchVectorField())` if there's a requirement for it.

Yes - sorry for not unclear, this is what I meant. We don't need to *require* the output field, but it will be necessary in some cases.

Bike shed: perhaps output_field is a better name than base_field, to match expressions, and since it's not the base/"source" field for the expression, but where the output/result of the expression is stored.

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/3887ba35-fa4d-4360-badb-58097c57c59bn%40googlegroups.com.

Dan Davis

unread,

Apr 13, 2022, 4:38:49 PM4/13/22

to Django developers (Contributions to Django itself)

+1

To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAMyDDM2CWm0HOkLC%3DBjqQ92928L%3DD%3DXOZzUtA1cHdgc235oYRQ%40mail.gmail.com.

Jeremy Nauta

unread,

Dec 24, 2022, 6:53:10 PM12/24/22

to Django developers (Contributions to Django itself)

I'd love to help implement this if we can find a rough syntax! I've made a proof of concept in Postgres, and there are two outstanding limitations to address:

- The generated field value is not set until the model is reloaded from the database

- The `GENERATED ALWAYS` expression requires an argument to be injected in the the sql expression, but this is not currently possible

from django.db.backends.utils import CursorWrapper

from django.db.models import Expression, Field

from django.db.models.sql import Query

class GeneratedField(Field):

"""

Wrapper field used to support generated columns in postgres.

"""

def __init__(self, expression: Expression, db_collation: str = None, *args, **kwargs):

"""

:param expression: DB expression used to calculate the auto-generated field value

"""

self.expression = expression

self.db_collation = db_collation

kwargs['editable'] = False # This field can never be edited

kwargs['blank'] = True # This field never requires a value to be set

kwargs['null'] = True # This field never requires a value to be set

super().__init__(*args, **kwargs)

def _compile_expression(self, cursor: CursorWrapper, sql: str, params: dict):

"""

Compiles SQL and its associated parameters into a full SQL query. Usually sql params are kept

separate until `cursor.execute()` is called, but this is not possible since this function

must return a single sql string.

"""

return cursor.mogrify(sql, params).decode()

def db_type(self, connection):

"""

Called when calculating SQL to create DB column (e.g. DB migrations)

https://docs.djangoproject.com/en/4.1/ref/models/fields/#django.db.models.Field.db_type

"""

db_type = self.expression.output_field.db_type(connection=connection)

# Convert any F() references to concrete field names

query = Query(model=self.model, alias_cols=False)

expression = self.expression.resolve_expression(query, allow_joins=False)

# Compile expression into SQL

expression_sql, params = expression.as_sql(

compiler=connection.ops.compiler('SQLCompiler')(

query, connection=connection, using=None

),

connection=connection,

)

with connection.cursor() as cursor:

expression_sql = self._compile_expression(

cursor=cursor, sql=expression_sql, params=params

)

return f'{db_type} GENERATED ALWAYS AS ({expression_sql}) STORED'

def rel_db_type(self, connection):

"""

Called when calculating SQL to reference DB column

https://docs.djangoproject.com/en/4.1/ref/models/fields/#django.db.models.Field.rel_db_type

"""

return self.expression.output_field.db_type(connection=connection)

def deconstruct(self):

"""

Add custom field properties to allow migrations to deconstruct field

https://docs.djangoproject.com/en/4.1/ref/models/fields/#django.db.models.Field.deconstruct

"""

name, path, args, kwargs = super().deconstruct()

kwargs['expression'] = self.expression

if self.db_collation is not None:

kwargs['db_collation'] = self.db_collation

return name, path, args, kwargs

class GeneratedFieldQuerysetMixin:

"""

Must be added to queryset classes

"""

def _insert(self, objs, fields, *args, **kwargs):

if getattr(self.model, '_generated_fields', None) and fields:

# Don't include generated fields when performing a `model.objects.bulk_create()`

fields = [f for f in fields if f not in self.model._generated_fields()]

return super()._insert(objs, fields, *args, **kwargs)

class GeneratedFieldModelMixin:

"""

Must be added to model class

"""

def _generated_fields(cls) -> list[Field]:

"""

:return all fields of the model that are generated

"""

return [

f

for f in cls._meta.fields

if isinstance(f, GeneratedField)

]

def _do_insert(self, manager, using, fields, *args, **kwargs):

generated_fields = self._generated_fields()

if generated_fields and fields:

# Don't include generated fields when performing a `save()` or `create()`

fields = [f for f in fields if f not in generated_fields]

return super()._do_insert(manager, using, fields, *args, **kwargs)

def _do_update(self, base_qs, using, pk_val, values, *args, **kwargs):

generated_fields = self._generated_fields()

if generated_fields and values:

# Don't include generated fields when performing an `update()`

values = [v for v in values if v[0] not in generated_fields]

return super()._do_update(base_qs, using, pk_val, values, *args, **kwargs)

schinckel

unread,

Dec 24, 2022, 10:56:38 PM12/24/22

to Django developers (Contributions to Django itself)

I believe there are a bunch of similarities between the requirements of generated fields and my project django-shared-property: https://django-shared-property.readthedocs.io/en/latest/

Jeremy Nauta

unread,

Jan 2, 2023, 1:41:09 AM1/2/23

to Django developers (Contributions to Django itself)

I have implemented a first draft for this feature! Feedback and ideas are greatly appreciated.

https://github.com/django/django/pull/16417

https://code.djangoproject.com/ticket/34238

Reply all

Reply to author

Forward