Generated Field

772 views
Skip to first unread message

Paolo Melchiorre

unread,
Apr 13, 2022, 6:25:48 AM4/13/22
to django-d...@googlegroups.com
Hi all,

I am at PyCon DE 2022 in Berlin with Markus and I shared with him this
idea that I have been thinking about for a few months, given his
interest I also share it with you.

I figured we could add in Django a "GeneratedField" which accepts a
"base_field" attribute in a similar way to "ArrayField" and then an
expression to generate a value for the field.

For example this model:

class Album(models.Model):
...
title = models.CharField(max_length=120)
search = GeneratedField(
SearchVectorField(),
F('title')
)

would generate an SQL code like this in PostgreSQL:

CREATE TABLE album (
...
title char(120),
search tsvector GENERATED ALWAYS AS title STORED
);

I found some documentation for different database backend about
generated column (or similar names)

# SQLite
https://www.sqlite.org/gencol.html
https://www.sqlite.org/releaselog/3_31_0.html

# PostgreSQL
https://www.postgresql.org/docs/12/ddl-generated-columns.html
https://www.postgresql.org/docs/12/release-12.html

# Oracle
https://oracle-base.com/articles/11g/virtual-columns-11gr1

# MySQL
https://dev.mysql.com/doc/refman/5.7/en/mysql-nutshell.html
https://dev.mysql.com/doc/refman/5.7/en/create-table-generated-columns.html

# MariaDB
https://mariadb.com/kb/en/changes-improvements-in-mariadb-102/
https://mariadb.com/kb/en/generated-columns/

I'd love to hear your thoughts on this idea and if you think it might
be something worth continuing to investigate.

Greetings from PyCon DE 2022 in Berlin,
Paolo

--
https://www.paulox.net

Adam Johnson

unread,
Apr 13, 2022, 7:01:46 AM4/13/22
to Django developers (Contributions to Django itself)
I'd be interested in seeing this. Generated columns are a useful SQL feature that are missing from Django.

Nice initial research on backend coverage - it looks like they're widely supported.

Some ideas...

Is it necessary to take a base field? Can we not determine the output field type for some kinds of expression? e.g. F("some_integer_field") + 1 can be assumed to have output field type IntegerField.

Would it be worth blocking assignment to the field? Allowing "model.generated_field = ..." would at least be misleading. (This could be done with a special descriptor.)

Good luck!

--
You received this message because you are subscribed to the Google Groups "Django developers  (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/CAKFO%2Bx5GHUEVdzi2awYtH5C17tTPTPh%2ByoDP%3DKC18pF8%2Bi_7PA%40mail.gmail.com.

Mariusz Felisiak

unread,
Apr 13, 2022, 7:05:49 AM4/13/22
to Django developers (Contributions to Django itself)
Related tickets:

- https://code.djangoproject.com/ticket/31300: Add function-based virtual fields on PostgreSQL and Oracle.
- https://code.djangoproject.com/ticket/31565: Support GENERATED ALWAYS columns for MySQL and PostgreSQ

Related DEP:

- https://github.com/django/deps/pull/39 - Refactor ORM with VirtualField and CompositeField

and an accepted ticket for non-database backed calculated field (see discussion https://groups.google.com/g/django-developers/c/ADSuUUuZp3Q/m/eZGYZv74AQAJ):

- https://code.djangoproject.com/ticket/28822 - Add DBCalculatedField to model to annotate models automatically


Kye Russell

unread,
Apr 13, 2022, 7:09:38 AM4/13/22
to Django developers (Contributions to Django itself)
I’d love to see this!

Kye
--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.

charettes

unread,
Apr 13, 2022, 9:48:57 AM4/13/22
to Django developers (Contributions to Django itself)
> Is it necessary to take a base field? Can we not determine the output field type for some kinds of expression? e.g. F("some_integer_field") + 1 can be assumed to have output field type IntegerField.

It should be possible by simply accessing the `output_field` property of the provided expression once it's resolved yes.

e.g.
# Will crash as the output field of some_integer_field has not been resolved.
(F("some_integer_field") + 1).output_field

# Will work as F resolved to a Col pointing at SomeModel.some_integer_field (assuming it exists)
(F("some_integer_field") + 1).resolve_expression(Query(SomeModel), allow_joins=False).output_field

But as Paolo demonstrate in his example I think we might want a distinct separation between the column type and the expression used to generate the value of this field if unless we want to force users to do gymnastics to rely on type coercion.

For example, in the reported case the user would have to do `GeneratedField(SearchVector('title'))`  which would actually result in `tsvector GENERATED ALWAYS AS to_tsvector(title)` STORED which is not exactly the same. It's not an issue here but I wouldn't be surprised that not being able to configured a generated field to generate a particular SQL output might be problematic in some cases.

What about we make the expected signature `GeneratedField(expression, base_field=None)` where a missing `base_field` defaults to `expression.output_field`? That would allow the exact expected SQL to be generated with `GeneratedField('title', base_field=SearchVectorField())` if there's a requirement for it.

Cheers,
Simon

Adam Johnson

unread,
Apr 13, 2022, 10:34:19 AM4/13/22
to Django developers (Contributions to Django itself)
What about we make the expected signature `GeneratedField(expression, base_field=None)` where a missing `base_field` defaults to `expression.output_field`? That would allow the exact expected SQL to be generated with `GeneratedField('title', base_field=SearchVectorField())` if there's a requirement for it.

Yes - sorry for not unclear, this is what I meant. We don't need to *require* the output field, but it will be necessary in some cases.

Bike shed: perhaps output_field is a better name than base_field, to match expressions, and since it's not the base/"source" field for the expression, but where the output/result of the expression is stored.

Dan Davis

unread,
Apr 13, 2022, 4:38:49 PM4/13/22
to Django developers (Contributions to Django itself)

Jeremy Nauta

unread,
Dec 24, 2022, 6:53:10 PM12/24/22
to Django developers (Contributions to Django itself)
I'd love to help implement this if we can find a rough syntax! I've made a proof of concept in Postgres, and there are two outstanding limitations to address:

- The generated field value is not set until the model is reloaded from the database
- The `GENERATED ALWAYS` expression requires an argument to be injected in the the sql expression, but this is not currently possible

from django.db.backends.utils import CursorWrapper

from django.db.models import Expression, Field

from django.db.models.sql import Query



class GeneratedField(Field):

    """

    Wrapper field used to support generated columns in postgres.

    """


    def __init__(self, expression: Expression, db_collation: str = None, *args, **kwargs):

        """

        :param expression: DB expression used to calculate the auto-generated field value

        """


        self.expression = expression

        self.db_collation = db_collation


        kwargs['editable'] = False  # This field can never be edited

        kwargs['blank'] = True  # This field never requires a value to be set

        kwargs['null'] = True  # This field never requires a value to be set


        super().__init__(*args, **kwargs)


    def _compile_expression(self, cursor: CursorWrapper, sql: str, params: dict):

        """

        Compiles SQL and its associated parameters into a full SQL query. Usually sql params are kept

        separate until `cursor.execute()` is called, but this is not possible since this function

        must return a single sql string.

        """


        return cursor.mogrify(sql, params).decode()


    def db_type(self, connection):

        """

        Called when calculating SQL to create DB column (e.g. DB migrations)

        https://docs.djangoproject.com/en/4.1/ref/models/fields/#django.db.models.Field.db_type

        """


        db_type = self.expression.output_field.db_type(connection=connection)


        # Convert any F() references to concrete field names

        query = Query(model=self.model, alias_cols=False)

        expression = self.expression.resolve_expression(query, allow_joins=False)


        # Compile expression into SQL

        expression_sql, params = expression.as_sql(

            compiler=connection.ops.compiler('SQLCompiler')(

                query, connection=connection, using=None

            ),

            connection=connection,

        )


        with connection.cursor() as cursor:

            expression_sql = self._compile_expression(

                cursor=cursor, sql=expression_sql, params=params

            )


        return f'{db_type} GENERATED ALWAYS AS ({expression_sql}) STORED'


    def rel_db_type(self, connection):

        """

        Called when calculating SQL to reference DB column

        https://docs.djangoproject.com/en/4.1/ref/models/fields/#django.db.models.Field.rel_db_type

        """

        return self.expression.output_field.db_type(connection=connection)


    def deconstruct(self):

        """

        Add custom field properties to allow migrations to deconstruct field

        https://docs.djangoproject.com/en/4.1/ref/models/fields/#django.db.models.Field.deconstruct

        """

        name, path, args, kwargs = super().deconstruct()

        kwargs['expression'] = self.expression

        if self.db_collation is not None:

            kwargs['db_collation'] = self.db_collation

        return name, path, args, kwargs



class GeneratedFieldQuerysetMixin:

    """

    Must be added to queryset classes

    """


    def _insert(self, objs, fields, *args, **kwargs):

        if getattr(self.model, '_generated_fields', None) and fields:

            # Don't include generated fields when performing a `model.objects.bulk_create()`

            fields = [f for f in fields if f not in self.model._generated_fields()]


        return super()._insert(objs, fields, *args, **kwargs)



class GeneratedFieldModelMixin:

    """

    Must be added to model class

    """


    def _generated_fields(cls) -> list[Field]:


        """

        :return all fields of the model that are generated

        """


        return [

            f

            for f in cls._meta.fields

            if isinstance(f, GeneratedField)

        ]


    def _do_insert(self, manager, using, fields, *args, **kwargs):

        generated_fields = self._generated_fields()

        if generated_fields and fields:

            # Don't include generated fields when performing a `save()` or `create()`

            fields = [f for f in fields if f not in generated_fields]


        return super()._do_insert(manager, using, fields, *args, **kwargs)


    def _do_update(self, base_qs, using, pk_val, values, *args, **kwargs):

        generated_fields = self._generated_fields()

        if generated_fields and values:

            # Don't include generated fields when performing an `update()`

            values = [v for v in values if v[0] not in generated_fields]


        return super()._do_update(base_qs, using, pk_val, values, *args, **kwargs)



schinckel

unread,
Dec 24, 2022, 10:56:38 PM12/24/22
to Django developers (Contributions to Django itself)
I believe there are a bunch of similarities between the requirements of generated fields and my project django-shared-property: https://django-shared-property.readthedocs.io/en/latest/

Jeremy Nauta

unread,
Jan 2, 2023, 1:41:09 AM1/2/23
to Django developers (Contributions to Django itself)
I have implemented a first draft for this feature! Feedback and ideas are greatly appreciated.

https://github.com/django/django/pull/16417
Reply all
Reply to author
Forward
0 new messages