enum fields in django

1,228 views
Skip to first unread message

Thomas Stephenson

unread,
Feb 25, 2015, 9:53:34 PM2/25/15
to django-d...@googlegroups.com
As discussed in Issue 24342, I've got an implementation of EnumField that we've found useful when developing our django REST API that I'd like to add to django core. It was suggested I bring this issue up in the developers mailing list for discussion. Unfortunately, updates to the issue were delivered to my spam folder, so there has been some delay in actually raising the issue.

Basically, the implementation consists of a new field type, and a new migration operation (to register the values associated with an enum type). The field takes an `enum_type` argument and registers a type with values taken from the enum value names. The actual values associated with the names are ignored, so support for IntEnum and other enum types comes as standard.

In a real implementation, the enum type would have to be checked when running migrations to ensure that values haven't been added/removed from the python class. It's not something that we've needed to deal with in our in-house implementation.

Any database which does not support an enum field natively would default to a CharField implementation. 

Useful addition? Or does it overlap too much with the choices API? 

Thomas

Marc Tamlyn

unread,
Feb 26, 2015, 5:26:59 AM2/26/15
to django-d...@googlegroups.com
I kinda like the idea of enum fields, but I feel they are likely better in theory than in practice. In theory I said I would introduce one as part of contrib.postgres, but I've been putting it off as I'm unconvinced whether it is necessarily ideal anyway when compared to choices or reference tables.

Database support: PG, MySQL and Oracle all have enum data types. However postgres does treat them somewhat differently, requiring you to explicitly create a new type allowing the same enum type to be used across multiple tables. Which paradigm should we follow in migrations?

Python support: This isn't a major issue as some other "core" features like timezones require third party packages (pytz), but we are still requiring an external package for python < 3.4. SAY NO TO VENDORING! It's also worth noting it's a relatively new python level concept and there may be changes to it.

Ordering: I'm a little uncomfortable with the idea that you can do .order_by('my_enum_field') but you can't do sorted(qs, key=lambda o: o.my_enum_field), unless you use IntEnum, which the docs say you shouldn't.

Migration issues: Postgres does not support removing values from enum fields, although has good support for adding values. I'm not sure Oracle supports changing enums at all, and on MySQL changing the enum is an ill defined process with some unexpected consequences (inevitably...), which is also extremely slow. Obviously we have no such guarantee about anything when using choices at the moment, however with an automatic migration system associated to enums developers are likely to assume more intelligence than is present. We go to great pains in db.migrations at the moment to ensure this.

Philosophy: Like with choices, there are arguments against using enum at all in favour of using reference tables. The most obvious of these being the ability to add extra information to a reference table. On the other side however, by allowing the right hand side of the enum in python to be arbitrary objects (classes or something) then you can nicely encapsulate the a bunch of extra information. For example consider a competition model where there 3 or 4 different ways of working out the winner, then having an enum with classes where you can do competition.winner_type.get_winner(). This functional call may need additional context.

Overall, I'm hovering around a -0 on a general implementation of EnumField or similar. However, I'm +0 on either a) a good third party implementation or b) a contrib.postgres specific implementation with well documented caveats. An advantage of living in contrib.postgres is that you don't need to concern yourself with automatic migrations, and contrib can have a more lenient policy on external packages. You also get a guaranteed review (me!).

Marc

--
You received this message because you are subscribed to the Google Groups "Django developers (Contributions to Django itself)" group.
To unsubscribe from this group and stop receiving emails from it, send an email to django-develop...@googlegroups.com.
To post to this group, send email to django-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/django-developers.
To view this discussion on the web visit https://groups.google.com/d/msgid/django-developers/59072aa1-7e7a-4fcf-8dd1-effde66675c6%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Thomas Stephenson

unread,
Feb 26, 2015, 7:38:17 AM2/26/15
to django-d...@googlegroups.com
Some general responses to your points:

Database support
Yep, not all databases support enums and not all support them the same
way. That's par for the course when you're trying to abstract across
multiple implementations of the same pseudo-standard.

Vendor lock-in
Enums are hardly vendor lock-in, they're a language feature that
happens to require a library in order to support backwards
compatibility. I accept the fact that there may be changes to it in
the near future, but even if there are, there can't be any breaking
changes to the API.

Ordering
Yes you can't do
sorted(qs, key=lambda o: o.my_enum_field)
But you can do
sorted(qs, key=lambda o: o.my_enum_field.name)
which will give you the same result as the database ordering, or
sorted(qs, key=lambda o: o.my_enum_field.value),
if you want to sort according to the values.

Migrations
Yep, a bit of a pain in the neck to support changing enum definitions,
but in general any _sane_ developer would only ever add values to an
enum after it was initially created, and you could make
`AddValueToEnum` a non-reversible operation.

I admit it does create some expectations about other places that
migrations would require more "intelligence", but extensions to the
original implementation are an inevitability and I'm sure you're quite
aware of that.

On that note, I'm kind of worried about your assertion that types in
contrib libraries aren't required to support automatic migration.
Since migrations were added, we've added support for migrations in
most of our custom field types.

Philosophy
I try to keep myself out of philosophical arguments. Reference tables
have a purpose (when you're dealing with a set of values that aren't
fully known when you're defining the dataset (eg. custom application
error code tables)), but when the dataset _is_ known in advance an
enum will save you a couple of joins per table lookup.


It depends what you guys want to do. I'm happy to put in the work to
make the implementation generic, but I'm not going to push for you
guys to implement something you don't want or don't think provides
real value for users. Contributing to contrib.postgres is a possible
option, but it's not really a postgres specific feature -- almost all
of the major database vendors support it (sqlite being as always the
obvious exception).

Thomas
> https://groups.google.com/d/msgid/django-developers/CAMwjO1GAG88_%3DLFRibpO6uabUmCb7eprByWRZyjECdV2jHbcxg%40mail.gmail.com.

charettes

unread,
Feb 26, 2015, 2:38:02 PM2/26/15
to django-d...@googlegroups.com

I try to keep myself out of philosophical arguments. Reference tables
have a purpose (when you're dealing with a set of values that aren't
fully known when you're defining the dataset (eg. custom application
error code tables)), but when the dataset _is_ known in advance an
enum will save you a couple of joins per table lookup.

You can avoid those couple of joins by making your referenced table primary key what your enum value would have been and simple not joining it.

In this case you would simply use your referenced table for data integrity through foreign constraint enforcement.

Simon

Marc Tamlyn

unread,
Feb 26, 2015, 2:52:10 PM2/26/15
to django-d...@googlegroups.com
> On that note, I'm kind of worried about your assertion that types in contrib libraries aren't required to support automatic migration. Since migrations were added, we've added support for migrations in most of our custom field types.

To clarify, custom fields provided in contrib.postgres are autodetected and deconstructed appropriately. However notably the hstore field requires an extension to be installed in the database. It turned out to be extremely invasive to autodetect when this extension was needed and when it wasn't needed and it would have required probably hundreds of lines of extra complexity in the autodetector, as well as defining a formal api for "database extensions", whatever that means. Enums in postgres (custom types) would fall into the same category. As a result, the decision was taken to provide Operation subclasses appropriate for anything needed, but require users to manually add that operation to a migration where necessary and ensure the dependency tree works. Similarly and extension is provided if you want to use the unaccent lookup. See https://docs.djangoproject.com/en/dev/ref/contrib/postgres/operations/.

If contrib.postgres was to gain an EnumField implementation, I would expect to see the addition of a CreateEnum operation and an AlterEnum operation, but I wouldn't expect those to necessarily be autodetected. If they were, that's great, but from my research into trying to do something similar, doing it correctly while maintaining the API boundaries between AutoDetector and SchemaEditor is likely to be an extremely complex patch, in my mind for very little gain. Especially when talking about postgres, the error messages from such a failed migration should be fairly easy to understand: "type hstore does not exist", "type my_enum does not exist", so I don't think it's a big deal to expect users to manage it by hand.

Josh Smeaton

unread,
Feb 26, 2015, 5:26:14 PM2/26/15
to django-d...@googlegroups.com
Contributing to contrib.postgres is a possible
option, but it's not really a postgres specific feature -- almost all
of the major database vendors support it (sqlite being as always the
obvious exception).

This option worries me. I definitely do not like the idea of building a feature into contrib.postgres that could be built for the rest of the database backends too. It seems like a cop-out for doing less work, and really promotes one built in backend above others. contrib.postgres should be a place for postgres specific features, not for cutting out other backends. I'm glad you pointed that out.

Cheers 

Thomas Stephenson

unread,
Feb 26, 2015, 10:02:14 PM2/26/15
to django-d...@googlegroups.com
> You can avoid those couple of joins by making your referenced table primary key what
> your enum value would have been and simple not joining it.

> In this case you would simply use your referenced table for data integrity through
> foreign constraint enforcement.

True, but when defining the tables using an ORM it can be hard to
document avoiding those joins, whether they're native (using filter)
or (the ones you _really_ want to avoid if possible) created via a
queryset on attribute access.

It's all perfectly reasonable to do, but a bit excessive when all you
want to do is add

class LikertScale(IntEnum):
STRONGLY_AGREE = 5
AGREE = 4
NEUTRAL = 3
DISAGREE = 2
STRONGLY_DISAGREE = 1

to your data model.


> As a result, the decision was taken to provide Operation subclasses appropriate for
> anything needed, but require users to manually add that operation to a migration where
> necessary and ensure the dependency tree works. Similarly and extension is provided
> if you want to use the unaccent lookup.

Thanks for the clarification. I hadn't really thought about the
mechanics of auto-detection except as a "nice to have". Our custom
operations are all run manually, but my thinking was that it would
probably have to be auto detected if I was going to push the
implementation upstream.
> --
> You received this message because you are subscribed to the Google Groups
> "Django developers (Contributions to Django itself)" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to django-develop...@googlegroups.com.
> To post to this group, send email to django-d...@googlegroups.com.
> Visit this group at http://groups.google.com/group/django-developers.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/django-developers/112e6136-25ce-475c-be2a-f218411ee745%40googlegroups.com.

Gavin Wahl

unread,
Feb 27, 2015, 12:26:47 PM2/27/15
to django-d...@googlegroups.com
I would definitely use an enum field that used pep 435 enums instead of choices. The implementation as a real enum in postgres and choices on other databases is perfect. I've used a third-party package, https://github.com/hzdg/django-enumfields, to accomplish this in the past. The ability to use real python enums is a great improvement over defining constants for each choice yourself.
Reply all
Reply to author
Forward
0 new messages