Why not Single Table Inheritance?

1,550 views
Skip to first unread message

Thomas Güttler

unread,
May 12, 2014, 5:27:01 AM5/12/14
to django-d...@googlegroups.com
Single Table Inheritance is used by ruby-on-rails and SQLAlchemy.

Are there reasons why it is used in django?

I would love to see a polymorphic inheritance solution in django.

I know that there are third party apps which provide this, but
something like this should be in the core.

There was some discussion about this [1]. One reason against it was,
that supporting not-null is not available. But a db constraint which
checks the type and data column could solve this.

What do you think?

Thomas Güttler

[1] https://code.djangoproject.com/wiki/ModelInheritance#a1.ModelingparentrelationsinSQL

Christian Schmitt

unread,
May 15, 2014, 11:11:42 AM5/15/14
to django-d...@googlegroups.com

Tom Evans

unread,
May 15, 2014, 11:23:23 AM5/15/14
to django-d...@googlegroups.com
On Thu, May 15, 2014 at 4:11 PM, Christian Schmitt
<c.sc...@briefdomain.de> wrote:
> This is already merged.
>
> https://docs.djangoproject.com/en/1.6/topics/db/models/#multi-table-inheritance
>

MTI is not STI, nor is it polymorphic.

Cheers

Tom

Shai Berger

unread,
May 16, 2014, 5:46:01 AM5/16/14
to django-d...@googlegroups.com
On Monday 12 May 2014 12:27:01 Thomas Güttler wrote:
> Single Table Inheritance is used by ruby-on-rails and SQLAlchemy.
>
> Are there reasons why it is used in django?
>

Essentially, STI is a form of database denormalization. I think Django should
not encourage this.

> I would love to see a polymorphic inheritance solution in django.
>

Just to spell things out: You want to have models A, B(A), C(A) and D(B), so
that list(A.objects.all()) returns a list of objects of different types.

This sounds like a good idea in general, but there are devils in the
implementation details. In particular, I'd like to separate this issue from
the issue of STI -- polymorphism is an issue of processing the retrieved
records, and need not be tightly coupled to the database layout. The
polymorphism solution should work whether the records are fetched by the
equivalent of A.objects.all() or
A.objects.select_related(child_classes).all().

I think you can sort-of achieve STI by doing it the other way around: Define
all your hierarchy as abstract models, with one concrete model inheriting all
of them (I suspect any STI implementation in Django would have to do something
very similar "behind the scenes"). Pros and cons (as well as testing if this
actually works) are left as an exercise to the reader.

> I know that there are third party apps which provide this, but
> something like this should be in the core.
>

If you ignore STI, I think it is quite straightforward to solve this with a
parent model class which adds a type field, and manager methods to add the
select_related calls and "interpret" the type field properly; so I don't see an
immediate need for inclusion in core.

> There was some discussion about this [1]. One reason against it was,
> that supporting not-null is not available. But a db constraint which
> checks the type and data column could solve this.
>

Django does not currently support arbitrary constraints well, so getting this
right (and cross-backend) might be somewhat of a challenge; which, in turn,
could justify including it in core. IMO, though, it would be much more useful
to support general constraints in core, and allow this to be done by users.

Either way, concrete suggestions welcome,

Shai.

> [1]
> https://code.djangoproject.com/wiki/ModelInheritance#a1.Modelingparentrela
> tionsinSQL

Carl Meyer

unread,
May 16, 2014, 10:06:18 AM5/16/14
to django-d...@googlegroups.com
On 05/16/2014 04:46 AM, Shai Berger wrote:
> On Monday 12 May 2014 12:27:01 Thomas Güttler wrote:
>> Single Table Inheritance is used by ruby-on-rails and SQLAlchemy.
>>
>> Are there reasons why it is used in django?
>>
>
> Essentially, STI is a form of database denormalization. I think Django should
> not encourage this.

I agree.

>> I would love to see a polymorphic inheritance solution in django.
>>
>
> Just to spell things out: You want to have models A, B(A), C(A) and D(B), so
> that list(A.objects.all()) returns a list of objects of different types.
>
> This sounds like a good idea in general, but there are devils in the
> implementation details. In particular, I'd like to separate this issue from
> the issue of STI -- polymorphism is an issue of processing the retrieved
> records, and need not be tightly coupled to the database layout. The
> polymorphism solution should work whether the records are fetched by the
> equivalent of A.objects.all() or
> A.objects.select_related(child_classes).all().
>
> I think you can sort-of achieve STI by doing it the other way around: Define
> all your hierarchy as abstract models, with one concrete model inheriting all
> of them (I suspect any STI implementation in Django would have to do something
> very similar "behind the scenes"). Pros and cons (as well as testing if this
> actually works) are left as an exercise to the reader.
>
>> I know that there are third party apps which provide this, but
>> something like this should be in the core.
>>
>
> If you ignore STI, I think it is quite straightforward to solve this with a
> parent model class which adds a type field, and manager methods to add the
> select_related calls and "interpret" the type field properly; so I don't see an
> immediate need for inclusion in core.

You don't even need the "type" field, you can just select_related all
the subclasses and then test when iterating over the queryset which one
exists for each record. This is what InheritanceManager in
django-model-utils does.

I don't see a need to have this in core either. It seems to me almost a
perfect example of the occasionally-useful-but-not-essential
functionality that is well served by third-party packages. (IMO concrete
model inheritance is more often than not a questionable model-design in
the first place, so Django shouldn't be adding more support around it to
encourage its use.)

Carl

signature.asc

Anssi Kääriäinen

unread,
May 22, 2014, 4:05:24 AM5/22/14
to django-d...@googlegroups.com
I think it is time to add a new model classmethod from_db() to Django.

The idea is to allow customization of object initialization when loading from database. Instead of calling directly model.__init__ from the queryset iterators, Django calls model_cls.from_db(). The default implementation calls just model.__init__, but by overriding from_db() it will be possible to do interesting things. For example:
  1) It allows for faster loading of models. See https://code.djangoproject.com/ticket/19501 for some benchmarks.
  2) Possibility to return polymorphic classes from querysets. For example for STI:
    def from_db(cls, using, fields, values):
        # Assume database has type column, and the class contains a _type_map pointing to the wanted class for each different type.
        data = dict(zip(fields, values))
        model_cls = cls._type_map[data['type'])
        new = model_cls(**data)
        new._state = ModelState(using, adding=False)
        return new

  3) Allow differentiating database loading from initialization in user code. For example (pseudo-codeish) automatic update_fields on save():

    class AutoTracking(models.Model):
        fields...

        @classmethod
        def from_db(cls, using, fields, values):
            new = super().from_db(using, fields, values)
            # This step is surprisingly hard to do correctly at the moment!
            # Can't use overridden __init__ or signals as they don't know if the model is loaded from the
            # database or not.
            new._old_data = dict(zip(fields, values))
            return new

        def save(...):
            if update_fields is None:
                update_fields = set()
                for attr_name, v in self._old_data.items():
                    if getattr(self, attr_name) != v:
                        update_fields.add(attr_name)
            super().save(...)


So, there are several advantages to adding from_db(). The only problem I can see with this approach is that the default model initialization code path will be around 5%-10% slower due to the extra from_db() call for each row. To me that isn't big enough slowdown to worry about. In addition, as shown in #19501, usage of from_db() allows for significantly faster model loading for special cases.

The patches in #19501 are IMO too complex. The patches try to automatically detect when we can skip calling model.__init__. Instead we should just add the from_db() hook without any fast-path automation.

Any thoughts on this idea?

 - Anssi

Shai Berger

unread,
May 22, 2014, 4:13:52 AM5/22/14
to django-d...@googlegroups.com
On Thursday 22 May 2014 11:05:24 Anssi Kääriäinen wrote:
> I think it is time to add a new model classmethod from_db() to Django.
>
> The idea is to allow customization of object initialization when loading
> from database. Instead of calling directly model.__init__ from the queryset
> iterators, Django calls model_cls.from_db(). The default implementation
> calls just model.__init__, but by overriding from_db() it will be possible
> to do interesting things.

[...]

>
> Any thoughts on this idea?
>

Instinctively -- isn't it possible to achieve the same things today by
overriding __new__ ?

Anssi Kääriäinen

unread,
May 22, 2014, 5:02:48 AM5/22/14
to django-d...@googlegroups.com
On 05/22/2014 11:13 AM, Shai Berger wrote:
>> Any thoughts on this idea?
>>
> Instinctively -- isn't it possible to achieve the same things today by
> overriding __new__ ?
My understanding is that achieving all the same things isn't possible.
The problem is that inside __new__ it is impossible to know if the call
to __new__ was made from database loading or from user code. It also
seems that it is impossible to alter the args and kwargs passed to
__init__(). In addition if one wants for some reason (speed, or not
invoking __setattr__) to assign values directly to the __dict__ of the
new class, then __new__() doesn't seem to offer any way to do that.

It is true that STI is likely possible with usage of __new__() as long
as you don't want to change the arguments to the __init__ call of the
created object.

As a side note I think direct assignment to __dict__ on model loading
would be a better design than the current __init__ call. For example
Django needs to do some pretty crazy stuff in __init__() to support
deferred field loading. With direct __dict__ assignment deferred model
creation is trivial. Also, loading from the database is a form of
deserialization, and when deserializing you want to load the model as it
were saved. The way to do this is to avoid __init__, __setattr__ and
descriptor __set__ calls. To avoid those the values should be assigned
directly to the __dict__ of the object. This is also used by Python's
deserialization. Of course, thinking about this is mostly academic.
Changing the way model loading from database is done has severe
backwards compatibility issues. Even django-core relies on descriptor
calls in some case. As an example to_python() method of custom fields is
called through a descriptor.

- Anssi

Craig de Stigter

unread,
May 25, 2014, 6:50:02 PM5/25/14
to django-d...@googlegroups.com
> If you ignore STI, I think it is quite straightforward to solve this with a 
parent model class which adds a type field, and manager methods to add the 
select_related calls and "interpret" the type field properly; so I don't see an 
immediate need for inclusion in core. 

Well, you don't need select_related calls at all, if you're actually storing things in one table like "single-table inheritance" implies.

I too was surprised to find Django doesn't do this, and was unable to find a good third-party app that does it.


It works well and we have been using it in production for a couple years.

It does rely on a few hacks that Django doesn't officially support, like proxy models with their own fields, which has unfortunately been broken in django 1.7. I'd love to see better support for this in Django core.


Regards
Craig de Stigter

Thomas Güttler

unread,
Jun 6, 2014, 3:42:32 AM6/6/14
to django-d...@googlegroups.com


Am 26.05.2014 00:50, schrieb Craig de Stigter:
>> If you ignore STI, I think it is quite straightforward to solve this with a
> parent model class which adds a type field, and manager methods to add the
> select_related calls and "interpret" the type field properly; so I don't see an
> immediate need for inclusion in core.
>
> Well, you don't need select_related calls at all, if you're actually storing things in one table like "single-table
> inheritance" implies.
>
> I too was surprised to find Django doesn't do this, and was unable to find a good third-party app that does it.
>
> So I wrote my own: https://github.com/craigds/django-typed-models/
>
> It works well and we have been using it in production for a couple years.

...

Thank you very much for your answer.

I guess a lot of developers don't want to hear the next lines:

I think it is a "not invented here" syndrome: Ruby on Rails did it before. That's
a reason to do it different.

But I can live with an external library like django-typed-models. It does not need to be in django core.

There is a second fear: Some years ago, when I was new to database layout design I tried
to avoid to create new tables. And I guess a lot of other did it like this, too. If you
could use some tricky algorithm to avoid a database table, I choose to code, not to
use a new table.

Time has passed and I learned: Structure is more important, code can be replaced.

If I can write less code with a good database layout now, I prefer less code. Django
ORM and south handles new tables perfect.

New tables is an expansion in one dimension, the other dimension is: new columns.

That's what STI does: it creates a lot of new columns. When I first read how STI works,
I had the same old fear: New columns .... that is outside my current comfort zone.

I want to use STI the next time I need model inheritance.

Regards,
Thomas

--
Thomas Güttler
http://thomas-guettler.de/


Russell Keith-Magee

unread,
Jun 6, 2014, 11:02:27 AM6/6/14
to Django Developers
On Fri, Jun 6, 2014 at 3:42 PM, Thomas Güttler <h...@tbz-pariv.de> wrote:


Am 26.05.2014 00:50, schrieb Craig de Stigter:

If you ignore STI, I think it is quite straightforward to solve this with a
parent model class which adds a type field, and manager methods to add the
select_related calls and "interpret" the type field properly; so I don't see an
immediate need for inclusion in core.

Well, you don't need select_related calls at all, if you're actually storing things in one table like "single-table
inheritance" implies.

I too was surprised to find Django doesn't do this, and was unable to find a good third-party app that does it.

So I wrote my own: https://github.com/craigds/django-typed-models/

It works well and we have been using it in production for a couple years.

...

Thank you very much for your answer.

I guess a lot of developers don't want to hear the next lines:

I think it is a "not invented here" syndrome: Ruby on Rails did it before. That's
a reason to do it different.

Poppycock. 

Allow me to assure you that "What Rails Did" didn't even rate a sideways mention during the design discussions for model inheritance. You can verify this yourself with a bit of a search of the django-dev archives.

What *did* factor into the decision - the fact that, in the general case, STI means you have to make almost all the fields in your model NULLable. You lose any semblance of having an actual database schema, and end up writing a whole lot of code to re-implement the features of a database schema instead of using the well tested, robust implementation that the database provides.

When Django's model inheritance features were added - over six years ago - we made the decision to implement a solution that made the best usage of database features, not to try and turn a database into an overweight key-value store. 

Yours,
Russ Magee %-)

Shai Berger

unread,
Jun 6, 2014, 1:33:48 PM6/6/14
to django-d...@googlegroups.com
Let me expand on Russell's expletives:

On Friday 06 June 2014 09:42:15 Thomas Güttler wrote:
>
> I guess a lot of developers don't want to hear the next lines:
>
> I think it is a "not invented here" syndrome: Ruby on Rails did it before.
> That's a reason to do it different.
>

This does deserve calling out, because...

>
> If I can write less code with a good database layout now, I prefer less
> code. Django ORM and south handles new tables perfect.
>
> New tables is an expansion in one dimension, the other dimension is: new
> columns.
>
> That's what STI does

No. If you're considering STI, then the "dimensions" are not orthogonal: STI
is an attempt to use added columns _instead_ of added tables. It is, as has
already been noted on this thread (by myself as well as others), a form of
denormalization -- which means you may get some performance benefits out of it,
but you are certain to lose important correctness guarantees.

So, asking for STI in the name of better database design is a little
inconsistent; doing this while casting doubt on the honesty of other
developers -- well, Russell gave a very succinct description of that.

Shai.

Aymeric Augustin

unread,
Jun 6, 2014, 2:05:29 PM6/6/14
to django-d...@googlegroups.com
On 6 juin 2014, at 09:42, Thomas Güttler <h...@tbz-pariv.de> wrote:

> I think it is a "not invented here" syndrome: Ruby on Rails did it before. That's
> a reason to do it different.

The reason is more simple.

Rails was designed around MySQL, a database with a rather casual relationship
to data integrity. It will happily truncate data or save invalid values in the name of
performance. In the same spirit STI trades data integrity for speed. It avoids joins,
which can be very slow on MySQL, but also prevents the database from enforcing
constraints.

Django was designed around PostgreSQL, a database that cares about its data.

That explains many differences in the design of the Rails and Django ORMs.

--
Aymeric.




Craig de Stigter

unread,
Jun 10, 2014, 8:16:06 PM6/10/14
to django-d...@googlegroups.com
Late reply I know but I see a lot of FUD in this thread and I want to try and clear it up.


> in the general case, STI means you have to make almost all the fields in your model NULLable. You lose any semblance of having an actual database schema, and end up writing a whole lot of code to re-implement the features of a database schema instead of using the well tested, robust implementation that the database provides.

This is just not true. Well implemented STI does *not* turn your database table into a key-value store. You can use the fields of the table just like you would normally use them. The *only* abnormal consideration here is that, if you *need* for some reason to have a field defined on a child class instead of all fields defined on the base model, then such fields must be nullable.

In my experience users of django-typed-models will have 90-100% of their fields defined on the base class for each abstract type, meaning that only 0-10% of the fields will have to be nullable. There is no deviation from normal use of the ORM, and no hoops to jump through to ensure db consistency.

In my own case, I am using django-typed-models with all the fields on the base class. In fact, django-typed-models didn't initially allow for fields on child models at all, and I still don't use that behaviour. I did merge a PR to add the possibility for fields defined on child models, and I can see how that would be useful. If you use that functionality you just have to be okay with those fields being nullable.


> Django was designed around PostgreSQL, a database that cares about its data.

And I wouldn't use MySQL if you paid me. But this is a straw man; if you're using STI you *might* be using fields that are nullable but could otherwise be non-nullable, but this shouldn't imply that you don't care about your data.


I reserve judgment on whether STI should be included in core. It works fine in a third-party app. Some better support for it in core would be helpful, since currently I'm relying on some unsupported stuff that the core devs could decide to break at any time.

But it's a better solution to certain problems than MTI is, and it doesn't deserve the bashing that some people seem to give it. Use the right tool for the job. If that's STI then use STI :)


Craig de Stigter

Florian Apolloner

unread,
Jun 11, 2014, 3:09:40 AM6/11/14
to django-d...@googlegroups.com


On Wednesday, June 11, 2014 2:16:06 AM UTC+2, Craig de Stigter wrote:
I reserve judgment on whether STI should be included in core. It works fine in a third-party app. Some better support for it in core would be helpful, since currently I'm relying on some unsupported stuff that the core devs could decide to break at any time.

We'll happily add hooks to make support easier, patches welcome.

Cheers,
Florian
Reply all
Reply to author
Forward
0 new messages