Model inheritance redux

65 views
Skip to first unread message

Malcolm Tredinnick

unread,
Aug 3, 2006, 1:42:26 AM8/3/06
to django-d...@googlegroups.com
Clearing out my list of TODOs after OSCON...

Jacob, Adrian and I had a discussion about model inheritance last week,
trying to nail down some of the features. This resulted in a few changes
to the earlier proposal in
http://groups.google.com/group/django-developers/browse_frm/thread/1bc01ee32cfb7374/84b64c625cd1b402#84b64c625cd1b402

Pure duck typing will no longer be available (for those for whom this
sounds like Chinese, it's the case where you select from the Place table
and magically get back an ItalianRestaurant object because that's the
ultimate type of the row).

The idea of adding an extra column to each table in order to support
pure duck typing had a spectacular but ultimately brief life. Adding
extra columns is out (to be fair, I hadn't given enough thought to being
nice to legacy databases). Keeping this information in a separate table
was also viewed as difficult to maintain.

Iterating through all possible descendents of a table to find the right
type is not fun, either: Imagine a hierarchy that is three layers high
with two choices at each layer. Querying the root table would result in
a join over seven tables before you even get out of bed. Combine that
with joins for related fields and it starts to get really silly. Sure,
databases can handle it, but it's hard to get bug-free in our query
generation and the alternative does not suck enough to warrant the extra
work here.

So, new, improved inheritance API (actually, it's the "old" API that we
used to have more or less):

(1) Querying a class gives you back instances of that class. If the
class has parents, the parent fields will be available as attributes,
just like in normal inheritance.

NOTE: To make things easier, we are going to put in a constraint
that you cannot override a field from a parent (i.e. if Place
has a "name" field, you can't have a "name" field in a child).
You can override normal Python attributes, though. It's just the
Field class derivatives that become a little hard to manage
otherwise.

(2) If a class has derived classes, you can access those via attributes
on the class. For example, if Restaurant is a Place sub-class and p =
Place.objects.all()[0], then p.restaurant will be a Restaurant instance.
The trick here is that evaluating p.restaurant is lazy (we don't do it
until you need it) and may raise an AttributeError if this p instance is
actually the parent of a BaseballField and not a Restaurant.

NOTES: (1) There will probably be a way to control the attribute
name so that if Place has a restaurant field, you can still have
a Restaurant subclass. That's an enhancement that is independent
of the rest.

(2) Not sure if we should go for
Place.restaurant.italianrestaurant or just make
Place.italianrestaurant work directly (I prefer the latter, but
neither is a showstopper).

(3) Something we didn't discuss but will make sense is to have a
method to test for is_restaurant() and is_baseballfield() to
avoid having to wrap things in try...except blocks if you don't
want to (or performance is critical).

(4) Implementing a "what type am I" function is not going to
exist initially. It's semi-costly, in that you have to query
every child table and not necessarily that useful. If you are
querying the Place model it's because you generally want to work
with the Place-common fields, otherwise you would be querying
the Restaurant or BaseballField models. If you really want to do
this, it's not impossible to fake, so we're not blocking anybody
from implementing their own.

Punting on this now does not rule it out forever, since, again,
the implementation does not block adding it in the future. I
want to get something that works for the 90% case in first and
then we can extend the last 10% as experience suggests.

(3) Abstract base classes should still exist as soon as somebody comes
up with a good API (Bill and Jacob had ideas there and we need to think
that through). It's a useful optimisation case (flattening the parent
columns into the child's table) when you don't want to access the parent
directly.

For the common case where you sub-class an object as a way of sharing
common features, none of this is really too important. It will work
transparently, which is the important thing. I'm just laying out how the
edges will look on the map.

Now back to the implementation salt mine...

Cheers,
Malcolm

Russell Keith-Magee

unread,
Aug 3, 2006, 9:01:52 AM8/3/06
to django-d...@googlegroups.com

On 8/3/06, Malcolm Tredinnick <mal...@pointy-stick.com > wrote:

Sorry to see duck typing go, but I can see that the problems you mentioned could open up a real nest of vipers (especially debugging the spider-web of queries). Oh well... it was nice while it lasted :-)

So, new, improved inheritance API (actually, it's the "old" API that we
used to have more or less):

(1) Querying a class gives you back instances of that class. If the
class has parents, the parent fields will be available as attributes,
just like in normal inheritance.

+1

(2) If a class has derived classes, you can access those via attributes
on the class. 

Obviously, I'd still prefer an _actual_ duck, but this seems a reasonable alternative in an duckless world.

I assume p.restaurant will return a fully functioning Restaurant instance? i.e., p.restaurant is really just a descriptor returning Restaurant.objects.get(pk=p.id) ? If so, +1

        (2) Not sure if we should go for
        Place.restaurant.italianrestaurant or just make
        Place.italianrestaurant work directly (I prefer the latter, but
        neither is a showstopper).

+1 to the latter syntax. When operating on an instance, the exact heirarchy is irrelevant. The only problem I can see here is in dealing with repeated names in the heirarchy, to which the only solution I can see is a model validation error.

        (3) Something we didn't discuss but will make sense is to have a
        method to test for is_restaurant() and is_baseballfield() to
        avoid having to wrap things in try...except blocks if you don't
        want to (or performance is critical).

hrm... -0. I would have thought that a single isinstance() would be a better idea than a range of is_XXX() functions. e.g.,

p.isinstance(Restaurant)

(possibly using strings rather than/in addition to model modules if preferred or the implementation requries). This keeps the API small and simple, requires less 'magically appearing' functions which could potentially clash with user-defined functions, and keeps the API (reasonably) aligned with Python inheritance API.

This isn't necessarily an either/or suggestion - both _could_ be supported (but I would be -1 on having both).

This approach would also allow for checking multiple baseclasses simultaneously:

p.isinstance(Restaurant, BaseballField)

In itself, this wouldn't be extraordinarily useful; if Restaurant and BaseballField both had a 'beer_price' attribute, you would still need to use p.restaurant.beer_price and p.baseballfield.beer_price... but more on this later

        (4) Implementing a "what type am I" function is not going to
        exist initially.

Not sure about this. I agree that a generic form of this type of function is inherently expensive. However, the absence of ducks kind of makes this kind of function a requirement, IMHO.

To go back to the beer_price example, getting the value of the beer_price attribute gets a bit messy; it requires nested try/catches, and some extra effort to avoid code duplication:

The multi-input isinstance might be one way to get around this. If p.isinstance returns the instance of the first class in the provided list that matches, or None if no class matches, the following would be possible:

obj = p.isinstance(Restaurant, BaseballField)
try:
   price = obj.beer_price
   print "Beer costs", price
except:
   print "No beer!"

isinstance() continues to work like a boolean interrogation, but it also provides the ability to act as a 'narrowing' method (to use the CORBA terminology). This provides a restricted 'what type am I' function; it will only check those types that are provided, rather than the full list of potential child types.

I will concede that it isn't immediately obvious that isinstance() will return anything other than a boolean. I don't have any particular solution to this, other than providing isinstance() _and_ narrow() - but the duplicated functionality, sort-of-typed API grates against my sensibilities.

I will also concede that implementing a narrow method of this sort is not a major drama, and could be easily written by the end user. However, it does strike me as a fairly essential component of a duckless inheritance system (as demonstrated by CORBA requiring a narrow method due to C++'s class system). My crystal ball forsees complaints if it is omitted.

Russ Magee %-)

Malcolm Tredinnick

unread,
Aug 3, 2006, 7:45:39 PM8/3/06
to django-d...@googlegroups.com
This is why posting this stuff pays off. Somebody comes back with a
cluestick in hand and helps out. :-)

On Thu, 2006-08-03 at 21:01 +0800, Russell Keith-Magee wrote:
>
> On 8/3/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:

[...]


>
> (2) If a class has derived classes, you can access those via
> attributes
> on the class.
>
> Obviously, I'd still prefer an _actual_ duck, but this seems a
> reasonable alternative in an duckless world.

I could subclass everything from Duck if you like. Poor man's duck
typing. :-)

> I assume p.restaurant will return a fully functioning Restaurant
> instance? i.e., p.restaurant is really just a descriptor returning
> Restaurant.objects.get(pk=p.id) ? If so, +1

Yes, that's the idea. I can't see any reason not to do this, since the
database work is the same.

> (2) Not sure if we should go for
> Place.restaurant.italianrestaurant or just make
> Place.italianrestaurant work directly (I prefer the
> latter, but
> neither is a showstopper).
>
> +1 to the latter syntax. When operating on an instance, the exact
> heirarchy is irrelevant. The only problem I can see here is in dealing
> with repeated names in the heirarchy, to which the only solution I can
> see is a model validation error.

I haven't worked out the API for specifying the attribute here, but
that's minor. Basically, we need something on the sub-class to say "use
this attribute name in the parent(s)". An attribute in the Meta class is
probably the solution, I guess. We already need to have an attribute for
specifying database column(s) back to the parent(s). Failure to set the
attribute and getting a name clash is a validation error.

[...]


> hrm... -0. I would have thought that a single isinstance() would be a
> better idea than a range of is_XXX() functions. e.g.,
>
> p.isinstance(Restaurant)
>
>
> (possibly using strings rather than/in addition to model modules if
> preferred or the implementation requries). This keeps the API small
> and simple, requires less 'magically appearing' functions which could
> potentially clash with user-defined functions, and keeps the API
> (reasonably) aligned with Python inheritance API.

You might be right, I may be .. (wait! Let's not finish that thought.
Billy Joel has the copyright).

All your API points are well taken. I'm just being a coward on the
implementation side. It's harder that way, but that's not really a
reason to avoid doing the Right Thing(tm).

> This isn't necessarily an either/or suggestion - both _could_ be
> supported (but I would be -1 on having both).

If an isinstance() isn't horrible, I would prefer that, for all the
reasons you mention. Agree that both is silly.

> This approach would also allow for checking multiple baseclasses
> simultaneously:
>
> p.isinstance(Restaurant, BaseballField)
>
> In itself, this wouldn't be extraordinarily useful; if Restaurant and
> BaseballField both had a 'beer_price' attribute, you would still need
> to use p.restaurant.beer_price and p.baseballfield.beer_price... but
> more on this later

Worth doing just to be in sync with how the isinstance() builtin works,
if nothing else. Your later points aren't invalid either.

> (4) Implementing a "what type am I" function is not
> going to
> exist initially.
>
> Not sure about this. I agree that a generic form of this type of
> function is inherently expensive. However, the absence of ducks kind
> of makes this kind of function a requirement, IMHO.
>
> To go back to the beer_price example, getting the value of the
> beer_price attribute gets a bit messy; it requires nested try/catches,
> and some extra effort to avoid code duplication:

That is the reason for the is_XXX() (or single isinstance()) methods and
why I mentioned performance (and code readability) -- all those
try/excepts are going to be ugly and not free in Python cycles.

The question for me is whether this is really a use case or not. If
beer_price is common, why isn't it in a common base class? The only
reason I can think of is that it doesn't have the same type. Which gets
kind of interesting when you want to work with it later (although if it
has the same "interface" always, that is fine).

Kind of the reason I was punting on this was that the end-user (the
developer) would be more likely to understand their hierarchy and be
able to go to the things they cared about directly. Doing it in code
requires more general introspection and iteration over the all the
possibilities. On the other hand, your typical hierarchy is going to be
a couple of layers deep with a couple of classes, so it's hardly going
to kill you.

> The multi-input isinstance might be one way to get around this. If
> p.isinstance returns the instance of the first class in the provided
> list that matches, or None if no class matches, the following would be
> possible:
>
> obj = p.isinstance(Restaurant, BaseballField)
> try:
> price = obj.beer_price
> print "Beer costs", price
> except:
> print "No beer!"
>
> isinstance() continues to work like a boolean interrogation, but it
> also provides the ability to act as a 'narrowing' method (to use the
> CORBA terminology). This provides a restricted 'what type am I'
> function; it will only check those types that are provided, rather
> than the full list of potential child types.
>
> I will concede that it isn't immediately obvious that isinstance()
> will return anything other than a boolean. I don't have any particular
> solution to this, other than providing isinstance() _and_ narrow() -
> but the duplicated functionality, sort-of-typed API grates against my
> sensibilities.

Heh. I end up in "this feels like CORBA" land a lot, too, when I'm
thinking about this and trying to work out how to use it. :-)

I feel I can live without anything resembling narrow() -- or C++'s
dynamic_cast, for those more familiar with that language -- because I'm
not convinced there is an important use-case. However, if we leave it
out, it does makes accessing the beer_price attribute, wherever it may
live a little harder. It's also dangerous for me to be projecting my
usages onto the world wide group of developers. I have some experience,
but not *that* much. So, you probably have a point.

In passing, overriding __getattr__ isn't an ideal solution, either,
because it doesn't work well with lazy attribute resolving. I tried for
a while to make that work nicely, but never came away happy. (Although,
as I write this, it occurs to me that with introspection and caching
which models have which fields, we would only examine the "right"
tables. Hmmm.)

> I will also concede that implementing a narrow method of this sort is
> not a major drama, and could be easily written by the end user.
> However, it does strike me as a fairly essential component of a
> duckless inheritance system (as demonstrated by CORBA requiring a
> narrow method due to C++'s class system). My crystal ball forsees
> complaints if it is omitted.

My crystal ball says the complaints won't stop if it is, though. :-)

We're stuck between being explicit, efficient and being almost
transparently Python-like. I'm not sure all three goals can be met, so
we're discussing where to place the "here be bears" signs.

How about this? In the interests of getting the code finished and giving
people something to try out, we punt this one initially. It can be added
later, as you mention. Unless somebody has a really big brainwave very
soon, the main design here can't change too much, so debating whether we
have a narrow()-equivalent or not probably distracts from the big
picture. My thinking is along the lines of: if narrow() wasn't
available, would it be a showstopper for this design? If so, what is the
alternative?

I think the answer to the first question is "no", so I don't feel too
unhappy putting it in the "later" basket for now.

Thanks heaps for all the feedback, Russ. Helps a lot.

Cheers,
Malcolm


Russell Keith-Magee

unread,
Aug 4, 2006, 1:36:23 AM8/4/06
to django-d...@googlegroups.com
On 8/4/06, Malcolm Tredinnick <mal...@pointy-stick.com > wrote:

> Obviously, I'd still prefer an _actual_ duck, but this seems a
> reasonable alternative in an duckless world.

I could subclass everything from Duck if you like. Poor man's duck
typing. :-)

Hey! Great idea! +1 :-)

I haven't worked out the API for specifying the attribute here, but
that's minor. Basically, we need something on the sub-class to say "use
this attribute name in the parent(s)". An attribute in the Meta class is
probably the solution, I guess. We already need to have an attribute for
specifying database column(s) back to the parent(s). Failure to set the
attribute and getting a name clash is a validation error.

Hadn't considered using the Meta class. +1 on this idea.

Related note that just occured to me: is there any intention to extend query terms with this attribute-based inheritance syntax? i.e., is something like:

Place.objects.filter(restaurant__chef_name='Bork')

going to be possible? And if so, what about:

Place.objects.filter(Q(restaurant__chef_name='Bork') | Q(baseballfield__home_team_name='Red Sox'))

Any comments?

> hrm... -0. I would have thought that a single isinstance() would be a
> better idea than a range of is_XXX() functions. e.g.,
>
> p.isinstance(Restaurant)

You might be right, I may be .. (wait! Let's not finish that thought.
Billy Joel has the copyright).

Grrrr... I am now holding you _personally_ responsible for the fact that I have a Billy Joel song going through my head. :-)

I feel I can live without anything resembling narrow() -- or C++'s
dynamic_cast, for those more familiar with that language -- because I'm
not convinced there is an important use-case. However, if we leave it
out, it does makes accessing the beer_price attribute, wherever it may
live a little harder. It's also dangerous for me to be projecting my
usages onto the world wide group of developers. I have some experience,
but not *that* much. So, you probably have a point.

Other than toy cases, I don't actually have a strong use case for this kind of situation; however, I won't presume to represent the world community of developers out there. I would only offer that the existence of CORBA's narrow() is proof of the potential for a need (my god... I'm defending the design decisions of the OMG... time to kill myself... :-)

In passing, overriding __getattr__ isn't an ideal solution, either,
because it doesn't work well with lazy attribute resolving. I tried for
a while to make that work nicely, but never came away happy. (Although,
as I write this, it occurs to me that with introspection and caching
which models have which fields, we would only examine the "right"
tables. Hmmm.)

Interesting idea, but one that I would think could be added transparently as an optimization post facto. IMHO, getting inheritance working is a more immediate priority.

> My crystal ball forsees
> complaints if it is omitted.

My crystal ball says the complaints won't stop if it is, though. :-)

Sad, but true.... :-)

I think the answer to the first question is "no", so I don't feel too
unhappy putting it in the "later" basket for now.

I think this is a reasonable approach. I doubt it would be a showstopper (if only because it can be implemented by end users without too much drama), and I don't have an actual use case myself that would require this feature.

I mentioned it because (1) a toy example jumped to mind immediately, (2) I'm pained to not provide a feature because _I_ don't have a need for it, and (3) I could see that isinstance() was really just a booleanized form of narrow(), so most (if not all) of a narrow() implementation would be required to implement isinstance(), and if this was the case, I figured it might be worthwhile exposing the useful bit as API.

Thanks heaps for all the feedback, Russ. Helps a lot.

Always a pleasure.

Russ %-)

Malcolm Tredinnick

unread,
Aug 4, 2006, 1:57:45 AM8/4/06
to django-d...@googlegroups.com
On Fri, 2006-08-04 at 13:36 +0800, Russell Keith-Magee wrote:
>
> On 8/4/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:

[...]


> Related note that just occured to me: is there any intention to extend
> query terms with this attribute-based inheritance syntax? i.e., is
> something like:
>
> Place.objects.filter(restaurant__chef_name='Bork')
>
> going to be possible? And if so, what about:
>
> Place.objects.filter(Q(restaurant__chef_name='Bork') |
> Q(baseballfield__home_team_name='Red Sox'))
>
> Any comments?

I hate it (sorry). You can do it anyway with itertools.chain (except for
ordering) and one day we might even get QuerySet unions between
disparate tables implemented and it will be transparent then, too. For
now it's just model abuse.

I'm -1 on this until somebody can convince me it's actually useful and
common.

Next time I'm picking two Place types that have absolutely nothing in
common so that you can't twist my examples into cases like this.
Restaurants and SmallCometsInTheOortCloud, for example. I'm making this
too easy for you.

[...]


> I feel I can live without anything resembling narrow() -- or C
> ++'s
> dynamic_cast, for those more familiar with that language --
> because I'm
> not convinced there is an important use-case. However, if we
> leave it
> out, it does makes accessing the beer_price attribute,
> wherever it may
> live a little harder. It's also dangerous for me to be
> projecting my
> usages onto the world wide group of developers. I have some
> experience,
> but not *that* much. So, you probably have a point.
>
> Other than toy cases, I don't actually have a strong use case for this
> kind of situation; however, I won't presume to represent the world
> community of developers out there. I would only offer that the
> existence of CORBA's narrow() is proof of the potential for a need (my
> god... I'm defending the design decisions of the OMG... time to kill
> myself... :-)

I don't actually hate CORBA as much as the typical developer in the
street. I kind of like programming with it sometimes; when it works,
it's beautiful. So feel free to make the positive comparisons. However,
I think the difference between CORBA's narrow() and our situation is
that casting is a language feature in many of the language bindings that
CORBA initially supported and is a necessity when you don't have duck
typing in the language. It feels un-Pythonic and since we've already
backed away from true duck typing, I would feel more comfortable being
explicit about that.

Anyway, I think we're in agreement that we can punt this for now and get
the 90% case working first.

OK, I think you and I are at least somewhat in agreement.

Thanks,
Malcolm

Russell Keith-Magee

unread,
Aug 4, 2006, 4:10:12 AM8/4/06
to django-d...@googlegroups.com
On 8/4/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:

On Fri, 2006-08-04 at 13:36 +0800, Russell Keith-Magee wrote:

> Place.objects.filter(restaurant__chef_name='Bork')

>
> Any comments?

I hate it (sorry).

Apologies - I wasn't actually suggesting it or requesting it. I dislike it too; It just occurred to me that p.name and p.restaurant both appear as attributes, we could end up fielding questions about the use of inheritance attributes in queries, and that on initial appearances, it would be a right mess to interpret, let alone implement.

Next time I'm picking two Place types that have absolutely nothing in
common so that you can't twist my examples into cases like this.
Restaurants and SmallCometsInTheOortCloud, for example. I'm making this
too easy for you.

erm... p.smallcomets.great_ambience? I know some great little dives out on the solar rim... :-)

I don't actually hate CORBA as much as the typical developer in the
street. I kind of like programming with it sometimes; when it works,
it's beautiful. So feel free to make the positive comparisons.

True. I shouldn't get so uppity - I work with a few ex-Iona (producers of Orbix) people.

Anyway, I think we're in agreement that we can punt this for now and get
the 90% case working first.

Yup.

Russ %-)

Bjørn Stabell

unread,
Aug 4, 2006, 12:15:17 PM8/4/06
to Django developers
Just a question; how does this compare pros and cons with single-table
inheritance, used in Rails? See:

http://twelvelabs.com/singletable/index.html

Rgds,
Bjorn

Joseph Kocherhans

unread,
Aug 4, 2006, 12:28:17 PM8/4/06
to django-d...@googlegroups.com

http://www.objectmatter.com/vbsf/docs/maptool/ormapping.html

Check out the section on Horizontal Mapping vs. the one on Filtered
Mapping. Also, there's a good explanation in Martin Fowler's Patterns
of Enterprise Application Architecture.

In short:

STI is theoretically faster, but you can't really enforce db-level
integrity constraints. The proposed implementation is theoretically
slower (can involve a lot of joins for deep class hierarchies), but
not null and foreign key contraints can actually be enforced at the db
level. Those are the main points I remember, but there are many more.

Joseph

Malcolm Tredinnick

unread,
Aug 4, 2006, 7:49:01 PM8/4/06
to django-d...@googlegroups.com
On Fri, 2006-08-04 at 10:28 -0600, Joseph Kocherhans wrote:
> On 8/4/06, Bjørn Stabell <bjo...@gmail.com> wrote:
> >
> > Just a question; how does this compare pros and cons with single-table
> > inheritance, used in Rails? See:
> >
> > http://twelvelabs.com/singletable/index.html

You probably want to read the Wiki page and the previous threads on this
to see some of the earlier discussions. It's been suggested before
(although I realise you're asking for a comparison, rather than
necessarily making a suggestion). A brief summary below.

Joseph's point is a good start:

[...]


> STI is theoretically faster, but you can't really enforce db-level
> integrity constraints. The proposed implementation is theoretically
> slower (can involve a lot of joins for deep class hierarchies), but
> not null and foreign key contraints can actually be enforced at the db
> level. Those are the main points I remember, but there are many more.

The main constraint that is hard is NULL columns -- you can't have
not-NULL columns any more. Most other things can be faked (e.g. CHECK()
conditions and other constraint constructs) in many databases, but it's
a bit of effort.

Performance isn't as big an issue as people want to believe. Databases
are very fast. And, not surprisingly, they are very good at doing joins.
"Fast enough" is often a better goal then "as fast as possible", since
it's less fragile, easier to debug and maintain, and evolves with
changing technology more easily.

Other points against the single table model (in addition to NULL
constraints):
- doesn't integrate well with legacy databases

- doesn't work with third-party model inheritance

- multiple inheritance becomes really hard (you can't do it
perfectly with single tables for all cases). You may think this
isn't relevant, but mix-ins are surprisingly useful and some of
them will be database-backed.

- development, rollout and rollback are more painful and
error-prone (you need to keep altering tables instead of
creating and dropping separate ones)

- it's poor database table normalisation (lots of sparse
columns)

We will have a way to say that the base class (or classes) is
"abstract", meaning that its columns should be folded into the child
table. Then you can't query on the base class (maybe; it might even be
possible), but you get to share the common bits at the Python level.
That's an optimisation for people who know what they are doing and
understand the trade-offs. Most people probably won't need to care.

If you really need absolute "bits on the wire" database optimisation,
then there are many places in Django that will not be perfect for you --
although almost all of them can be worked around. The perceived
advantages of chasing the single table goal are far outweighed by the
advantages of using good database design. Sure that's a pretty
opinionated statement, but I feel I've worked with enough small and
largish databases (significant fractions of a terabyte) to have
developed some feel for where the trade-offs lie. Performance problems
never appear in the place you expect them. :-)

Hope that answers your question in part.

Best wishes,
Malcolm

Bjørn Stabell

unread,
Aug 6, 2006, 7:01:26 AM8/6/06
to Django developers
Joseph & Malcom, thanks for the comments. Just wanted to make sure I
could stand my ground in the face of Rails-istas :)

Bjørn Stabell

unread,
Aug 7, 2006, 12:26:08 PM8/7/06
to Django developers
Okay, I've got one more question. If we're not going to support
querying for an object's real type, it becomes quite cumbersome to do
anything polymorphically, which kind-of is the point of
object-orientation.

For example, to use the same URI spec & view for all the subtypes.

OPTION 1: lots of if-then-else's

def detail_view(id):
place = Place.objects.get(id)
if place.isinstance(Place): model_name = 'Place'
elif place.isinstance(Restaurant): model_name = 'Restaurant'
...
return render_to_response("templates/place_%s.html" % model_name,
place)

OPTION 2: embed the type in the URI

def detail_view(model_name, id):
... # or map to a different view altogether

Neither seem like very good solutions to me. Am I missing something?

Alan Green

unread,
Aug 7, 2006, 5:33:22 PM8/7/06
to django-d...@googlegroups.com
Hi Bjørn,

Sure. In this case you would need a discriminator attribute on Place.
I'm thinking of code along the lines of:

class Place(models.Model):
...
discriminator = models.CharField(maxlength=50)
def save(self):
self.discriminator = self.__class__.__name__.lower()
models.Model.save(self)

def autocast(self):
return self.getattr(self.discriminator)

Then in your view:

def detail_view(id):
place = Place.objects.get(id).autocast()
return render_to_response(
"templates/place_%s.html" % place.discriminator,
place=place)


I'd be pleased to see Django require discriminator attributes on
superclasses, and then automagically retrieve the correct subclasses
at the correct times. It seems to work well enough in ORMs such as
Hibernate. However, I think I could happily live with Malcolm's
proposal too, even if it means writing code like the above every
now-and-then.

Alan


--
Alan Green
al...@bright-green.com - http://bright-green.com

Russell Keith-Magee

unread,
Aug 7, 2006, 7:35:40 PM8/7/06
to django-d...@googlegroups.com
On 8/8/06, Alan Green <alan....@gmail.com> wrote:

Hi Bjørn,

Sure. In this case you would need a discriminator attribute on Place.
I'm thinking of code along the lines of:

class Place(models.Model):
    ...
    discriminator = models.CharField(maxlength=50)
    def save(self):
        self.discriminator = self.__class__.__name__.lower()
        models.Model.save(self)

    def autocast(self):
        return self.getattr(self.discriminator)

For those late to the discussion, it should be noted that this was one of the ideas proposed for implementing inheritance. It was rejected on two grounds:

1) Lack of support for legacy databases
2) The number of joins that would be required for queries in the general case.

However, while it isn't a viable solution as part of Django's model system, as Alan notes, you can implement the same behaviour yourself.

Yours,
Russ Magee %-)

Malcolm Tredinnick

unread,
Aug 7, 2006, 8:36:00 PM8/7/06
to django-d...@googlegroups.com
On Tue, 2006-08-08 at 07:35 +0800, Russell Keith-Magee wrote:
[...]

>
> For those late to the discussion, it should be noted that this was one
> of the ideas proposed for implementing inheritance. It was rejected on
> two grounds:
>
> 1) Lack of support for legacy databases
> 2) The number of joins that would be required for queries in the
> general case.

More for the first reason than the second, I thought.

The join numbers are as small as possible when you have a discriminator
(because you can work out precisely the right tables to join); it's one
of their advantages. As soon as we threw out discriminators, true
transparent polymorphism (duck typing) became harder because the join
numbers now became O(# of nodes in the inheritance tree).

Regards,
Malcolm


Malcolm Tredinnick

unread,
Aug 7, 2006, 8:39:04 PM8/7/06
to django-d...@googlegroups.com
On Mon, 2006-08-07 at 09:26 -0700, Bjørn Stabell wrote:
> Okay, I've got one more question. If we're not going to support
> querying for an object's real type, it becomes quite cumbersome to do
> anything polymorphically, which kind-of is the point of
> object-orientation.

This stuff is hard not because we are not very clever, but because it is
quite possibly fundamentally hard. There is a famous comment from Bjarne
Stroustrup where he mentioned that AT&T (his employer) had tried very
hard to map object models onto relation databases and had come to the
conclusion it just wasn't possible to do. With pedigree like that laying
the groundwork, I don't mind making a few compromises in order to make
some things easy and all things somehow possible.

One of the compromises we are making is deciding to make sharing common
stuff (one benefit/point of inheritance) trivially easy, possibly at the
slight expense of other things. Working with the most-derived objects is
transparent. Working from the other end of the stack, with the base
classes and trying to use polymorphism everywhere is going to be a bit
more painful. We tried not to make this sacrifice, but, as pointed out
earlier in this thread and in other threads, all other designs had their
trade-offs as well.

> For example, to use the same URI spec & view for all the subtypes.
>
> OPTION 1: lots of if-then-else's
>
> def detail_view(id):
> place = Place.objects.get(id)
> if place.isinstance(Place): model_name = 'Place'
> elif place.isinstance(Restaurant): model_name = 'Restaurant'
> ...
> return render_to_response("templates/place_%s.html" % model_name,
> place)

This is why Russell was talking about the a narrow()-like function
earlier. It exists in CORBA to make this type of thing easier. You can
see our conclusion on that earlier in this very thread.

Also, if you are going to need to do this sort of thing frequently, you
can build a small app much like content-type that maps (model name, pk)
to "most derived type". That would act precisely like a discriminator
column (because it *is* one) and you could be able to work smoothly with
polymorphic objects. The reason we didn't put this in the main design by
default is because we are tending to avoid auxilliary tables in core
parts of Django (note that ContentTypes is an optional model, for
example). The database maintenance and requirements to keep another
table in sync is a consideration.

Best wishes,
Malcolm

Malcolm Tredinnick

unread,
Aug 7, 2006, 9:29:38 PM8/7/06
to django-d...@googlegroups.com
On Tue, 2006-08-08 at 10:39 +1000, Malcolm Tredinnick wrote:
[...]

> This stuff is hard not because we are not very clever, but because it is
> quite possibly fundamentally hard. There is a famous comment from Bjarne
> Stroustrup where he mentioned that AT&T (his employer) had tried very
> hard to map object models onto relation databases and had come to the
> conclusion it just wasn't possible to do. With pedigree like that laying
> the groundwork, I don't mind making a few compromises in order to make
> some things easy and all things somehow possible.

Urgh. That came across more arrogantly than I intended. Sorry.

I can't even find an online source for the Stroustrup comment now
(although I'm pretty sure I've run into it again recently). He didn't
say it was impossible always, just not possible in full generality. And,
of course, things like Hibernate and ActiveRecord attempt to disprove
that on a daily basis.

So let's dismiss the above paragraph as any kind of justification,
lacking supporting evidence or particular relevance.

Malcolm


Bjørn Stabell

unread,
Aug 7, 2006, 10:07:52 PM8/7/06
to Django developers
Alan Green wrote:
> Sure. In this case you would need a discriminator attribute on Place.
[...]

> I'd be pleased to see Django require discriminator attributes on
> superclasses, and then automagically retrieve the correct subclasses
> at the correct times. It seems to work well enough in ORMs such as
> Hibernate. However, I think I could happily live with Malcolm's
> proposal too, even if it means writing code like the above every
> now-and-then.

Yes, I'm worried that needing polymorphism is the common case, and if
it's not blatantly obvious how to do it, people are going to get
frustrated, especially if they're used to it working in other ORMs.
They're attracted to Python and Django for certain characteristics, and
it'll be a bit of a let-down, that's all.

So, I understand the technical challenges, and that the design is the
most generic and optimized possible, and many people will appreciate
this as well. All I'm saying this Alan's descriptor solution better be
a Best Practice side note in the docs if it's not an option of the
framework itself :)

Rgds,
Bjorn

Alan Green

unread,
Aug 8, 2006, 12:36:06 AM8/8/06
to django-d...@googlegroups.com

Even with discriminators the queries would be very wide. When querying
to retrieve a bunch of objects, you don't know which tables to join
until after you've selected the discriminator value, so you have to
outer join them all anyway.

I didn't say so yet, but I think the proposal you outlined is a
sensible compromise between functionality and implementation
complexity.

Cheers,

Alan.

>
> Regards,
> Malcolm

Alan Green

unread,
Aug 8, 2006, 12:39:01 AM8/8/06
to django-d...@googlegroups.com
On 8/4/06, Malcolm Tredinnick <mal...@pointy-stick.com> wrote:
> Next time I'm picking two Place types that have absolutely nothing in
> common so that you can't twist my examples into cases like this.
> Restaurants and SmallCometsInTheOortCloud, for example. I'm making this
> too easy for you.

Here's a use case where the functionality you've proposed would be just fine.

My current application which has Person and Organisation classes. As
people and organisations have certain attributes and relations in
common (e.g addresses, emails and telephone numbers), my model
includes another class named Party.

That is:

- Person has foreign key to Party
- Organisation has foreign key to Party
- Address has foreign key to Party
- A party may have many addresses, and will either have a related
Person, or a related Organisation

It will be great to be able to make Party a "real" superclass.

Furthermore, because of the two-levels deep relationships between
Person, Party and Address, I have had to build custom CRUD pages for
Person administration, and for Organisation administration. With
inheritance, I'm supposing the Django admin app will now be able show
Addresses on the Person and Organisation pages, so that's about 6
boring pages I won't have to write next time.

Cheers,

Alan.

Reply all
Reply to author
Forward
0 new messages