Many-to-many relationships with additional columns

Marty Alchin

unread,

May 31, 2007, 4:32:28 PM5/31/07

to django-d...@googlegroups.com

I've been giving a bit of thought into many-to-many relationships
lately, and I (once again) ran across the task of creating a
relationship that contained additional information. I know this has
been bounced around before, but I can't seem to find any substantial
discussions on the topic, so I wrote up a wiki article[1] with some of
my thoughts.

I'll be putting up the code I've got working so far that demonstrates
that parts of what I wrote up are indeed possible. The rest should be
as well, but it will probably take some extra discussion and possibly
some patches to how related managers work. I'll deal with that once I
figure out what all would be necessary.

My main reason for bringing it here, however, is to ask if there are
indeed some previous discussions that I should consider before I go to
far in my experiments on this subject. Or if I'm just completely off
my rocker.

-Gul

[1] http://code.djangoproject.com/wiki/ManyToManyManager

Jacob Kaplan-Moss

unread,

May 31, 2007, 4:55:55 PM5/31/07

to django-d...@googlegroups.com

On 5/31/07, Marty Alchin <gulo...@gamemusic.org> wrote:
> [1] http://code.djangoproject.com/wiki/ManyToManyManager

+1.

No, wait, + a whole lot more than that! I've been wanting explicit M2M
join objects for quite some time; this is a pretty neat way to
accomplish it.

Just for shits and giggles, here's another possible syntax I'd been
considering::

class Role(models.Model):
role = models.CharField(maxlength=255)

class Film(models.Model):
title = models.CharField(maxlength=255)

class Actor(models.Model):
name = models.CharField(maxlength=255)
films = models.ManyToManyField(Film, through=Role,
related_name="actors")

(i.e. Actor relates to File "through" Role.)

Jacob

oggie rob

unread,

May 31, 2007, 6:26:06 PM5/31/07

to Django developers

Yeah! This is much cleaner. I need to add ordering to some model
relations (i.e. represent all related values as an ordered list, where
relations can be shifted up & down) to the M2M field, it looks like it
would fit much easier using a manager & concrete Model subclass than
being restricted to a field.

I don't vote often, but +lots.

-rob

Russell Keith-Magee

unread,

May 31, 2007, 9:32:38 PM5/31/07

to django-d...@googlegroups.com

On 6/1/07, Marty Alchin <gulo...@gamemusic.org> wrote:
>
> My main reason for bringing it here, however, is to ask if there are
> indeed some previous discussions that I should consider before I go to
> far in my experiments on this subject. Or if I'm just completely off
> my rocker.

You may well be completely off your rocker. I can't claim expertise in
that domain. Mostly because _I'm_ completely off my rocker, most of
the time :-).

However, you're not the first to propose this. In fact, I would doubt
that you are even the tenth. This is a pretty common request.

This is the first time that I've seen a viable mechanism for
distinguishing between queries over properties of the related object
and properties of the relation. Previously, the suggestion has been to
pseudo-add the fields of the relation model to the related classes:

Actor.objects.filter(films__title='Fight Club')
Actor.objects.filter(films__role='Tyler Durden')

Your suggestion seems to be more on the lines of pseudo-adding the
entire relation _model_ to the related classes:

Actor.objects.filter(films__title='Fight Club')
Actor.objects.filter(roles__role='Tyler Durden')

This are still some namespace clash problems, but they're much smaller
than previous suggestions, and the semantic ambiguity (i.e., the
implication of attributes on the related model that aren't actually
there) isn't as pronounced.

To add to your proposal, I would say that the pseudo-attribute should
derive its name from the relation name:

Actor.objects.filter(film_roles__role='Tyler Durden')
Film.objects.filter(actor_roles__role='Tyler Durden')

Specifically, this is avoid problems when you have two m2m relations
with a relation model (not a problem for this scenario, but certainly
conceivable). It also has the advantage of reducing the semantic
ambiguity a little further.

Regarding syntax; Jacob's suggestion is the usual suggested syntax for
proposals in this direction, and I'd have to say I prefer that syntax
to your M2MManager idea. To boot, Jacob's syntax should actually fit
into the existing contribute_to_model framework without too much
difficulty.

So, put me down as a +1, but with Jacob's syntax (or a variation therof).

Yours,
Russ Magee %-)

Marty Alchin

unread,

Jun 1, 2007, 8:51:30 AM6/1/07

to django-d...@googlegroups.com

On 5/31/07, Russell Keith-Magee <freakb...@gmail.com> wrote:
> However, you're not the first to propose this. In fact, I would doubt
> that you are even the tenth. This is a pretty common request.

Yeah, I know I had seen at least one mention of it before, and I
figured it was fairly common. I was just havnig trouble tracking down
any prior discussions on the topic, so I had little to go on.

> Your suggestion seems to be more on the lines of pseudo-adding the
> entire relation _model_ to the related classes:
>
> Actor.objects.filter(films__title='Fight Club')
> Actor.objects.filter(roles__role='Tyler Durden')

Admittedly, I hadn't gone far enough into it to consider how filters
like that would operate, so I'm actually fairly open to using whatever
method works best. Right now, the manager actually returns a QuerySet
based entirely on the destination model, using .extra(select=...) to
add in the fields from the join model.

It's really quite primitive at this point, since my first concern was
accessing the data from templates. Comments like yours are precisely
why I wanted to bring it to the list. :)

> This are still some namespace clash problems, but they're much smaller
> than previous suggestions, and the semantic ambiguity (i.e., the
> implication of attributes on the related model that aren't actually
> there) isn't as pronounced.

As for the namespace clash problems, I figured the manager, during its
contribute_to_class, could check the fields on each related model, and
if there are any duplicates, throw and error. That way, something like
that would fail even during 'manage.py validate'.

As for semantic ambiguity, I had given a little bit of thought to it,
and there's only so much that can be done to help that situation. The
one thing I'd insist on is that the result model (such as the Actor
taken from film.actors.filter(name='John Cleese')) would never update
the relationship when its .save() is called. That, of course, is
current functionality anyway, and I would think that changing it would
cause far too many headaches. That's why the wiki article recommends
the addition on an .update() method on the manager.

> To add to your proposal, I would say that the pseudo-attribute should
> derive its name from the relation name:
>
> Actor.objects.filter(film_roles__role='Tyler Durden')
> Film.objects.filter(actor_roles__role='Tyler Durden')
>
> Specifically, this is avoid problems when you have two m2m relations
> with a relation model (not a problem for this scenario, but certainly
> conceivable). It also has the advantage of reducing the semantic
> ambiguity a little further.

Again, I hadn't done any work yet on how filters would work, so this
seems like a reasonable approach. The manager already returns a custom
subclass of QuerySet to add in the relationship data, so adding in
extra filter behavior should be straightforward.

> Regarding syntax; Jacob's suggestion is the usual suggested syntax for
> proposals in this direction, and I'd have to say I prefer that syntax
> to your M2MManager idea. To boot, Jacob's syntax should actually fit
> into the existing contribute_to_model framework without too much
> difficulty.

I admit I hadn't considered any alternative syntax, but that's mostly
because I couldn't find any previous discussion. I'm definitely open
to alternatives, but I'll at least explain why I chose the syntax I
proposed: it falls very much in line with the existing recommendation
for defining joiner models.

The main advantage to the manager concept is that projects could
maintain all their existing models and databases, without having to
destroy anything. All they'd have to do is add the manager to
whichever models they were already using, optionally change the
related_names to make more sense, and update their other code to use
the new API. No manual schema changes would be necessary.

There is (at least) one substantial pitfall to my syntax, however.
It's fairly easy to implement basic retrieval (already done) and the
.update() method (again, already done), and the filter modifications
should be simple enough, but extending the .add() method is currently
impossible without a patch to django.db.models.fields.related.

The RelatedDescriptor subclasses the model's default manager and adds
its own .add(), so anything I put on my manager gets completely
ignored. And I'm not about to add a separate method, since that would
not only be inconsistent with other relationship types, but it would
also still leave the existing .add() in place, even though it wouldn't
function as expected.

I'm not sure how much of a patch to the RelatedDescriptor process
would need to happen to get .add() working, but it's certainly enough
of a trouble to investigate alternatives.

So, there's a bit more to chew on; hopefully we can come up with some
final thoughts on this soon, because I'd love to see this happen.

-Gul

Benjamin Slavin

unread,

Sep 10, 2007, 5:56:59 PM9/10/07

to django-d...@googlegroups.com

Hello all,

On 6/1/07, Marty Alchin <gulo...@gamemusic.org> wrote:

> So, there's a bit more to chew on; hopefully we can come up with some
> final thoughts on this soon, because I'd love to see this happen.

Marty's suggestion looks promising and seems to have a good bit of
support from the core devs. It's been a few months, and I am
wondering if there has been any movement on this.

Like Rob, I've come into a data model where an ordered-per-collection
ManyToMany relationship is necessary, and have been trying to find a
suitable solution.

Perhaps this would be a good candidate for the upcoming sprint if
nothing has been done on it thus far. (I'm hoping to be able to join
the festivities, but that depends on how a few other things play out
this week).

- Ben

Marty Alchin

unread,

Sep 10, 2007, 6:26:52 PM9/10/07

to django-d...@googlegroups.com

On 9/10/07, Benjamin Slavin <benjami...@gmail.com> wrote:
> Marty's suggestion looks promising and seems to have a good bit of
> support from the core devs. It's been a few months, and I am
> wondering if there has been any movement on this.

This particular item has been low on my list of priorities lately, but
I've gotten some other things cleared away, so I hope to revisit it
soon.

> Perhaps this would be a good candidate for the upcoming sprint if
> nothing has been done on it thus far. (I'm hoping to be able to join
> the festivities, but that depends on how a few other things play out
> this week).

I actually am indeed hoping to bring it up again at the sprint! A
"proper" implementation to provide the API I described would require
some trickery in the realtionship descriptors, so I'd need to discuss
things a bit with the core devs to see how best to proceed on that
front anyway.

I make no guarantees, but if it doesn't get tackled at the sprint, I
should be able to at least clear out enough other stuff to get it back
at the top of my list. Thanks for the support!

-Gul

Tai Lee

unread,

Sep 11, 2007, 7:29:09 PM9/11/07

to Django developers

I think I'd like to see this functionality in trunk, too. Previously
when I needed a M2M model with sequence or other data, I just created
the M2M model explicitly with ForeignKey fields to the two related
models.

class Role(models.Model):
role = models.CharField(maxlength=255)

class Film(models.Model):
title = models.CharField(maxlength=255)

class Actor(models.Model):
name = models.CharField(maxlength=255)

class ActorFilmRole(models.Model):
actor = models.ForeignKey(Actor)
film = models.ForeignKey(Film)
role = models.ForeignKey(Role)
sequence = models.IntegerField()

This seems to work, but is it missing out on some of the niceties /
helper methods of the proposed M2M? What would be the benefits of the
suggested new method over this?

Marty Alchin

unread,

Sep 11, 2007, 9:39:35 PM9/11/07

to django-d...@googlegroups.com

On 9/11/07, Tai Lee <real....@mrmachine.net> wrote:
> This seems to work, but is it missing out on some of the niceties /
> helper methods of the proposed M2M? What would be the benefits of the
> suggested new method over this?

Yes, that method works, and that's the recommended way to go for now,
but as for what it's missing, it's a little more subtle. It's not
missing any functionality, that much is true. What it's missing is the
ability to quickly get from one model to another, and have
relationship fields easily accessible.

With a standard M2M field, you can just use film.actors.all() and
actor.films.all(), but with the method you identified, there's another
level between the two. This may not be too bad for you and me, but
it's murder on template designers who shouldn't need to know about
those types of things.

As for the relationship fields, they're on a separate object which
also makes it quite unintuitive to somebody who's not familiar with
the inner workings.

What I'm working on is a way to provide the functional benefits of M2M
with relationship data (like your example), with the human-natural API
provided by the standard M2M field.