Multi-table (polymorphic?) row-specific relations

409 views
Skip to first unread message

Andrey Petrov

unread,
Apr 22, 2011, 12:20:45 AM4/22/11
to sqlal...@googlegroups.com
Hi there,

I'm trying to make a table for annotating other tables' rows, perhaps even in relation to a user. I haven't been able to find similar examples of this elsewhere so I'd like a sanity check that I'm not doing something horrendously wrong. This is in SQLAlchemy 0.7 by the way.

Here's an example:

class Tag(BaseModel):
    __tablename__ = 'tag'

    id = Column(types.Integer, primary_key=True)
    table_name = Column(types.String(64))
    user_id = Column(types.Integer)
    row_id = Column(types.Integer)

    name = Column(types.String, nullable=False)

Index('tag_backref_idx', Tag.table_name, Tag.user_id, Tag.row_id, Tag.name, unique=True)
Index('tag_name_idx', Tag.table_name, Tag.user_id, Tag.name)

So say I have a table called user, I want to be able to apply a tag to a row in it as Tag(table_name='foo', user_id=None, row_id=1, name='sometag'). The semantic is that user_id=None means it's a "global" (aka. system) tag, whereas a tag from a user would have that user_id filled respectively.

On the Foo object, I can apply a relation like this (I'm doing this via a mixin class in my code):

    @declared_attr
    def tags(cls):
        return orm.relationship(Tag,
            primaryjoin=and_(
                Tag.table_name==cls.__tablename__,
                Tag.user_id==None,
                Tag.row_id==cls.id,
            ),
            foreign_keys=[Tag.table_name, Tag.user_id, Tag.row_id],   
        )

So now I can do things like...
 
f = Session.query(Foo).get(1)
f.tags # == [list of global Tag objects applied on object f]

The goal is to be able to apply a "tag" on the system level (user_id=None) or on the user level for any row of any table (in reality there will be a couple of tables that will never have tags, but 90% will).

My questions:
  1. Can we use the polymorphic_on stuff to make this better integrated?
  2. Is there a way to make a "dynamic" ForeignKey reference such that I can associate a tag to an object that hasn't been committed (ie. no `id` yet)?

    Example:

    f = Foo()
    t = Tag(name='blah', magic_triple_column_property=f)
    Session.add(f)
    Session.add(t)
    Session.commit()

  3. Am I crazy for trying to do this? Is there a better/simpler/more rational way to do something like this?

Thanks in advance!

- Andrey

Michael Bayer

unread,
Apr 22, 2011, 12:56:05 AM4/22/11
to sqlal...@googlegroups.com
so, some background, "table_name" is a Rails-style polymorphic association, very old blog post about this at:  http://techspot.zzzeek.org/archive/2007/05/1/

So there's a hacky way there, which you can adopt to declarative, to do the "foreign key thats not really a foreign key" thing.    I don't like it since the lack of an FK means the relatoinship is entirely corruptable.   From reading the post you'll see i always find a way to use foreign keys correctly, which means an extra table.

Polymorphically, youd have FooTag, BarTag, with a joined table, the table thats joined has an FK back to the parent Foo or Bar.

Without using the polymorphic thing, which actually I usually don't for this, you can have your tags @declared_attr pull out a "secondary" table, right in the relationship there.   Or polymorphically you could generate FooTag, BarTag right there.  @declared_attr is great for all that stuff.

The polymorphic association is something I'm slowly getting ready to re-present in a declarative way, since its been coming up a lot lately and that blog post is from 2007.



--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Andrey Petrov

unread,
Apr 25, 2011, 12:59:50 AM4/25/11
to sqlal...@googlegroups.com
So if I understand correctly,

You would choose the (1) assoc_id/type secondary table approach that you outline in the blog post over having (2) redundant schema per-table with direct FKs (FooTag, BarTag, etc)? Why? (I have a feeling I'm misparsing what you're saying.)

(1) worries me because I would need approximately one extra row in the secondary table per row in other tables, plus the extra join. I'd need to test this but I imagine the performance implications would be significant. With (2) I get more tables, but at least it's a reasonably compact and simple representation.

I do agree that the hackish rails-style association is not ideal.

Michael Bayer

unread,
Apr 25, 2011, 10:20:02 AM4/25/11
to sqlal...@googlegroups.com

On Apr 25, 2011, at 12:59 AM, Andrey Petrov wrote:

> So if I understand correctly,
>
> You would choose the (1) assoc_id/type secondary table approach that you outline in the blog post over having (2) redundant schema per-table with direct FKs (FooTag, BarTag, etc)? Why? (I have a feeling I'm misparsing what you're saying.)

I might actually do FooTag, BarTag, but each of those would have its own table alone (i.e. concrete table style). That way each object has a collection of "tags" thats entirely portable from the rest. Only if you need to query across "all tags" or otherwise correlate different types would I deal with an intermediary table. it depends on what kind of queries you want to do.


>
> (1) worries me because I would need approximately one extra row in the secondary table per row in other tables, plus the extra join. I'd need to test this but I imagine the performance implications would be significant. With (2) I get more tables, but at least it's a reasonably compact and simple representation.
>
> I do agree that the hackish rails-style association is not ideal.

if the table is under a million rows and you're not on MySQL most differences would be negligible. The comparison of an index of (int, string) versus (int, int) comes into play as well.

Andrey Petrov

unread,
Apr 25, 2011, 2:17:51 PM4/25/11
to sqlal...@googlegroups.com
Thanks Michael!

I am crawling and annotating portions of Twitter's social graph, so I'm already approaching 1mil rows. I'm trying to hold off sharding as long as I can. :)

Concrete table style sounds most sensible, despite the schema clutter. I need to write some kind of Taggable factor to generate the appropriate table.

Andrey Petrov

unread,
Apr 25, 2011, 9:10:23 PM4/25/11
to sqlal...@googlegroups.com
One more thought:

Is there a sane way to hide a schema object within another schema object?

Specifically, I want to make a factory method (or maybe a class decorator) which generates these Tag schemas onto specific tables. Something along the lines of:

@Taggable
class User(BaseModel)
    __tablename__ = "user"

The Taggable class decorator would generate another table called "tag_user" and attach the appropriate relationship attributes to the User class. I'm thinking it'll also attach the anonymous Tag declarative class to the User class as User.TagClass or somesuch.

Is this reasonable? Is there a better way to do this than monkey-patching the User class in the Taggable decorator?

One fantasy I had was if you could do...

class TaggableMixin(object):
    @declared_attr
    def TaggableClass(cls):
        class Tag(BaseModel):
            __tablename__ = "tag_%s" % cls.__tablename__
            # ... schema here

class User(BaseModel, TaggableMixin):
    # ...

Perhaps I should try it but I don't imagine this will work.

- Andrey

Andrey Petrov

unread,
Apr 25, 2011, 9:28:45 PM4/25/11
to sqlal...@googlegroups.com
Looks like this almost-sorta works:

class TagMixin(object):
    @declared_attr
    def TagClass(cls):
        class Tag(BaseModel):
            __tablename__ = "tag_%s" % cls.__tablename__

            id = Column(types.Integer, primary_key=True)
            time_created = Column(types.DateTime, default=datetime.now, nullable=False)

            row_id = Column(types.Integer, ForeignKey(cls.id), index=True)
            name = Column(types.String, nullable=False, index=True)

# This part breaks with... sqlalchemy.exc.InvalidRequestError: Table 'tag_user' is already defined for this MetaData instance. 
#    @declared_attr
#    def tags(cls):
#        return orm.relationship(cls.TagClass, backref='tagged')

class User(BaseModel, TagMixin):
    __tablename__ = 'user'

    id = Column(types.Integer, primary_key=True)
    ...

The appropriate tables do get generated on create_all(). But as soon as I touch the User.TagClass attribute, it barfs with the same error again:

InvalidRequestError: Table 'tag_user' is already defined for this MetaData instance.  Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

 
 

Michael Bayer

unread,
Apr 27, 2011, 1:12:05 PM4/27/11
to sqlal...@googlegroups.com
you should create tables inside of @declared_attr.   Usually I'm using straight Table in there but declared class should work too (though you might get a warning about the same class created twice, unless you do some kind of uniqifying, it probably would be nice to fix declarative to not call any @declared_attr twice).     For the "already have this table in this metadata" issue use the table argument "keep_existing=True".

Otherwise, I do this exact pattern you have above except explicitly:

class UserTag(TagBase, Base):
    __tablename__ = 'user_tag'

class User(Taggable, Base):
    tag_cls = UserTag

TagBase and Taggable do the rest of the work.

*maybe* this works too:

class User(Taggable, Base):
    class tag_cls(TagBase, Base):
        __tablename__ = 'user_tag'


You could also use an event like "instrument_class" or "mapper_configured" , associated with mapper() and would look for Taggable subclasses.


Michael Bayer

unread,
Apr 27, 2011, 1:19:27 PM4/27/11
to sqlal...@googlegroups.com
this works  (been working on getting this kind of thing into recipes/docs/books)

from sqlalchemy import Column, Integer, ForeignKey, String
from sqlalchemy.ext.declarative import declarative_base, declared_attr
from sqlalchemy.orm import relationship

Base = declarative_base()

class Mixin(object):
    @declared_attr
    def foob(cls):
        cls.Blah = Blah = type("%sBlah" % cls.__name__, 
            (Base, ),
            dict(
                __tablename__ = 'blah_%s' % cls.__tablename__,
                id = Column(Integer, primary_key=True),
                name_id = Column(Integer, ForeignKey(cls.id))
            )
        )

        return relationship(Blah)

class Name(Mixin, Base):
    __tablename__ = "name"

    id = Column(Integer, primary_key=True)
    name = Column(String)


class Name2(Mixin, Base):
    __tablename__ = "name2"

    id = Column(Integer, primary_key=True)
    name = Column(String)

print Name2.foob.contains(Name2.Blah())



Andrey Petrov

unread,
Apr 27, 2011, 6:59:59 PM4/27/11
to sqlal...@googlegroups.com
Ah I was really close. This worked:

class TagMixin(object):
    @declared_attr
    def tags(cls):
        class Tag(BaseModel):
            __tablename__ = "tag_%s" % cls.__tablename__

            id = Column(types.Integer, primary_key=True)
            time_created = Column(types.DateTime, default=datetime.now, nullable=False)

            row_id = Column(types.Integer, ForeignKey(cls.id), index=True)
            name = Column(types.String, nullable=False, index=True)

        cls.TagClass = Tag

        return orm.relationship(Tag, backref='tagged')

Ciaran Farrell

unread,
May 2, 2012, 5:07:10 AM5/2/12
to sqlal...@googlegroups.com
But how would you actually add a tag? For example, say, using the example you provided below, I had a table called Document, which has is 'taggable'. If I create a Document object (doc1), I can see doc1.tags, which is a list. However, how do I actually _add_ a tag to doc1? I have a TagMixin object available in dir() but no Tag object (which I would have expected) - though a Tag class exists _inside_ the scope of the TagMixin. Do I have to create a Tag object outside the TagMixin class too?

David Bowser

unread,
May 2, 2012, 8:18:21 AM5/2/12
to sqlal...@googlegroups.com

Nope, just use the one inside the scope, TagMixin.TagClass, like so for your example:

session.add(User.TagClass(tagged=doc1,name='foo'))


--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/rSjfceG4OQEJ.

Ciaran Farrell

unread,
May 10, 2012, 3:15:55 AM5/10/12
to sqlal...@googlegroups.com
Ok, got it thanks. And how would you query the tags then? session.query(TagMix) won't work - neither will session.query(User). Is there a way of querying all tags, irrespective of what table they are 'attached' to?
To unsubscribe from this group, send email to sqlalchemy+unsubscribe@googlegroups.com.

Andrey Petrov

unread,
May 10, 2012, 11:36:01 AM5/10/12
to sqlal...@googlegroups.com
In this specific scenario, the tag sets each live a separate table sibling to the table they're representing, so it's not trivial to query over all tags across all tables.

One thing you could do is UNION all the tag tables. To do this dynamically, you'd need to keep track of all the tag tables. One way to do this is to accumulate this list in some static variable right after the ``cls.TagClass = Tag`` line, and use that to form a UNION query.

Or an easier but hackier way would be to just traverse all tables and pick out the ones prefixed with tag_...

Or perhaps this kind of schema is not ideal for your use case. :)

- Andrey
Reply all
Reply to author
Forward
0 new messages