Is it possible to define multiple model objects that refer to the same table?

Rob Crowell

unread,

Nov 15, 2012, 7:32:29 PM11/15/12

to sqlal...@googlegroups.com

I'm working with a denormalized cache schema, and I've run into a situation where it would be helpful to be able to create multiple classes that extend Base but refer to the same __tablename__. Is this possible to do? I am getting this Exception:

sqlalchemy.exc.InvalidRequestError: Table '[tablename]' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

For a little more insight, we have some attributes that always have exactly one value (user who created the issue), and other attributes that can have 1 or more values (user-defined tags for the issue). If we were being exhaustive, we would create two "cached" tables for our issues since sometimes we want to display recent issues sometimes by user and sometimes by tag:

* issue_user

* issue_tag

However, we can get away with writing just one table and querying it with an appropriate group_by("user_id") to achieve the same end as having 2 tables. Since my application should behave as if there were 2 separate cache tables (and I'd like to keep open the option of adding two separate cache tables in the future), I would like to have 2 different Base classes representing the two ways in which we would query the table. The obvious way of doing this doesn't work:

class IssueUser(Base):

__tablename__ = 'issue_user_tag'

class IssueTag(Base):

Rob Crowell

unread,

Nov 15, 2012, 7:33:16 PM11/15/12

to sqlal...@googlegroups.com

Sorry, that got cut off at the end.

class IssueTag(Base):

__tablename__ = 'issue_user_tag'

sqlalchemy.exc.InvalidRequestError: Table 'issue_user_tag' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

Michael Bayer

unread,

Nov 15, 2012, 10:48:39 PM11/15/12

to sqlal...@googlegroups.com

--

two^H^H^H three ways:

1. map to a Table:

mytable = Table("mytable", Base.metadata, Column(...))

class A(Base):

__table__ = mytable

class B(Base):

__table__ = mytable

1a: variant of 1, map A as you did but use __table__ on B

class A(Base):

__tablename__ = 'mytable'

x = Column(...)

class B(Base):

__table__ = A.__table__

2. use single table inheritance with no discriminator

class MyTable(Base):

__tablename__ = 'mytable'

class A(MyTable):

# ....

class B(MyTable):

# ...

I don't have an understanding of your querying situation yet, discriminating on group_by() seems a little strange as group_by() is only intended to be used to group for aggregates, but #1, #1a or #2 should fit the bill.

You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/pPc-8bqYaSUJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Rob Crowell

unread,

Nov 16, 2012, 2:59:36 PM11/16/12

to sqlalchemy

Thanks for the help so far Michael! I can explain a little more about
what I'm trying to do (I'm using a fictional application here but I
think it pretty accurately translates into my actual application).

BACKGROUND
----------

Let's say I'm writing an issue tracking application. Each issue that
we're tracking has a type (an issue must have exactly one type), and
each issue may have an unlimited number of user-provided labels.

Each day, people browse our issue tracker and each time they do they
generate a page view on the issue. Here's an example of one day's
worth of data:

IssueA: <Views: 1, Type: "Bug", Labels: ["login-page (id=1)",
"show-stopper (id=2)"]>
IssueB: <Views: 20, Type: "One-Time Task", Labels: ["disk-full
(id=3)", "show-stopper (id=2)"]>
IssueC: <Views: 300, Type: "Bug", Labels: ["login-page (id=1)"]>

The BigCo. I'm working for is very interested in knowing which issues
are read by the most people, and they need the ability to generate
reports sliced by arbitrary date ranges. However, we can tolerate a
day delay, so we are writing summary tables each night. Two of these
summary tables are aggregated by either issue type or label, and we
also write a third table that can be used to drill-down and see page
visits bucketed by both type and label:

CREATE TABLE `issue_type` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`type` enum('Bug','Feature','Improvement','One-Time Task') NOT NULL,
`num_watchers` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
)

CREATE TABLE `issue_label` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`label_id` int(10) unsigned NOT NULL,
`num_watchers` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
)

CREATE TABLE `issue_type_label` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`created` datetime NOT NULL,
`type` enum('Bug','Feature','Improvement','One-Time Task') NOT NULL,
`label_id` int(10) unsigned NOT NULL,
`num_visits` int(10) unsigned NOT NULL,
PRIMARY KEY (`id`)
)

So we'd run these insert statements at midnight:

INSERT INTO issue_type (created, type, num_visits) VALUES
("2012-11-15", "Bug", 301),
("2012-11-15", "One-Time Task", 20);

INSERT INTO issue_labels (created, label_id, num_visits) VALUES
("2012-11-15", 1, 301),
("2012-11-15", 2, 21),
("2012-11-15", 3, 20);

INSERT INTO issue_type_label (created, type, label_id, num_visits)
VALUES
("2012-11-15", "Bug", 1, 301),
("2012-11-15", "Bug", 2, 1),
("2012-11-15", "One-Time Task", 3, 20),
("2012-11-15", "One-Time Task", 2, 20);

Now when we want to generate the summary reports, we query one of the
first two tables (if we're generating a report aggregated by issue
type we hit issue_type, if we're generating a report aggregated by
label we hit issue_label), and when we want to run a drill-down query
both both type and label, we hit issue_type_label:

# broken down by type
SELECT type, SUM(num_visits) FROM issue_type WHERE created >=
"2012-11-01" AND created <= "2012-11-16" GROUP BY type;

# broken down by label
SELECT label_id, SUM(num_visits) FROM issue_label WHERE created >=
"2012-11-01" AND created <= "2012-11-16" GROUP BY label_id;

# broken down by both type and label
SELECT type, label_id, SUM(num_visits) FROM issue_type_label WHERE
created >= "2012-11-01" AND created <= "2012-11-16" GROUP BY type,
label_id;

This works fine and is very fast for our application. However, it's a
bit redundant since each issue has exactly one issue type, so we can
safely ignore it when generating the label summary report. So really,
we can get away with not having the issue_label table at all, and just
run this instead (we must continue to have the issue_type table
however, since a single issue can have many labels or none at all):

# broken down by label
SELECT label_id, SUM(num_visits) FROM issue_type_label WHERE
created >= "2012-11-01" AND created <= "2012-11-16" GROUP BY label_id;

THE ISSUE
---------

In reality, we have many types of fields that we generate reports on
(not just type and label), so we have more than a dozen of these
aggregate tables rolled up by different sets of fields. While porting
our data access layer to SQLAlchemy (we have a separate process to
write these tables which we aren't going to move to SQLAlchemy), I've
created a bunch of functions that use reflection on the various Model
objects I've built to figure out what columns we need to select.

Each column that we don't want to GROUP BY or filter out with a WHERE
clause gets wrapped by func.sum() and passed to a call to
session.query(). My columns are very consistently named so I have
automated almost all of this using object reflection, which lets my
code populate which columns to select, group by, and sum on
automatically. I really don't want to write a custom call to query()
for each table type that we have since we have so many and they are
accessed in basically identical ways.

The issue, however, comes when trying to generate the report bucketed
by label. Since the issue_type_label table has both type and label_id
columns, my code will always GROUP BY both of these columns unless I
introduce various hacks to suppress selecting and grouping by these
columns all over the place...

What I want to be able to do is to define 2 models that are both
backed by the same table, but have declared different sets of Columns
so that the reflection code I've written doesn't have to be hacked to
shreds to exclude the type column when I don't want it; I will simply
define one model that doesn't create a Column for type and the
reflection code will ignore it when generating queries. So this is
what I want to write, but it generates the InvalidRequestError:

from sqlalchemy import Column, Date, Enum, Integer, String,
create_engine, desc, func
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

engine = create_engine('mysql://root:password@localhost/
issue_tracker')
Session = sessionmaker(bind=engine)

Base = declarative_base()

class IssueType(Base):
__tablename__ = 'issue_type'
id = Column('id', Integer, primary_key=True)
type = Column('type', Enum('Bug', 'Feature', 'Improvement', 'One-
Time Task'))
created = Column('created', Date)
num_visits = Column('num_visits', Integer)

class IssueLabel(Base):
__tablename__ = 'issue_type_label'
id = Column('id', Integer, primary_key=True)
label_id = Column('label_id', String)
created = Column('created', Date)
num_visits = Column('num_visits', Integer)

class IssueTypeLabel(Base):
__tablename__ = 'issue_type_label'
id = Column('id', Integer, primary_key=True)
type = Column('type', Enum('Bug', 'Feature', 'Improvement', 'One-
Time Task'))
label_id = Column('label_id', String)
created = Column('created', Date)
num_visits = Column('num_visits', Integer)

>>> sqlalchemy.exc.InvalidRequestError: Table 'issue_type_label' is already defined for this MetaData instance. Specify 'extend_existing=True' to redefine options and columns on an existing Table object.

Correct me if I'm wrong, but I believe the methods you described above
will all inherit the same column definitions for the table. However,
that's exactly what I don't want to happen! I poked around in the
source and noticed there is a table argument called keep_existing
which I tried using, but its behavior seemed inconsistent and/or buggy
with respect to what columns were reported via the reflection I was
doing. Maybe this isn't the use case it was intended for?

Its possible that I can declare 2 different Base classes with two
separate calls to declarative_base(), and have IssueLabel extend Base1
and IssueTypeLabel extend Base2, but that solution seems
unsatisfactory to me. The problem seems so easy, and I'm really close
to solving it, but I'm not quite there yet.

Any pointers to something I missed? Thanks again!

>
>
>
>
>
>
>
> > You received this message because you are subscribed to the Google Groups "sqlalchemy" group.

> > To view this discussion on the web visithttps://groups.google.com/d/msg/sqlalchemy/-/pPc-8bqYaSUJ.

Michael Bayer

unread,

Nov 16, 2012, 8:49:27 PM11/16/12

to sqlal...@googlegroups.com

without having a deep understanding of your system and only time for a cursory review of this level of detail, it seems like you just need some system of generating a Query(), or even just a select(), programatically. You've got naming schemes and consistent inputs/outputs, I would not even use ORM mappings for this beyond just simple, traditional maps to each table.

> What I want to be able to do is to define 2 models that are both
> backed by the same table, but have declared different sets of Columns
> so that the reflection code I've written doesn't have to be hacked to
> shreds to exclude the type column when I don't want it;

this is likely a mis-use of ORM mappings. an ORM mapping is generally one model for one Table, and that's it, with some very occasional use cases where you might be selecting from more than one table at once.

If you want to select subsets of columns to select from, that's what session.query() or select() does.

You need to write code that does this:

def my_model_one():
return query(*getattr(MyModel, colname) for colname in get_colnames_for_model_one())

def my_model_two():
return query(*getattr(MyModel, colname) for colname in get_colnames_for_model_two())

I wouldn't be trying to define ORM mappings that define particular subqueries, basically.

> Correct me if I'm wrong, but I believe the methods you described above
> will all inherit the same column definitions for the table. However,
> that's exactly what I don't want to happen! I poked around in the
> source and noticed there is a table argument called keep_existing
> which I tried using, but its behavior seemed inconsistent and/or buggy
> with respect to what columns were reported via the reflection I was
> doing. Maybe this isn't the use case it was intended for?

the MetaData object is intended to store collections of Table objects, keyed to their name.

If you wanted to have two ORM mappings, each one mapped to a subset of columns on one Table, you need to map to a Table object:

t = Table('mytable', metadata,

Column('x', ...),
Column('y', ...),
Column('z', ...),
Column('q', ...),
Column('p', ...),
Column('r', ...)
)

class ClassOne(Base):
__table__ = t

__mapper_args__ = {'include_columns':['x', 'y', 'z'])

class ClassTwo(Base):
__table__ = t

__mapper_args__ = {'include_columns':['q', 'p', 'r'])

Check the docs for mapper() for more detail on include_columns, I'm working from memory on how it works (I've never needed it).

http://docs.sqlalchemy.org/en/latest/orm/mapper_config.html?highlight=mapper#sqlalchemy.orm.mapper

>
> Its possible that I can declare 2 different Base classes with two
> separate calls to declarative_base(),

not sure why you need two Base classes. I'd be looking to keep the usage of the ORM traditional and simple. If you're doing reporting and need different reports to generate different sets of columns, I'd be working with Query() or select() directly to pass in those lists, and backing it would be custom code on your end that keeps it organized somehow.

Rob Crowell

unread,

Nov 21, 2012, 3:11:38 PM11/21/12

to sqlalchemy

Thanks so much! Your pointers were exactly what I needed,
specifically the bit which led me to discover exclude_properties.
I'll leave my working code here in case it ever helps anybody else
out:

from sqlalchemy import Column, Date, Enum, Integer, String, Table,
create_engine
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import ColumnProperty, sessionmaker
from sqlalchemy.orm.mapper import class_mapper

engine = create_engine('mysql://user:password@localhost/

issue_tracker')
Session = sessionmaker(bind=engine)

Base = declarative_base()

tables = {'issue_type':
Table('issue_type', Base.metadata,
Column('id', Integer, primary_key=True),

Column('type', Enum('Bug', 'Feature',

'Improvement', 'One-Time Task')),
Column('created', Date),
Column('num_visits', Integer)),

'issue_type_label':
Table('issue_type_label', Base.metadata,
Column('id', Integer, primary_key=True),

Column('type', Enum('Bug', 'Feature',

'Improvement', 'One-Time Task')),
Column('label_id', String),
Column('created', Date),
Column('num_visits', Integer))}

def get_columns(model):
return [x.key for x in class_mapper(model).iterate_properties if
isinstance(x, ColumnProperty)]

class IssueType(Base):
__table__ = tables['issue_type']

class IssueLabel(Base):
__table__ = tables['issue_type_label']
__mapper_args__ = {'exclude_properties': ['type']}

class IssueTypeLabel(Base):
__table__ = tables['issue_type_label']

print "issue type:", get_columns(IssueType)
print "issue label:", get_columns(IssueLabel)
print "issue type label:", get_columns(IssueTypeLabel)

This code correctly prints the following:
issue type: ['id', 'type', 'created', 'num_visits']
issue label: ['id', 'label_id', 'created', 'num_visits']
issue type label: ['id', 'type', 'label_id', 'created',
'num_visits']

> http://docs.sqlalchemy.org/en/latest/orm/mapper_config.html?highlight...

>
>
>
> > Its possible that I can declare 2 different Base classes with two
> > separate calls to declarative_base(),
>

> not ...
>
> read more »

Reply all

Reply to author

Forward