Temporary mapped classes for testing

472 views
Skip to first unread message

Iwan Vosloo

unread,
Jun 26, 2012, 7:59:38 AM6/26/12
to sqlal...@googlegroups.com
Hi there,

When we test, we sometimes create temporary classes just for a specific test (or group of tests).  You might want to inherit from a class that is part of production code, for example and override a method for testing purposes, etc. When these temporary classes are declarative sqlalchemy mapped classes, we run into all sorts of problems.

What we do is something along the lines of:
 1) create the tables for the actual model
 2) commit
 3) define the test classes and create their tables
 4) run the test logic
 5) roll back

Steps 1-2 happen one for a test run with many tests.
Steps 3-5 represents one test, so they can be repeated.

However, it seems that some cleanup is needed of mappers / the metadata / classes that inherit from Base, etc AFTER 5.
The sqlalchemy Table objects and mappers are now out of sync with the database (which has been rolled back).

What can one do to get all the sqlalchemy related stuff (metadata, mappers, tables and how these were set on existing classes) back in sync with the rolled-back database schema?  We know the metadata can be cleared (and reflected); we know we can clear mappers and recreate them... and we've tried several of these things to no avail. The example given below, however should work without too much clearing, surely?

We always get the first test to work, but some subsequent tests tend to fail. Of course, there are a large number of scenarios - test classes with relationships to 'real' classes, inheritance, etc. Those scenarios can complicate things. But here is the simplest basic example we'd like to get working:

from pkg_resources import require

require('sqlalchemy==0.7.7')

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import *

connection_uri = 'sqlite:///:memory:'

engine = create_engine(connection_uri, echo=True)
Session = sessionmaker(autoflush=True, autocommit=False, bind=engine)
Base = declarative_base()
session = Session()

 
class RealClass(Base):
    __tablename__ = 'normal_class'
    id = Column(Integer, primary_key=True)
    attr =  Column(String(50))

Base.metadata.create_all(bind=session.connection())   # Creating the real classes (once per run)
session.commit()                                                       # Committed also...


def test():
    class TestClass(Base):
        __tablename__ = 'simple_test_class'
        attr =  Column(String(50))
        id = Column(Integer, primary_key=True)

        __table_args__ = {
            'extend_existing': True
            }

    Base.metadata.create_all(bind=session.connection())  # To create the test classes

    session.add(RealClass(attr='sdf'))
    session.flush()
    session.add(TestClass(attr='sdf'))
    session.flush()

    print session.query(RealClass).all()
    print session.query(TestClass).all()


try:
    test()  # Works
finally:
    session.rollback()

try:
    test()  # Breaks the second time round
finally:
    session.rollback()


Regards
- Iwan




Michael Bayer

unread,
Jun 26, 2012, 1:13:27 PM6/26/12
to sqlal...@googlegroups.com
On Jun 26, 2012, at 7:59 AM, Iwan Vosloo wrote:

Hi there,

When we test, we sometimes create temporary classes just for a specific test (or group of tests).  You might want to inherit from a class that is part of production code, for example and override a method for testing purposes, etc. When these temporary classes are declarative sqlalchemy mapped classes, we run into all sorts of problems.

What we do is something along the lines of:
 1) create the tables for the actual model
 2) commit
 3) define the test classes and create their tables
 4) run the test logic
 5) roll back

Steps 1-2 happen one for a test run with many tests.
Steps 3-5 represents one test, so they can be repeated.

However, it seems that some cleanup is needed of mappers / the metadata / classes that inherit from Base, etc AFTER 5.
The sqlalchemy Table objects and mappers are now out of sync with the database (which has been rolled back).

What can one do to get all the sqlalchemy related stuff (metadata, mappers, tables and how these were set on existing classes) back in sync with the rolled-back database schema?  We know the metadata can be cleared (and reflected); we know we can clear mappers and recreate them... and we've tried several of these things to no avail. The example given below, however should work without too much clearing, surely?

I do have some simple solutions to this issue, however since I suspect there may be more complexity to your use case that isn't illustrated in your example, I'll give you the whole background.

When you deal with a declarative Base, there are two registries at work.  One is the MetaData object associated with it, which has a dictionary called "tables" stuck on it.   The other is the "_decl_class_registry", which is a dictionary of class names to mapped classes.   

The purpose of these registries is primarily to facilitate configuration based on string names.   For example, if you say ForeignKey("mytable.id"), that string "mytable" needs to be resolved to a Table object; if you say relationship("SomeClass"), the string "SomeClass" needs to be resolved to a class object.   This is the biggest reason Base and MetaData need to follow the registry pattern.  

The MetaData object in particular also has several other purposes for this registry.   It serves as a "singleton" registry so that there is only one instance of a particular Table for a particular name.  This is important because SQLAlchemy's expression language uses the identity of a particular clause element to indicate it's lexical identity within a statement.  If you had two distinct Table objects, both with the same name, SQLAlchemy would treat them as distinct "FROM" clauses among other things leading to incorrect SQL.  So the singleton registry here encourages that multiple Table objects of the same name aren't created.

MetaData is also the "collection of record" for a set of Table objects, such as when you say "metadata.reflect()", MetaData is where the Table objects are stored.  If they weren't there, they'd fall out of scope (unless metadata.reflect() were modified to actually return the collection of Table objects).

So MetaData's registry is strong-referencing, meaning entries do not get removed when they are fully dereferenced elsewhere.  A Table can be removed from the MetaData using remove(), however this has caveats which I'll detail in a moment.

The "_decl_class_registry" on declarative Base is also not a weak-referencing dictionary - so when you make a subclass of Base, and then lose all references to that class, it is still present in _decl_class_registry.  To that extent, Base is not automatically managing of temporary classes.   It is possible to replace _decl_class_registry with a WeakValueDictionary, which you could pass into declarative_base() via the  "class_registry" argument.    I would say that perhaps we should make _decl_class_registry weak referencing by default in any case, as it is only used at configuration time so is not a performance concern.  I've created #2524 to think about that and it's likely this will be in 0.8 as it seems to work fine.

Outside of using declarative with the ORM, there is also a registry of underlying mapper() objects to their mapped classes, which is the original "classical" system of mapping classes to tables.  This registry is in fact weak referencing.    So the original intent behind the concept of "mapping a class to a table" did include that this is a one-way setup, where when the class falls out of scope, it's mapping would also (with similar caveats to those of Tables).   SQLAlchemy's test suite creates new classes, maps them, and then just forgets about them regularly.  This is supported.

So what is the big caveat of removing individual tables, individual mappers from a full configuration of tables and classes ?  The answer is that the table/class/mapper in question may be referred to by other table/class/mappers - and there is no functionality in place to surgically sever these connections.   The connections that may be established to a particular mapped class are:

1. the Table may be referred to by the ForeignKey(Constraint) of one or more remote tables.
2. the Table may be referred to by any number of mappings.
3. the mapper() may be referred to by the relationship()/backref of one or more remote mappers.
4. the mapper() may be a superclass of another mapping, in which case there's obviously a high degree of dependency, or
5. the mapper() may be a subclass of another mapping, in which case there are still connections - a superclass mapping needs to be aware of all it's descendants in many cases, such as when organizing a flush as well as using "polymorphic" queries.   These linkages are not weak-referencing at the moment though perhaps they could be.

So when any of #1-#5 above are present, simply dropping a class from _decl_class_registry or removing a table from MetaData via remove() is not enough.  The mapper/class/Table is still referred to by other linkages.  The nature of these linkages is obviously very involved, and it's way out of scope for SQLAlchemy to be able to surgically remove individual linkages from a graph of mapped classes and tables.   This is why SQLAlchemy in general does not support the explicit removal of individual mappings and greatly downplays metadata.remove() - the problem *may* be much more complicated.

The good news about #1-#5 is that none of these scenarios will be present *if you didn't create this scenario yourself* - SQLAlchemy itself never does any of these things without being told.  If you made a Table that nobody refers to by foreign key, and you're going to drop all the mappings that point to it, then that's fine - remove it and it's gone.  If you made a mapper() on a class that is not referred to by any relationship(), and is not present within a joined/single table inheritance hierarchy, also fine - when you lose all references to that class, the mapper() is also dropped, and if you remove the class from Base._decl_class_registry, it is also gone.

What follows from the assumption that none of #1-#5 are present, is that since it's established that your mappers/classes/tables aren't referred to by other mappers/classes/tables that you aren't dropping, there is no need to even use the same MetaData object and Base as registries.   I'll illustrate using this below.

However, you stated that "You might want to inherit from a class that is part of production code".   Assuming this "production class" is mapped, that changes everything.   When you make a subclass mapping of an existing mapper, you are making not just a subclass but a new mapper() as well, and you're really changing the configuration significantly, including that your class is going to register itself with the base.    A subclass mapping is strongly referenced by its base mapper at the moment along many paths, as the mapper() is big on memoizing things so that it doesn't have to recalculate expensive things more than once.  I've added http://www.sqlalchemy.org/trac/ticket/2526 to think about this, but it would require lots of weak references, and weak references are expensive to use.

The rationale here seems to be that you'd like to override methods on your class to do different things for tests, and I'd submit that this is an architectural problem where your classes are not testable.   I'd seek here to use compositional patterns, rather than inheriting patterns, in order to do things like injecting mock data and objects into production objects.

Anyway, some more background on your example.    There's actually a bug here, which is that extend_existing in conjunction with a Table that has already been used is causing a failure in locating the primary key after an INSERT, which is because there's a memoized attribute _autoincrement_column that isn't being reset (see http://www.sqlalchemy.org/trac/ticket/2525).

It was not anticipated that extend_existing would be used for a Table that's already been used - the purpose of extend_existing is so that you can create a Table with some columns while at the same time reflecting other columns from the database.   Meaning, extend_existing is only to help create a fully formed Table, not to re-build in place a Table that is already completed.  But in this case you obviously just saw the error message and did what it said.   Usually SQLAlchemy's approach of "suggesting fixes" helps more than not but this would be an exception to that.

Anyway, the test as is can work fine, regardless of this error, by getting rid of the unnecessary extend_existing and just removing that table at the end:

    Base.metadata.remove(TestClass.__table__)

simple enough.  But as I mentioned before, if your classes/tables have no dependencies, just use a new MetaData.   Using a new Base makes it easier, but let's say you want to share the Base - you might have noticed your test also emits this warning "The classname 'TestClass' is already in the registry of this declarative base", so we'll use a WeakValueDictionary to nix that.   We'll use two MetaData objects as well:

from sqlalchemy import *
from sqlalchemy.orm import *
from sqlalchemy.ext.declarative import *
import weakref

connection_uri = 'sqlite:///:memory:'

engine = create_engine(connection_uri, echo=True)
Session = sessionmaker(autoflush=True, autocommit=False, bind=engine)
Base = declarative_base(class_registry=weakref.WeakValueDictionary())
session = Session()

 
class RealClass(Base):
    __tablename__ = 'normal_class'
    id = Column(Integer, primary_key=True)
    attr =  Column(String(50))

Base.metadata.create_all(bind=session.connection())   # Creating the real classes (once per run)
# there is no need to "commit" here.  SQLA DDL does this automatically.

def test():
    class TestBase(Base):
        # abstract Base with its own MetaData.  Example + docs:
        metadata = MetaData()
        __abstract__ = True

    class TestClass(TestBase):
        __tablename__ = 'simple_test_class'
        attr =  Column(String(50))
        id = Column(Integer, primary_key=True)

    TestBase.metadata.create_all(bind=session.connection())

    session.add(RealClass(attr='sdf'))
    session.flush()
    session.add(TestClass(attr='sdf'))
    session.flush()

    print session.query(RealClass).all()
    print session.query(TestClass).all()

for i in xrange(3):
    try:
        test()
    finally:
        session.rollback()










--
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To view this discussion on the web visit https://groups.google.com/d/msg/sqlalchemy/-/snYsPmsRjOEJ.
To post to this group, send email to sqlal...@googlegroups.com.
To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.

Iwan Vosloo

unread,
Jun 27, 2012, 4:17:07 AM6/27/12
to sqlal...@googlegroups.com
Michael,

Thank you, that explanation of the background really helps.

I have two simple questions (while the rest of what you said is still being digested):


On Tuesday, June 26, 2012 7:13:27 PM UTC+2, Michael Bayer wrote:
However, you stated that "You might want to inherit from a class that is part of production code".   Assuming this "production class" is mapped, that changes everything.   When you make a subclass mapping of an existing mapper, you are making not just a subclass but a new mapper() as well, and you're really changing the configuration significantly, including that your class is going to register itself with the base.    A subclass mapping is strongly referenced by its base mapper at the moment along many paths, as the mapper() is big on memoizing things so that it doesn't have to recalculate expensive things more than once.  I've added http://www.sqlalchemy.org/trac/ticket/2526 to think about this, but it would require lots of weak references, and weak references are expensive to use.

Without knowing the intimate details, I can appreciate the nature of the problem you'd have to solve there.

Isn't it also possible to "reset" the whole configuration of tables/mappers/metadata and other registries entirely, by clearing all of them out completely, and rebuilding it all from scratch?  This could happen both before and after a test run. 

I was thinking along the lines of metadata.clear(); clear_mappers() which would clear the metadata and the registry of mappers. But I am not sure how to clear all classes that inherit from Base, or how to reconstruct everything properly?

Base.metadata.create_all(bind=session.connection())   # Creating the real classes (once per run)
# there is no need to "commit" here.  SQLA DDL does this automatically.


I understood that create_all(bind=engine) would result in an automatic commit here, but that using create_all(bind=session.connection()) does not, deferring the responsibility of transaction handling to whoever is in control of the current transaction. Is this understanding then incorrect?

Regards
- Iwan

Michael Bayer

unread,
Jun 27, 2012, 7:59:38 AM6/27/12
to sqlal...@googlegroups.com
On Jun 27, 2012, at 4:17 AM, Iwan Vosloo wrote:

Michael,

Thank you, that explanation of the background really helps.

I have two simple questions (while the rest of what you said is still being digested):

On Tuesday, June 26, 2012 7:13:27 PM UTC+2, Michael Bayer wrote:
However, you stated that "You might want to inherit from a class that is part of production code".   Assuming this "production class" is mapped, that changes everything.   When you make a subclass mapping of an existing mapper, you are making not just a subclass but a new mapper() as well, and you're really changing the configuration significantly, including that your class is going to register itself with the base.    A subclass mapping is strongly referenced by its base mapper at the moment along many paths, as the mapper() is big on memoizing things so that it doesn't have to recalculate expensive things more than once.  I've added http://www.sqlalchemy.org/trac/ticket/2526 to think about this, but it would require lots of weak references, and weak references are expensive to use.

Without knowing the intimate details, I can appreciate the nature of the problem you'd have to solve there.

Isn't it also possible to "reset" the whole configuration of tables/mappers/metadata and other registries entirely, by clearing all of them out completely, and rebuilding it all from scratch?  This could happen both before and after a test run. 

Absolutely.   If you're using Declarative, the easiest way to do this is to just throw away your Base class and use a new one.    The clear_mappers() call will also remove all mappings, but you'd still have those Base classes with __table__ attributes stuck on them, and also clear_mappers() removes all mappings throughout the entire application space, whereas discarding a particular Base allows individual sets of mappers to be discarded without affecting other areas of the application.    clear_mappers() is really more aligned towards SQLAlchemy's "classical" mapping system for this reason and is mostly for the benefit of SQLAlchemy's own unit tests.



I was thinking along the lines of metadata.clear(); clear_mappers() which would clear the metadata and the registry of mappers. But I am not sure how to clear all classes that inherit from Base, or how to reconstruct everything properly?

This approach doesn't really work, versus just creating everything from Base again.   If you were to say clear_mappers() and metadata.clear(), the declarative classes that you still have around wouldn't really be useful.  For example if you had class MyClass(Base) and it had something like "id = Column(Integer, primary_key=True)" on it, that Column object is moved to be part of the Table object for MyClass at class creation time - "id" is replaced by an instrumented attribute that maps the "id" column internally.   Calling metadata.clear() and clear_mappers(), you'd see that your MyClass object has nothing on it anymore - the "id" attribute would be gone.    

In "classical" SQLAlchemy that used explicit mapper() and Table calls, while day-to-day configuration was much more verbose, it at least had the behavior that you could re-run mapper() against all your classes if you had organized things that way.    With declarative it's a bit more awkward as we typically like to have class definitions at the module level, and not as the result of callables that can be called again.    

There's other ways to slice this, at the end of the day if you are erasing all Table objects, the key is that you can produce a function that will re-create Table objects.   If you did something like this:

class MyClass(Base):
    __table__ = my_table_registry.get('mytable')

then supposing "my_table_registry" were some object that could be re-instantiated and produce all the Table objects again, you could say something like mapper(MyClass, my_table_registry.get('mytable'), **MyClass.__mapper_args__) and re-establish MyClass (or maybe you'd want to use declarative.instrument_declarative(MyClass)), basically you need to get at those Table objects again.

All of this points towards not needing to recreate the entirety of mappings.   Really if you just need subclassing where the subclasses can be thrown away, the patch + usage in www.sqlalchemy.org/trac/ticket/2526 achieves this.   in fact, even without the patch, if you had a subclass of a mapped class and wanted to discard it, you could say:

MyBaseClass.__mapper__._inheriting_mappers.remove(MySubClass.__mapper__)
del MyBaseClass.__mapper__.polymorphic_map["mysubclass"]
MyBaseClass.__mapper__._expire_memoizations()

it's that _expire_memoizations() part that is difficult to make automatic, as it refers to an array of attributes on the mapper() object that are lazily evaluated and then memoized.   If you aren't using polymorphic loading for these mappers, then even the above steps might not even be needed - the additional records in these collections might just be ignorable and would only be present while tests are running, not while the actual app runs.


Base.metadata.create_all(bind=session.connection())   # Creating the real classes (once per run)
# there is no need to "commit" here.  SQLA DDL does this automatically.


I understood that create_all(bind=engine) would result in an automatic commit here, but that using create_all(bind=session.connection()) does not, deferring the responsibility of transaction handling to whoever is in control of the current transaction. Is this understanding then incorrect?

ah you're right, I misread that sorry.


Iwan Vosloo

unread,
Jun 28, 2012, 8:56:58 AM6/28/12
to sqlal...@googlegroups.com
On Wednesday, June 27, 2012 1:59:38 PM UTC+2, Michael Bayer wrote:

All of this points towards not needing to recreate the entirety of mappings.   Really if you just need subclassing where the subclasses can be thrown away, the patch + usage in www.sqlalchemy.org/trac/ticket/2526 achieves this.   in fact, even without the patch, if you had a subclass of a mapped class and wanted to discard it, you could say:

MyBaseClass.__mapper__._inheriting_mappers.remove(MySubClass.__mapper__)
del MyBaseClass.__mapper__.polymorphic_map["mysubclass"]
MyBaseClass.__mapper__._expire_memoizations()

it's that _expire_memoizations() part that is difficult to make automatic, as it refers to an array of attributes on the mapper() object that are lazily evaluated and then memoized.   If you aren't using polymorphic loading for these mappers, then even the above steps might not even be needed - the additional records in these collections might just be ignorable and would only be present while tests are running, not while the actual app runs.


Ok, I understand why you favoured that solution. We have implemented what you said above and it works fine. Will keep an eye on #2526 to see what you decide about that...

Thank you
-i
Reply all
Reply to author
Forward
0 new messages