Sharing metadata between alembic and SQLAlchemy

758 views
Skip to first unread message

Ian McCullough

unread,
Aug 1, 2015, 6:59:41 PM8/1/15
to sqlalchemy-alembic
I've been getting up to speed with SQLAlchemy and alembic and having a great time of it. This is some great stuff!

One thing that's been confounding me is this: My Alembic schema revisions are 'authoritative' for my metadata (i.e. I've started from scratch using alembic to build the schema from nothing), yet it doesn't appear that the metadata that exists in my alembic scripts can be leveraged by my models in my main app. So far, I've been maintaining effectively two versions of the metadata, one in the form of the "flattened projection" of my alembic schema rev scripts, and another in my application models scripts. I understand that there are some facilities to auto-re-generate the metadata from the RDBMS on the application side, but that seems potentially "lossy", or at least subject to the whims of whatever DBAPI provider I'm using.

Is there a way to pull this flattened projection of metadata out of alembic and into my app's models at runtime? (i.e. import alembic, read the version from the live DB, then build the metadata by playing the upgrade scripts forward, not against the database, but against a metadata instance?) It seems like a fool's errand to try to keep my app models in sync with the flattened projection of the schema revisions by hand. My assumption is that I'm missing something super-obvious here.

Thanks,
Ian


Mike Bayer

unread,
Aug 3, 2015, 9:36:19 AM8/3/15
to sqlalchem...@googlegroups.com
There's a lot to say on this issue.    The idea of the migrations themselves driving the metadata would be nice, and I think that the recent rewrite of django south does something somewhat analogous to this.

Also, the reorganization of Alembic operations into objects that you can hang any number of operations upon, this is due for Alembic 0.8, is also something that we'd leverage to make this kind of thing happen.

However, where it gets thorny is that neither Alembic migrations nor SQLAlchemy metadata are supersets of each other.     That is, there's many things in SQLAlchemy metadata that currently has no formal representation in Alembic operations, the primary example is that of Python-side default operations on columns, which have no relevance to emitting ALTER statements.    On the Alembic side, a set of migrations that takes care to only use the official Alembic op.* operations, and also does not use "execute()" for any of them, is the only way to guarantee that each change is potentially representable in SQLAlchemy metadata.    A migration that emits op.execute("ALTER TABLE foo ADD COLUMN xyz") wouldn't work here, and a migration that has lots of conditionals and runtime logic might also not be useful in this way.

SQLAlchemy Table and Column objects also do not support removal from their parents.   This would be necessary in order to represent "drop" mutations as targeted at a SQLAlchemy metadata structure.  This is something that could be implemented but SQLA has always made a point to not get into this because it's very complicated to handle "cascades" of dependent objects, whether that means raising an error or mimicking other functionality of a real "drop" operation.

Finally, the whole workflow of Alembic up til now has been organized for the opposite workflow; the MetaData is the authoritative model, and migrations are generated using tools like autogenerate to minimize how much they need to be coded by hand (and there is of course no issue of maintaining the same code in two places because migration scripts are a fixed point in time once created).    This model is practical for many reasons; all of the above reasons, plus that it is compatible with applications that weren't using migrations up to point or were using some other system, plus that it allows easy pruning of old migrations.






Thanks,
Ian


--
You received this message because you are subscribed to the Google Groups "sqlalchemy-alembic" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy-alem...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Ian McCullough

unread,
Aug 8, 2015, 3:50:41 PM8/8/15
to sqlalchemy-alembic

So I've continued thinking about this a bit, mainly because trying to keep the database schema and object model in sync is effectively a kind of a "double entry" system, and I have been burned so. many. times. in the past by (N>1)-entry systems where the different "truths" get out of sync, leading to subtle, but week-ruining bugs. In the abstract, my goal is to have a single source of truth and to have that truth 'flow' through the system (in my case, I'm trying to use the alembic version scripts, but I really don't care what form the truth takes as long as there's only one of them.) 

Michael said: 
However, where it gets thorny is that neither Alembic migrations nor SQLAlchemy metadata are supersets of each other

My approach here has been to "change" this by leveraging the ability to specify arbitrary, ancillary data in the `info` property of `SchemaItem` to store any/all additional information necessary to re-create the models (i.e. making alembic migrations a superset of the SQLAlchemy metadata that I need, for my specific purposes) Then, once I've captured that metadata, I push it up into the database (in my case, I'm using Postgres's COMMENTs feature, but in other DBs it could just be an arbitrary table), which can then be used by another, build-time tool to generate my models.py file from the database. Ideally, there would be a way to cut out the database so that you could just run the alembic scripts and get out the appropriate metadata, but going through the database is an acceptable detour for me (especially now that I've wrapped up the necessary fixtures to spool up an ephemeral, local Postgres installation.) Considering that you could conceivably even ship pickled python object graphs to this kind of "sidecar" storage, I suspect that there probably is enough flexibility to capture all the SQLAlchemy metadata, if you carried this to its logical conclusion.

I realize this approach is probably too "restrictive" to be useful in the general case, but I figured I'd share my thoughts and hacks anyway. Conceptually, I think the best thing, long term, would be for alembic to be able to handle both DB schema migration and object model migration, and to serve as a single source of truth for systems willing to operate completely within alembic's purview. Based on your comments about mine being an unusual workflow, I assume many folks won't want to work this way, but for those who consider single sources of truth to be critical, I think it could be a win.

If anyone is interested in talking more about this, let me know.

Regards,
Ian

Mike Bayer

unread,
Aug 8, 2015, 4:12:13 PM8/8/15
to sqlalchem...@googlegroups.com


On 8/8/15 3:50 PM, Ian McCullough wrote:

So I've continued thinking about this a bit, mainly because trying to keep the database schema and object model in sync is effectively a kind of a "double entry" system, and I have been burned so. many. times. in the past by (N>1)-entry systems where the different "truths" get out of sync, leading to subtle, but week-ruining bugs.

of course, this is something we all share.  I only ask that you consider that the Alembic migration files and other similar concepts might be thought of differently, as they intended to be an *immutable* truth; that is, a particular migration file can never be incorrect, because it represents a point in time that has already passed.   But I guess that's why you favor the migration files as that source of truth.



In the abstract, my goal is to have a single source of truth and to have that truth 'flow' through the system (in my case, I'm trying to use the alembic version scripts, but I really don't care what form the truth takes as long as there's only one of them.) 

Michael said: 
However, where it gets thorny is that neither Alembic migrations nor SQLAlchemy metadata are supersets of each other

My approach here has been to "change" this by leveraging the ability to specify arbitrary, ancillary data in the `info` property of `SchemaItem` to store any/all additional information necessary to re-create the models (i.e. making alembic migrations a superset of the SQLAlchemy metadata that I need, for my specific purposes) Then, once I've captured that metadata, I push it up into the database (in my case, I'm using Postgres's COMMENTs feature, but in other DBs it could just be an arbitrary table), which can then be used by another, build-time tool to generate my models.py file from the database. Ideally, there would be a way to cut out the database so that you could just run the alembic scripts and get out the appropriate metadata, but going through the database is an acceptable detour for me (especially now that I've wrapped up the necessary fixtures to spool up an ephemeral, local Postgres installation.) Considering that you could conceivably even ship pickled python object graphs to this kind of "sidecar" storage, I suspect that there probably is enough flexibility to capture all the SQLAlchemy metadata, if you carried this to its logical conclusion.

I realize this approach is probably too "restrictive" to be useful in the general case, but I figured I'd share my thoughts and hacks anyway. Conceptually, I think the best thing, long term, would be for alembic to be able to handle both DB schema migration and object model migration, and to serve as a single source of truth for systems willing to operate completely within alembic's purview. Based on your comments about mine being an unusual workflow, I assume many folks won't want to work this way, but for those who consider single sources of truth to be critical, I think it could be a win.

If anyone is interested in talking more about this, let me know.

a technique you might want to consider, which we do in Openstack, is that as part of our CI suite we actually run the migrations fully and then do a diff of the schema as developed against the metadata.   This function is available from Alembic using the compare_metadata function: http://alembic.readthedocs.org/en/rel_0_7/api.html#alembic.autogenerate.compare_metadata.   Basically you want it to return nothing, meaning nothing has changed.

This allows the alembic migrations and the current metadata to remain separately, but they are checked for accuracy against each other as part of the test suite.   It is of course only as accurate as SQLAlchemy reflection goes (so things like check constraints, triggers, etc. that aren't currently reflected aren't included, unless you augment the comparison with these), but as far as those aspects of the model that make a difference from the Python side, e.g. names of tables and columns and datatypes, those would all be affirmed.





Regards,
Ian



On Monday, August 3, 2015 at 9:36:19 AM UTC-4, Michael Bayer wrote:


On 8/1/15 6:59 PM, Ian McCullough wrote:
I've been getting up to speed with SQLAlchemy and alembic and having a great time of it. This is some great stuff!

One thing that's been confounding me is this: My Alembic schema revisions are 'authoritative' for my metadata (i.e. I've started from scratch using alembic to build the schema from nothing), yet it doesn't appear that the metadata that exists in my alembic scripts can be leveraged by my models in my main app. So far, I've been maintaining effectively two versions of the metadata, one in the form of the "flattened projection" of my alembic schema rev scripts, and another in my application models scripts. I understand that there are some facilities to auto-re-generate the metadata from the RDBMS on the application side, but that seems potentially "lossy", or at least subject to the whims of whatever DBAPI provider I'm using.

Is there a way to pull this flattened projection of metadata out of alembic and into my app's models at runtime? (i.e. import alembic, read the version from the live DB, then build the metadata by playing the upgrade scripts forward, not against the database, but against a metadata instance?) It seems like a fool's errand to try to keep my app models in sync with the flattened projection of the schema revisions by hand. My assumption is that I'm missing something super-obvious here.

There's a lot to say on this issue.    The idea of the migrations themselves driving the metadata would be nice, and I think that the recent rewrite of django south does something somewhat analogous to this.

Also, the reorganization of Alembic operations into objects that you can hang any number of operations upon, this is due for Alembic 0.8, is also something that we'd leverage to make this kind of thing happen.

However, where it gets thorny is that neither Alembic migrations nor SQLAlchemy metadata are supersets of each other.     That is, there's many things in SQLAlchemy metadata that currently has no formal representation in Alembic operations, the primary example is that of Python-side default operations on columns, which have no relevance to emitting ALTER statements.    On the Alembic side, a set of migrations that takes care to only use the official Alembic op.* operations, and also does not use "execute()" for any of them, is the only way to guarantee that each change is potentially representable in SQLAlchemy metadata.    A migration that emits op.execute("ALTER TABLE foo ADD COLUMN xyz") wouldn't work here, and a migration that has lots of conditionals and runtime logic might also not be useful in this way.

SQLAlchemy Table and Column objects also do not support removal from their parents.   This would be necessary in order to represent "drop" mutations as targeted at a SQLAlchemy metadata structure.  This is something that could be implemented but SQLA has always made a point to not get into this because it's very complicated to handle "cascades" of dependent objects, whether that means raising an error or mimicking other functionality of a real "drop" operation.

Finally, the whole workflow of Alembic up til now has been organized for the opposite workflow; the MetaData is the authoritative model, and migrations are generated using tools like autogenerate to minimize how much they need to be coded by hand (and there is of course no issue of maintaining the same code in two places because migration scripts are a fixed point in time once created).    This model is practical for many reasons; all of the above reasons, plus that it is compatible with applications that weren't using migrations up to point or were using some other system, plus that it allows easy pruning of old migrations.






Thanks,
Ian



Ian McCullough

unread,
Aug 8, 2015, 6:12:04 PM8/8/15
to sqlalchem...@googlegroups.com
Thanks for the tip! After playing with it for a bit, I think `compare_metadata` might be just enough to satisfy my paranoia for the time being. (And it's a lot less work!)

Thanks again!
Ian


--
You received this message because you are subscribed to a topic in the Google Groups "sqlalchemy-alembic" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/sqlalchemy-alembic/-c02w37LbtM/unsubscribe.
To unsubscribe from this group and all its topics, send an email to sqlalchemy-alem...@googlegroups.com.
Reply all
Reply to author
Forward
0 new messages