Using SA to move data between databases

672 views
Skip to first unread message

Paul Johnston

unread,
Jul 24, 2007, 6:59:47 AM7/24/07
to sqlal...@googlegroups.com
Hi,

I am migrating an Access database to MSSQL server. I have coded up basic
Access support for SQLAlchemy, which I'll be commiting to the 0.4 branch
shortly.

Using autocode, I now have a set of SQLAlchemy table definitions. My
question is: what's a good way to move all the data across?

Thanks for any help,

Paul

Christophe de VIENNE

unread,
Jul 24, 2007, 9:14:52 AM7/24/07
to sqlal...@googlegroups.com
Hi,

I'm in the same process, and very interested in the answer !

One idea I had is to define an universal dump format, (based for
example on pytables), which could be used to backup and restore datas
from/to various databases.
If this way is a good one and a good implementation proposed, it could
become an interesting addon to SA.

Regards,

Christophe

2007/7/24, Paul Johnston <p...@pajhome.org.uk>:

michael

unread,
Jul 24, 2007, 9:38:38 AM7/24/07
to sqlal...@googlegroups.com

With all due respect for the brilliance of SQLAlchemy.. it is not an
operating system and not a database.

Maybe I am missing the point here, but, in the two hours it took to get
a reply to the OP, one could have output from one db (to csv) and
import to the other one. Another alternative is to actually use the db
functionality. MSAccess and MSSQL both start with 'MS'. If I am not
mistaken, those are interoperable. One can set up a 'link' and
transfer the data, no? It has been years, but I remember doing that.

Moving data in/out of disparate data sources is a pretty common data
wharehouse process. And if they are large datasets, native 'bulk'
transfers are fastest. All of which can be automated... without
intervention from the application layer. (was that blasphemy?)


--

michael


Christophe de VIENNE

unread,
Jul 24, 2007, 11:52:55 AM7/24/07
to sqlal...@googlegroups.com
Hi

2007/7/24, michael <li...@genoverly.net>:

I see no blasphemy, but that does not exactly address my personal
issue (which is not exactly the same as Paul it seems).
I will have, in a few months, clients running my software on mysql,
other on mssql. I want to have a common backup format, so I can
restore any backup on any supported db, and all that should be doable
by a "Toto User" (toto=dummy).
Having it in the application layer allow me do to that. And since I
hate to re-do things, my approach will most probably to use SA to dump
and restore the datas, even if it's a bit slow (the databases are not
very big), and it will always be possible to optimize the process by
doing db-specific operations.
The pytables format looks attractive for this use because it's fast,
scalable, compresses the datas, and have generic viewer.

My experience with SA is still a bit light, and I might say stupid
things without seeing it, but that's the general idea.

My two cents :-)

Regards,

Christophe

michael

unread,
Jul 24, 2007, 12:41:09 PM7/24/07
to sqlal...@googlegroups.com
On Tue, 24 Jul 2007 17:52:55 +0200

Hello Christophe,

If I am reading you intent (and forgive me if I am wrong), you would
like to have one backup/restore routine for the same schema but on a
variety of databases. And, you want the user to be responsible for
doing both actions.

Your references to "universal dump format" and "common backup format"
point sqarely at a flat file; which can sometimes be
[tab|space|comma|pipe|etc] delimited. Since you said that your
databases will be small, text is truly universal. It does not require
python or any scripting language and can easily be viewable in any text
editor. As a side note, I have found pipe delimited to be the least
troublesome when moving between databases that have user-input data.

The first thing that comes to mind is that there should probably
already be an automated backup scheduled. Users will forget and when a
restore is needed, they will want fresh data.

Each database has their own export-to-text command and each has their
own import-from-text command; and related syntax. This can be triggered
with cron for all unixes; On Windows, MSSQL has its own scheduler. (Of
course, one could also have a button do the same thing, in *addition* to
routine backups.)

Are you asking for features in SA to handle scheduled backups and
user initiated restores?

--

michael


Christophe de VIENNE

unread,
Jul 24, 2007, 1:26:19 PM7/24/07
to sqlal...@googlegroups.com
Hi michael,

2007/7/24, michael <li...@genoverly.net>:


> Hello Christophe,
>
> If I am reading you intent (and forgive me if I am wrong), you would
> like to have one backup/restore routine for the same schema but on a
> variety of databases.

Yes

> And, you want the user to be responsible for
> doing both actions.

More precisely, I want to be able to provide a tool the user which
allow him to do it with minimal knowledge.

>
> Your references to "universal dump format" and "common backup format"
> point sqarely at a flat file; which can sometimes be
> [tab|space|comma|pipe|etc] delimited. Since you said that your
> databases will be small, text is truly universal. It does not require
> python or any scripting language and can easily be viewable in any text
> editor. As a side note, I have found pipe delimited to be the least
> troublesome when moving between databases that have user-input data.

Agree on the flat file. A Zipped Archive of flat files would do the
trick I guess.

What I found interesting with pytable is the high accessibility of the
data, even in a compressed file. This would make easier the
implementation of a partial restore. But this is just an idea I'm
throwing, maybe I'm totally wrong on what pytables would bring.

> The first thing that comes to mind is that there should probably
> already be an automated backup scheduled. Users will forget and when a
> restore is needed, they will want fresh data.
>
> Each database has their own export-to-text command and each has their
> own import-from-text command; and related syntax. This can be triggered
> with cron for all unixes; On Windows, MSSQL has its own scheduler. (Of
> course, one could also have a button do the same thing, in *addition* to
> routine backups.)

I agree, but that's almost orthogonal with what I have in mind. What
you suggest is what should be done by a descent admin or consultant
who is installing the software.

I have two real-life cases which are not covered by this approach :

1) My 'packaging team' prepares demonstrations databases as well as
ready-to-start ones. I want those databases to be accessible on our
website, easily choosable (idealy without having to know which
database server is used), and easily usable from within my app.

2) A user call the support team, and the problem seems to come from
bizarre datas in the database. I want the user (if he has the rights)
to be able to save his datas, attach them to an email and send it to
the support (or post it on a support form on the website).

> Are you asking for features in SA to handle scheduled backups and
> user initiated restores?

No. What I'd like to do is to provide facilities for dumping and
restoring a complete db. And I'd like this to work if to db with
different engines have (almost) the same schema.
This is a bit the same with migrate which is providing engine-agnostic
api for shema modifications.

One could easily, for e.g. build a sadump tool on top of this tool, or
a database manager tool for a particular application (this is what I
intend to do).

All this is 'just a thought' for now, but motivated by an actual future need.

Christophe

Paul Johnston

unread,
Aug 5, 2007, 7:09:45 PM8/5/07
to sqlal...@googlegroups.com
Hi,

>I'm in the same process, and very interested in the answer !
>
>

I've found what I think is the best solution, and it sounds quite
obvious thinking about it. Define the table, do a select on the old
database and an insert on the new database. This leverages all
SQLAlchemy's cleverness in converting types, etc. and keeps the ORM out
of the picture. The code I'm using is:

model = __import__(sys.argv[1])
if sys.argv[2] == 'copy':
seng = create_engine(sys.argv[3])
deng = create_engine(sys.argv[4])
for tbl in model.metadata.table_iterator():
deng.execute(tbl.insert(), [dict(x) for x in
seng.execute(tbl.select())])

All it relies on is that you call your MetaData "metadata". At the
moment, it chokes on names that contain a dash (or maybe it's ones that
use key=), but I'm sure I can workaround that. When I'm done, I'll put
this up as a recipe.

Paul

sdo...@sistechnology.com

unread,
Aug 7, 2007, 2:12:12 PM8/7/07
to sqlal...@googlegroups.com

heh, adding this raw-data-copy to the autoload.py
$ python autoload.py postgres://me@srvr/db1 | python - sqlite:///db2
which copyies the structure of input db1 database into the output db2.

makes quite a database-copier/migrator...

svil

Paul Johnston

unread,
Aug 8, 2007, 4:44:57 AM8/8/07
to sqlal...@googlegroups.com
Hi,

heh, adding this raw-data-copy to the autoload.py
makes quite a database-copier/migrator...

Yes indeed, I used this yesterday to migrate a legacy database, it was impressively quick and easy.

I can see we've got similar requirements in this area. Perhaps you and I could work together to package up some of these techniques in a more polished manner. Maybe dbcook is the place to do this; I've still not downloaded it.

Regards,

Paul

sdo...@sistechnology.com

unread,
Aug 9, 2007, 1:58:30 AM8/9/07
to sqlal...@googlegroups.com

yes, it can be done there.
Although to me it's all plain sqlalchemy (not using anyhing out of
dbcook layers), and just extensions of MetaData: _reflect(),
_copy_data(), _diff(). Once these things go into some MetaData
methods in a way or another, it can go back into UsageRecipes as it
would be all 10-15 lines of code.

Paul Johnston

unread,
Aug 9, 2007, 6:04:44 AM8/9/07
to sqlal...@googlegroups.com
Hi,

A little update; this code handles the case where columns have a key attribute:


model = __import__(sys.argv[1])
if sys.argv[2] == 'copy':
    seng = create_engine(sys.argv[3])
    deng = create_engine( sys.argv[4])
    for tbl in model.metadata.table_iterator():
        print tbl
        mismatch = {}
        for col in tbl.c:
            if col.key != col.name:
                mismatch[ col.name] = col.key
        def rewrite(x, mismatch):
            x = dict(x)
            for m in mismatch:
                x[mismatch[m]] = x[m]
            return x
        deng.execute(tbl.insert(), [rewrite(x, mismatch) for x in seng.execute(tbl.select())])

Paul

sdo...@sistechnology.com

unread,
Aug 10, 2007, 6:59:18 AM8/10/07
to sqlal...@googlegroups.com
On Thursday 09 August 2007 13:04:44 Paul Johnston wrote:
> Hi,
>
> A little update; this code handles the case where columns have a
> key attribute:
>
> model = __import__(sys.argv[1])
> if sys.argv[2] == 'copy':
> seng = create_engine(sys.argv[3])
> deng = create_engine(sys.argv[4])

> for tbl in model.metadata.table_iterator():
> print tbl
> mismatch = {}
> for col in tbl.c:
> if col.key != col.name:
> mismatch[col.name] = col.key
> def rewrite(x, mismatch):
> x = dict(x)
> for m in mismatch:
> x[mismatch[m]] = x[m]
> return x
> deng.execute(tbl.insert(), [rewrite(x, mismatch) for x in
> seng.execute(tbl.select())])

are u sure about the rewrite() part?
x will contain both .key and .name with same values on them...

wouldn't this be working equivalent? it also copes with empty tables..

-----------
def copy( metadata, src_engine, dst_engine, echo =False ):
for tbl in metadata.table_iterator():
if echo: print tbl
data = [ dict( (col.key, x[ col.name]) for col in tbl.c)
for x in src_engine.execute( tbl.select()) ]
if data:
dst_engine.execute( tbl.insert(), data)


if __name__ == '__main__':
arg_model = sys.argv[1]
model = import__( arg_model )
copy( model.metadata,
src_engine= create_engine( sys.argv[2]),
dst_engine= create_engine( sys.argv[3]),
)


http://dbcook.svn.sourceforge.net/viewvc/*checkout*/dbcook/trunk/dbcook/misc/copydata.py
there is also copyall.py (at same place) that does all in once
(autoload + copydata):
$ python copyall.py postgres://me@srvr/db1 sqlite:///db2

===========
i'm Wondering if all the unicode strings (at least table/column names)
should be converted back into plain strings as they have been before
autoload reflecting them from database.

svil

sdo...@sistechnology.com

unread,
Aug 10, 2007, 7:12:46 AM8/10/07
to sqlal...@googlegroups.com
On Thursday 09 August 2007 13:04:44 Paul Johnston wrote:
> Hi,
>
> A little update;

Also, in the same direction, complete copy of some database seems to
consist of (at least) 3 stages:
1 recreate/remove the old one if it exists
2 copy structure
3 copy data
--------

3 is your copy loop, which is independent of db type;
2 is the autoload, which does depend on db-dialect;
i hope most of it can move into the SA-dialects themselves.

how about 1? it also does depend badly on db-dialect.
see
http://dbcook.svn.sourceforge.net/viewvc/*checkout*/dbcook/trunk/dbcook/usage/sa_engine_defs.py

e.g. for sqlite, recreate mean 'rm -f file';
for postgres it means 'dropdb url' + 'createdb url';
for mssql it is even more tricky...

btw: why is the 'text_as_varchar=1' considered only if it is in url
(see mssql.py create_connect_args()) and not if it is in the
connect_args argument to create_engine()?


svil

Paul Johnston

unread,
Aug 10, 2007, 3:53:43 PM8/10/07
to sqlal...@googlegroups.com
Hi,

>wouldn't this be working equivalent? it also copes with empty tables..
>
>

Yep, I like yours better. Thanks!

>i'm Wondering if all the unicode strings (at least table/column names)
>should be converted back into plain strings as they have been before
>autoload reflecting them from database.
>
>

Well, some databases do support unicode identifier names, some don't.
I'd say don't do any conversion for now; if someone is faced with
migrating tables with unicode names to a database that doesn't support
it, well, let them sweat that one :-)

Paul

Paul Johnston

unread,
Aug 10, 2007, 4:02:46 PM8/10/07
to sqlal...@googlegroups.com
Hi,

> 1 recreate/remove the old one if it exists

>how about 1? it also does depend badly on db-dialect.
>
>

Personally, I'd do this step manually. Not sure I quite trust a script
that has the potential to "drop database"

>btw: why is the 'text_as_varchar=1' considered only if it is in url
>(see mssql.py create_connect_args()) and not if it is in the
>connect_args argument to create_engine()?
>
>

Fair question, and the short answer is because that's all I needed. We
did have a discussion about unifying create_engine args and URL params,
but it turns out there are a few gotchas. We could allow specification
in both places - is this important to you?

Paul


sdo...@sistechnology.com

unread,
Aug 11, 2007, 12:27:20 AM8/11/07
to sqlal...@googlegroups.com

> >btw: why is the 'text_as_varchar=1' considered only if it is in
> > url (see mssql.py create_connect_args()) and not if it is in the
> > connect_args argument to create_engine()?
>
> Fair question, and the short answer is because that's all I needed.
> We did have a discussion about unifying create_engine args and URL
> params, but it turns out there are a few gotchas. We could allow
> specification in both places - is this important to you?
not really important, we dig it out already, but it would be more sane
if power(connect_args) >= power(url_args), that is, connect_args is
the more general/powerful/ way, and url allows a subset (or all) of
connect_args items; and not vice versa -- connect_args is the
programatical way, so that should do _everything_..

sdo...@sistechnology.com

unread,
Aug 11, 2007, 12:47:49 AM8/11/07
to sqlal...@googlegroups.com
> >i'm Wondering if all the unicode strings (at least table/column
> > names) should be converted back into plain strings as they have
> > been before autoload reflecting them from database.
>
> Well, some databases do support unicode identifier names, some
> don't. I'd say don't do any conversion for now; if someone is faced
> with migrating tables with unicode names to a database that doesn't
> support it, well, let them sweat that one :-)

hmmm. i'll probably put that as some option, as my model's
table/column names are never unicode, but once they go into db, all
gets unicoded there. so i'm not sure if after some migration the
model will match the database...
e.g. sqlite turns everything into unicode and hence does not care if
unicode or not - so it's all ok there; but once db-structure migrates
into something that _does_ care about unicode or not, trouble
trouble..
is this unicodeing everything a sqlite specific behaviour?
de-unicoding it then should go into sqlite-dialect specific
reflection then.

Rick Morrison

unread,
Aug 13, 2007, 3:35:13 PM8/13/07
to sqlal...@googlegroups.com
It's an SQLite-ism. See the recent thread on the type system. I've had exactly this issue with SQLite vs. MSSQL.

Rick Morrison

unread,
Aug 13, 2007, 3:37:27 PM8/13/07
to sqlal...@googlegroups.com


> but it would be more sane
> if power(connect_args) >= power(url_args), that is, connect_args is
> the more general/powerful/ way, and url allows a subset (or all) of
> connect_args items; and not vice versa -- connect_args is the
> programatical way, so that should do _everything_..

If we ever get around to getting SA options in the db-url, this makes perfect sense, at least to me.

Reply all
Reply to author
Forward
0 new messages