I am migrating an Access database to MSSQL server. I have coded up basic
Access support for SQLAlchemy, which I'll be commiting to the 0.4 branch
shortly.
Using autocode, I now have a set of SQLAlchemy table definitions. My
question is: what's a good way to move all the data across?
Thanks for any help,
Paul
I'm in the same process, and very interested in the answer !
One idea I had is to define an universal dump format, (based for
example on pytables), which could be used to backup and restore datas
from/to various databases.
If this way is a good one and a good implementation proposed, it could
become an interesting addon to SA.
Regards,
Christophe
2007/7/24, Paul Johnston <p...@pajhome.org.uk>:
With all due respect for the brilliance of SQLAlchemy.. it is not an
operating system and not a database.
Maybe I am missing the point here, but, in the two hours it took to get
a reply to the OP, one could have output from one db (to csv) and
import to the other one. Another alternative is to actually use the db
functionality. MSAccess and MSSQL both start with 'MS'. If I am not
mistaken, those are interoperable. One can set up a 'link' and
transfer the data, no? It has been years, but I remember doing that.
Moving data in/out of disparate data sources is a pretty common data
wharehouse process. And if they are large datasets, native 'bulk'
transfers are fastest. All of which can be automated... without
intervention from the application layer. (was that blasphemy?)
--
michael
2007/7/24, michael <li...@genoverly.net>:
I see no blasphemy, but that does not exactly address my personal
issue (which is not exactly the same as Paul it seems).
I will have, in a few months, clients running my software on mysql,
other on mssql. I want to have a common backup format, so I can
restore any backup on any supported db, and all that should be doable
by a "Toto User" (toto=dummy).
Having it in the application layer allow me do to that. And since I
hate to re-do things, my approach will most probably to use SA to dump
and restore the datas, even if it's a bit slow (the databases are not
very big), and it will always be possible to optimize the process by
doing db-specific operations.
The pytables format looks attractive for this use because it's fast,
scalable, compresses the datas, and have generic viewer.
My experience with SA is still a bit light, and I might say stupid
things without seeing it, but that's the general idea.
My two cents :-)
Regards,
Christophe
Hello Christophe,
If I am reading you intent (and forgive me if I am wrong), you would
like to have one backup/restore routine for the same schema but on a
variety of databases. And, you want the user to be responsible for
doing both actions.
Your references to "universal dump format" and "common backup format"
point sqarely at a flat file; which can sometimes be
[tab|space|comma|pipe|etc] delimited. Since you said that your
databases will be small, text is truly universal. It does not require
python or any scripting language and can easily be viewable in any text
editor. As a side note, I have found pipe delimited to be the least
troublesome when moving between databases that have user-input data.
The first thing that comes to mind is that there should probably
already be an automated backup scheduled. Users will forget and when a
restore is needed, they will want fresh data.
Each database has their own export-to-text command and each has their
own import-from-text command; and related syntax. This can be triggered
with cron for all unixes; On Windows, MSSQL has its own scheduler. (Of
course, one could also have a button do the same thing, in *addition* to
routine backups.)
Are you asking for features in SA to handle scheduled backups and
user initiated restores?
--
michael
2007/7/24, michael <li...@genoverly.net>:
> Hello Christophe,
>
> If I am reading you intent (and forgive me if I am wrong), you would
> like to have one backup/restore routine for the same schema but on a
> variety of databases.
Yes
> And, you want the user to be responsible for
> doing both actions.
More precisely, I want to be able to provide a tool the user which
allow him to do it with minimal knowledge.
>
> Your references to "universal dump format" and "common backup format"
> point sqarely at a flat file; which can sometimes be
> [tab|space|comma|pipe|etc] delimited. Since you said that your
> databases will be small, text is truly universal. It does not require
> python or any scripting language and can easily be viewable in any text
> editor. As a side note, I have found pipe delimited to be the least
> troublesome when moving between databases that have user-input data.
Agree on the flat file. A Zipped Archive of flat files would do the
trick I guess.
What I found interesting with pytable is the high accessibility of the
data, even in a compressed file. This would make easier the
implementation of a partial restore. But this is just an idea I'm
throwing, maybe I'm totally wrong on what pytables would bring.
> The first thing that comes to mind is that there should probably
> already be an automated backup scheduled. Users will forget and when a
> restore is needed, they will want fresh data.
>
> Each database has their own export-to-text command and each has their
> own import-from-text command; and related syntax. This can be triggered
> with cron for all unixes; On Windows, MSSQL has its own scheduler. (Of
> course, one could also have a button do the same thing, in *addition* to
> routine backups.)
I agree, but that's almost orthogonal with what I have in mind. What
you suggest is what should be done by a descent admin or consultant
who is installing the software.
I have two real-life cases which are not covered by this approach :
1) My 'packaging team' prepares demonstrations databases as well as
ready-to-start ones. I want those databases to be accessible on our
website, easily choosable (idealy without having to know which
database server is used), and easily usable from within my app.
2) A user call the support team, and the problem seems to come from
bizarre datas in the database. I want the user (if he has the rights)
to be able to save his datas, attach them to an email and send it to
the support (or post it on a support form on the website).
> Are you asking for features in SA to handle scheduled backups and
> user initiated restores?
No. What I'd like to do is to provide facilities for dumping and
restoring a complete db. And I'd like this to work if to db with
different engines have (almost) the same schema.
This is a bit the same with migrate which is providing engine-agnostic
api for shema modifications.
One could easily, for e.g. build a sadump tool on top of this tool, or
a database manager tool for a particular application (this is what I
intend to do).
All this is 'just a thought' for now, but motivated by an actual future need.
Christophe
>I'm in the same process, and very interested in the answer !
>
>
I've found what I think is the best solution, and it sounds quite
obvious thinking about it. Define the table, do a select on the old
database and an insert on the new database. This leverages all
SQLAlchemy's cleverness in converting types, etc. and keeps the ORM out
of the picture. The code I'm using is:
model = __import__(sys.argv[1])
if sys.argv[2] == 'copy':
seng = create_engine(sys.argv[3])
deng = create_engine(sys.argv[4])
for tbl in model.metadata.table_iterator():
deng.execute(tbl.insert(), [dict(x) for x in
seng.execute(tbl.select())])
All it relies on is that you call your MetaData "metadata". At the
moment, it chokes on names that contain a dash (or maybe it's ones that
use key=), but I'm sure I can workaround that. When I'm done, I'll put
this up as a recipe.
Paul
heh, adding this raw-data-copy to the autoload.py
$ python autoload.py postgres://me@srvr/db1 | python - sqlite:///db2
which copyies the structure of input db1 database into the output db2.
makes quite a database-copier/migrator...
svil
heh, adding this raw-data-copy to the autoload.py
makes quite a database-copier/migrator...
yes, it can be done there.
Although to me it's all plain sqlalchemy (not using anyhing out of
dbcook layers), and just extensions of MetaData: _reflect(),
_copy_data(), _diff(). Once these things go into some MetaData
methods in a way or another, it can go back into UsageRecipes as it
would be all 10-15 lines of code.
are u sure about the rewrite() part?
x will contain both .key and .name with same values on them...
wouldn't this be working equivalent? it also copes with empty tables..
-----------
def copy( metadata, src_engine, dst_engine, echo =False ):
for tbl in metadata.table_iterator():
if echo: print tbl
data = [ dict( (col.key, x[ col.name]) for col in tbl.c)
for x in src_engine.execute( tbl.select()) ]
if data:
dst_engine.execute( tbl.insert(), data)
if __name__ == '__main__':
arg_model = sys.argv[1]
model = import__( arg_model )
copy( model.metadata,
src_engine= create_engine( sys.argv[2]),
dst_engine= create_engine( sys.argv[3]),
)
http://dbcook.svn.sourceforge.net/viewvc/*checkout*/dbcook/trunk/dbcook/misc/copydata.py
there is also copyall.py (at same place) that does all in once
(autoload + copydata):
$ python copyall.py postgres://me@srvr/db1 sqlite:///db2
===========
i'm Wondering if all the unicode strings (at least table/column names)
should be converted back into plain strings as they have been before
autoload reflecting them from database.
svil
Also, in the same direction, complete copy of some database seems to
consist of (at least) 3 stages:
1 recreate/remove the old one if it exists
2 copy structure
3 copy data
--------
3 is your copy loop, which is independent of db type;
2 is the autoload, which does depend on db-dialect;
i hope most of it can move into the SA-dialects themselves.
how about 1? it also does depend badly on db-dialect.
see
http://dbcook.svn.sourceforge.net/viewvc/*checkout*/dbcook/trunk/dbcook/usage/sa_engine_defs.py
e.g. for sqlite, recreate mean 'rm -f file';
for postgres it means 'dropdb url' + 'createdb url';
for mssql it is even more tricky...
btw: why is the 'text_as_varchar=1' considered only if it is in url
(see mssql.py create_connect_args()) and not if it is in the
connect_args argument to create_engine()?
svil
>wouldn't this be working equivalent? it also copes with empty tables..
>
>
Yep, I like yours better. Thanks!
>i'm Wondering if all the unicode strings (at least table/column names)
>should be converted back into plain strings as they have been before
>autoload reflecting them from database.
>
>
Well, some databases do support unicode identifier names, some don't.
I'd say don't do any conversion for now; if someone is faced with
migrating tables with unicode names to a database that doesn't support
it, well, let them sweat that one :-)
Paul
> 1 recreate/remove the old one if it exists
>how about 1? it also does depend badly on db-dialect.
>
>
Personally, I'd do this step manually. Not sure I quite trust a script
that has the potential to "drop database"
>btw: why is the 'text_as_varchar=1' considered only if it is in url
>(see mssql.py create_connect_args()) and not if it is in the
>connect_args argument to create_engine()?
>
>
Fair question, and the short answer is because that's all I needed. We
did have a discussion about unifying create_engine args and URL params,
but it turns out there are a few gotchas. We could allow specification
in both places - is this important to you?
Paul
hmmm. i'll probably put that as some option, as my model's
table/column names are never unicode, but once they go into db, all
gets unicoded there. so i'm not sure if after some migration the
model will match the database...
e.g. sqlite turns everything into unicode and hence does not care if
unicode or not - so it's all ok there; but once db-structure migrates
into something that _does_ care about unicode or not, trouble
trouble..
is this unicodeing everything a sqlite specific behaviour?
de-unicoding it then should go into sqlite-dialect specific
reflection then.