On 02/16/2016 04:02 PM, Alex Hall wrote:
> Great; I was hoping you wouldn't say that. :) I've been through them
> many, many times, trying to get the connection working. I've gone from
> error to error, and thought I had it all working when I finally got
> the create_engine line to run with no problem. Apparently I'm not as
> far along as I thought I was. Back to the drawing board.
big iron databases like DB2 are seriously painful to work with. Oracle
is similar and MS SQL Server is barely much better. These DBs have
complicated, highly proprietary and arcane connectivity models so this
is kind of par for the course using a database like that.
>
> To keep things on topic for this thread, let me pose a general
> question. This database contains hundreds of tables, maybe thousands.
> Some are small, a few have thousands or millions of rows. Would
> automap choke on all that, or could it handle it? Will mapping all
> that fill up my ram, or have any other impact I should consider?
The rows inside the tables don't matter. Reflecting thousands of
tables is definitely not a quick process, and the speed of the operation
can be hindered further by the responsiveness of the target database's
information schema views. Reflecting tables on a platform like Oracle
for example incurs a half dozen queries per table for example which
don't run too quickly, and for hundreds of tables you could be looking
at startup times at least in the tens of seconds. You'd want to do
some benching against DB2 to see how well the reflection queries
perform; note these queries are part of the DB2 driver itself and were
written by the IBM folks in this case.
Additionally, reflecting tables means we're building up Table / Column
structures in memory, which in most cases is not such a large memory
investment; however if you truly have thousands of tables, and these are
big legacy-style tables that themselves have hundreds of columns in some
cases, this will produce a significant memory footprint. Not
necessarily unworkable, but for the Python process to build itself up to
a very large size itself adds latency.
Depending on what you are trying to do, you'd probably want to look into
using automap and/or reflection for only the subset of tables that you
actually need; look at the "only" param
http://docs.sqlalchemy.org/en/rel_1_0/core/metadata.html?highlight=reflect#sqlalchemy.schema.MetaData.reflect.params.only
for that. The columns reflected and/or mapped within each Table can be
limited also but you need a little more code for that. Work on getting
connected first :).