reflection taking a very long time?

1,662 views
Skip to first unread message

Alex Hall

unread,
Feb 16, 2016, 2:42:32 PM2/16/16
to sqlalchemy
Hi list,
Sorry for all the emails. I've determined that my script is actually
connecting to the 400's test database. At least, a print statement
placed just after the create_engine call is printing, so I guess we're
good there.

What I'm running into now is unresponsiveness when I try to reflect or
automap the database so I can do some basic queries. As soon as I call
either
automap.prepare(dbEngine, reflect=True)
or
metadata = MetaData()
metadata.reflect(dbEngine, only=['tableName'])

the script stops, hanging there with no response at all. The same
thing happened when I was trying to use an inspector on the engine.
It's an AS400, so taking a few seconds is a very long time for it.
This is being left to run for minutes and isn't doing anything. What,
if anything did I do wrong syntactically? Is there a better way to
check that my engine is actually ready to go, or some other check I
should be making? The full script, minus anything sensitive, is below.

import globals
import logging
from sqlalchemy import *
from sqlalchemy.engine import reflection
from sqlalchemy.ext.automap import automap_base
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker

logger = logging.getLogger(globals.appName+"."+__name__)

#set up the sqlalchemy objects
logger.debug("Creating database engine, base, and session.")
dbEngine = create_engine("ibm_db_sa://"+user+":"+pwd+"@"+server+":"+port+"/"+dbName)
print "connected"
Session = sessionmaker(bind = dbEngine) #note that's a capital s on Session
session = Session() #lowercase s
metadata = MetaData()
logger.debug("Creating session.")
print "Creating automap base"
base = automap_base()
print "setting up automapping"
#base.prepare(dbEngine, reflect=True)
metadata.reflect(dbEngine, only=['tableName'])

def getOrderByNumber(orderID):
orders = base.classes.ORHED
order = session.query(orders).filter(orders.OAORNO==orderID).first()
print order.OAORNO
#end def getOrderByNumber

getOrderByNumber("AA111")

Alex Hall

unread,
Feb 16, 2016, 2:59:24 PM2/16/16
to sqlalchemy
Upon re-reading some of the docs, I realized that my problem may still
be that initial connection. The create-engine doesn't actually
*connect* to the database, it just sets things up. That means that my
actual connection happens later, when I try to reflect or use automap.
When that happens, the connection starts up and the script hangs. I'm
no closer to solving this, and would love to hear anyone's thoughts,
but at least I know that my thought of blaming reflect/automap is
likely incorrect.

Mike Bayer

unread,
Feb 16, 2016, 3:28:28 PM2/16/16
to sqlal...@googlegroups.com
turning on echo=True inside create_engine() will show you what queries
are emitted as they occur so you can see which ones are taking long
and/or hanging.

Alex Hall

unread,
Feb 16, 2016, 3:37:16 PM2/16/16
to sqlal...@googlegroups.com
I tried that, hoping for a bit more insight into the problem. However,
unless I'm doing something wrong, I don't even get any queries. I get
my own print statements, then the script tries to connect and hangs.
I've added
dbEngine.connect()
just to be sure the problem is that first connection, and sure enough,
it hangs on that line.
> --
> You received this message because you are subscribed to the Google Groups
> "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.
>

Mike Bayer

unread,
Feb 16, 2016, 3:41:52 PM2/16/16
to sqlal...@googlegroups.com
well then you're just not making any database connection. you'd need
to check your database connectivity and your connection parameters.

Alex Hall

unread,
Feb 16, 2016, 4:02:11 PM2/16/16
to sqlal...@googlegroups.com
Great; I was hoping you wouldn't say that. :) I've been through them
many, many times, trying to get the connection working. I've gone from
error to error, and thought I had it all working when I finally got
the create_engine line to run with no problem. Apparently I'm not as
far along as I thought I was. Back to the drawing board.

To keep things on topic for this thread, let me pose a general
question. This database contains hundreds of tables, maybe thousands.
Some are small, a few have thousands or millions of rows. Would
automap choke on all that, or could it handle it? Will mapping all
that fill up my ram, or have any other impact I should consider?

Mike Bayer

unread,
Feb 16, 2016, 4:11:38 PM2/16/16
to sqlal...@googlegroups.com


On 02/16/2016 04:02 PM, Alex Hall wrote:
> Great; I was hoping you wouldn't say that. :) I've been through them
> many, many times, trying to get the connection working. I've gone from
> error to error, and thought I had it all working when I finally got
> the create_engine line to run with no problem. Apparently I'm not as
> far along as I thought I was. Back to the drawing board.

big iron databases like DB2 are seriously painful to work with. Oracle
is similar and MS SQL Server is barely much better. These DBs have
complicated, highly proprietary and arcane connectivity models so this
is kind of par for the course using a database like that.


>
> To keep things on topic for this thread, let me pose a general
> question. This database contains hundreds of tables, maybe thousands.
> Some are small, a few have thousands or millions of rows. Would
> automap choke on all that, or could it handle it? Will mapping all
> that fill up my ram, or have any other impact I should consider?

The rows inside the tables don't matter. Reflecting thousands of
tables is definitely not a quick process, and the speed of the operation
can be hindered further by the responsiveness of the target database's
information schema views. Reflecting tables on a platform like Oracle
for example incurs a half dozen queries per table for example which
don't run too quickly, and for hundreds of tables you could be looking
at startup times at least in the tens of seconds. You'd want to do
some benching against DB2 to see how well the reflection queries
perform; note these queries are part of the DB2 driver itself and were
written by the IBM folks in this case.

Additionally, reflecting tables means we're building up Table / Column
structures in memory, which in most cases is not such a large memory
investment; however if you truly have thousands of tables, and these are
big legacy-style tables that themselves have hundreds of columns in some
cases, this will produce a significant memory footprint. Not
necessarily unworkable, but for the Python process to build itself up to
a very large size itself adds latency.

Depending on what you are trying to do, you'd probably want to look into
using automap and/or reflection for only the subset of tables that you
actually need; look at the "only" param
http://docs.sqlalchemy.org/en/rel_1_0/core/metadata.html?highlight=reflect#sqlalchemy.schema.MetaData.reflect.params.only
for that. The columns reflected and/or mapped within each Table can be
limited also but you need a little more code for that. Work on getting
connected first :).

Jonathan Vanasco

unread,
Feb 16, 2016, 7:02:40 PM2/16/16
to sqlalchemy


On Tuesday, February 16, 2016 at 4:11:38 PM UTC-5, Mike Bayer wrote:
Depending on what you are trying to do, you'd probably want to look into
using automap and/or reflection for only the subset of tables that you
actually need; look at the "only" param
http://docs.sqlalchemy.org/en/rel_1_0/core/metadata.html?highlight=reflect#sqlalchemy.schema.MetaData.reflect.params.only
for that.  The columns reflected and/or mapped within each Table can be
limited also but you need a little more code for that.   Work on getting
connected first :).

You could also use sqlacodegen (https://pypi.python.org/pypi/sqlacodegen/) to only reflect once -- generating the python models for you.  Then you could split the python models into a series of files that you import as needed to cut-down on in-memory size.


Michal Petrucha

unread,
Feb 17, 2016, 4:57:56 AM2/17/16
to sqlal...@googlegroups.com
On Tue, Feb 16, 2016 at 04:02:08PM -0500, Alex Hall wrote:
> Great; I was hoping you wouldn't say that. :) I've been through them
> many, many times, trying to get the connection working. I've gone from
> error to error, and thought I had it all working when I finally got
> the create_engine line to run with no problem. Apparently I'm not as
> far along as I thought I was. Back to the drawing board.

Hi Alex,

I just want to reiterate my earlier suggestion – before you try to use
any SQLAlchemy machinery at all, first try to create a connection from
your Python runtime directly, using whichever DBAPI driver you want to
use (most likely you want to create a ibm_db connection object -- do
not import anything related to SQLAlchemy at this point, neither
sqlalchemy, nor ibm_db_sa), make sure you are able to execute SQL
statements using that, and only once you get this to work correctly,
try to figure out how to make it work with SQLAlchemy.

And, of course, you shouldn't try to get SQLAlchemy to work all at
once either. First, create an Engine with a connection string, but do
not try to run any fancy introspection or anything before you make
sure that you can execute raw SQL queries using that engine. After you
get *that* out of the way, you can start trying out more advanced
features of SQLAlchemy.

Baby steps, you know. Divide and conquer. Do not try to solve this
entire huge problem all at once. (And yes, as you are probably aware
by now, successfully connecting to an enterprise database server *is*
a huge problem.) That way you'll avoid false leads like this one.

Good luck!

Michal
signature.asc
Reply all
Reply to author
Forward
0 new messages