'Too many connections': Where am I handling sessions incorrectly?

Jeff

unread,

Sep 15, 2011, 6:39:59 AM9/15/11

to sqlalchemy

SQLAlchemy version 0.7.1
MySQL Ver 14.12 Distrib 5.0.77

We have a series of tables with one to many connections:
A -> B -> C -> D->E etc.

Script1 has a big for loop over several hundred/thousand values. In
each loop iteration it goes through A,B,C, makes some new entries,
then calls Function1 (passing some ids from A,B,C).
Function1 makes a new entry in D, then calls Function2 (passing ids
from A,B,C,D).
Function2 makes modification to the entry in D and makes several new
entries in E.

Not far into the loop we get an error saying the MySQL database has
run out of connections:
(Operational Error) (1040, 'Too many connections')

I suspect this is due to me mishandling sessions, and I've included
the likely relevant snippets of code below. Am I doing something
obviously incorrect? The goal in the near future is to have Function1
submit Function2 as a job to a cluster with PBS, so Function2 needs to
independently access the database. Thanks!

In Script1:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import database_classes as dc
database_url = ....

engine = create_engine(database_url, echo=False)
dc.Base.metadata.create_all(engine)
Session = sessionmaker(bind=engine)
....
for fname in dirList:
session = Session()
....
Function1(database_url,....)

In Function1:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import database_classes as dc
engine = create_engine(database_url, echo=False)
Session = sessionmaker(bind=engine)
session = Session()
.....
Function2(database_url, ....)
....
session.close()

In Function2:
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
import database_classes as dc
engine = create_engine(database_url, echo=False)
Session = sessionmaker(bind=engine)
session = Session()
....
session.close()

Database classes: https://github.com/jeffalstott/research_code/blob/master/database_classes.py

Script1:
https://github.com/jeffalstott/research_code/blob/master/MRC_avalanche_analyses.py

Functions:
https://github.com/jeffalstott/research_code/blob/master/criticality.py
Function1 is "avalanche_analyses"
Function2 is "avalanche_statistics"

Michael Bayer

unread,

Sep 15, 2011, 9:30:23 AM9/15/11

to sqlal...@googlegroups.com

On Sep 15, 2011, at 6:39 AM, Jeff wrote:

> SQLAlchemy version 0.7.1
> MySQL Ver 14.12 Distrib 5.0.77
>
> We have a series of tables with one to many connections:
> A -> B -> C -> D->E etc.
>
> Script1 has a big for loop over several hundred/thousand values. In
> each loop iteration it goes through A,B,C, makes some new entries,
> then calls Function1 (passing some ids from A,B,C).
> Function1 makes a new entry in D, then calls Function2 (passing ids
> from A,B,C,D).
> Function2 makes modification to the entry in D and makes several new
> entries in E.
>
> Not far into the loop we get an error saying the MySQL database has
> run out of connections:
> (Operational Error) (1040, 'Too many connections')

Your scripts call create_engine() essentially in a loop. This isn't really the appropriate usage of create_engine(). The Engine does not represent a single database connection; is an expensive-to-create registry of information about your database and DBAPI as well as a connection pool (see http://www.sqlalchemy.org/docs/core/engines.html for an overview). Dereferencing it will eventually close out connections which were open from it, but not immediately as the garbage collector thread typically needs to find those unreachable objects.

The appropriate scope for Engine is once per url per application, at the module level. That means if your application has only one URL, there should be exactly one call to create_engine() in just one place, and the resulting Engine should be placed in a single module made available for other modules to import. Otherwise you're working against the intended design of create_engine().

With that, all functions that call upon the Engine will be calling upon the underlying connection pool so that the total number of connections used by the application can be managed.

The guidelines for Session are less stringent, though again generally a single Session is shared among all functions and methods for a particular operation. I didn't read your script carefully but typically a single Session is passed along all functions that need to operate on data, so that all those functions can share the same pool of objects which all interact cleanly, not to mention all within one transaction. The script as it is now creates many new transactions.

If you really do want to use a Session inside a function you can forego the usage of sessionmaker as again that function is just a helper for declaring module-level patterns. The Session constructor can be called directly, i.e. session = Session(engine).

Jeff

unread,

Sep 15, 2011, 10:20:27 AM9/15/11

to sqlalchemy

Thanks for the guidance!

In a situation which a script submits a swarm of independent jobs for
a cluster, and then finishes before some/all of those jobs have
started running, each job will need to create the engine, yes? Or is
there a better way to do it?

On Sep 15, 2:30 pm, Michael Bayer <mike...@zzzcomputing.com> wrote:
> On Sep 15, 2011, at 6:39 AM, Jeff wrote:
>
>
>
>
>
>
>
>
>
> > SQLAlchemy version 0.7.1
> > MySQL Ver 14.12 Distrib 5.0.77
>
> > We have a series of tables with one to many connections:
> > A -> B -> C -> D->E etc.
>
> > Script1 has a big for loop over several hundred/thousand values. In
> > each loop iteration it goes through A,B,C, makes some new entries,
> > then calls Function1 (passing some ids from A,B,C).
> > Function1 makes a new entry in D, then calls Function2 (passing ids
> > from A,B,C,D).
> > Function2 makes modification to the entry in D and makes several new
> > entries in E.
>
> > Not far into the loop we get an error saying the MySQL database has
> > run out of connections:
> > (Operational Error) (1040, 'Too many connections')
>

> Your scripts call create_engine() essentially in a loop. This isn't really the appropriate usage of create_engine(). The Engine does not represent a single database connection; is an expensive-to-create registry of information about your database and DBAPI as well as a connection pool (seehttp://www.sqlalchemy.org/docs/core/engines.htmlfor an overview). Dereferencing it will eventually close out connections which were open from it, but not immediately as the garbage collector thread typically needs to find those unreachable objects.

Michael Bayer

unread,

Sep 15, 2011, 11:31:55 AM9/15/11

to sqlal...@googlegroups.com

On Sep 15, 2011, at 10:20 AM, Jeff wrote:

> Thanks for the guidance!
>
> In a situation which a script submits a swarm of independent jobs for
> a cluster, and then finishes before some/all of those jobs have
> started running, each job will need to create the engine, yes? Or is
> there a better way to do it?

Anytime you start a new process, that's where create_engine() would need to be called once.

When I use the Python multiprocessing library for example, I have a function init_for_subprocess() which I can pass as the "on init" function to the various multiprocessing functions, or if I'm writing a function that I know is the starting point of the process boundary I'd put init_for_subprocess() at the top. init_for_subprocess() then ultimately does the create_engine() and establishes it as a module level global in the appropriate place).

> --
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To post to this group, send email to sqlal...@googlegroups.com.
> To unsubscribe from this group, send email to sqlalchemy+...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/sqlalchemy?hl=en.
>

Jeff

unread,

Sep 15, 2011, 11:41:27 AM9/15/11

to sqlalchemy

Great. I implemented your suggestions and it was (predictably) a
significant speedup. Cheers!

> >> Your scripts call create_engine() essentially in a loop. This isn't really the appropriate usage of create_engine(). The Engine does not represent a single database connection; is an expensive-to-create registry of information about your database and DBAPI as well as a connection pool (seehttp://www.sqlalchemy.org/docs/core/engines.htmlforan overview). Dereferencing it will eventually close out connections which were open from it, but not immediately as the garbage collector thread typically needs to find those unreachable objects.

Reply all

Reply to author

Forward