Connection pool takes a long time to handle a closed connection

1,893 views
Skip to first unread message

Sebastian Meinhardt

unread,
Oct 25, 2018, 7:50:41 AM10/25/18
to sqlalchemy
Our database environment drops idle connections after an hour. My application uses the SQLAlchemy connection pool to deal with this. The problem is: When a connection is closed, it takes the pool a real long time (15-20 minutes) to recover. This happens with both strategies, pool_pre_ping and pool_recylce. Apparently, the pool hangs when trying to operate on the already dead connections.

Here is an excerpt from the logs. Look at the timestamps!

With pool_pre_ping:

2018-10-25 12:30:38,866:DEBUG:Connection <cx_Oracle.Connection to USER@DATABASE> checked out from pool 2018-10-25 12:30:38,868:DEBUG:Pool pre-ping on connection <cx_Oracle.Connection to USER@DATABASE> 2018-10-25 12:46:26,624:DEBUG:Pool pre-ping on connection <cx_Oracle.Connection to USER@DATABASE> failed, will invalidate pool 2018-10-25 12:46:26,626:INFO:Disconnection detected on checkout, invalidating all pooled connections prior to current timestamp (reason: InvalidatePoolError())

With pool_recycle:

2018-10-25 12:40:51,869:INFO:Connection <cx_Oracle.Connection to USER@DATABASE> exceeded timeout; recycling
2018-10-25 12:40:51,871:DEBUG:Closing connection <cx_Oracle.Connection to USER@DATABASE>
2018-10-25 12:56:39,493:ERROR:Exception closing connection <cx_Oracle.Connection to USER@DATABASE>
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 314, in _close_connection
    self._dialect.do_close(connection)
  File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 465, in do_close
    dbapi_connection.close()
cx_Oracle.OperationalError: ORA-03113: end-of-file on communication channel
Process ID: 0
Session ID: 540 Serial number: 46277
2018-10-25 12:56:39,674:DEBUG:Created new connection <cx_Oracle.Connection to USER@DATABASE>


As this is an interactive application, I cannot let the user wait for 16 minutes. While I could implement an application side timeout that creates a new pool every 59 minutes, it feels like that would be missing the point. Is there an SQLAlchemy solution to my problem? Decrease the timeout on ping/close? Recycle connections in a background threads? Am I doing something fundamentally wrong here?

Mike Bayer

unread,
Oct 25, 2018, 8:48:42 AM10/25/18
to sqlal...@googlegroups.com
On Thu, Oct 25, 2018 at 7:50 AM Sebastian Meinhardt
<seb.me...@gmail.com> wrote:
>
> Our database environment drops idle connections after an hour. My application uses the SQLAlchemy connection pool to deal with this. The problem is: When a connection is closed, it takes the pool a real long time (15-20 minutes) to recover. This happens with both strategies, pool_pre_ping and pool_recylce. Apparently, the pool hangs when trying to operate on the already dead connections.

I would classify this as an issue with cx_Oracle. Can we post an
issue over at https://github.com/oracle/python-cx_Oracle/issues and
ask for mitigation strategies? In particular, the .close() method
taking 15 minutes is entirely not acceptable. Also, the strategy
by which how the database environment is dropping these connections is
likely very important, is the Oracle database itself doing so, or is
there a proxy layer of some kind.

Unfortunately on my end I don't have any good answers, pool_recycle is
the more effective system but if even .close() is causing a problem,
this implies you'd need to build a background thread that is actively
closing out connections from the pool long before they are allowed to
sit in the pool for too long. that would definitely solve the
problem in the absense of any configurational / environmental reasons
that .close() itself can't complete quickly.


>
> Here is an excerpt from the logs. Look at the timestamps!
>
> With pool_pre_ping:
>
> 2018-10-25 12:30:38,866:DEBUG:Connection <cx_Oracle.Connection to USER@DATABASE> checked out from pool
> 2018-10-25 12:30:38,868:DEBUG:Pool pre-ping on connection <cx_Oracle.Connection to USER@DATABASE>
> 2018-10-25 12:46:26,624:DEBUG:Pool pre-ping on connection <cx_Oracle.Connection to USER@DATABASE> failed, will invalidate pool
> 2018-10-25 12:46:26,626:INFO:Disconnection detected on checkout, invalidating all pooled connections prior to current timestamp (reason: InvalidatePoolError())
>
>
> With pool_recycle:
>
> 2018-10-25 12:40:51,869:INFO:Connection <cx_Oracle.Connection to USER@DATABASE> exceeded timeout; recycling
> 2018-10-25 12:40:51,871:DEBUG:Closing connection <cx_Oracle.Connection to USER@DATABASE>
> 2018-10-25 12:56:39,493:ERROR:Exception closing connection <cx_Oracle.Connection to USER@DATABASE>
> Traceback (most recent call last):
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/pool.py", line 314, in _close_connection
> self._dialect.do_close(connection)
> File "/usr/local/lib/python3.6/site-packages/sqlalchemy/engine/default.py", line 465, in do_close
> dbapi_connection.close()
> cx_Oracle.OperationalError: ORA-03113: end-of-file on communication channel
> Process ID: 0
> Session ID: 540 Serial number: 46277
> 2018-10-25 12:56:39,674:DEBUG:Created new connection <cx_Oracle.Connection to USER@DATABASE>
>
>
> As this is an interactive application, I cannot let the user wait for 16 minutes. While I could implement an application side timeout that creates a new pool every 59 minutes, it feels like that would be missing the point. Is there an SQLAlchemy solution to my problem? Decrease the timeout on ping/close? Recycle connections in a background threads? Am I doing something fundamentally wrong here?
>
> --
> SQLAlchemy -
> The Python SQL Toolkit and Object Relational Mapper
>
> http://www.sqlalchemy.org/
>
> To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
> ---
> You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.
> To post to this group, send email to sqlal...@googlegroups.com.
> Visit this group at https://groups.google.com/group/sqlalchemy.
> For more options, visit https://groups.google.com/d/optout.

Simon King

unread,
Oct 25, 2018, 8:49:17 AM10/25/18
to sqlal...@googlegroups.com
Here's how SQLAlchemy pings the database:

https://bitbucket.org/zzzeek/sqlalchemy/src/580d114405f3d200682f3a219dcb9004c81300c7/lib/sqlalchemy/pool/base.py?at=master&fileviewer=file-view-default#base.py-767

if pool._pre_ping:
if fairy._echo:
pool.logger.debug(
"Pool pre-ping on connection %s",
fairy.connection)

result = pool._dialect.do_ping(fairy.connection)
# <-- this is apparently taking 16 minutes
if not result:
if fairy._echo:
pool.logger.debug(
"Pool pre-ping on connection %s failed, "
"will invalidate pool", fairy.connection)
raise exc.InvalidatePoolError()

https://bitbucket.org/zzzeek/sqlalchemy/src/580d114405f3d200682f3a219dcb9004c81300c7/lib/sqlalchemy/engine/default.py?at=master&fileviewer=file-view-default#default.py-471

def do_ping(self, dbapi_connection):
cursor = None
try:
cursor = dbapi_connection.cursor()
try:
cursor.execute(self._dialect_specific_select_one)
finally:
cursor.close()
except self.dbapi.Error as err:
if self.is_disconnect(err, dbapi_connection, cursor):
return False
else:
raise
else:
return True

So the first question is: do you see this 15-20 minute timeout with a
plain DBAPI connection? From a python prompt, create a cx_Oracle
connection, then kill the connection from the DB side (or wait an
hour). Then try to create a cursor and execute a trivial query.

If it takes a very long time to return, you need a DBAPI-level fix.
The cx_Oracle docs suggest there is a configurable timeout parameter
(https://cx-oracle.readthedocs.io/en/latest/connection.html#Connection.callTimeout).

If it fails immediately, repeat the experiment using an SQLAlchemy engine.

Let us know how you get on,

Simon

Mike Bayer

unread,
Oct 25, 2018, 9:06:40 AM10/25/18
to sqlal...@googlegroups.com
per those docs callTimeout is new as of oracle client 18c , that's
extremely new. i run everything from the 11g rpm still.

Sebastian Meinhardt

unread,
Oct 25, 2018, 9:26:03 AM10/25/18
to sqlalchemy
Thank you for the advice. I will surely try out all the suggestions to isolate the exact cause of the problem. Unfortunately, today of all days is the annual maintenance and the system is down. I will try to reproduce the problem on a similar system.

According to our database administrator, idle connections are actively killed by some job running on the database. Thus, it is not a classic timeout or something between client and database, but the connection gets dropped by the database itself.

Mike Bayer

unread,
Oct 25, 2018, 9:48:24 AM10/25/18
to sqlal...@googlegroups.com
OK, so I believe your script should be using KILL SESSION, I've just
tried this and the effects are immediate:

>>> from sqlalchemy import *
>>> e = create_engine("oracle://scott:tiger@oracle1120")
>>> conn = e.connect()
>>> conn.scalar("select 1 from dual")
1

then in the DB console as SYS:

SQL> alter system kill session '118,59061';

then:

>>> conn.scalar("select 1 from dual")
Traceback (most recent call last):
File "/home/classic/dev/sqlalchemy/lib/sqlalchemy/engine/base.py",
line 1193, in _execute_context
context)
File "/home/classic/dev/sqlalchemy/lib/sqlalchemy/engine/default.py",
line 509, in do_execute
cursor.execute(statement, parameters)
cx_Oracle.DatabaseError: ORA-00028: your session has been killed

.. etc


the response was immediate and the client was told exactly what had happened.

Sebastian Meinhardt

unread,
Oct 25, 2018, 10:33:56 AM10/25/18
to sqlalchemy
On Thursday, October 25, 2018 at 3:48:24 PM UTC+2, Mike Bayer wrote:

OK, so I believe your script should be using KILL SESSION, I've just
tried this and the effects are immediate:

Yes, my own tests that I did in the meantime confirm this. As the original system is still down for maintenance, I used a similar database, connected, killed the session on the database side, and did not have any problems. Unfortunately, I had to use slightly different versions of Oracle database and cx_Oracle than the original system, thus I will need some further tests. But for now, it sure looks like the problem is how the database system kills the connection, and is not related to cx_Oracle or SQLAlchemy.
Reply all
Reply to author
Forward
0 new messages