Help with configuring connect_timeout setting for when we a high available postgres database drops

2,531 views
Skip to first unread message

Karim Gillani

unread,
Feb 25, 2020, 11:49:19 AM2/25/20
to sqlalchemy
I am having an issue where when my HA postgres database drops and is automatically moved to a backup database, the timeout doesn't seem to be working.

My config.py shows the following:

SQLALCHEMY_DATABASE_URI = '{engine}://{user}:{password}@{host}:{port}/{name}'.format(
engine=DB_ENGINE,
user=DB_USER,
password=DB_PASSWORD,
host=DB_HOST,
port=DB_PORT,
name=DB_NAME,
)
SQLALCHEMY_ENGINE_OPTIONS
= { 'pool_size' : pool_size, 'max_overflow' : max_overflow, 'pool_pre_ping' : True, 'pool_timeout': DB_POOL_TIMEOUT, 'pool_recycle': 3600, 'connect_args': { 'connect_timeout': 5, 'options' : '-c statement_timeout=5000' } }    

Is there something I set wrong?  My timeout seems to be about 2 minutes instead of 5 secs.

Any help would be appreciated,

Thanks
Karim

Karim Gillani

unread,
Feb 25, 2020, 12:07:24 PM2/25/20
to sqlalchemy
Please note I am using a Pool.  And some other connections seem to recover fine.  After two minutes I get this message and things start working again:

==> AE K:11oh392; T:1; E:(psycopg2.OperationalError) could not connect to server: Connection timed out> Is the server running on host "patroni-mastertheq" (172.50.46.180) and accepting> TCP/IP connections on port 5432?>>(Background on this error at: http://sqlalche.me/e/e3q8 )
Thanks
Karim

Mike Bayer

unread,
Feb 25, 2020, 1:37:19 PM2/25/20
to noreply-spamdigest via sqlalchemy
"could not connect" means your server is unreachable from a network standpoint.  You want to make sure the server is running and that it is routable over the network you are trying to connect from.

in an HA scenario, depending on what software is in use and how it does switchover, there can be long network lags, such as if it is moving a VIP from one machine to another.  You'd want to be using a proxy server such as HAProxy or PGBouncer rather than relying upon changing network routing.
--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Karim Gillani

unread,
Feb 26, 2020, 12:11:56 AM2/26/20
to sqlalchemy
We are using openshift which uses HAProxy.  What I am looking for is why the connect_timeout may not being applied so that we can retry the call after re-connect to the other database pod.  

It is taking up to two minutes before I get the psycopg2.OperationalError.  It's very strange.  The code will automatically retry and work after this error shows up.  I am guessing but I don't think my setup is passing the connect_timeout to libpq.

Thanks
To unsubscribe from this group and stop receiving emails from it, send an email to sqlal...@googlegroups.com.

Karim Gillani

unread,
Feb 26, 2020, 12:23:03 AM2/26/20
to sqlalchemy
More information:

I tested by going into the python pod, and running looped psql select 1 command with and without PGCONNECT_TIMEOUT=2.  Without the PGCONNECT, I also saw 2 minute delays but with PGCONNECT set, the error came back in 2 secs.  I expected the same results by setting the connect_timeout but I don't see them.

Thanks

Mike Bayer

unread,
Feb 26, 2020, 10:31:03 AM2/26/20
to noreply-spamdigest via sqlalchemy
This would be a psycopg2 level setting which are documented at https://www.psycopg.org/docs/module.html#psycopg2.connect and the available values are ultimately at https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-PARAMKEYWORDS .    On the SQLAlchemy side you pass these to create_engine using connect_args: https://docs.sqlalchemy.org/en/13/core/engines.html?highlight=connect_args#custom-dbapi-args
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Karim Gillani

unread,
Feb 26, 2020, 11:19:28 AM2/26/20
to sqlal...@googlegroups.com
Thanks.  I don't believe this is a Red Hat issue as much as a configuration issue.  This is because I can replicate the issue using PSQL and using the connect_timeout argument in PSQL, I can fix it.  With the complexity of using flask-sqlalchemy which uses sqlalchemy to pass arguments to psycopg2 to libpq, it is difficult to see where the issue actually is.  Based on the links you provided, I am passing the connect_timeout parameter correctly.  I am not sure what to do now.  I am guessing I will need to somehow figure out if the parameter is being passed.  I think I will play with getting  connection.get_dsn_parameters() function to display the parameters.

Karim

Jonathan Vanasco

unread,
Feb 26, 2020, 4:50:39 PM2/26/20
to sqlalchemy
I would also throw a few debug lines into psycopg2 and ensure that library is creating a connection with the params you intend. 

Karim Gillani

unread,
Feb 26, 2020, 7:27:12 PM2/26/20
to sqlalchemy
Any suggestions on how to do that?  I believe that SQLAlchemy is using the psycopg2-binaries so we can add debugging to the library.  Can we override what it uses easily? 

At this point I am stuck and my dev's don't know how to proceed :(

Do you know of a way to display what is being sent to the libpq library via debug statements?

Thanks
Karim

Mike Bayer

unread,
Feb 27, 2020, 9:48:00 AM2/27/20
to noreply-spamdigest via sqlalchemy
Can you write a script that uses psycopg2 directly and see if you can replicate the issue in that way?  this would elimiante SQLAlchemy as part of the issue and you can interact with the psycopg2 developers directly:

import psycopg2

connection = psycopg2.connect(
    user="scott",
    password="tiger",
    host="localhost",
    database="test",
    connect_timeout=2,
)


There's no need to do pdb inside of psycopg2 itself.  Within SQLAlchemy you could do this at lib/sqlalchemy/dialects/postgresql/psycopg2.py however only if using raw psycopg2 above proves that it solves your connection issue.

Karim Gillani

unread,
Feb 27, 2020, 12:57:56 PM2/27/20
to sqlalchemy
Thank you for the great idea.  I did not see the forest through the trees.  Haha.

We have the pscopyg2 script running and things seems to be working fine with that.  We can see without the connect_timeout that the delay is about 2 minutes and with the connect_timeout set, the delay is only 2 seconds.

Next, we are going to wrap the script (if possible) with Alchemy only and test.
Then after that, we will wrap flask-alchemy.

Thanks
Karim


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlal...@googlegroups.com.


--
SQLAlchemy -
The Python SQL Toolkit and Object Relational Mapper
 
 
To post example code, please provide an MCVE: Minimal, Complete, and Verifiable Example. See http://stackoverflow.com/help/mcve for a full description.
---
You received this message because you are subscribed to the Google Groups "sqlalchemy" group.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlal...@googlegroups.com.

Mike Bayer

unread,
Feb 27, 2020, 1:24:25 PM2/27/20
to noreply-spamdigest via sqlalchemy
OK so that script I gave you, the exact SQLAlchemy equivalent is:


from sqlalchemy import create_engine


e = create_engine("postgresql+psycopg2://scott:tiger@localhost/test", connect_args={"connect_timeout": 2})
c = e.connect()


it also will accept it as:

e = create_engine("postgresql+psycopg2://scott:tiger@localhost/test?connect_timeout=2")

however, in the above case the timeout is passed as a string "2" and not an integer, which may interfere with it working, im not sure.   the separate connect_args approach is safer.
To unsubscribe from this group and stop receiving emails from it, send an email to sqlalchemy+...@googlegroups.com.

Karim Gillani

unread,
Feb 27, 2020, 5:30:32 PM2/27/20
to sqlalchemy
Latest update:

1. Psycopyg2 works with connect_timeout - Direct SQL call 
2. sqlalchemy works with connect_timeout (Direct SQL call no ORM)
3. Flask-sqlalchemy works with connect_timeout (Direct SQL call, no ORM)

Next up on our troubleshooting tree..

Test sqlalchemy  with ORM calls with Connect_timeout..

Weird...

If anyone wants to look at our code to see what our app is doing: https://github.com/bcgov/queue-management/api

Karim

Karim Gillani

unread,
Mar 2, 2020, 1:28:41 PM3/2/20
to sqlalchemy
Sadly we are still working on the ORM calls for the connect timeout.  Basic calls are working, of course ours are lot more complex so trying to get one of the complex ones in the test script.

At the same time, I am trying the pg8000 driver instead of psycopg2.  That driver seems to be much more promising.  

Very strange and the mystery continues..

Karim

Karim Gillani

unread,
Mar 9, 2020, 10:01:35 PM3/9/20
to sqlalchemy
Ok, so we had a retry decorator around a group of ORM calls.  This seems to be the issue.  If we wrap each call instead, things are much better. 

I wonder if the transaction rollback is causing issues.

Karim
Reply all
Reply to author
Forward
0 new messages