Jira (PDB-4579) Enable tcpKeepAlive on the postgres driver

Issue Type:	Improvement
Affects Versions:	PDB 6.7.1
Assignee:	Unassigned
Components:	PuppetDB
Created:	2019/11/06 6:55 AM
Priority:	Normal
Reporter:	Taylan Develioglu

Bringing down the network interface on our PostgreSQL server causes the connection pool to hold on to connections that were already closed on the database server for (what seems t oe be) an infinite time.

The network stack on the client is never notified these connections have been closed on the peer and PuppetDB's connection pool still believes they are active.

This caused us to run out of available connections in the connection pool until restarting PuppetDB. The PDBReadPool_pool_ActiveConnections metric also reports a value of 25 (maximum-pool-size).

Can the tcpKeepAlive option of the PostgreSQL JDBC driver be enabled to prevent this class of issue from happening ?

Network link going down on the PostgreSQL server

[di nov  5 12:49:51 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down

[di nov  5 12:55:03 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

[di nov  5 12:55:09 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down

[di nov  5 12:55:10 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

[di nov  5 12:55:11 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down

[di nov  5 12:55:13 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

Connections still in ESTABLISHED state on the client side

[root@puppetdb ~]# netstat -ntp|grep 10.197.29.74:5432

tcp6       0      0 10.198.174.11:39186     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:59996     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:50380     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:60952     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:33536     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:60902     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:35564     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:57950     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:45416     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:33644     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:39678     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:43846     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:55738     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:58098     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:34214     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:40098     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:41694     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:53760     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:33806     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:50358     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:60068     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:33530     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:38840     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:54616     10.197.29.74:5432       ESTABLISHED 47079/java

tcp6       0      0 10.198.174.11:36002     10.197.29.74:5432       ESTABLISHED 47079/java

Actual established connections.

[root@pgsqldb-puppetdb ~]# netstat -ntp|grep 10.198.174.11

tcp        0      0 10.197.29.74:5432       10.198.174.11:39186     ESTABLISHED 9292/postgres: pupp

tcp        0      0 10.197.29.74:5432       10.198.174.11:40098     ESTABLISHED 9369/postgres: pupp

tcp        0      0 10.197.29.74:5432       10.198.174.11:39678     ESTABLISHED 9338/postgres: pupp

tcp        0      0 10.197.29.74:5432       10.198.174.11:60902     ESTABLISHED 7652/postgres: pupp

PuppetDB connection pool running out of available connections.

2019-11-06T12:43:50.504+01:00 WARN  [p.p.jdbc] Caught exception. Last attempt, throwing exception.

2019-11-06T12:43:50.506+01:00 WARN  [o.e.j.s.HttpChannel] /pdb/query/v4

javax.servlet.ServletException: java.sql.SQLTransientConnectionException: PDBReadPool - Connection is not available, request timed out after 3000ms.

This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)

Taylan Develioglu (JIRA)

unread,

Nov 6, 2019, 9:57:02 AM11/6/19

to puppe...@googlegroups.com

Taylan Develioglu updated an issue

PuppetDB /

Change By:	Taylan Develioglu

Bringing down the network interface link on our the PostgreSQL server causes the connection pool to hold on to connections that were already closed on the database server for (what seems t oe to be) an infinite time.

The network stack on the client is never notified these connections have been closed on the peer and PuppetDB's connection pool still believes they are active.

This caused us to run out of available connections in the connection pool until restarting PuppetDB. The PDBReadPool_pool_ActiveConnections metric also reports a value of 25 (maximum-pool-size).

Can the tcpKeepAlive option of the PostgreSQL JDBC driver be enabled to prevent this class of issue from happening ?

h5. Network link going down on the PostgreSQL server
{code:java}[di nov 5 12:49:51 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down

[di nov  5 12:55:03 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[di nov  5 12:55:09 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down
[di nov  5 12:55:10 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[di nov  5 12:55:11 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down
[di nov  5 12:55:13 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

{code}

h5. Connections still in ESTABLISHED state on the client side
{code:java}[root@puppetdb ~]# netstat -ntp|grep 10.197.29.74:5432

{code}

h5. Actual established connections.
{code:java}[root@pgsqldb-puppetdb ~]# netstat -ntp|grep 10.198.174.11

tcp        0      0 10.197.29.74:5432       10.198.174.11:39186     ESTABLISHED 9292/postgres: pupp
tcp        0      0 10.197.29.74:5432       10.198.174.11:40098     ESTABLISHED 9369/postgres: pupp
tcp        0      0 10.197.29.74:5432       10.198.174.11:39678     ESTABLISHED 9338/postgres: pupp
tcp        0      0 10.197.29.74:5432       10.198.174.11:60902     ESTABLISHED 7652/postgres: pupp

{code}

h5. PuppetDB connection pool running out of available connections.
{code:java}2019-11-06T12:43:50.504+01:00 WARN [p.p.jdbc] Caught exception. Last attempt, throwing exception.

2019-11-06T12:43:50.506+01:00 WARN [o.e.j.s.HttpChannel] /pdb/query/v4
javax.servlet.ServletException: java.sql.SQLTransientConnectionException: PDBReadPool - Connection is not available, request timed out after 3000ms.

{code}

Taylan Develioglu (JIRA)

unread,

Nov 6, 2019, 9:57:03 AM11/6/19

to puppe...@googlegroups.com

Taylan Develioglu updated an issue

PuppetDB /

Change By:	Taylan Develioglu

Bringing down the network interface on our PostgreSQL server causes the connection pool to hold on to connections that were already closed on the database server for (what seems t oe be) an infinite time.

Robert Roland (JIRA)

unread,

Nov 7, 2019, 12:43:03 PM11/7/19

to puppe...@googlegroups.com

Robert Roland commented on

Re: Enable tcpKeepAlive on the postgres driver

After some digging, tcpKeepAlive won't help here - there's a bug in the underlying PostgreSQL JDBC driver that it doesn't detect dead connections properly.

We'll upgrade the PostgreSQL driver to handle this.

Robert Roland (JIRA)

unread,

Nov 7, 2019, 12:43:04 PM11/7/19

to puppe...@googlegroups.com

Robert Roland assigned an issue to Robert Roland

PuppetDB /

Change By:	Robert Roland
Assignee:	Robert Roland

Robert Roland (JIRA)

unread,

Nov 22, 2019, 2:30:03 PM11/22/19

to puppe...@googlegroups.com

Robert Roland updated an issue

PuppetDB /