Jira (PDB-4579) Enable tcpKeepAlive on the postgres driver

4 views
Skip to first unread message

Taylan Develioglu (JIRA)

unread,
Nov 6, 2019, 9:56:04 AM11/6/19
to puppe...@googlegroups.com
Taylan Develioglu created an issue
 
PuppetDB / Improvement PDB-4579
Enable tcpKeepAlive on the postgres driver
Issue Type: Improvement Improvement
Affects Versions: PDB 6.7.1
Assignee: Unassigned
Components: PuppetDB
Created: 2019/11/06 6:55 AM
Priority: Normal Normal
Reporter: Taylan Develioglu

Bringing down the network interface on our PostgreSQL server causes the connection pool to hold on to connections that were already closed on the database server for (what seems t oe be) an infinite time.

The network stack on the client is never notified these connections have been closed on the peer and PuppetDB's connection pool still believes they are active.

This caused us to run out of available connections in the connection pool until restarting PuppetDB. The PDBReadPool_pool_ActiveConnections metric also reports a value of 25 (maximum-pool-size).

 

Can the tcpKeepAlive option of the PostgreSQL JDBC driver be enabled to prevent this class of issue from happening ?

 

Network link going down on the PostgreSQL server

 

[di nov  5 12:49:51 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down
[di nov  5 12:55:03 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[di nov  5 12:55:09 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down
[di nov  5 12:55:10 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[di nov  5 12:55:11 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down
[di nov  5 12:55:13 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

 

Connections still in ESTABLISHED state on the client side

 

 
[root@puppetdb ~]# netstat -ntp|grep 10.197.29.74:5432
tcp6       0      0 10.198.174.11:39186     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:59996     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:50380     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:60952     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:33536     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:60902     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:35564     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:57950     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:45416     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:33644     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:39678     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:43846     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:55738     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:58098     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:34214     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:40098     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:41694     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:53760     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:33806     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:50358     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:60068     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:33530     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:38840     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:54616     10.197.29.74:5432       ESTABLISHED 47079/java          
tcp6       0      0 10.198.174.11:36002     10.197.29.74:5432       ESTABLISHED 47079/java     

 

Actual established connections.

 

[root@pgsqldb-puppetdb ~]# netstat -ntp|grep 10.198.174.11
tcp        0      0 10.197.29.74:5432       10.198.174.11:39186     ESTABLISHED 9292/postgres: pupp 
tcp        0      0 10.197.29.74:5432       10.198.174.11:40098     ESTABLISHED 9369/postgres: pupp 
tcp        0      0 10.197.29.74:5432       10.198.174.11:39678     ESTABLISHED 9338/postgres: pupp 
tcp        0      0 10.197.29.74:5432       10.198.174.11:60902     ESTABLISHED 7652/postgres: pupp 

 

PuppetDB connection pool running out of available connections.

 

 

 

2019-11-06T12:43:50.504+01:00 WARN  [p.p.jdbc] Caught exception. Last attempt, throwing exception.
2019-11-06T12:43:50.506+01:00 WARN  [o.e.j.s.HttpChannel] /pdb/query/v4
javax.servlet.ServletException: java.sql.SQLTransientConnectionException: PDBReadPool - Connection is not available, request timed out after 3000ms.

Add Comment Add Comment
 
This message was sent by Atlassian JIRA (v7.7.1#77002-sha1:e75ca93)
Atlassian logo

Taylan Develioglu (JIRA)

unread,
Nov 6, 2019, 9:57:02 AM11/6/19
to puppe...@googlegroups.com
Taylan Develioglu updated an issue
Change By: Taylan Develioglu
Bringing down the network interface link on our the PostgreSQL server causes the connection pool to hold on to connections that were already closed on the database server for (what seems t oe to be) an infinite time.


The network stack on the client is never notified these connections have been closed on the peer and PuppetDB's connection pool still believes they are active.

This caused us to run out of available connections in the connection pool until restarting PuppetDB. The PDBReadPool_pool_ActiveConnections metric also reports a value of 25 (maximum-pool-size).

 

Can the tcpKeepAlive option of the PostgreSQL JDBC driver be enabled to prevent this class of issue from happening ?

 
h5. Network link going down on the PostgreSQL server 
{code:java}[di nov  5 12:49:51 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down

[di nov  5 12:55:03 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[di nov  5 12:55:09 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down
[di nov  5 12:55:10 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none
[di nov  5 12:55:11 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Down
[di nov  5 12:55:13 2019] bnx2x 0000:37:00.0 eno1: NIC Link is Up, 10000 Mbps full duplex, Flow control: none

{code}
 
h5. Connections still in ESTABLISHED state on the client side 
{code:java}[root@puppetdb ~]# netstat -ntp|grep 10.197.29.74:5432
{code}
 
h5. Actual established connections. 
{code:java}[root@pgsqldb-puppetdb ~]# netstat -ntp|grep 10.198.174.11

tcp        0      0 10.197.29.74:5432       10.198.174.11:39186     ESTABLISHED 9292/postgres: pupp
tcp        0      0 10.197.29.74:5432       10.198.174.11:40098     ESTABLISHED 9369/postgres: pupp
tcp        0      0 10.197.29.74:5432       10.198.174.11:39678     ESTABLISHED 9338/postgres: pupp
tcp        0      0 10.197.29.74:5432       10.198.174.11:60902     ESTABLISHED 7652/postgres: pupp

{code}
 
h5. PuppetDB connection pool running out of available connections. 
{code:java}2019-11-06T12:43:50.504+01:00 WARN  [p.p.jdbc] Caught exception. Last attempt, throwing exception.

2019-11-06T12:43:50.506+01:00 WARN  [o.e.j.s.HttpChannel] /pdb/query/v4
javax.servlet.ServletException: java.sql.SQLTransientConnectionException: PDBReadPool - Connection is not available, request timed out after 3000ms.

{code}

Taylan Develioglu (JIRA)

unread,
Nov 6, 2019, 9:57:03 AM11/6/19
to puppe...@googlegroups.com
Taylan Develioglu updated an issue
Bringing down the network interface on our PostgreSQL server causes the connection pool to hold on to connections that were already closed on the database server for (what seems t oe be) an infinite time.

Robert Roland (JIRA)

unread,
Nov 7, 2019, 12:43:03 PM11/7/19
to puppe...@googlegroups.com
Robert Roland commented on Improvement PDB-4579
 
Re: Enable tcpKeepAlive on the postgres driver

After some digging, tcpKeepAlive won't help here - there's a bug in the underlying PostgreSQL JDBC driver that it doesn't detect dead connections properly.

We'll upgrade the PostgreSQL driver to handle this.

Robert Roland (JIRA)

unread,
Nov 7, 2019, 12:43:04 PM11/7/19
to puppe...@googlegroups.com
Robert Roland assigned an issue to Robert Roland
 
Change By: Robert Roland
Assignee: Robert Roland

Robert Roland (JIRA)

unread,
Nov 22, 2019, 2:30:03 PM11/22/19
to puppe...@googlegroups.com
Robert Roland updated an issue
Change By: Robert Roland
Release Notes Summary: Updated the PostgreSQL driver version to be able to properly detect dead connections before their use. This resolves an issue where an unreachable PostgreSQL server can cause PuppetDB to exhaust its connection pool (thus requiring a restart)
Release Notes: Bug Fix

Zachary Kent (JIRA)

unread,
Jan 10, 2020, 12:22:05 PM1/10/20
to puppe...@googlegroups.com
Zachary Kent updated an issue
Change By: Zachary Kent
Fix Version/s: PDB 6.7.4
Fix Version/s: PDB 6.3.7
Fix Version/s: PDB 5.2.12

Heston Hoffman (JIRA)

unread,
Jan 13, 2020, 5:20:06 PM1/13/20
to puppe...@googlegroups.com
Heston Hoffman updated an issue
Change By: Heston Hoffman
Labels: resolved-issue-added
Reply all
Reply to author
Forward
0 new messages