Java driver 3.0.2 connection pool doesn't free connections despite maxConnectionLifetime

1,384 views
Skip to first unread message

Christian Bayer

unread,
Nov 13, 2015, 3:15:54 PM11/13/15
to mongodb-user
Hey folks,

we ran into a problem when the MongoDB Java driver v3.0.2 wouldn't free up connections to a node (blade5-9) after that node stepped down from primary to secondary although we have maxConnectionLifetime of 20 mins set for the pool. Usually you would expect that all connections to that (now) secondary get invalidated and will be closed. After initial invalidation, it looks like they will reconnect automatically:

2015-11-08T04:00:03.847+0000 I COMMAND  [conn47669] command admin.$cmd command: replSetStepDown { replSetStepDown: 60.0 } keyUpdate
s:0 writeConflicts:0 numYields:0 reslen:37 locks:{} 1875ms
2015-11-08T04:00:03.847+0000 I WRITE    [conn47700] update aaaa_prd.bcwAAiaaic2t6aaadimBQMstoh query: { uuid: "bcxW6iaaiE6PaaaadokXVc88ta" }
[...] keyUpdates:0 writeConflicts:0 exception: Not primary while writing to aaaa_prd.bcwAAiaaic2t6aaadimBQMstoh code:10107 numYields:0 locks:{} 119ms
The client now detects that the connections are invalid:
[2015-11-08 04:00:03.727 GMT] INFO org.mongodb.driver.connection   Opened connection [connectionId{localValue:151324, serverValue:47700}] to blade5-9-vip:XXXX
[...]
[2015-11-08 04:00:03.847 GMT] WARN org.mongodb.driver.connection   Got socket exception on connection [connectionId{localValue:151324, serverValue:47700}] to blade5-9-vip:XXXX.
All connections to blade5-9-vip:XXXX will be closed.
On the Mongo side, all connections invalidated like this now show the following message:
2015-11-08T04:00:03.882+0000 I NETWORK  [conn47471] SocketException handling request, closing client connection: 9001 socket exception [SEND_ERROR] server [10.0.X.X:52310] 
The client detects the transition to secondary for that node:

[2015-11-08 04:00:03.868 GMT] INFO org.mongodb.driver.cluster   No server chosen by ReadPreferenceServerSelector{readPreference=primary}
from cluster description ClusterDescription{type=REPLICA_SET, connectionMode=MULTIPLE, all=[ServerDescription{address=blade2-9-vip:XXXX,
type=REPLICA_SET_SECONDARY, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 0, 1]}, minWireVersion=0, maxWireVersion=3,
electionId=null, maxDocumentSize=16777216, roundTripTimeNanos=484423, setName='rs10', canonicalAddress=blade2-9-vip:XXXX, hosts=[blade5-9-vip:XXXX,
blade2-9-vip:XXXX], passives=[], arbiters=[blade4-4-mongo:XXXX], primary='blade5-9-vip:XXXX', tagSet=TagSet{[]}},
ServerDescription{address=blade4-4-mongo:XXXX, type=REPLICA_SET_ARBITER, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 0, 1]},
minWireVersion=0, maxWireVersion=3, electionId=null, maxDocumentSize=16777216, roundTripTimeNanos=505066, setName='rs10', canonicalAddress=blade4-4-mongo:XXXX,
hosts=[blade5-9-vip:XXXX, blade2-9-vip:XXXX], passives=[], arbiters=[blade4-4-mongo:XXXX], primary='blade5-9-vip:XXXX', tagSet=TagSet{[]}},
ServerDescription{address=blade5-9-vip:XXXX, type=UNKNOWN, state=CONNECTING, exception={com.mongodb.MongoInterruptedException: Interrupted acquiring a permit to retrieve an item from the pool },
caused by {java.lang.InterruptedException}}]}. Waiting for 1000 ms before timing out
[...]
[2015-11-08 04:00:03.884 GMT] INFO cluster-ClusterId{value='563bb0081a761c2eecf2a777', description='null'}-blade5-9-vip:XXXX org.mongodb.driver.cluster
   Monitor thread successfully connected to server with description ServerDescription{address=blade5-9-vip:XXXX, type=REPLICA_SET_SECONDARY, state=CONNECTED,
ok=true, version=ServerVersion{versionList=[3, 0, 1]}, minWireVersion=0, maxWireVersion=3, electionId=null, maxDocumentSize=16777216, roundTripTimeNanos=803898, setName='rs10',
canonicalAddress=blade5-9-vip:XXXX, hosts=[blade5-9-vip:XXXX, blade2-9-vip:XXXX], passives=[], arbiters=[blade4-4-mongo:XXXX], primary='null', tagSet=TagSet{[]}}

Now, the Mongo client surprisingly opens a connection to the former primary server, before it officially announces that it has become secondary:

2015-11-08T04:00:03.891+0000 I NETWORK  [initandlisten] connection accepted from 10.0.X.X:63986 #47733 (23 connections now open)
2015-11-08T04:00:04.435+0000 I REPL     [ReplicationExecutor] replSetElect voting yea for blade2-9-vip:XXXX (0)
2015-11-08T04:00:05.839+0000 I REPL     [ReplicationExecutor] Member blade2-9-vip:XXXX is now in state PRIMARY
[...]
2015-11-08T04:10:01.845+0000 I REPL     [ReplicationExecutor] transition to RECOVERING
[...]
2015-11-08T04:13:58.438+0000 I REPL     [ReplicationExecutor] transition to SECONDARY
2015-11-08T04:13:58.439+0000 I COMMAND  [conn47994] command aaaa_prd$cmd command: compact { compact: "temp" } keyUpdates:0 writeConflicts:0 numYields:0 reslen:37 locks:{} 8298ms
2015-11-08T04:13:58.439+0000 I REPL     [ReplicationExecutor] syncing from: blade2-9-vip:XXXX
2015-11-08T04:13:58.452+0000 I NETWORK  [conn47994] end connection 10.0.4.129:34086 (120 connections now open)

While the recovery runs, all 100+ connections from our 4 clients reconnect, on top of the 10 connections used internally.
The server connection with id 47733 connecting to the server was open for multiple days before, despite a 20 min max connection life time and a 10 minute idle time. Looks like the pool reconnects these connections (which it shouldn't) and subsequently doesn't shut them down properly.

$ grep "serverValue:47733" * |grep blade5-9
[2015-11-06 00:47:05.486 GMT] INFO org.mongodb.driver.connection   Opened connection [connectionId{localValue:39393, serverValue:47733}] to blade5-9-vip:XXXX
[2015-11-06 00:47:10.531 GMT] INFO org.mongodb.driver.connection   Closed connection [connectionId{localValue:39393, serverValue:47733}] to blade5-9-vip:vip:XXXX because the pool has been closed.
[2015-11-08 04:00:03.891 GMT] INFO cluster-ClusterId{value='563bb0081a761c2eecf2a777', description='null'}-blade5-9-vip:vip:XXXX org.mongodb.driver.connection 
Opened connection [connectionId{localValue:417362, serverValue:47733}] to blade5-9-vip:vip:XXXX

Our setup consists of primary, secondary and arbiter and typically there are only 10-12 connections open to any secondary we have. 
Any ideas? Could that be a bug in the driver? Do you need more logs?

Thanks,
Christian

Christian Bayer

unread,
Nov 17, 2015, 3:27:57 PM11/17/15
to mongodb-user
Bump. Any ideas?

Jeff Yemin

unread,
Nov 17, 2015, 4:03:13 PM11/17/15
to mongodb-user
Hey Christian,

The Mongo client, by design, opens a single connection to each non-hidden member of the replica set, regardless of whether it's announced itself as a primary or secondary.  This connection is entirely separate from the connection pool and does not respect any settings for maximum idle time or life time.  The connection is used to monitor the state of each member of the replica set in order to determine which server to send each operation to. Much of this is described in detail in the following specifications:
There's shorter blog posts about the former here and the latter here.


A few questions about your description: you write:

While the recovery runs, all 100+ connections from our 4 clients reconnect, on top of the 10 connections used internally
  1. Are you saying that the driver opens 100+ connections to blade5-9 before it becomes a secondary?  
  2. What read preference is used by the application?  If it's always primary, I wouldn't expect any connections to be opened unless perhaps you're also configuring a minimum connection pool size.

Regards,
Jeff

On Friday, November 13, 2015 at 3:15:54 PM UTC-5, Christian Bayer wrote:

Christian Bayer

unread,
Nov 17, 2015, 4:18:21 PM11/17/15
to mongodb-user
Hey Jeff, thanks for your reply!

1. Are you saying that the driver opens 100+ connections to blade5-9 before it becomes a secondary?  

Correct. Blade5-9 is a primary that has 100+ connections open when it receives a stepdown command. (The fact that it has 100 connections open might already be a problem because a big portion of them don't seem to be busy.) Then even before blade5-9 logs the fact that it went into recovering state to subsequently become secondary, all 100+ connections instantly reconnect.

2. What read preference is used by the application?  If it's always primary, I wouldn't expect any connections to be opened unless perhaps you're also configuring a minimum connection pool size.

It is primary only, we don't read from secondaries at all. Sorry, I forgot to include that in the initial message. There is no minimum connection pool size, but the maximum lifetime for connections in the pool.

Christian

Jeff Yemin

unread,
Nov 17, 2015, 4:24:17 PM11/17/15
to mongod...@googlegroups.com
Hey Christian,

Can you provide the server logs for blade5-9 for the recovery period?  If there is sensitive information in the logs but you're willing to share them with us but not the whole world, I can provide you a way to upload them securely.


Regards,
Jeff

--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/ee75dd4e-009d-464a-9a2c-7607ee3a6133%40googlegroups.com.

For more options, visit https://groups.google.com/d/optout.

Reply all
Reply to author
Forward
0 new messages