Failover not working with mogngodb 2.6 and java Driver 2.12.0

902 views
Skip to first unread message

Stephan

unread,
Apr 25, 2014, 9:26:05 AM4/25/14
to mongod...@googlegroups.com
Hi,

we got a strange behavior today. Just now, the primary mongo node crashed, mongodb just stopped, process exited.
But this is not the major problem, problem is, that failover in java-Driver is not working. 

The failover in the mongo replicaset worked flawlessly, mongo node 2 took over as primary within seconds, but the java-Driver did not get this. We keep getting this exception:
  │java.lang.RuntimeException: com.mongodb.MongoServerSelectionException: Unable to connect to any server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=primary}, Latenc│
  │yMinimizingServerSelector{acceptableLatencyDifference=15 ms}]}                                                                                                                                    │
  │        at de.caluga.morphium.writer.MorphiumWriterImpl$11.run(MorphiumWriterImpl.java:1271)                                                                                                      │
  │        at de.caluga.morphium.writer.MorphiumWriterImpl.submitAndBlockIfNecessary(MorphiumWriterImpl.java:697)                                                                                    │
  │        at de.caluga.morphium.writer.MorphiumWriterImpl.set(MorphiumWriterImpl.java:1277)                                                                                                         │
  │        at de.caluga.morphium.Morphium.set(Morphium.java:585)                                                                                                                                     │
  │        at de.caluga.morphium.Morphium.set(Morphium.java:580)                                                                                                                                     │
  │        at de.caluga.morphium.messaging.Messaging.run(Messaging.java:93)                                                                                                                          │
20│Caused by: com.mongodb.MongoServerSelectionException: Unable to connect to any server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=primary}, LatencyMinimizingServer│ot
el│Selector{acceptableLatencyDifference=15 ms}]}                                                                                                                                                     │
ja│        at com.mongodb.BaseCluster.getServer(BaseCluster.java:80)                                                                                                                                 │
  │        at com.mongodb.DBTCPConnector.getServer(DBTCPConnector.java:663)                                                                                                                          │
  │        at com.mongodb.DBTCPConnector.access$300(DBTCPConnector.java:42)                                                                                                                          │
  │        at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:508)                                                                                                               │
  │        at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:456)                                                                                                                         │
  │        at com.mongodb.DBTCPConnector.getPrimaryPort(DBTCPConnector.java:414)                                                                                                                     │
  │        at com.mongodb.DBCollectionImpl.update(DBCollectionImpl.java:257)                                                                                                                         │
  │        at com.mongodb.DBCollection.update(DBCollection.java:191)  

We got several thousand of those exception in log. First i thought it’s a bigger issue with the replicaset, but restarting the client helped at once - everything workes flawlessly again.

Any Idea what that could be?

Jeff Yemin

unread,
Apr 25, 2014, 9:41:03 AM4/25/14
to mongod...@googlegroups.com
Hi Stephan,

Thanks for reporting this.

How are you constructing the MongoClient in your application?  If using a URI, what is it?  If not, what seed list of ServerAddresses are you passing?

Can you also enable INFO-level logging and provide additional client logs?  I'm looking for entries from this logger:

   com.mongodb.driver.cluster
   

Regards,
Jeff


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.
To view this discussion on the web visit https://groups.google.com/d/msgid/mongodb-user/7d84ebed-bdb6-4b2c-a11a-1aa33fad27a5%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Stephan

unread,
Apr 25, 2014, 11:12:45 AM4/25/14
to mongod...@googlegroups.com
Hi Jeff,

we pass on server Adresses, no URI. Hmm... the list of ServerAdresses is in our case, pmongo1:27017, pmongo2:27017, pmongo3:27017

Or what info excactly do you need?

I will try to reproduce this error here somehow, i obviously won't to that on the productional environment again ;-)

I'll get back to you as soon as I got that reproduced.

Jeff Yemin

unread,
Apr 25, 2014, 12:40:12 PM4/25/14
to mongod...@googlegroups.com
That's enough in terms of the MongoClient construction.  Now I just would like to see the application logs.


Stephan

unread,
Apr 25, 2014, 3:43:13 PM4/25/14
to mongod...@googlegroups.com
ok, i can reproduce the error quite simply.

I have a cluster running on localhost, 3 Nodes: 
tst:PRIMARY> rs.config()
{
"_id" : "tst",
"version" : 1,
"members" : [
{
"_id" : 0,
"host" : "127.0.0.1:27017",
"priority" : 10
},
{
"_id" : 1,
"host" : "127.0.0.1:27018",
"priority" : 5
},
{
"_id" : 2,
"host" : "127.0.0.1:27019",
"arbiterOnly" : true
}
]
}

The Test just  writes Data into one collection... and reads it with a different one.

As soon as I kill node number 2 - the writing proccess freezes during the write - for minutes! Nothing happens. When I start node 2 again, everything works fine.

This test worked with Driver Version 2.11.x fine - when killing node #2, there sometimes was a read error (exception) but, it continued...

I set the logging Level of com.mongodb to ALL, but it's not telling very much:
pr 25, 2014 9:30:57 PM com.mongodb.MultiServerCluster invalidateOldPrimaries
Information: Replica set primary has changed to 127.0.0.1:27017
21:30:58,212 INFO  [main] FailoverTests: reading...101
21:30:58,212 INFO  [main] FailoverTests: Write Errors now: 0
21:30:59,214 INFO  [main] FailoverTests: reading...101
21:30:59,214 INFO  [main] FailoverTests: Write Errors now: 0
21:31:00,218 INFO  [main] FailoverTests: reading...101
21:31:00,218 INFO  [main] FailoverTests: Write Errors now: 0
21:31:01,220 INFO  [main] FailoverTests: reading...101
21:31:01,220 INFO  [main] FailoverTests: Write Errors now: 0
21:31:02,222 INFO  [main] FailoverTests: reading...101
21:31:02,222 INFO  [main] FailoverTests: Write Errors now: 0
Apr 25, 2014 9:31:02 PM com.mongodb.MultiServerCluster invalidateOldPrimaries
Information: Replica set primary has changed to 127.0.0.1:27017

This is happening, wenn killing node 2... But the message is plain wrong: node two is on port 27018 and never was primary, the primary did not change!
When bringing up node #2 again, this is logged:
Apr 25, 2014 9:33:06 PM com.mongodb.ServerStateNotifier run
Information: Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27018, type=ReplicaSetSecondary, hosts=[127.0.0.1:27017, 127.0.0.1:27018], passives=[], arbiters=[127.0.0.1:27019], primary='null', maxDocumentSize=16777216, maxMessageSize=48000000, maxWriteBatchSize=1000, tags={}, setName='tst', setVersion='1', averagePingTimeNanos=472000, ok=true, state=Connected, version=ServerVersion{versionList=[2, 6, 0]}, minWireVersion=0, maxWireVersion=2}
Apr 25, 2014 9:33:07 PM com.mongodb.MultiServerCluster invalidateOldPrimaries
Information: Replica set primary has changed to 127.0.0.1:27017

Even when all nodes are up, the message still appears:
Apr 25, 2014 9:34:07 PM com.mongodb.MultiServerCluster invalidateOldPrimaries
Information: Replica set primary has changed to 127.0.0.1:27017


When I kill node 1 - the primary node - i get these exceptions:
com.mongodb.MongoServerSelectionException: Unable to connect to any server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=primary}, LatencyMinimizingServerSelector{acceptableLatencyDifference=15 ms}]}
at com.mongodb.BaseCluster.getServer(BaseCluster.java:80)
at com.mongodb.DBTCPConnector.getServer(DBTCPConnector.java:663)
at com.mongodb.DBTCPConnector.access$300(DBTCPConnector.java:42)
at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:508)
at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:456)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:291)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:276)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:78)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:61)
at com.mongodb.DB.getCollectionNames(DB.java:510)
at com.mongodb.DB.collectionExists(DB.java:553)
at de.caluga.morphium.writer.MorphiumWriterImpl$1.run(MorphiumWriterImpl.java:126)
at de.caluga.morphium.writer.MorphiumWriterImpl.submitAndBlockIfNecessary(MorphiumWriterImpl.java:697)
at de.caluga.morphium.writer.MorphiumWriterImpl.store(MorphiumWriterImpl.java:234)
at de.caluga.morphium.Morphium.store(Morphium.java:1482)
at de.caluga.morphium.Morphium.store(Morphium.java:1474)
at de.caluga.morphium.Morphium.store(Morphium.java:1470)
at de.caluga.test.mongo.suite.FailoverTests$WriterThread.run(FailoverTests.java:95)

Problem is: once I restart node #1, the exception still occurs, no write takes place, no access to Mongo possible.

Apr 25, 2014 9:35:12 PM com.mongodb.ServerStateNotifier run
Information: Monitor thread successfully connected to server with description ServerDescription{address=127.0.0.1:27018, type=ReplicaSetSecondary, hosts=[127.0.0.1:27017, 127.0.0.1:27018], passives=[], arbiters=[127.0.0.1:27019], primary='null', maxDocumentSize=16777216, maxMessageSize=48000000, maxWriteBatchSize=1000, tags={}, setName='tst', setVersion='1', averagePingTimeNanos=377245, ok=true, state=Connected, version=ServerVersion{versionList=[2, 6, 0]}, minWireVersion=0, maxWireVersion=2}
21:35:12,550 ERROR [Thread-4] FailoverTests: Error during storage
com.mongodb.MongoServerSelectionException: Unable to connect to any server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=primary}, LatencyMinimizingServerSelector{acceptableLatencyDifference=15 ms}]}
at com.mongodb.BaseCluster.getServer(BaseCluster.java:80)
at com.mongodb.DBTCPConnector.getServer(DBTCPConnector.java:663)
at com.mongodb.DBTCPConnector.access$300(DBTCPConnector.java:42)
at com.mongodb.DBTCPConnector$MyPort.getConnection(DBTCPConnector.java:508)
at com.mongodb.DBTCPConnector$MyPort.get(DBTCPConnector.java:456)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:291)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:276)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:78)
at com.mongodb.DBCollectionImpl.find(DBCollectionImpl.java:61)
at com.mongodb.DB.getCollectionNames(DB.java:510)
at com.mongodb.DB.collectionExists(DB.java:553)
at de.caluga.morphium.writer.MorphiumWriterImpl$1.run(MorphiumWriterImpl.java:126)
at de.caluga.morphium.writer.MorphiumWriterImpl.submitAndBlockIfNecessary(MorphiumWriterImpl.java:697)
at de.caluga.morphium.writer.MorphiumWriterImpl.store(MorphiumWriterImpl.java:234)
at de.caluga.morphium.Morphium.store(Morphium.java:1482)
at de.caluga.morphium.Morphium.store(Morphium.java:1474)
at de.caluga.morphium.Morphium.store(Morphium.java:1470)
at de.caluga.test.mongo.suite.FailoverTests$WriterThread.run(FailoverTests.java:99)
21:35:12,753 ERROR [Thread-4] FailoverTests: Error during storage

There is nothing being logged in the case, this exception occurred...


while I'm preparing this post, the behavior changed. Now, when killing Node#1, I get a couple of exceptions, and then the writes just stop. - same as the behavior as when killing node number 2


Hope that helps....

If you need anything, please tell me. This problem is really causing downtimes whenever _any_ server of the replicaset goes down!

Jeff Yemin

unread,
Apr 25, 2014, 6:00:52 PM4/25/14
to mongod...@googlegroups.com
Can you double-check whether there is a SEVERE log message starting with:

   "Expecting replica set member from set"

or any other SEVERE log messages, or an INFO log message starting with

   "Server %s is no longer a member of the replica set"  

where %s is the name of the primary


Thanks,
Jeff

Jeff Yemin

unread,
Apr 25, 2014, 9:38:35 PM4/25/14
to mongod...@googlegroups.com
Hi Stephan,

Can you try to reproduce the issue using the latest 2.12.1-SNAPSHOT release, available at https://oss.sonatype.org/content/repositories/snapshots/org/mongodb/mongo-java-driver/2.12.1-SNAPSHOT/.  There are a number of bug fixes in that release that I think may be relevant to what you are experiencing.


Thanks,
Jeff

Stephan

unread,
Apr 28, 2014, 3:37:04 AM4/28/14
to mongod...@googlegroups.com
Hi Jeff,

just finished testing this with the latest snapshot, and it seems to have fixed the problem. The endless loop with exceptions is gone. Do you know when the 2.12.1 will be available? I need to install it on our production system and I hesitate a bit using a snapshot version.

Jeff Yemin

unread,
Apr 28, 2014, 6:50:42 AM4/28/14
to mongod...@googlegroups.com
Today or tomorrow.  I don't anticipate any more changes going in from the snapshot you tested.


Jeff


--
You received this message because you are subscribed to the Google Groups "mongodb-user"
group.
 
For other MongoDB technical support options, see: http://www.mongodb.org/about/support/.
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
To post to this group, send email to mongod...@googlegroups.com.
Visit this group at http://groups.google.com/group/mongodb-user.

Pontus Naimell

unread,
Apr 28, 2014, 9:51:16 AM4/28/14
to mongod...@googlegroups.com

Just wanted to add
This may be the same issue

Apr 28, 2014 10:11:23 AM com.mongodb.MultiServerCluster handleReplicaSetMemberChanged
SEVERE: Expecting replica set member from set 'app2', but found one from set 'null'.  Removing XXXXXXXX:40000 from client view of cluster.
Apr 28, 2014 10:11:23 AM com.mongodb.MultiServerCluster handleReplicaSetMemberChanged
SEVERE: Expecting replica set member from set 'app2', but found one from set 'null'.  Removing XXXXXXXX:27017 from client view of cluster.
Apr 28, 2014 10:13:14 AM com.sun.jersey.spi.container.ContainerResponse mapMappableContainerException
SEVERE: The RuntimeException could not be mapped to a response, re-throwing to the HTTP container
com.mongodb.MongoServerSelectionException: Unable to connect to any server that matches {serverSelectors=[ReadPreferenceServerSelector{readPreference=secondaryPreferred}, LatencyMinimizingServerSelector{acceptableLatencyDifference=15 ms}]}
at com.mongodb.BaseCluster.getServer(BaseCluster.java:80)
...


MongoDB version 2.6.0
Driver  2.12.0

/Pontus

craiggwilson

unread,
Apr 28, 2014, 2:51:34 PM4/28/14
to mongod...@googlegroups.com
Yes, that is the same issue, or at least one that will get fixed by the same change.

Jeff Yemin

unread,
Apr 29, 2014, 11:15:46 AM4/29/14
to mongod...@googlegroups.com

Pontus Naimell

unread,
Apr 30, 2014, 6:02:56 AM4/30/14
to mongod...@googlegroups.com
Will you bump up the version number to make it official ?

Pontus Naimell

unread,
Apr 30, 2014, 6:07:42 AM4/30/14
to mongod...@googlegroups.com
Ok, so now maven updated... this was just lag

Michael

unread,
Apr 30, 2014, 9:05:51 PM4/30/14
to mongod...@googlegroups.com
Perfect.

在 2014年4月30日星期三UTC+8下午6时07分42秒,Pontus Naimell写道:
Reply all
Reply to author
Forward
0 new messages