Failover problems (again) with Java Driver

81 views
Skip to first unread message

Stephan

unread,
Feb 25, 2013, 2:15:10 AM2/25/13
to mongod...@googlegroups.com
Hi,

we still have a lot of Problems with the failover, especially when the master node goes down.
Failover in Mongo itself works fine - master goes down, secondary becomes master after a couple of seconds... so fine, so good.

BUT: The java Driver still tries to talk to the master, whether its there or not:
com.mongodb.MongoException$Network: can't call something : mongo1/10.20.100.55:27017/holidayinsider
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:295)
at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:257)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:310)
at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:295)
at com.mongodb.DBCursor._check(DBCursor.java:368)
at com.mongodb.DBCursor._hasNext(DBCursor.java:459)
at com.mongodb.DBCursor._fill(DBCursor.java:518)
at com.mongodb.DBCursor.length(DBCursor.java:532)
at de.caluga.morphium.query.QueryImpl.get(QueryImpl.java:599)
at de.caluga.morphium.Morphium.findById(Morphium.java:1442)
at de.holidayinsider.component.http.server.networkhandler.HiWebUtils.getFreeMarkerDto(HiWebUtils.java:844)
at de.holidayinsider.component.http.server.networkhandler.HiNetworkDispatcher.doGet(HiNetworkDispatcher.java:446)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:621)
--
at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:589)
at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: couldn't connect to [mongo1/10.20.100.55:27017] bc:java.net.ConnectException: Connection refused
at com.mongodb.DBPort._open(DBPort.java:214)
at com.mongodb.DBPort.go(DBPort.java:107)
at com.mongodb.DBPort.call(DBPort.java:74)
at com.mongodb.DBTCPConnector.innerCall(DBTCPConnector.java:286)
... 45 more

I always end up with those exception, I tried it for several minutes - no help. No Read, no write, nothing works.

When Master is up, but is secondary, all seems do work fine... 

Can I influence the failover behavior with some settings? 

Thanks for any help.

Kind Regards,

Stephan 

Stephan

unread,
Feb 25, 2013, 2:20:42 AM2/25/13
to mongod...@googlegroups.com
Update: no, failover does not work correctly, even if master server is up, but being secondary (rs.stepDown()):
2013-02-25 08:20:04,756 ERROR [de.caluga.morphium.messaging.Messaging] (Thread-12) Unhandled exception not master
com.mongodb.MongoException: not master
at com.mongodb.CommandResult.getException(CommandResult.java:100)
at com.mongodb.CommandResult.throwOnError(CommandResult.java:134)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:142)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:183)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:155)
at com.mongodb.DBApiLayer$MyCollection.remove(DBApiLayer.java:289)
at com.mongodb.DBCollection.remove(DBCollection.java:250)
at de.caluga.morphium.WriterImpl.delete(WriterImpl.java:421)
at de.caluga.morphium.Morphium.delete(Morphium.java:1816)
at de.caluga.morphium.messaging.Messaging.run(Messaging.java:77)
2013-02-25 08:20:07,280 ERROR [de.caluga.morphium.messaging.Messaging] (Thread-12) Unhandled exception not master
com.mongodb.MongoException: not master
at com.mongodb.CommandResult.getException(CommandResult.java:100)
at com.mongodb.CommandResult.throwOnError(CommandResult.java:134)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:142)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:183)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:155)
at com.mongodb.DBApiLayer$MyCollection.remove(DBApiLayer.java:289)
at com.mongodb.DBCollection.remove(DBCollection.java:250)
at de.caluga.morphium.WriterImpl.delete(WriterImpl.java:421)
at de.caluga.morphium.Morphium.delete(Morphium.java:1816)
at de.caluga.morphium.messaging.Messaging.run(Messaging.java:77)
2013-02-25 08:20:09,812 ERROR [de.caluga.morphium.messaging.Messaging] (Thread-12) Unhandled exception not master
com.mongodb.MongoException: not master
at com.mongodb.CommandResult.getException(CommandResult.java:100)
at com.mongodb.CommandResult.throwOnError(CommandResult.java:134)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:142)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:183)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:155)
at com.mongodb.DBApiLayer$MyCollection.remove(DBApiLayer.java:289)
at com.mongodb.DBCollection.remove(DBCollection.java:250)
at de.caluga.morphium.WriterImpl.delete(WriterImpl.java:421)
at de.caluga.morphium.Morphium.delete(Morphium.java:1816)
at de.caluga.morphium.messaging.Messaging.run(Messaging.java:77)

One word about our setup:
2 Nodes (Master + Secondary) + 1 Arbiter.


Why is the mongo driver always trying to talk to master?

We needed to set the ReadPreference to Primary as there were problems with the writing to all slaves, as I already posted here

any Idea what this could be?

Kind Regards,

Stephan

Cwolf

unread,
Feb 25, 2013, 10:00:24 AM2/25/13
to mongod...@googlegroups.com
Hi Stephan 
I am not sure if you are following the Java driver design correctly.  The Java driver will error if it cannot talk to the primary  (it does / will find the primary eventually) your application code needs to handle these errors and either retry or return an error upstream.  On the writing to secondaries - you never write to secondaries from a driver - you always write to the primary.  You may read from the secondaries.

Chase

Stephan

unread,
Apr 18, 2013, 10:38:43 AM4/18/13
to mongod...@googlegroups.com
hi Cwolf,

this is correct. Unfortunately, I did not write to secondaries, the driver did! The application would handle the exceptions for some time, problem is: these exceptions occur as long as the old master is not back to primary state.
This means: we have a replica set of 4 nodes, node1 is usually the primary (due to a higher priority). Taking it down makes the node2 to take over the Primary-Role. This works fine so far...
Unfortunately, all writes happen to be still send to node1 - although is's not accessible - and end up in an exception (can't call something)
And: when node1 comes back up and node2 still stays primary, the exceptions still occur.

Unfortunatleyl this only happens in our production environment - do you think the problem is, that there is one hidden node?
Reply all
Reply to author
Forward
0 new messages