On Thursday, October 4, 2012 7:37:17 AM UTC-4, Hummingbird wrote:
> I have a question related to Failover in replicaSet. Here’s how I have
> configured my servers.
> I have 1 PRIMARY server, 1 SECONDARY server and 1 ARBITER.
> I used the default configuration (for WriteConcern). i.e. slaveOK = false,
> safe=false,fsync=false
> I run my application that stores more than 40k records into DB on Start
> up.
> I have slaveOK = false as I have collection that stores the set of ids or
> index in one collection and that is being used by multiple threads to
> insert and get the index to store a record in another Collection with that
> index. So there should NOT be any delay in the reads for the indices.
> Now to test the replication fail-over, I abruptly shut down the PRIMARY
> server.
> I observed that there is some delay till the time the voting happens and
> next PRIMARY is elected.
> Now during this time my application tries to insert records to the
> database and it keeps getting exceptions (until PRIMARY is up and running).
> As the PRIMARY is not yet up, there is some *LOSS *of data. When the
> application connects to the new PRIMARY it again commences the data
> insertion.
> The downside here is that I am losing on my data.
> To fix this issue I looked at the WriteConcern option that controls the
> write behavior. One of the options I found was *REPLICAS_SAFE:*
> Mongo doc for REPLICA-SAFE says- Exceptions are raised for network issues,
> and server errors; *waits for at least 2 servers for the write operation*
> So I changed my MongoURI configuration and added following options:
> *safe=true&w=2&wtimeout=0*
> mongo.db.uri=mongodb://localhost:27021,localhost:27022,localhost:27023/?*
> safe=true&w=2&wtimeout=0*
> Now with this setting I restarted all the mongo servers and my application.
> I stopped my PRIMARY server intermittently, hoping that now the inserts
> would wait until at least one of the SECONDARY servers would become PRIMARY.
> However, I got following exception:
> *WARNING: emptying DBPortPool to loclhost:27022 b/c of error*
> *java.io.IOException: couldn't connect to [/loclhost:27022]
> bc:java.net.ConnectException: Connection refused: connect*
> *…*
> *…*
> *Oct 3, 2012 9:38:24 AM com.mongodb.DBTCPConnector _set*
> *WARNING: Master switching from localhost:27022 to localhost:27021*
> And even after the warning that the *Master* has changed the insertion
> operation did not start.
> I am not able to figure out what could be the real reason. What am I
> missing here? Is this how *REPLICA_SAFE *supposed to behave?
> What is the correct way of handling this situation when the PRIMARY goes
> down (having slaveOk=false) and still prevent the data loss?
> Many Thanks.
> Hummingbird