Replica Set -Java driver - Primary Failover

RajAhm

unread,

Oct 28, 2010, 5:41:14 PM10/28/10

to mongodb-user

I have set up replica set with 3 Mongod server on localhost.

I am testing failover from primary. I am using Java Driver version
2.2

I have written an application and wanted to test how Java driver
behaves during failover and I am puzzled to say a least.

As soon as I CTRL-C to kill primary mongod server, I see in eclipse
following messages..

===
ARNING: node down: 171.69.120.229:27018 java.io.IOException: couldn't
connect to [/171.69.120.229:27018] bc:java.net.ConnectException:
Connection refused: connect
Oct 28, 2010 2:33:38 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: 171.69.120.229:27018 java.io.IOException: couldn't
connect to [/171.69.120.229:27018] bc:java.net.ConnectException:
Connection refused: connect
Oct 28, 2010 2:33:44 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: 171.69.120.229:27018 java.io.IOException: couldn't
connect to [/171.69.120.229:27018] bc:java.net.ConnectException:
Connection refused: connect
Oct 28, 2010 2:33:50 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: 171.69.120.229:27018 java.io.IOException: couldn't
connect to [/171.69.120.229:27018] bc:java.net.ConnectException:
Connection refused: connect
Oct 28, 2010 2:33:56 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: 171.69.120.229:27018 java.io.IOException: couldn't
connect to [/171.69.120.229:27018] bc:java.net.ConnectException:
Connection refused: connect
Oct 28, 2010 2:34:02 PM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: 171.69.120.229:27018 java.io.IOException: couldn't
connect to [/171.69.120.229:27018] bc:java.net.ConnectException:
Connection refused: connect
Oct 28, 2010 2:34:08 PM com.mongodb.ReplicaSetStatus$Node update
==================================================

Then I try to persist some JSON records and I get following error

Oct 28, 2010 2:28:33 PM com.mongodb.DBTCPConnector$MyPort error
SEVERE: MyPort.error called
com.mongodb.MongoInternalException: DBPort.findOne failed
at com.mongodb.DBPort.findOne(DBPort.java:123)
at com.mongodb.DBPort.runCommand(DBPort.java:129)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:
123)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:153)
at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:289)
at com.mongodb.DBCollection.save(DBCollection.java:534)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:638)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:685)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:679)
at
com.cisco.cas.paymentgateway.CreditCardProfileThread.run(CreditCardProfileThread.java:
22)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException
at org.bson.io.Bits.readFully(Bits.java:32)
at com.mongodb.Response.<init>(Response.java:34)
at com.mongodb.DBPort.go(DBPort.java:95)
at com.mongodb.DBPort.findOne(DBPort.java:115)
... 10 more

I believe Java driver 2.2 figures it out which is new primary but that
is not what I am observing.

Once I get an exception, I also wait for 20 seconds for replica set to
elect a new primary.
I am not sure flow how Java driver and application should behave
once primary fails.
I am not sure it clearly documented with use case. This is very
typical deployment where primary fails and secondary takes over and
application logic should be oblivious to internal failure/recover
mechanism

Please enlighten me

Thanks
Rajan Bhatt

Eliot Horowitz

unread,

Oct 28, 2010, 9:46:09 PM10/28/10

to mongod...@googlegroups.com

The first write after a failure of the master fails might fail.
The driver tries to pre-empt that, but maybe could be better.
Can you try with the new 2.3 version?

> --
> You received this message because you are subscribed to the Google Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com.
> To unsubscribe from this group, send email to mongodb-user...@googlegroups.com.
> For more options, visit this group at http://groups.google.com/group/mongodb-user?hl=en.
>
>

Robert Stewart

unread,

Oct 29, 2010, 3:08:34 AM10/29/10

to mongodb-user

I was seeing the same problems that Rajan reported with the 2.2 driver
earlier today. The 2.3 driver is much better. In fact, it works
exactly as expected.

I've got an app that is using Log4mongo-java to store log events in a
collection. I started the app with Log4mongo-java pointed at my three-
member replica set running on localhost with one of the members set to
arbiterOnly.

While the app was logging to primary, I killed primary. Secondary
elected itself to be the new primary after a delay of less than two
seconds.

My app logged the following to the console:

log4j:ERROR Failed to insert document to MongoDB
com.mongodb.MongoException$Network: can't say something
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:169)
...
Caused by: java.net.SocketException: Broken pipe
at java.net.SocketOutputStream.socketWrite0(Native Method)
...

I then used the mongo shell to connect to the new primary and
confirmed that it missed only the statements that were logged in the
short period between me killing the first instance and the second
instance taking over as primary.

Dan Harvey

unread,

Oct 29, 2010, 10:49:21 AM10/29/10

to mongodb-user

I am seeing the same thing too. I was using the 2.2 driver but now
using the 2.3 driver things don't seem to have changed.

I am connecting to a 3 server replica set with :-

MongoOptions mo = new MongoOptions();
mo.autoConnectRetry = true;

Mongo mongo = new Mongo(Arrays.asList(new ServerAddress("data-
node-1"), new ServerAddress("data-node-2"), new ServerAddress("data-
node-3")), mo);
DB db = mongo.getDB("dm");

DBCollection docs = db.getCollection("documents");

Then have a loop of id's I'm grabbing to benchmark the setup :-

for (String id: ids) {
// Do stats on response time
DBObject key = new BasicDBObject();
key.put("id", id);

// Benchmark this
DBCursor results = docs.find(key);
}
When I kill the Primary node on the cluster, one of the other two
servers take over as Primary, but I end up with the following errors
from the driver whilst being in the loop :-

29-Oct-2010 15:37:37 com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: data-node-1:27017 java.net.SocketTimeoutException:
Read timed out
29-Oct-2010 15:37:55 com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: data-node-1:27017 java.io.IOException: couldn't
connect to [data-node-1/192.168.2.146:27017]
bc:java.net.NoRouteToHostException: No route to host
29-Oct-2010 15:38:06 com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: data-node-1:27017 java.io.IOException: couldn't
connect to [data-node-1/192.168.2.146:27017]
bc:java.net.NoRouteToHostException: No route to host
29-Oct-2010 15:38:17 com.mongodb.ReplicaSetStatus$Node update
(and so on..)

Do I need to connect in a different way to allow the driver to
automatically use the new Primary server?

Thanks,

Eliot Horowitz

unread,

Oct 29, 2010, 11:02:14 AM10/29/10

to mongod...@googlegroups.com

You're connecting correctly.
You'll see those messages for now - its a background thread checking things.
Does your app work though?

Dan Harvey

unread,

Oct 29, 2010, 1:30:17 PM10/29/10

to mongodb-user

No the app doesn't work, adding a print statement in the loop to see
what results are returned I find that no more are output once the
primary node has switched over.
I've just ran it again and after 5 minutes it is still just
outputting :-

29-Oct-2010 18:28:34 com.mongodb.ReplicaSetStatus$Node update

WARNING: node down: data-node-1:27017 java.io.IOException: couldn't
connect to [data-node-1/192.168.2.146:27017]
bc:java.net.NoRouteToHostException: No route to host

Recursively, it seems like it's not finding the new primary server?

Eliot Horowitz

unread,

Oct 29, 2010, 1:31:42 PM10/29/10

to mongod...@googlegroups.com

Can you send the code you're using to test?
We have a similar test that works correctly.

Those messages doesn't mean its not finding the master, just that its
found a down node

Dan

unread,

Oct 29, 2010, 1:36:29 PM10/29/10

to mongod...@googlegroups.com

yes sure, I've put the code up here on gist http://gist.github.com/653968

Eliot Horowitz

unread,

Oct 29, 2010, 1:39:55 PM10/29/10

to mongod...@googlegroups.com

The first request after the master dies will fail.
So if you put a try/catch inside the for loop, you should get a couple
failures while a new master is elected, then should start working.

Dan

unread,

Oct 29, 2010, 3:32:54 PM10/29/10

to mongod...@googlegroups.com

I've added the try catch for all runtime exceptions around the inner loop (which I think is what the driver throws?) then tried the Primary node fail over again.

What happens is that as soon as I kill the node it will stop the queries taking place, and just outputting that it can't connect to the node that just failed, it doesn't print out any exceptions from the attempts with the find() or getting data from the DBCursor.

After about 15 minutes it will output the following exception, and start working again :-

29-Oct-2010 19:56:47 com.mongodb.DBTCPConnector _error

WARNING: replica set mode, switching master

java.net.SocketException: No route to host

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:146)

at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)

at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)

at java.io.BufferedInputStream.read(BufferedInputStream.java:334)

at org.bson.io.Bits.readFully(Bits.java:35)

at org.bson.io.Bits.readFully(Bits.java:28)

at com.mongodb.Response.<init>(Response.java:35)

at com.mongodb.DBPort.go(DBPort.java:101)

at com.mongodb.DBPort.go(DBPort.java:66)

at com.mongodb.DBPort.call(DBPort.java:56)

at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)

at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:284)

at com.mongodb.DBCursor._check(DBCursor.java:297)

at com.mongodb.DBCursor._hasNext(DBCursor.java:420)

at com.mongodb.DBCursor.hasNext(DBCursor.java:445)

at com.mendeley.catalog.serving.test.ReadBenchmark.main(ReadBenchmark.java:55)

29-Oct-2010 19:56:47 com.mongodb.DBTCPConnector$MyPort error

SEVERE: MyPort.error called

java.net.SocketException: No route to host

at java.net.SocketInputStream.socketRead0(Native Method)

at java.net.SocketInputStream.read(SocketInputStream.java:146)

at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)

at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)

at java.io.BufferedInputStream.read(BufferedInputStream.java:334)

at org.bson.io.Bits.readFully(Bits.java:35)

at org.bson.io.Bits.readFully(Bits.java:28)

at com.mongodb.Response.<init>(Response.java:35)

at com.mongodb.DBPort.go(DBPort.java:101)

at com.mongodb.DBPort.go(DBPort.java:66)

at com.mongodb.DBPort.call(DBPort.java:56)

at com.mongodb.DBTCPConnector.call(DBTCPConnector.java:211)

at com.mongodb.DBApiLayer$MyCollection.__find(DBApiLayer.java:284)

at com.mongodb.DBCursor._check(DBCursor.java:297)

at com.mongodb.DBCursor._hasNext(DBCursor.java:420)

at com.mongodb.DBCursor.hasNext(DBCursor.java:445)

at com.mendeley.catalog.serving.test.ReadBenchmark.main(ReadBenchmark.java:55)

So I guess it's just trying a few too many times to re-connect to the previous primary before it fail-over? In the mongod http interface the primary is elected with 30s.

Where do you set how many retries / a time out for failing over to the new primary?

Thanks,

Eliot Horowitz

unread,

Oct 29, 2010, 3:34:46 PM10/29/10

to mongod...@googlegroups.com

It should do it at the first failure.
Can you send the current version of the code - will give it a try.

Dan

unread,

Oct 30, 2010, 8:49:15 AM10/30/10

to mongod...@googlegroups.com

The latest code with the try catch is here :- http://gist.github.com/655263

I think it is failing over on the first failure, but taking 15 minutes to actually fail the find()?

Thanks for the fast replies!

Eliot Horowitz

unread,

Oct 31, 2010, 8:53:50 PM10/31/10

to mongod...@googlegroups.com

How are you killing the master?

Dan

unread,

Nov 1, 2010, 5:58:28 AM11/1/10

to mongod...@googlegroups.com

Doing ifconfig eth0 down on that node, so to the others the machine will appear to have gone down. I guess another way could be to kill -9 the daemon, but I don't see there should be much difference between the two from the point of view of the rest of the system?

I've got three Debian instances setup in virtualbox on my machine each with mongoDB as part of the replica set.

Eliot Horowitz

unread,

Nov 1, 2010, 8:46:03 AM11/1/10

to mongod...@googlegroups.com

The difference is that on an if down the socket may not fail quickly. On a kill -9 it would. You can set a socket timeout of you know all your queries should be fast

Dan

unread,

Nov 1, 2010, 9:19:57 AM11/1/10

to mongod...@googlegroups.com

Ah ok, kill -9 works as expected failing-over after 15 seconds. But would the same thing with socket time-outs not happen if for example the machine completely failed (power failure) cauing the network to be taken down like this? So right now the socket time-out should be a lot less to come with time-outs like this?

If that is the case would there possibility a way to include a heart beat in the client? maybe putting a "ping" in the protocol whilst either the client/server is blocking until they reply?

I've also noticed that if you take the network down rather than kill -9 the daemon the fail-over on the replica set nodes takes long (about 30s rarther than < 1s), and the secondary nodes end up in the state of "recovering" more often. With the design of the failure detection for mongodb does it seem to work better when sockets are closed rather than being left hanging?

Thanks,

Eliot Horowitz

unread,

Nov 1, 2010, 11:27:08 AM11/1/10

to mongod...@googlegroups.com

There is a heartbeat but it can't really interrupt an existing request.
As soon as the heartbeat detects a problem will re-route.

As for failover time, its the same issue. Takes longer to detect
network issues that hard issues.

Dan

unread,

Nov 1, 2010, 11:50:05 AM11/1/10

to mongod...@googlegroups.com

I think for us it's fine to set the socket timeout lower, so I'll do that to help us with those type of failure.

I'm impressed with the speed of replies to this mailing list too, so thanks for helping to speed debugging this up!

RajAhm

unread,

Nov 3, 2010, 6:10:27 PM11/3/10

to mongodb-user

Does Morphia DataStore Object needs to re-created after Master
failover ?

ds = morphia.createDatastore( m,
mProps.getProperty("mongoDBDBName") );

Where ds Morphia Datastore object and m is Mongo instance.

In my design I have two asynchronous threads running. One thread which
is waiting on Blocking queue to save Object which is annotated with
Morphia annotations and another thread directly use BasicDBObject for
persistence. This BasicDBObject is effectively able to save after
Master failover where Morphia annotated is not able to save even after
failover.
Should I create new datastore object after Master failover or Morphia
should take care of it ?

Rajan

On Nov 1, 8:50 am, Dan <danharve...@gmail.com> wrote:
> I think for us it's fine to set the socket timeout lower, so I'll do that to
> help us with those type of failure.
>
> I'm impressed with the speed of replies to this mailing list too, so thanks
> for helping to speed debugging this up!
>

> On Mon, Nov 1, 2010 at 3:27 PM, Eliot Horowitz <eliothorow...@gmail.com>wrote:
>
> > There is a heartbeat but it can't really interrupt an existing request.
> > As soon as the heartbeat detects a problem will re-route.
>
> > As for failover time, its the same issue. Takes longer to detect
> > network issues that hard issues.
>

> > On Mon, Nov 1, 2010 at 9:19 AM, Dan <danharve...@gmail.com> wrote:
> > > Ah ok, kill -9 works as expected failing-over after 15 seconds. But would
> > > the same thing with socket time-outs not happen if for example the
> > > machine completely failed (power failure) cauing the network to be taken
> > > down like this? So right now the socket time-out should be a lot less to
> > > come with time-outs like this?
> > > If that is the case would there possibility a way to include a heart beat
> > in
> > > the client? maybe putting a "ping" in the protocol whilst either the
> > > client/server is blocking until they reply?
> > > I've also noticed that if you take the network down rather than kill -9
> > the
> > > daemon the fail-over on the replica set nodes takes long (about 30s
> > rarther
> > > than < 1s), and the secondary nodes end up in the state of "recovering"
> > more
> > > often. With the design of the failure detection for mongodb does it seem
> > to
> > > work better when sockets are closed rather than being left hanging?
> > > Thanks,

> > > On Mon, Nov 1, 2010 at 12:46 PM, Eliot Horowitz <eliothorow...@gmail.com

>
> > > wrote:
>
> > >> The difference is that on an if down the socket may not fail quickly. On
> > a
> > >> kill -9 it would. You can set a socket timeout of you know all your
> > queries
> > >> should be fast
>

> > >> On Nov 1, 2010, at 5:58 AM, Dan <danharve...@gmail.com> wrote:
>
> > >> Doing ifconfig eth0 down on that node, so to the others the machine will
> > >> appear to have gone down. I guess another way could be to kill -9 the
> > >> daemon, but I don't see there should be much difference between the two
> > from
> > >> the point of view of the rest of the system?
> > >> I've got three Debian instances setup in virtualbox on my machine each
> > >> with mongoDB as part of the replica set.
>
> > >> On Mon, Nov 1, 2010 at 12:53 AM, Eliot Horowitz <

> > eliothorow...@gmail.com>

> > >> wrote:
>
> > >>> How are you killing the master?
>

> > >>> On Sat, Oct 30, 2010 at 8:49 AM, Dan <danharve...@gmail.com> wrote:
> > >>> > The latest code with the try catch is here

> > >>> > :-http://gist.github.com/655263

> > >>> > I think it is failing over on the first failure, but taking
> > >>> > 15 minutes to
> > >>> > actually fail the find()?
> > >>> > Thanks for the fast replies!
> > >>> > On Fri, Oct 29, 2010 at 8:34 PM, Eliot Horowitz

> > >>> > <eliothorow...@gmail.com>

> > >>> > wrote:
>
> > >>> >> It should do it at the first failure.
> > >>> >> Can you send the current version of the code - will give it a try.
>

> > >>> >> > <eliothorow...@gmail.com>

> > >>> >> > wrote:
>
> > >>> >> >> The first request after the master dies will fail.
> > >>> >> >> So if you put a try/catch inside the for loop, you should get a
> > >>> >> >> couple
> > >>> >> >> failures while a new master is elected, then should start
> > working.
>

> > >>> >> >> On Fri, Oct 29, 2010 at 1:36 PM, Dan <danharve...@gmail.com>

> > wrote:
> > >>> >> >> > yes sure, I've put the code up here on

> > >>> >> >> > gisthttp://gist.github.com/653968

> > >>> >> >> > On Fri, Oct 29, 2010 at 6:31 PM, Eliot Horowitz

> > >>> >> >> > <eliothorow...@gmail.com>

> > >>> >> >> > wrote:
>
> > >>> >> >> >> Can you send the code you're using to test?
> > >>> >> >> >> We have a similar test that works correctly.
>
> > >>> >> >> >> Those messages doesn't mean its not finding the master, just
> > >>> >> >> >> that
> > >>> >> >> >> its
> > >>> >> >> >> found a down node
>
> > >>> >> >> >> On Fri, Oct 29, 2010 at 1:30 PM, Dan Harvey

> > >>> >> >> >> <danharve...@gmail.com>

> ...
>
> read more »

RajAhm

unread,

Nov 3, 2010, 6:11:30 PM11/3/10

to mongodb-user

Does Morphia DataStore Object needs to re-created after Master
failover ?

ds = morphia.createDatastore( m,
mProps.getProperty("mongoDBDBName") );

Where ds Morphia Datastore object and m is Mongo instance.

In my design I have two asynchronous threads running. One thread which
is waiting on Blocking queue to save Object which is annotated with
Morphia annotations and another thread directly use BasicDBObject for
persistence. This BasicDBObject is effectively able to save after
Master failover where Morphia annotated is not able to save even after
failover.
Should I create new datastore object after Master failover or Morphia
should take care of it ?

Rajan

On Nov 1, 8:50 am, Dan <danharve...@gmail.com> wrote:

> I think for us it's fine to set the socket timeout lower, so I'll do that to
> help us with those type of failure.
>
> I'm impressed with the speed of replies to this mailing list too, so thanks
> for helping to speed debugging this up!
>

> On Mon, Nov 1, 2010 at 3:27 PM, Eliot Horowitz <eliothorow...@gmail.com>wrote:
>
> > There is a heartbeat but it can't really interrupt an existing request.
> > As soon as the heartbeat detects a problem will re-route.
>
> > As for failover time, its the same issue. Takes longer to detect
> > network issues that hard issues.
>

> > On Mon, Nov 1, 2010 at 9:19 AM, Dan <danharve...@gmail.com> wrote:
> > > Ah ok, kill -9 works as expected failing-over after 15 seconds. But would
> > > the same thing with socket time-outs not happen if for example the
> > > machine completely failed (power failure) cauing the network to be taken
> > > down like this? So right now the socket time-out should be a lot less to
> > > come with time-outs like this?
> > > If that is the case would there possibility a way to include a heart beat
> > in
> > > the client? maybe putting a "ping" in the protocol whilst either the
> > > client/server is blocking until they reply?
> > > I've also noticed that if you take the network down rather than kill -9
> > the
> > > daemon the fail-over on the replica set nodes takes long (about 30s
> > rarther
> > > than < 1s), and the secondary nodes end up in the state of "recovering"
> > more
> > > often. With the design of the failure detection for mongodb does it seem
> > to
> > > work better when sockets are closed rather than being left hanging?
> > > Thanks,

> > > On Mon, Nov 1, 2010 at 12:46 PM, Eliot Horowitz <eliothorow...@gmail.com

>
> > > wrote:
>
> > >> The difference is that on an if down the socket may not fail quickly. On
> > a
> > >> kill -9 it would. You can set a socket timeout of you know all your
> > queries
> > >> should be fast
>

> > >> On Nov 1, 2010, at 5:58 AM, Dan <danharve...@gmail.com> wrote:
>
> > >> Doing ifconfig eth0 down on that node, so to the others the machine will
> > >> appear to have gone down. I guess another way could be to kill -9 the
> > >> daemon, but I don't see there should be much difference between the two
> > from
> > >> the point of view of the rest of the system?
> > >> I've got three Debian instances setup in virtualbox on my machine each
> > >> with mongoDB as part of the replica set.
>
> > >> On Mon, Nov 1, 2010 at 12:53 AM, Eliot Horowitz <

> > eliothorow...@gmail.com>

> > >> wrote:
>
> > >>> How are you killing the master?
>

> > >>> On Sat, Oct 30, 2010 at 8:49 AM, Dan <danharve...@gmail.com> wrote:
> > >>> > The latest code with the try catch is here

> > >>> > :-http://gist.github.com/655263

> > >>> > I think it is failing over on the first failure, but taking
> > >>> > 15 minutes to
> > >>> > actually fail the find()?
> > >>> > Thanks for the fast replies!
> > >>> > On Fri, Oct 29, 2010 at 8:34 PM, Eliot Horowitz

> > >>> > <eliothorow...@gmail.com>

> > >>> > wrote:
>
> > >>> >> It should do it at the first failure.
> > >>> >> Can you send the current version of the code - will give it a try.
>

> > >>> >> > <eliothorow...@gmail.com>

> > >>> >> > wrote:
>
> > >>> >> >> The first request after the master dies will fail.
> > >>> >> >> So if you put a try/catch inside the for loop, you should get a
> > >>> >> >> couple
> > >>> >> >> failures while a new master is elected, then should start
> > working.
>

> > >>> >> >> On Fri, Oct 29, 2010 at 1:36 PM, Dan <danharve...@gmail.com>

> > wrote:
> > >>> >> >> > yes sure, I've put the code up here on

> > >>> >> >> > gisthttp://gist.github.com/653968

> > >>> >> >> > On Fri, Oct 29, 2010 at 6:31 PM, Eliot Horowitz

> > >>> >> >> > <eliothorow...@gmail.com>

> > >>> >> >> > wrote:
>
> > >>> >> >> >> Can you send the code you're using to test?
> > >>> >> >> >> We have a similar test that works correctly.
>
> > >>> >> >> >> Those messages doesn't mean its not finding the master, just
> > >>> >> >> >> that
> > >>> >> >> >> its
> > >>> >> >> >> found a down node
>
> > >>> >> >> >> On Fri, Oct 29, 2010 at 1:30 PM, Dan Harvey

> > >>> >> >> >> <danharve...@gmail.com>

> ...
>
> read more »

Scott Hernandez

unread,

Nov 3, 2010, 8:40:24 PM11/3/10

to mongod...@googlegroups.com

On Wed, Nov 3, 2010 at 3:10 PM, RajAhm <rajan...@sbcglobal.net> wrote:
> Does Morphia DataStore Object needs to re-created after Master
> failover ?

No, it uses the underlying Mongo instance, hence the Datastore
construction below.

> ds = morphia.createDatastore( m,
> mProps.getProperty("mongoDBDBName") );
>
> Where ds Morphia Datastore object and m is Mongo instance.
>
> In my design I have two asynchronous threads running. One thread which
> is waiting on Blocking queue to save Object which is annotated with
> Morphia annotations and another thread directly use BasicDBObject for
> persistence. This BasicDBObject is effectively able to save after
> Master failover where Morphia annotated is not able to save even after
> failover.

Are you checking for write errors? By default the datastore will make
all writes "Safe" so that exceptions can be raised. This is not the
default for the driver writes.

> Should I create new datastore object after Master failover or Morphia
> should take care of it ?

No, unless you are creating a new Mongo instance.

RajAhm

unread,

Nov 4, 2010, 1:08:11 PM11/4/10

to mongodb-user

Thanks Scott for reply.

I am still running into some issues.
This is my sample code. In nutshell what I am doing is, this thread is
blocking new data to be inserted in Mongo.

===========================================
public class CreditCardProfileThread implements Runnable {

@Override
public void run() {
CreateCustomerProfile ccProf = null;
// TODO Auto-generated method stub
System.out
.println(" I am in Create Credit Card Profile runnable interface
");
while (true) {
try {
System.out
.println(" I am blocking on queue for Create Credit Card
Profile");
ccProf = CreditCardInfoImpl.CreditCardProfilequeue.take();

System.out
.println(" I am just unblocked and saving data - Credit Card
Profile ");

CreditCardInfoImpl.ds.save(ccProf);

} catch (InterruptedException e) {
// TODO Auto-generated catch block
System.out.println(" I am here1");
// e.printStackTrace();
} catch (Exception ie) {

System.out.println(" I am here2");
System.out.println(" Exception caught : " + ie.getMessage());
try {
System.out.println(" Sleeping starts ");
Thread.sleep(20000);
System.out.println(" wakeup thread ");
CreditCardInfoImpl.createNewMorphiaDataStore();
} catch (InterruptedException e) {
// TODO Auto-generated catch block
System.out.println(" I am interrupted Create Card Profile Thread
");
// e.printStackTrace();
}

// Let us drain all elements to Collection of type Vector while we
were down
Vector<CreateCustomerProfile> vct = new
Vector<CreateCustomerProfile>();
CreditCardInfoImpl.CreditCardProfilequeue.drainTo(vct);
// Add the element which we already removed from queue
vct.add(ccProf);
for( CreateCustomerProfile c : vct )
{
System.out.println(" Saving a Create profile ");
CreditCardInfoImpl.ds.save(c);
}

}

}
========
If thread gets data, it wakes up stores it ( This is a good case where
all Replica set members are running ).

When I kill ( CTRL-C) Primary member. I get an exception and at that
new Primary member election starts. It may take say 10 seconds. I
sleep for 20 seconds ans then try to save data again then I get
following exception. It seems that this thread dies after trying to
save after 20 seconds sleeps..

ov 4, 2010 9:47:03 AM com.mongodb.DBTCPConnector$MyPort error
SEVERE: MyPort.error called
com.mongodb.MongoInternalException: DBPort.findOne failed
at com.mongodb.DBPort.findOne(DBPort.java:129)
at com.mongodb.DBPort.runCommand(DBPort.java:135)
at com.mongodb.DBTCPConnector._checkWriteError(DBTCPConnector.java:
121)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:157)
at com.mongodb.DBTCPConnector.say(DBTCPConnector.java:141)
at com.mongodb.DBApiLayer$MyCollection.update(DBApiLayer.java:317)

at com.mongodb.DBCollection.save(DBCollection.java:534)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:638)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:685)
at com.google.code.morphia.DatastoreImpl.save(DatastoreImpl.java:679)
at
com.cisco.cas.paymentgateway.CreditCardProfileThread.run(CreditCardProfileThread.java:
22)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.EOFException

at org.bson.io.Bits.readFully(Bits.java:37)

at org.bson.io.Bits.readFully(Bits.java:28)
at com.mongodb.Response.<init>(Response.java:35)
at com.mongodb.DBPort.go(DBPort.java:101)
at com.mongodb.DBPort.go(DBPort.java:66)

at com.mongodb.DBPort.findOne(DBPort.java:121)
... 11 more
I am here2
Exception caught : DBPort.findOne failed
Sleeping starts
Nov 4, 2010 9:47:09 AM com.mongodb.ReplicaSetStatus$Node update
WARNING: node down: 171.69.120.229:27018 java.io.IOException: couldn't