hey Jesse, thanks for clarifications. The problem persists, here's the setup :Â
RE 1) I've changed the string to include all 3 members. It appears that MotorRSC knows of the secondary nodes, but can't connect to them (see below)
RE 2) The error "No replica set primary available for query with ReadPreference PRIMARY" has gone away - I was using internal IPs in my replica set settings, so of course Motor couldn't connect to them. I've added the line "DB.read_preference
s = ReadPreference.SECONDARY_PREFERRED" after my connection string, no difference noted.
RE 3) My replica set seems to work fine, here are the steps that cause the problem :Â
A) 3-node set running, when connecting to primary I check rs.status() :Â
{
"set" : "rs01",
"date" : ISODate("2012-10-25T14:03:08Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"name" : "IP_ONE_HERE:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 76,
"optime" : Timestamp(1351164396000, 2),
"optimeDate" : ISODate("2012-10-25T11:26:36Z"),
"self" : true
},
{
"_id" : 1,
"name" : "IP_TWO_HERE:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 76,
"optime" : Timestamp(1351164396000, 2),
"optimeDate" : ISODate("2012-10-25T11:26:36Z"),
"lastHeartbeat" : ISODate("2012-10-25T14:03:08Z"),
"pingMs" : 1
},
{
"_id" : 2,
"name" : "IS_THREE_HERE:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 76,
"optime" : Timestamp(1351164396000, 2),
"optimeDate" : ISODate("2012-10-25T11:26:36Z"),
"lastHeartbeat" : ISODate("2012-10-25T14:03:08Z"),
"pingMs" : 1
}
],
"ok" : 1
}
All seems okay. I have my app running with no errors. I take down PRIMARY, and node #2 takes over. I log into mongo and check rs.status() :Â
{
"set" : "rs01",
"date" : ISODate("2012-10-25T14:16:03Z"),
"myState" : 2,
"syncingTo" : "NEW_PRIMARY_IP:27017",
"members" : [
{
"_id" : 0,
"name" : "IP_ONE_HERE:27017",
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1351164396000, 2),
"optimeDate" : ISODate("2012-10-25T11:26:36Z"),
"lastHeartbeat" : ISODate("2012-10-25T14:15:37Z"),
"pingMs" : 0,
"errmsg" : "socket exception [CONNECT_ERROR] for IP_ONE_HERE:27017"
},
{
"_id" : 1,
"name" : "IP_TWO_HERE:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 64666,
"optime" : Timestamp(1351164396000, 2),
"optimeDate" : ISODate("2012-10-25T11:26:36Z"),
"lastHeartbeat" : ISODate("2012-10-25T14:16:03Z"),
"pingMs" : 2
},
{
"_id" : 2,
"name" : "IP_THREE_HERE:27017",
"health" : 1,
"state" : 2,
"stateStr" : "SECONDARY",
"uptime" : 64666,
"optime" : Timestamp(1351164396000, 2),
"optimeDate" : ISODate("2012-10-25T11:26:36Z"),
"errmsg" : "syncing to: NEW_PRIMARY_IP:27017",
"self" : true
}
],
"ok" : 1
}
All looks good, we have a new PRIMARY running on the second node. Now I fire a request to my app and get the following :Â
[W 121025 10:18:52 iostream:507] Connect error on fd 16: ECONNREFUSED
[W 121025 10:18:52 iostream:507] Connect error on fd 17: ECONNREFUSED
[W 121025 10:18:52 iostream:507] Connect error on fd 15: ECONNREFUSED
[E 121025 10:18:52 ioloop:435] Exception in callback <tornado.stack_context._StackContextWrapper object at 0x1553050>
  Traceback (most recent call last):
   File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 421, in _run_callback
    callback()
   File "/usr/local/lib/python2.7/dist-packages/tornado/stack_context.py", line 229, in wrapped
    callback(*args, **kwargs)
   File "/usr/local/lib/python2.7/dist-packages/motor/__init__.py", line 1324, in _to_list_got_more
    callback(None, error)
   File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 382, in inner
    self.set_result(key, result)
   File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 315, in set_result
    self.run()
   File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 343, in run
    yielded = self.gen.throw(*exc_info)
   File "/mypath/file.py", line 27, in get
    segments, uids, segment_data = yield motor.WaitAllOps(['one', 'two', 'three'])
   File "/usr/local/lib/python2.7/dist-packages/tornado/gen.py", line 335, in run
    next = self.yield_point.get_result()
   File "/usr/local/lib/python2.7/dist-packages/motor/__init__.py", line 1679, in get_result
    raise error
  AttributeError: 'NoneType' object has no attribute 'close'
If I resuscitate my old PRIMARY, everything goes back online and works fine. If I get the first 3 lines of the error msg correctly, it seems like Motor sees the 3 seeds, but gets a refused connection? Odd note : the very first time I tried these steps, when putting PRIMARY back online, I actually got a message saying : "master has changed" (not that it helped, I need it to detect this when going offline).
Cheers
L