Hi!
We are running a popular iPhone messenger,
http://pushme.to on a
replica set database on Ubuntu 10.04 and 10.10, MongoDB 10.6.3. We use
pecl Mongo extension 1.0.10 with persistent connections enabled; APC
is also used. PHP is in FastCGI mode.
The problem: PHP extension seems like almost never able to failover
whenever master server changes.
In the simpliest form, it's just enough to say "rs.stepDown()" on the
master server to make all php-cgi processes go dead. They throw
exception "Not master" and never fail over to a new master.
The more complicated error happens when the master suddenly dies. In
this case php-cgi processes just saty in a blocking mode, and only
kill -9 helps. Even after a few minutes after the master has been
elected.
If I shut down the master mongod and new master is not elected
immediately (let's say about 30 seconds passes before another node
becomes master), then PHP extension is not able to detect new master
and php-cgi processes do hang.
And the best part: when master change suddenly happens in
production. :)
Well, we only have four servers in the cluster for
pushme.to and it's
not really hard to killall php-cgi on all of them, but doesn't it just
defeat the purpose of replica set?..
These situations are 100% repeatable.
We are willing to help test possible solutions (even on production).
Are there any?