Re: [mongodb-user] PHP Driver 1.3.1 slow connection, replica set

504 views
Skip to first unread message

Hannes Magnusson

unread,
Dec 13, 2012, 1:25:35 PM12/13/12
to mongod...@googlegroups.com
On Thu, Dec 13, 2012 at 4:31 AM, Dmitry Sinev <dmitry...@gmail.com> wrote:
> Hi,
>
> I have a problem with a slow connection to the replica set with one member
> down. My replica configuration is 2 replicas + 1 arbiter.
> My connection string is:
> $connection = new MongoClient(
> 'mongodb://localhost:27017',
> array('connect' => true, 'replicaSet' => 'storage', 'readPreference' =>
> 'primary')
> );
> Each connection takes just a bit more that 2 seconds every time I'm trying
> to connect.
> I tried a direct connection and it works very fast so it is a replica set
> connection issue, I suppose. I tested different connection parameters
> without success.


Are you using windows? Where is the other replicaset member located?
Is it accessible to your network?
How is the other member 'down'? (not running mongodb, server offline,
firewalled...).

What is the actual hostname of the server you are trying to connect
to? (i.e. what does rs.status() return as the hostname), try
connecting to that hostname instead of 'localhost', or 127.0.0.1.

-Hannes

Dmitry Sinev

unread,
Dec 13, 2012, 4:25:56 PM12/13/12
to mongod...@googlegroups.com
I'm using CentOS Linux, it's a production server, PHP 5.4.8 with mongo 1.3.1. Other replica sets are located at the nearby datacenter - 1.5ms 5 hops away for higher availability.
Server with replica was turned off completely - server has failed. It is recovered at the moment and the problem has gone away.
I'm connecting to the local instance (storage1-2.company.com:27017) as localhost and didn't tried to connect with a full external host name like storage1-2.company.com, do you think this might be the issue? If you think so, I can imitate this situation by putting offline one of the servers.

Here is rs.status() output:

storage:PRIMARY> rs.status();
{
"set" : "storage",
"date" : ISODate("2012-12-13T12:34:37Z"),
"myState" : 1,
"members" : [
{
"_id" : 0,
"health" : 0,
"state" : 8,
"stateStr" : "(not reachable/healthy)",
"uptime" : 0,
"optime" : Timestamp(1355392452000, 119),
"optimeDate" : ISODate("2012-12-13T09:54:12Z"),
"lastHeartbeat" : ISODate("2012-12-13T09:55:23Z"),
"pingMs" : 0,
"errmsg" : "socket exception [CONNECT_ERROR] for storage1-1.company.com:27017"
},
{
"_id" : 1,
"name" : "storage1-2.company.com:27017",
"health" : 1,
"state" : 1,
"stateStr" : "PRIMARY",
"uptime" : 845986,
"optime" : Timestamp(1355402041000, 251),
"optimeDate" : ISODate("2012-12-13T12:34:01Z"),
"self" : true
},
{
"_id" : 2,
"health" : 1,
"state" : 7,
"stateStr" : "ARBITER",
"uptime" : 597888,
"lastHeartbeat" : ISODate("2012-12-13T12:34:37Z"),
"pingMs" : 97
}
],
"ok" : 1
}

Thank you.

Hannes Magnusson

unread,
Dec 13, 2012, 5:27:57 PM12/13/12
to mongod...@googlegroups.com
"localhost" sometimes maps to different things then 127.0.0.1, and
your mongdb instance could be only listening to a specific ip rather
then all interfaces.

Also, the driver does sanitychecking on the seedlist you provide it,
if the hostname you provide us with does not match the hostname the
replicaset uses internally we will disconnect and reconnect to the
correct one to avoid any issues.

Are you however saying that the issue has gone away after the server
that was down came up again?
The example you provided, is that the actual connection string you
use? - Only one server in the seedlist?

-Hannes
> --
> You received this message because you are subscribed to the Google
> Groups "mongodb-user" group.
> To post to this group, send email to mongod...@googlegroups.com
> To unsubscribe from this group, send email to
> mongodb-user...@googlegroups.com
> See also the IRC channel -- freenode.net#mongodb

Dmitry Sinev

unread,
Dec 14, 2012, 5:51:29 AM12/14/12
to mongod...@googlegroups.com
Yes, the problem has gone when all replica set members are up again and it doesn't appear when I connect to the mongo PRIMARY without replicaSet parameter.
This is an actual connection string - only one server in seedlist - localhost.
I did some more testing and here what I have found:
1. When I kill mongod on the second server everything works fine.
2. When I turn off the server completely the problem appears. 2 seconds takes to connect to replica as localhost
3. If I change localhost to the globally available DNS name (storage1-2.company.com) as it is defined in replica set the problem doesn't go away but instead 2 seconds connection time I get only 1 second.
4. When the server boots up again and mongod starts up all connection variations works fine again.

Hope this will help. Thank you.

Dmitry Sinev

unread,
Jan 21, 2013, 5:16:37 AM1/21/13
to mongod...@googlegroups.com
The same problem with the latest PHP driver version - 1.3.3
Can anyone suggest on how to avoid this issue?

I tried to change server connection string from localhost to the real server name as it defined at the replica set configuration, this doesn't solved the problem but improves connection time from 2 seconds to 1 second.
As a temporary solution, I do not use replica set and connect to the server directly when the other replica is down.

Thank you.

Hannes Magnusson

unread,
Jan 21, 2013, 6:32:36 AM1/21/13
to mongod...@googlegroups.com
This does sound like a dns resolving problem on your network.
This should only happen on the first request to each process though,
as all other subsequent requests will use the persistent connections
we create at the startup.

Are you entirely 100% sure that your network doesn't do some port
routing and filtering when a server is down?

-Hannes

Dmitry Sinev

unread,
Jan 21, 2013, 9:12:48 AM1/21/13
to mongod...@googlegroups.com
Thank you for your response.
I'm sure, it's not a DNS problem, the setup is very simple. The php is running in PHP-FPM mode with FastCGI interface to Nginx server, maybe this is the issue?

I believe I can replicate this problem at Amazon EC2 and give you an access to it if this will help you and your team.

Hannes Magnusson

unread,
Jan 21, 2013, 10:39:47 PM1/21/13
to mongod...@googlegroups.com
That would be great if you could!

I've never seen this before, even running PHP-FPM & nginx.

-Hannes

John

unread,
Jan 25, 2013, 3:11:26 AM1/25/13
to mongod...@googlegroups.com
I'm hitting the exact same issue. I started testing MongoDB failover by manually failing a secondary and immediately hit this. Response time degrades by approx 1000x (ex. from 0.0052609444s to 4.0203089714s for an individual query). After digging around it appears to be multiple verification passes against the failed member (each 1000ms). Not sure if that is "as designed" or a bug. Maybe someone could chime in on that or tell me where I'm going wrong?





Hannes Magnusson

unread,
Jan 25, 2013, 5:16:13 PM1/25/13
to mongod...@googlegroups.com
This is really weird
What is your exact connection string?

AFAICT you are using replicaset with authentication, but the
authentication request is failing.
Furthermore we seem to get duplicate primary connection, with two
different credentials?

-Hannes

Dmitry Sinev

unread,
Jan 27, 2013, 12:07:11 PM1/27/13
to mongod...@googlegroups.com
Hi Hannes,

I was able to reproduce the problem on EC2 cloud instances. This happens only when the second replica is turned off, stopping mongodb doesn't create this problem. Iptables DROP filter works too.
I'm writing you a direct email right now with access key, setup description and steps to reproduce it. Hope you can fix this issue.

Thank you.

John

unread,
Jan 27, 2013, 1:30:19 PM1/27/13
to mongod...@googlegroups.com
@Hannes. I suspect the credential stuff is just a red herring / distraction. We use a few different credentials during normal operation (read, readwrite, raw) and my test code was accidentally mixing them up. Also, my test code was deliberately making two similar (but not exactly duplicate) calls back to back so I could compare them. 

Having said all that.I think the real issue is https://jira.mongodb.org/browse/PHP-355 (which you are already involve in). As a work-around, I no longer inform the driver of the ReplicaSet and track the primary myself. If it fails I query the other servers in the set until one arises. Sprinkle in a little memcache and I avoid the timeouts associated with ongoing checking of non-primaries. We're not doing any fancy distributed reads so this strategy seems to work just fine.

Hannes Magnusson

unread,
Jan 28, 2013, 6:11:55 PM1/28/13
to mongod...@googlegroups.com
Wait wait wait.

This is an expected behaviour.

Upon initial connection to mongod of each PHP process we will connect
to all the members of the replicaset.
If the server is completely turned off then we have to wait until the
packages sent figureout that there is no server up with that ip
address and go back home to us and let us know (or timeout).

If the server is turned on the packages will wind up on the correct
server - if mongod isn't running, the server will let us know in
timely fashion and the connection establishment fails fast.

We will monitor the status of the replicaset and every once in a while
check if the server has come online again.
Now, everytime you instanciate a new MongoClient() object, we will run
through the seedlist and and check for changes of the topology and
therefore attempt to connect to the server which is down, again.

To configure this connection timeout you can pass in "connectTimeoutMS" option:
$mongoclient = new MongoClient("host1,host2", array("replicaSet" =>
"name", "connectTimeoutMS" => 500)); // Wait half a second

(This option will be in 1.3.4, along with the "socketTimeoutMS" which
controls data transmission timeouts).



Now. Writing up this reply I realise how stupid this behaviour is.
Since the connections are persistent, there is no reason for us to
attempt to reconnect to the failed server on the next request as we
already know it is down.
We should be waiting until we hit the ping/ismaster interval to
attempt to reconnect to that server.


I'll create a ticket for this. Thanks for the headsup!

-Hannes
> --

Hannes Magnusson

unread,
Jan 29, 2013, 12:24:55 AM1/29/13
to mongod...@googlegroups.com
Thank you!
This is now fixed in my branch on github;
https://github.com/bjori/mongo-php-driver/tree/PHP-MARK-AS-DOWN

Still haven't created a ticket for it though ;)

We'll try to include this in the upcoming 1.3.4 release.

For the record; Your demo script now says "Time to connect: 0.00004
seconds" after all the processes have been warmed up :)

-Hannes

Hannes Magnusson

unread,
Jan 29, 2013, 12:43:35 AM1/29/13
to mongod...@googlegroups.com
Here is the ticket for the records: https://jira.mongodb.org/browse/PHP-686

-Hannes

Dmitry Sinev

unread,
Jan 29, 2013, 4:29:51 AM1/29/13
to mongod...@googlegroups.com
Thank you!

connectTimeoutMS and connectTimeoutMS will help to ease the problem, but I don't think that it will solve it.
I restarted php-fpm on the test server and again we have 1 second to connect, much better then it was and now we can make this time even smaller with new options. And there is no problem when all php-fpm processes are warmed-up.
I'm not sure how it will behave on the production servers under heavy load. We have a couple of servers in replica set, each server has 250 php-fpm processes, each process will serve 500 requests and dies, new process will be created instead to avoid possible memory leaks. I'll upgrade once 1.3.4 driver wil be released and let you know.

I have one last question - when we create a new MongoClient('mongodb://localhost:27017', array('connect' => true, 'replicaSet' => 'storage') we will get a list of hosts for storage replica set, mongod server knows the state of each host and can pass it back to the MongoClient. My question is - why do we are trying to connect to the replica set members in DOWN state at all? Maybe we should skip those members and the problem will gone forever?

Thank you for your help!

Hannes Magnusson

unread,
Jan 29, 2013, 3:27:05 PM1/29/13
to mongod...@googlegroups.com
When you use authenticated connections we are not allowed to access
the internal replicaset information until it has been authenticated.
Also, members can disagree on which other members of the replicaset
are up or down due to network fragmentation.. Member a may believe
member b is offline, but member c see both a and b online.
From the webserver, you may not be able to access member a, but can
talk to member b and c.

This means we have to attempt to connect to them all so we can see the
world from the webservers perspective, versus what the individual
members landscape is.

It looks like the fix for this bug (PHP-686) is to drastic change to
include it in the 1.3.4, so we may need to wait until the 1.4.0
release.. Which we currently do not have an estimation for..

-Hannes
> ---
> You received this message because you are subscribed to the Google Groups
> "mongodb-user" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to mongodb-user...@googlegroups.com.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

Dmitry Sinev

unread,
Jan 29, 2013, 3:56:58 PM1/29/13
to mongod...@googlegroups.com
Thank you for the explanation, now I understand that it is not that simple as it might appear.
Will wait for the 1.4.0 release.
Reply all
Reply to author
Forward
0 new messages