C# driver: server ping every 10 seconds causing severe problems

559 views
Skip to first unread message

Khalid Salomão

unread,
Jun 27, 2013, 10:35:08 AM6/27/13
to mongod...@googlegroups.com

We have faced a problem with the MongoDb C# driver connection strategy.

 

We have a replication cluster with MongoDb 2.4.4 on AWS with two large nodes and a small instance for the Arbiter. This cluster had authentication active.

 

We had to run a distributed application that had 10.000 processes doing a simple write operation on the MongoDb cluster every 5-10 minutes. The processes were single-threaded and did only an occasional write operation.

 

The PRIMARY and SECONDARY node was ok with the load, but the ARBITER went crazy with 100% CPU load and becoming unresponsive from time to time.

 

From my investigations, the main problem were with the ping that the driver does every 10 seconds to check if the connection is ok.

 

The log files were huge.... several Gbs... the log setting was 'verbose=false' and 'quiet=true'. Mostly the logs were full with authentication messages that I suppose is the result of the driver ping. The messages was like: "... [conn2466078]  authenticate db: admin { authenticate: 1, user: ... "

 

Shouldn't this ping be replaced by a less disruptive approach?

 

I know the above example is a bit extreme, but the MongoDb was overloaded with a simple operation and the actual write operation wasn't demanding.... Even in scenarios with just a few active connections, the log files are filled with the above authentication message. 

 

Is there any option in the driver to disable this behavior?

 

Thanks,

Regards,

 


craiggwilson

unread,
Jun 27, 2013, 11:53:44 AM6/27/13
to mongod...@googlegroups.com
Hi Khalid, this is very interesting.  Thanks for reporting.  It looks as though there are some server issues at play here in that authentication on arbiters doesn't work.  I assume these authentication messages in your log files are failures.  The server ticket related to this is https://jira.mongodb.org/browse/SERVER-5479.

We stopped creating a new connection for the 10 second pings in version 1.8 (https://jira.mongodb.org/browse/CSHARP-585).  This won't help you do to the above server ticket, but it will reduce log noise related to the heartbeats.  We also have a ticket (https://jira.mongodb.org/browse/CSHARP-676) to stop pinging arbiters at all.  There are some issues with doing that we are trying to work through.  I'd keep an eye on CSHARP-676 and SERVER-5479 for a resolution to this issue.  Please comment/vote on them to  help increase their visibility.

Craig

Khalid Salomão

unread,
Jun 27, 2013, 1:09:23 PM6/27/13
to mongod...@googlegroups.com
Hi Craig,

Thanks for the fast response!

I am not sure if that it is an arbiter authentication failure. I just looked at a single server MongoDb (no cluster), and similar log messages are generated (once every couple of minutes in this test server):
Sat Jun 08 02:37:21.551 [conn473632]  authenticate db: admin { authenticate: 1, user: "sysdba", nonce: "dffe4ad163d3b1a1", key: "f6d8cb38dfe802a62a199fe4d8bb4be6" }

Just another clarification: the arbiter took the hit because it was a small machine, but the other nodes also suffered from a high CPU usage.

In regards to the C# driver, we are using the latest version (1.8.1). I took a very brief look in the code, and my guess is that the heartbeats were the source of all that high server demand. Note that we had 10.000 applications, each with a single connection.

Thanks again,
regards,
Khalid





--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to a topic in the Google Groups "mongodb-user" group.
To unsubscribe from this topic, visit https://groups.google.com/d/topic/mongodb-user/0Ewg8Wg5VA0/unsubscribe.
To unsubscribe from this group and all its topics, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

craiggwilson

unread,
Jun 27, 2013, 1:40:59 PM6/27/13
to mongod...@googlegroups.com
Just to be clear, that is ten thousand separate applications running concurrently?

Khalid Salomão

unread,
Jun 27, 2013, 1:53:27 PM6/27/13
to mongod...@googlegroups.com
Yes! It was a distributed system (with some kind of ETL processing of lots of data), but note that each did only write operations in an interval of 5-10 minutes. So the load should be fairly small.
The system current architecture has already changed, but the MongoDb behavior still worries me.

craiggwilson

unread,
Jun 27, 2013, 2:17:50 PM6/27/13
to mongod...@googlegroups.com
Ok then.  I'm going with the assumption that these applications are always running and they don't get shutdown between uses.  

This means that there is a minimum of ten thousand connections.  Every connection to the server takes about 1 MB of memory, so ten thousand connections take about 10 GB of memory.  Obviously, your quantity of distributed applications imposes a higher server requirement for your arbiter.  CSHARP-676 would be an answer to this in that we just stop monitoring arbiters at all.

In the meantime, it might be in your best interest to shutdown your applications when they aren't in their active state.  If that isn't possible, at least disconnect from the mongodb explicitly.  I hardly ever recommend anyone do this, but you can call server.Disconnect().  Again, this is generally a bad idea, but it will stop the pings to all the servers when the application isn't active.

Any of that helpful?

Khalid Salomão

unread,
Jun 27, 2013, 2:47:27 PM6/27/13
to mongod...@googlegroups.com
Hi Craig,

Thanks again.

Your suggestion was our first approach, we did explicitly call disconnect.

I did not know about the 1 Mb memory for each connection, but it makes sense. I guess there is a new thread per connection.

I am sorry to keep insisting, but I got the impression that (aside the high memory pressure of a lot of connections) the heatbeats did have some impact on the servers (and on the size of the log files). Would be a good idea to set a configuration option to increase the time between heatbeats or some option like a connection lifetime?

Just an idea.

But thanks very much for your time and insights,
Regards,
Khalid

craiggwilson

unread,
Jun 27, 2013, 3:41:40 PM6/27/13
to mongod...@googlegroups.com
No, please keep giving good thoughts.  Yes, 1 thread per connection.  The heartbeats run isMaster on the server, so they shouldn't be killing your servers.  I would assume it's simply the quantity of connections and the state management that is going on.

I added this ticket a little while ago: https://jira.mongodb.org/browse/CSHARP-719 which is to create a setting to allow you to alter the 10 seconds to something else.
Reply all
Reply to author
Forward
0 new messages