Re: mongoBD hangs during bulk inserts and w:"majority"

385 views
Skip to first unread message

heinob

unread,
Feb 5, 2013, 8:43:25 AM2/5/13
to mongod...@googlegroups.com

Addendum: Surprisingly the effect vanishes if I add

wtimeout: 3000

although no timeout really occurs. That seems to be a mongoDB issue, doesn't it!?



Am Dienstag, 5. Februar 2013 11:00:30 UTC+1 schrieb heinob:

Hi,

I am running a replica set with the following configuration:

green:SECONDARY> cnf = rs.config()
{
    "_id" : "green",
        "version" : 15,
        "members" : [
            {
                    "_id" : 0,
                    "host" : "green1:27017"
            },
            {
                    "_id" : 1,
                    "host" : "green2:27017"
            },
            {
                    "_id" : 2,
                    "host" : "green3:27017"
            }
    ],
    "settings" : {
            "getLastErrorDefaults" : {
                    "w" : "majority"
            }
    }
}

When I do single inserts on the database everything works fine and the data is replicated perfectly. But when I am doing "bulk inserts" (means: many inserts in short time - about 50 inserts per second), my whole application hangs after 15-300 inserts. And not even my application: Also the database hangs. Doing

db.test.find()

in the mongo shell on the primary hangs also (no answer until forever).

The whole process can only be "healed" by killing the primary (kill -9). If I do that my application continoues running for a while (1-5 sec) until the new primary hangs again.

Now: If i set

w: 1

everything works fine and I can to bulk inserts up to forever without any hangup. Is this a problem of the driver, mongoDB or my application? I am using node.js (0.8.14), node-mongodb-native (1.2.11) and mongoDB (2.2.3). Please help! Thanks in advancve

Andrew Emil

unread,
Feb 5, 2013, 11:26:58 AM2/5/13
to mongod...@googlegroups.com
Hi Heinob,

Would it be possible for you to grab the output of mongostat while the inserts are going and post it here? 

It might also be useful if you could provide some more detailed information about the cluster that you are running mongoDB on.  

Another question I have: How long does it hang for?  Is it that it takes a very long time to execute a command, or that it just gets stuck indefinitely?  

Thanks,
Andrew

Jochen Brüggemann

unread,
Feb 5, 2013, 11:40:12 AM2/5/13
to mongod...@googlegroups.com
(mongostat): Yes, I will do tomorrow.
(cluster): What information do you need?
(hang duration): The primary hangs forever. In the meantime I figured out, that the error only occurs if the following three conditions are fulfilled 
  1. write concern > 1 or "majority"
  2. no wtimeout set
  3. inserts > 30/sec (0-30 per sec work perfectly).
If I insert > 30/sec a random insert between the 20th and 500th does not return anything and the primary is "dead" from then on. No more inserts/finds/updates work on that server in any database.




2013/2/5 Andrew Emil <andre...@10gen.com>

--
--
You received this message because you are subscribed to the Google
Groups "mongodb-user" group.
To post to this group, send email to mongod...@googlegroups.com
To unsubscribe from this group, send email to
mongodb-user...@googlegroups.com
See also the IRC channel -- freenode.net#mongodb
 
---
You received this message because you are subscribed to the Google Groups "mongodb-user" group.
To unsubscribe from this group and stop receiving emails from it, send an email to mongodb-user...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.
 
 

Shane Spencer

unread,
Feb 5, 2013, 11:43:25 AM2/5/13
to mongod...@googlegroups.com
I recently had problems with having to handle an autoreconnect for MongoClient in my code when doing a bunch of inserts (not bulk).  This recently popped up with current stable.

Any chance you can test your client code for this and handle it.. it resolved my problem and seems to be an area MongoDB doesn't want to focus on at this point in time.  Which is totally fine.. it can be a pita.

- Shane

Jochen Brüggemann

unread,
Feb 5, 2013, 12:02:09 PM2/5/13
to mongodb-user
What do you exactly mean by "handle an auto_reconnect"? I do not think that the problem lies within the client because it is not the client that hangs but the mongoDB-primary, or did I miss sth?


2013/2/5 Shane Spencer <sh...@bogomip.com>

Shane Spencer

unread,
Feb 5, 2013, 12:14:41 PM2/5/13
to mongod...@googlegroups.com
Oh sorry.  I have a problem, even with localhost, where the connection drops and makes it appear as though the MongoDB server is hanging on a series of inserts.  Even though I can connect to it using the CLI and issue requests just fine.

If you can't issue commands to the primary any more then yeh.. definitely a server side problem.

heinob

unread,
Feb 5, 2013, 12:19:04 PM2/5/13
to mongod...@googlegroups.com
This is the output of mongostat, which I started a few seconds before the bulk insertion. This time it needed 1542 inserts until the server hang up. Mongostat output stopped at the same moment!

green@green1:~$ mongostat
connected to: 127.0.0.1
insert  query update delete getmore command flushes mapped  vsize    res faults  locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn set repl       time
     0      0      0      0       0       3       0  39.8g  80.7g  3.57g      0     .:0.0%          0       0|0     0|0   186b     5k    54 green  PRI   18:08:29
     0      0      0      0       0      17       0  39.8g  80.7g  3.57g      0     .:0.0%          0       0|0     0|0     1k     8k    54 green  PRI   18:08:30
     0      0      0      0       0       3       0  39.8g  80.7g  3.57g      0     .:0.0%          0       0|0     0|0   184b     5k    54 green  PRI   18:08:31
    23     48     24      0     190     100       0  39.8g  80.7g  3.57g      0 green_db:4.2%          0       0|0     0|1    28k    54k    54 green  PRI   18:08:32
    85    168     86      0     676     352       0  39.8g  80.7g  3.57g      0 green_db:15.4%          0       0|0     0|0   100k   184k    54 green  PRI   18:08:33
    80    160     82      0     640     325       0  39.8g  80.7g  3.57g      0 green_db:14.0%          0       0|0     0|0    94k   172k    54 green  PRI   18:08:34
    85    170     87      0     680     343       0  39.8g  80.7g  3.56g      0 green_db:17.3%          0       0|0     0|0   100k   182k    54 green  PRI   18:08:35
    42     84     44      0     336     185       0  39.8g  80.7g  3.56g      0 green_db:11.8%          0       0|0     0|0    50k    96k    54 green  PRI   18:08:36
    45     90     47      0     360     183       0  39.8g  80.7g  3.55g      0  green_db:9.7%          0       0|0     0|0    53k    98k    54 green  PRI   18:08:37
    55    110     57      0     440     225       0  39.8g  80.7g  3.55g      0 green_db:12.7%          0       0|0     0|0    65k   120k    54 green  PRI   18:08:38
insert  query update delete getmore command flushes mapped  vsize    res faults       locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn set repl       time
    63    126     65      0     504     267       0  39.8g  80.7g  3.55g      0 green_db:14.0%          0       0|0     0|0    75k   139k    54 green  PRI   18:08:39
    63    126     65      0     504     257       0  39.8g  80.7g  3.54g      0 green_db:12.7%          0       0|0     0|0    74k   136k    54 green  PRI   18:08:40
    75    150     77      0     600     303       0  39.8g  80.7g  3.54g      0 green_db:14.7%          0       0|0     0|0    88k   161k    54 green  PRI   18:08:41
    85    171     88      0     682     359       0  39.8g  80.7g  3.52g      0 green_db:17.3%          0       0|0     0|1   101k   186k    54 green  PRI   18:08:42
    86    171     87      0     686     345       0  39.8g  80.7g  3.52g      0 green_db:15.3%          0       0|0     0|0   101k   183k    54 green  PRI   18:08:43
    85    171     88      0     684     347       0  39.8g  80.7g  3.51g      0 green_db:15.5%          0       0|0     0|1   101k   183k    54 green  PRI   18:08:44
    85    171     87      0     680     356       0  39.8g  80.7g  3.51g      0 green_db:16.5%          0       0|0     0|0   101k   185k    54 green  PRI   18:08:45
    83    164     84      0     660     334       0  39.8g  80.7g  3.42g      1 green_db:15.8%          0       0|0     0|0    98k   177k    54 green  PRI   18:08:46
    86    173     89      0     688     348       0  39.8g  80.7g  3.42g      0 green_db:14.3%          0       0|0     0|0   101k   184k    54 green  PRI   18:08:47
    88    175     89      0     704     368       0  39.8g  80.7g   3.4g      0 green_db:15.8%          0       0|0     0|0   105k   192k    54 green  PRI   18:08:48
insert  query update delete getmore command flushes mapped  vsize    res faults       locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn set repl       time
    84    168     86      0     672     339       0  39.8g  80.7g  3.39g      0 green_db:17.3%          0       0|0     0|0    99k   180k    54 green  PRI   18:08:49
    82    164     84      0     656     333       0  39.8g  80.7g  3.39g      0 green_db:15.2%          0       0|0     0|0    97k   176k    54 green  PRI   18:08:50
    84    168     86      0     672     351       0  39.8g  80.7g  3.39g      0 green_db:14.5%          0       0|0     0|0   100k   183k    54 green  PRI   18:08:51

Here the output stopped and no more inserts have been processed. The queries derive from client side referential integrity checks I do before inserting.

Shane Spencer

unread,
Feb 5, 2013, 1:37:45 PM2/5/13
to mongod...@googlegroups.com
And you definitely cannot make any queries after the hang ... from the CLI?


--

Jochen Brüggemann

unread,
Feb 5, 2013, 1:45:34 PM2/5/13
to mongodb-user

Nope. Nothing!

heinob

unread,
Feb 6, 2013, 4:51:21 AM2/6/13
to mongod...@googlegroups.com
Replica-Set of three separate debian machines. node.js app running on a fourth machine.

All four machines have same configuration:
- debian
- 8 GB RAM
- quad core


Am Dienstag, 5. Februar 2013 17:26:58 UTC+1 schrieb Andrew Emil:
Message has been deleted

Andrew Emil

unread,
Feb 8, 2013, 5:40:59 PM2/8/13
to mongod...@googlegroups.com
Hello Heinob,

This is quite a mysterious problem to me, I am very unsure about what the problem here could be.

Would it be possible for you to post some log output (during these issue time periods) here?

Could you also post iostat output during these situations?  It seems possible it is some kind of hardware issue (perhaps on one of the secondaries?)

I am hoping that with this new information it will be easier to diagnose your problem as right now I really cannot say what is going on.  If it is still unclear after looking at these outputs then the next step will probably to be to increase the log verbosity and try again hoping that there is something useful in the new log output.

Thanks,
Andrew
Reply all
Reply to author
Forward
0 new messages