mongos crashes in concurrent read/write load

867 views
Skip to first unread message

Yulias Stolin

unread,
Aug 7, 2012, 6:25:53 AM8/7/12
to mongodb-user
I'm using mongodb version 2.0.6. I've configured it with sharding :
3 shards (each replica set master+secondary+arbiter)
1 config sevice
1 mongos

I'm running a jmeter test with 500 concurrent threads that invokes
each some read operation on mongo.
In a batch process in single thread I'm running 10 million upserts.
I see when my collection became about 5 millions the mongos crashes.

This is the log I see in it's .log file

Tue Aug 7 12:49:53 [mongosMain] can't create new thread, closing
connection
Tue Aug 7 12:49:53 [conn143663] DBException in process: could not
initialize cursor across all shards because :
boost::thread_resource_error @
shard2/172.16.40.202:10010,172.16.40.32:10010
Tue Aug 7 12:49:53 [conn143663] DBException in process: could not
initialize cursor across all shards because :
boost::thread_resource_error @
shard2/172.16.40.202:10010,172.16.40.32:10010
Tue Aug 7 12:49:53 [mongosMain] pthread_create failed: errno:11
Resource temporarily unavailable
Tue Aug 7 12:49:53 [mongosMain] can't create new thread, closing
connection
Tue Aug 7 12:49:53 [conn143663] ERROR: Uncaught std::exception:
boost::thread_resource_error, terminating
Tue Aug 7 12:49:53 [conn143663] dbexit: rc:100
Logstream::get called in uninitialized state
Tue Aug 7 12:49:53 [mongosMain] dbexit: rc:0
Received signal 6
Backtrace: 0x54e5b5 0x7fb8a7d46c20 0x7fb8a7d46ba5 0x7fb8a7d4a6b0
0x7fb8a7d3fa71 0x5e9f04 0x7fb8a883d971 0x7fb8a7df9f3d
/opt/mongodb-linux-x86_64-2.0.6/bin/
mongos(_ZN5mongo17printStackAndExitEi+0x75)[0x54e5b5]
/lib/libc.so.6(+0x33c20)[0x7fb8a7d46c20]
/lib/libc.so.6(gsignal+0x35)[0x7fb8a7d46ba5]

What does it mean?
BR,
Yulia Stolin

Jenna

unread,
Aug 7, 2012, 11:00:51 AM8/7/12
to mongodb-user
Hello Yulia,
I believe this error message is thrown when the operating system
reaches a limit on the number of connections it can accept.  There is
a ticket in JIRA related to handling this error:
https://jira.mongodb.org/browse/SERVER-4978

Do you think this is a possible explanation? Can you post your current
ulimit (ulimit -n to get the current setting). It's possible to change
the default number of allowed connections if you haven't already:
http://www.mongodb.org/display/DOCS/Too+Many+Open+Files

I would also like to point out that if you have a single config
server, I would strongly recommend adding two additional config
servers.  These servers store information about where your data exists
in a sharded cluster, and as such, it's very important to ensure that
this data is highly durable.

Yulias Stolin

unread,
Aug 8, 2012, 7:08:38 AM8/8/12
to mongod...@googlegroups.com
Hello,

I'm also think that this is a connection issue, but I do not understand why it happens?
I'm running mongo2.0.6 on ubuntu 
>  ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 131072
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited
I'm using mongo driver  2.8.0

The test I've run was in jmetter with 500 concurrent readers. In addition I have one single thread writer who's loading the 10M documents.
This is only test environment, thus I've configured 1 config server, I know that I need to run with 3 (I wanted to simplify the test meanwhile)

I do not understand what is happening with the connections? maybe there is some bug in mongos/mongod/java driver.

This is another snapshot of crash log.

Tue Aug  7 16:14:40 [mongosMain] can't create new thread, closing connection
Tue Aug  7 16:14:40 [conn723865] Socket recv() errno:104 Connection reset by peer 172.16.40.32:10020
Tue Aug  7 16:14:40 [conn723865] SocketException: remote: 172.16.40.32:10020 error: 9001 socket exception [1] server [172.16.40.32:10020
Tue Aug  7 16:14:40 [conn723865] DBClientCursor::init lazy say() failed
Tue Aug  7 16:14:40 [conn723865] DBClientCursor::init message from say() was empty
Tue Aug  7 16:14:40 [conn723865] slave no longer has secondary status: 172.16.40.32:10020
Tue Aug  7 16:14:40 [conn723865] warning: invalid result from shard3/172.16.40.202:10020,172.16.40.32:10020, retrying
Tue Aug  7 16:14:40 [conn723865] retrying parallel connection to shard3/172.16.40.202:10020,172.16.40.32:10020, 0 finished queries.
Tue Aug  7 16:14:40 [conn723865] DBException in process: could not initialize cursor across all shards because : boost::thread_resource_error @ shard3/172.16.40.202:10020,172.16.40.32:10020
Tue Aug  7 16:14:40 [conn723863] Socket recv() errno:104 Connection reset by peer 172.16.40.32:10020
Tue Aug  7 16:14:40 [conn723863] SocketException: remote: 172.16.40.32:10020 error: 9001 socket exception [1] server [172.16.40.32:10020
Tue Aug  7 16:14:40 [conn723863] DBClientCursor::init lazy say() failed
Tue Aug  7 16:14:40 [conn723863] DBClientCursor::init message from say() was empty
Tue Aug  7 16:14:40 [conn723863] slave no longer has secondary status: 172.16.40.32:10020
Tue Aug  7 16:14:40 [conn723863] warning: invalid result from shard3/172.16.40.202:10020,172.16.40.32:10020, retrying
Tue Aug  7 16:14:40 [conn723863] retrying parallel connection to shard3/172.16.40.202:10020,172.16.40.32:10020, 0 finished queries.
Tue Aug  7 16:14:40 [mongosMain] pthread_create failed: errno:11 Resource temporarily unavailable
Tue Aug  7 16:14:40 [mongosMain] can't create new thread, closing connection
Tue Aug  7 16:14:40 [conn724131] ERROR: Uncaught std::exception: boost::thread_resource_error, terminating
Tue Aug  7 16:14:40 [conn724131] dbexit:  rc:100 
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn724138]   Assertion failure ! inShutdown() client/connpool.cpp 136
Logstream::get called in uninitialized state
Logstream::get called in uninitialized state
Logstream::get called in uninitialized stateTue Aug  7 16:14:40 [conn724144]   Assertion failure ! inShutdown() client/connpool.cpp 136

Logstream::get called in uninitialized state0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
Logstream::get called in uninitialized state
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn724140]   Assertion failure ! inShutdown() client/connpool.cpp 136
Logstream::get called in uninitialized state
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn724142]   Assertion failure ! inShutdown() client/connpool.cpp 136
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
Logstream::get called in uninitialized state
Logstream::get called in uninitialized state0x52b5f6
Tue Aug  7 16:14:40 [conn724065] Socket recv() errno:104 Connection reset by peer 172.16.40.32:10020
Tue Aug  7 16:14:40 [conn724148]   Assertion failure ! inShutdown() client/connpool.cpp 136
Logstream::get called in uninitialized state
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4595 0x769507 0x777aac 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
Tue Aug  7 16:14:40 [conn724065] SocketException: remote: 172.16.40.32:10020 error: 9001 socket exception [1] server [172.16.40.32:10020
Logstream::get called in uninitialized stateLogstream::get called in uninitialized state
 
0x53613bLogstream::get called in uninitialized state
 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030Tue Aug  7 16:14:40 [conn724149]   Assertion failure ! inShutdown() client/connpool.cpp 136
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn724139]   Assertion failure ! inShutdown() client/connpool.cpp 136
Tue Aug  7 16:14:40 [conn724146]   Assertion failure ! inShutdown() client/connpool.cpp 136
Tue Aug  7 16:14:40 [conn724143]   Assertion failure ! inShutdown() client/connpool.cpp 136
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
 Tue Aug  7 16:14:40 [conn724065] DBClientCursor::init call() failed
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn724065] DBException in process: could not initialize cursor across all shards because : DBClientBase::findN: transport error: 172.16.40.32:10020 query: { setShardVersion: "", init: true, configdb: "172.16.40.31:20000", serverID: ObjectId('5020fe23566de242099f4c3e'), authoritative: true } @ shard1/172.16.40.31:10000,172.16.40.32:10000

ue Aug  7 16:14:40 [conn724141]   Assertion failure ! inShutdown() client/connpool.cpp 136
Received signal 6
Backtrace: 0x54e5b5 0x7f4809779c20 0x7f4809779ba5 0x7f480977d6b0 0x7f4809772a71 0x5e9f04 0x7f480a270971 0x7f480982cf3d 
0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4595 0x769507 0x777aac 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
Tue Aug  7 16:14:40 [conn724147]   Assertion failure ! inShutdown() client/connpool.cpp 136
Tue Aug  7 16:14:40 [conn724088]   Assertion failure ! inShutdown() client/connpool.cpp 136
Received signal 11
Backtrace: Tue Aug  7 16:14:40 [conn724084]   Assertion failure ! inShutdown() client/connpool.cpp 136
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
0x54e5b5 0x7f4809779c20 0x500942 0x50b854 0x50c544 0x52b5f6 0x739df8 0x76f9f5 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
0x52b5f6 0x53613b 0x5628d8 0x562a4b 0x5c55aa 0x5c4030 0x5c4306 0x5f0c58 0x770329 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [mongosMain] dbexit:  rc:0 
Received signal 11
Backtrace: 0x54e5b5 0x7f4809779c20 0x7f480977f445 0x7c2419 0x5e9c1a 0x7f480a270971 0x7f480982cf3d 
Received signal 11
Backtrace: 0x54e5b5 0x7f4809779c20 0x500942 0x50b854 0x50c544 0x50d290 0x739df8 0x76f9f5 0x7b6467 0x7c89c1 0x5e9747 0x7f480a270971 0x7f480982cf3d 
Logstream::get called in uninitialized state
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn723630] sharded connection to shard3/172.16.40.202:10020,172.16.40.32:10020 not being returned to the pool
Tue Aug  7 16:14:40 [conn723693] ERROR: Uncaught std::exception: call to empty boost::function, terminating
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn723693] dbexit:  rc:100 
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn723630] ERROR: Uncaught std::exception: call to empty boost::function, terminating
Logstream::get called in uninitialized state
Tue Aug  7 16:14:40 [conn723630] dbexit:  rc:100 
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo12sayDbContextEPKc+0x96) [0x52b5f6]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo8assertedEPKcS1_j+0xfb) [0x53613b]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo16DBConnectionPool4_getERKSsd+0x118) [0x5628d8]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo16DBConnectionPool3getERKSsd+0x3b) [0x562a4b]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo17ClientConnections13checkVersionsERKSs+0x1ba) [0x5c55aa]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo15ShardConnection5_initEv+0x2a0) [0x5c4030]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo15ShardConnectionC1ERKSsS2_+0x76) [0x5c4306]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo27ParallelSortClusteredCursor5_initEv+0x818) [0x5f0c58]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo13ShardStrategy7queryOpERNS_7RequestE+0xbc9) [0x770329]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo7Request7processEi+0x187) [0x7b6467]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo21ShardedMessageHandler7processERNS_7MessageEPNS_21AbstractMessagingPortEPNS_9LastErrorE+0x71) [0x7c89c1]
 /opt/mongodb-linux-x86_64-2.0.6/bin/mongos(_ZN5mongo3pms9threadRunEPNS_13MessagingPortE+0x287) [0x5e9747]
 /lib/libpthread.so.0(+0x7971) [0x7f480a270971]
 /lib/libc.so.6(clone+0x6d) [0x7f480982cf3d]

William Z

unread,
Aug 23, 2012, 2:29:05 PM8/23/12
to mongod...@googlegroups.com

Hi Yulias!

Your log file contains the error message:


> Tue Aug  7 16:14:40 [mongosMain] pthread_create failed: errno:11 Resource temporarily unavailable

This indicates that your OS was unable to create a new thread within the 'mongos' process.  Each incoming connection to 'mongos' requires a new thread, and the operating system is returning the EAGAIN error.

This means that this is an OS configuration problem.

Looking at the documentation for pthread_create(), I note that the OS can return EAGAIN from pthread_create() if there aren't enough resources for the individual process, or if there aren't enough resources system-wide.

(Ref: http://www.kernel.org/doc/man-pages/online/pages/man3/pthread_create.3.html)

The output from ulimit -a indicates that you have no limit on the per-user number of processes.  You can check the number of system-wide processes by looking in the file /proc/sys/kernel/threads-max .  You can raise this limit temporarily using the 'sysctl' command: you can change it permanently by editing /etc/sysctl.conf and setting the 'kernel.max-threads' value.

Let me know if this resolves your problem or if you have further questions.  Have a great day!

 -William
Reply all
Reply to author
Forward
0 new messages